Dokumentacija API

Transcribe audio in video datoteke programsko. Realnočasovno streaming, zvočnik diarizacija, AI summarisation, 100+ jezikov.

Pregled

STT.ai API zagotavlja transkripcijo govora na besedilo, streaming v realnem času, in AI pogon summarisation. Vse zahteve gre neposredno na naš GPU pogon API strežnik.

Osnovni URL
https://api.stt.ai
100+
Jeziki
5 modelov
STT.ai izboljšano, Whisper Turbo, velika V3, srednja, majhna
Realnočasovno
WebSocket streaming

Podprti vhodni formati: MP3, WAV, FLAC, OGG, M4A, AAC, OPUS, WMA, MP4, WebM, MKV, AVI, MOV, WMV, MPEG, MPEG. Največja velikost datoteke: 2GB.

Avtentifikacija

Pošljite svoj API ključ v glavo za avtorizacijo kot tokena Nosilca:

Authorization: Bearer YOUR_API_KEY

Vzemi svoj API ključ iz svojega Nastavitve računa. Anonimne zahteve so dovoljene z mejo 3 transkripcije na dan na IP.

Mejne vrednosti

StopnjaTranscriptionsMax velikost datotekeSočasno
Anonimno3/day per IP100 MB1
Prosto (registrirano)600 min/month500 MB2
Plačani načrtiNa podlagi kreditov2 GB5

Krediti se odštejejo na podlagi trajanja zvoka: 1 kredit = 1 minuta zvoka, zaokrožena.


Končni točki

POST https://api.stt.ai/v1/transcribe

Upload an audio or video file for transcription with speaker diarization, language detection, and word-level timestamps.

Zahtevajte parametre

Pošlji kot multipart/form-data

ParameterVrstaZahtevanaPrivzetoOpis
filefileYesAudio or video file
modelstringNolarge-v3-turboModel: stt-ai-enhanced, large-v3-turbo, large-v3, medium, small. Call GET /v1/models for the live list with metadata.
languagestringNoautoISO 639-1 code or auto
diarizebooleanNotrueEnable speaker diarization
speakersintegerNo0Expected speakers (0 = auto)
response_formatstringNojsonjson, txt, srt, vtt
Odziv (JSON)
{
  "text": "Hello, welcome to the meeting...",
  "language": "en",
  "duration": 125.4,
  "segments": [
    {
      "start": 0.0,
      "end": 3.2,
      "text": "Hello, welcome to the meeting.",
      "speaker": "Speaker 1",
      "confidence": 0.95,
      "words": [
        {"word": "Hello", "start": 0.0, "end": 0.4},
        {"word": "welcome", "start": 0.5, "end": 0.9}
      ]
    }
  ],
  "speakers": ["Speaker 1", "Speaker 2"]
}
POST https://api.stt.ai/v1/summarize

Summarize transcript text using an on-device LLM. No data leaves our servers.

Telo zahtevka (JSON)
ParameterVrstaZahtevanaOpis
textstringYesTranscript text to summarize
stylestringNobrief (default), detailed, action_items, bullet_points
Odziv
{
  "summary": "The team discussed Q3 revenue growth of 15%...",
  "style": "brief",
  "model": "qwen2.5-1.5b-instruct"
}
WS wss://api.stt.ai/v1/stream

Real-time speech-to-text via WebSocket. Send raw PCM audio (16-bit, 16kHz, mono) and receive transcription updates instantly.

Protokol
  1. Connect to wss://api.stt.ai/v1/stream
  2. Send JSON config: {"language": "en", "model": "large-v3-turbo"}
  3. Wait for {"status": "ready"}
  4. Stream raw PCM Int16 audio chunks (binary frames)
  5. Receive JSON updates: {"text": "...", "partial": "..."}
  6. Send {"action": "stop"} to finalize
Sporočila strežnika
PoljeOpis
status"ready" — connection established, ready for audio
partialPartial/interim transcript (updates as you speak)
textFinalized transcript segment
is_finaltrue when stream is complete
GET https://api.stt.ai/v1/models

List all available transcription models. No authentication required.

{
  "models": [
    {"id": "large-v3-turbo", "name": "Whisper Large V3 Turbo", "languages": 99, "speed": "fast"},
    {"id": "large-v3", "name": "Whisper Large V3", "languages": 99, "speed": "standard"},
    {"id": "medium", "name": "Whisper Medium", "languages": 99, "speed": "fast"},
    {"id": "small", "name": "Whisper Small", "languages": 99, "speed": "very_fast"}
  ]
}
GET https://api.stt.ai/v1/languages

List all supported languages with ISO codes. No authentication required.

{
  "languages": [
    {"code": "en", "name": "english"},
    {"code": "es", "name": "spanish"},
    {"code": "fr", "name": "french"},
    ...
  ]
}
POST https://api.stt.ai/v1/translate

Translate text to 450+ languages using MadLAD-400 (Apache 2.0). Runs on-device — no third-party APIs.

Request Body (JSON)
ParameterTypeRequiredDescription
textstringYes*Single text to translate
textsarrayYes*Array of texts to batch translate
targetstringYesTarget language code (e.g., "es", "fr", "zh")
sourcestringNoSource language code (default: "en")
// Single text
{"translated_text": "Hola mundo", "source_language": "en", "target_language": "es"}

// Batch
{"translations": [{"translated_text": "Hola"}, {"translated_text": "Mundo"}]}
POST https://api.stt.ai/v1/analyze

Analyze transcript text: sentiment, topics, entities, action items, questions, PII redaction.

Request Body (JSON)
ParameterTypeDescription
textstringText to analyze
typestringsentiment, topics, entities, action_items, questions, pii_redact
POST https://api.stt.ai/v1/generate

Generate content from transcript text: blog posts, social media, meeting notes, study guides, flashcards, quizzes.

Request Body (JSON)
ParameterTypeDescription
textstringTranscript text
typestringblog_post, social_media, newsletter, key_quotes, show_notes, meeting_notes, study_guide, flashcards, quiz, chapter_markers
POST https://api.stt.ai/v1/enhance-audio

Remove noise and normalize audio. Returns the enhanced WAV file.

Send as multipart/form-data with a file field. Returns binary audio/wav.

POST https://api.stt.ai/v1/tts

Clone a voice from a reference audio clip and generate speech. Uses F5-TTS (MIT license).

Send as multipart/form-data:

ParameterTypeDescription
referencefile3-10 seconds of voice reference audio
textstringText to speak in the cloned voice

Returns binary audio/wav. Headers include X-Duration and X-Generation-Time.

POST https://api.stt.ai/v1/embed

Generate sentence embeddings for semantic search. 384-dimensional vectors from all-MiniLM-L6-v2.

Request Body (JSON)
{"texts": ["Hello world", "How are you"]}
Response
{"embeddings": [[0.123, -0.456, ...], [...]], "dimensions": 384}
GET https://api.stt.ai/health

Check GPU and API health. No authentication required.

{
  "status": "ok",
  "gpu_available": true,
  "gpu_name": "NVIDIA A100",
  "gpu_memory_mb": 8188
}

REST API (Django)

Manage your account, transcripts, API keys, and more via the Django REST API at https://stt.ai/api/.

GET PUT https://stt.ai/api/v1/account/

Get or update your account info, email preferences, credits, plan details.

GET https://stt.ai/api/v1/transcripts/

List your transcripts with pagination. Filter by status, language, date.

GET DELETE https://stt.ai/api/v1/transcripts/:id/

Get transcript detail with segments, or delete a transcript.

GET https://stt.ai/api/v1/transcripts/:id/export/:format/

Export as txt, srt, vtt, json, csv, docx, or pdf.

POST https://stt.ai/api/v1/transcripts/:id/chat/

Ask AI questions about a transcript. Uses RAG with semantic search + Qwen2.5 LLM.

// Request
{"question": "What were the action items?", "session_id": "optional"}

// Response
{"answer": "...", "sources": [{"segment_order": 5, "text": "...", "score": 0.92}]}
POST https://stt.ai/api/v1/transcripts/:id/analyze/

Analyze transcript: sentiment, topics, entities, action_items, questions.

POST https://stt.ai/api/v1/transcripts/:id/generate/

Generate content: blog_post, social_media, meeting_notes, study_guide, flashcards, quiz.

GET POST https://stt.ai/api/v1/keys/

List or create API keys. POST returns the raw key once.

DELETE https://stt.ai/api/v1/keys/:id/

Revoke an API key.

GET https://stt.ai/api/v1/usage/

30-day usage breakdown by day.

GET https://stt.ai/api/v1/cloud/

List Private Cloud instances (if subscribed).


Primeri oznake

cURL
# Transcribe a file
curl -X POST https://api.stt.ai/v1/transcribe \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@meeting.mp3" \
  -F "model=large-v3-turbo" \
  -F "language=auto" \
  -F "diarize=true"

# Get SRT subtitles
curl -X POST https://api.stt.ai/v1/transcribe \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@video.mp4" \
  -F "response_format=srt" \
  -o subtitles.srt

# Summarize text
curl -X POST https://api.stt.ai/v1/summarize \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Full transcript text here...", "style": "brief"}'

# List models (no auth needed)
curl https://api.stt.ai/v1/models

# Health check
curl https://api.stt.ai/health
Python
import requests

API_KEY = "YOUR_API_KEY"
BASE = "https://api.stt.ai"

# Transcribe a file
with open("meeting.mp3", "rb") as f:
    resp = requests.post(
        f"{BASE}/v1/transcribe",
        headers={"Authorization": f"Bearer {API_KEY}"},
        files={"file": ("meeting.mp3", f, "audio/mpeg")},
        data={"model": "large-v3-turbo", "language": "auto", "diarize": "true"},
    )

result = resp.json()
print(f"Language: {result['language']}, Duration: {result['duration']:.1f}s")

for seg in result["segments"]:
    print(f"[{seg['start']:.1f}s] {seg.get('speaker', '')}: {seg['text']}")

# Summarize the transcript
summary = requests.post(
    f"{BASE}/v1/summarize",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={"text": result["text"], "style": "bullet_points"},
).json()
print(summary["summary"])
Node.js
const fs = require("fs");
const FormData = require("form-data");

const API_KEY = "YOUR_API_KEY";
const BASE = "https://api.stt.ai";

async function transcribe(filePath) {
  const form = new FormData();
  form.append("file", fs.createReadStream(filePath));
  form.append("model", "large-v3-turbo");
  form.append("language", "auto");
  form.append("diarize", "true");

  const resp = await fetch(`${BASE}/v1/transcribe`, {
    method: "POST",
    headers: { Authorization: `Bearer ${API_KEY}`, ...form.getHeaders() },
    body: form,
  });

  const result = await resp.json();
  console.log(`Duration: ${result.duration.toFixed(1)}s`);

  for (const seg of result.segments) {
    console.log(`[${seg.start.toFixed(1)}s] ${seg.speaker}: ${seg.text}`);
  }
  return result;
}

transcribe("meeting.mp3");
WebSocket (Browser)
// Real-time transcription from microphone
const ws = new WebSocket("wss://api.stt.ai/v1/stream");
ws.binaryType = "arraybuffer";

ws.onopen = () => {
  ws.send(JSON.stringify({ language: "auto", model: "large-v3-turbo" }));
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  if (data.status === "ready") {
    startStreaming(); // Begin sending audio
  }
  if (data.text) console.log("Final:", data.text);
  if (data.partial) console.log("Partial:", data.partial);
};

async function startStreaming() {
  const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
  const ctx = new AudioContext({ sampleRate: 16000 });
  const source = ctx.createMediaStreamSource(stream);
  const processor = ctx.createScriptProcessor(4096, 1, 1);

  processor.onaudioprocess = (e) => {
    const float32 = e.inputBuffer.getChannelData(0);
    const int16 = new Int16Array(float32.length);
    for (let i = 0; i < float32.length; i++) {
      int16[i] = Math.max(-32768, Math.min(32767, float32[i] * 32767));
    }
    if (ws.readyState === 1) ws.send(int16.buffer);
  };

  source.connect(processor);
  processor.connect(ctx.destination);
}

// Stop: ws.send(JSON.stringify({ action: "stop" }));

Obvladovanje napak

The API returns standard HTTP status codes with JSON error bodies.

StanjePomenKdaj
200OKRequest succeeded
400Bad RequestMissing file, unsupported format
401UnauthorizedInvalid or missing API key
402Payment RequiredNo credits remaining
429Too Many RequestsRate limit exceeded (free tier)
503Service UnavailableGPU temporarily unavailable
// Error response format
{"error": "No credits remaining. Upgrade your plan."}

SDK in knjižnice

Official SDKs for Python and Node.js. Install and start transcribing in minutes.

RESTARNI API
Dela z vsakim odjemalcem HTTP

Pripravljeni za začetek?

Vpišite se brezplačno in dobite API ključ v sekundah. 600 minut/mesec brezplačno.

Vpišite se brezplačno Ogled cen