Rapporto Bug / Richiesta di Funzionalità

API Reference

Documentation de l'API

Transcribe audio et vidéo fichiers programmatiquement. Streaming en temps réel, diarisation de haut-parleur, résumé AI, 100+ langues.

Overview

L'API STT.ai fournit la transcription de la parole au texte, le streaming en temps réel et la synthèse alimentée par l'IA.Toutes les demandes vont directement à notre serveur API alimenté par GPU.

URL basi

https://api.stt.ai

OpenAPI Spec (JSON)

100+

Lingue

4 Modelos de 4

Whisper Turbo, Large V3, Medium, Small - 100% original

En-temps réel

WebSocket streaming

Avec cette application, vous pouvez convertir des fichiers audio et vidéo en MP3, WAV, WMA, OGG, M4A, AAC, Opus, WMA, MP4, WebM, MKV, AVI, MOV, WMV, MPG, MPEG, FLV, 3GP, VOB, MKV, MOV, MP4, WebM, WMV, ASF, AVI, MP4, MOV, WMV, MPG, MPEG.

Autenticación

Envoyez votre clé API dans l'en-tête d'autorisation comme un token porteur:

Authorization: Bearer YOUR_API_KEY

Obteni tu API key desde tu cuenta de Google. Parametros do Conta. Requêtes anonymes sont permises avec un limite de 3 transcriptions par jour par IP.

Limites de taux

Tiers	Transcriptions	Max File Size	Concurrent
Anonyme	3/day per IP	100 MB	1
Free (registré)	600 min/month	500 MB	2
Paid plans	Bazé sur les crédits	2 GB	5

Les crédits sont déduits en fonction de la durée de l'audio: 1 crédit = 1 minute d'audio, arrondi à la hausse.

Endpoints

POST https://api.stt.ai/v1/transcribe

Upload an audio or video file for transcription with speaker diarization, language detection, and word-level timestamps.

Paramètres de la demande

Send as multipart/form-data

Parametro	Tipe	Required	Default	Beskrivelse
`file`	file	Yes	—	Audio or video file
`model`	string	No	`large-v3-turbo`	Model: `stt-ai-enhanced`, `large-v3-turbo`, `large-v3`, `medium`, `small`. Call `GET /v1/models` for the live list with metadata.
`language`	string	No	`auto`	ISO 639-1 code or `auto`
`diarize`	boolean	No	`true`	Enable speaker diarization
`speakers`	integer	No	`0`	Expected speakers (0 = auto)
`response_format`	string	No	`json`	`json`, `txt`, `srt`, `vtt`

Réponse (JSON)

{
  "text": "Hello, welcome to the meeting...",
  "language": "en",
  "duration": 125.4,
  "segments": [
    {
      "start": 0.0,
      "end": 3.2,
      "text": "Hello, welcome to the meeting.",
      "speaker": "Speaker 1",
      "confidence": 0.95,
      "words": [
        {"word": "Hello", "start": 0.0, "end": 0.4},
        {"word": "welcome", "start": 0.5, "end": 0.9}
      ]
    }
  ],
  "speakers": ["Speaker 1", "Speaker 2"]
}

POST https://api.stt.ai/v1/summarize

Summarize transcript text using an on-device LLM. No data leaves our servers.

Request Body (JSON)

Parametro	Tipe	Required	Beskrivelse
`text`	string	Yes	Transcript text to summarize
`style`	string	No	`brief` (default), `detailed`, `action_items`, `bullet_points`

Réponse

{
  "summary": "The team discussed Q3 revenue growth of 15%...",
  "style": "brief",
  "model": "qwen2.5-1.5b-instruct"
}

WS wss://api.stt.ai/v1/stream

Real-time speech-to-text via WebSocket. Send raw PCM audio (16-bit, 16kHz, mono) and receive transcription updates instantly.

Protocol

Connect to wss://api.stt.ai/v1/stream
Send JSON config: {"language": "en", "model": "large-v3-turbo"}
Wait for {"status": "ready"}
Stream raw PCM Int16 audio chunks (binary frames)
Receive JSON updates: {"text": "...", "partial": "..."}
Send {"action": "stop"} to finalize

Messaggi da Server

Campo	Beskrivelse
`status`	`"ready"` — connection established, ready for audio
`partial`	Partial/interim transcript (updates as you speak)
`text`	Finalized transcript segment
`is_final`	`true` when stream is complete

GET https://api.stt.ai/v1/models

List all available transcription models. No authentication required.

{
  "models": [
    {"id": "large-v3-turbo", "name": "Whisper Large V3 Turbo", "languages": 99, "speed": "fast"},
    {"id": "large-v3", "name": "Whisper Large V3", "languages": 99, "speed": "standard"},
    {"id": "medium", "name": "Whisper Medium", "languages": 99, "speed": "fast"},
    {"id": "small", "name": "Whisper Small", "languages": 99, "speed": "very_fast"}
  ]
}

GET https://api.stt.ai/v1/languages

List all supported languages with ISO codes. No authentication required.

{
  "languages": [
    {"code": "en", "name": "english"},
    {"code": "es", "name": "spanish"},
    {"code": "fr", "name": "french"},
    ...
  ]
}

POST https://api.stt.ai/v1/translate

Translate text to 450+ languages using MadLAD-400 (Apache 2.0). Runs on-device — no third-party APIs.

Request Body (JSON)

Parameter	Type	Required	Description
`text`	string	Yes*	Single text to translate
`texts`	array	Yes*	Array of texts to batch translate
`target`	string	Yes	Target language code (e.g., "es", "fr", "zh")
`source`	string	No	Source language code (default: "en")

// Single text
{"translated_text": "Hola mundo", "source_language": "en", "target_language": "es"}

// Batch
{"translations": [{"translated_text": "Hola"}, {"translated_text": "Mundo"}]}

POST https://api.stt.ai/v1/analyze

Analyze transcript text: sentiment, topics, entities, action items, questions, PII redaction.

Request Body (JSON)

Parameter	Type	Description
`text`	string	Text to analyze
`type`	string	`sentiment`, `topics`, `entities`, `action_items`, `questions`, `pii_redact`

POST https://api.stt.ai/v1/generate

Generate content from transcript text: blog posts, social media, meeting notes, study guides, flashcards, quizzes.

Request Body (JSON)

Parameter	Type	Description
`text`	string	Transcript text
`type`	string	`blog_post`, `social_media`, `newsletter`, `key_quotes`, `show_notes`, `meeting_notes`, `study_guide`, `flashcards`, `quiz`, `chapter_markers`

POST https://api.stt.ai/v1/enhance-audio

Remove noise and normalize audio. Returns the enhanced WAV file.

Send as multipart/form-data with a file field. Returns binary audio/wav.

POST https://api.stt.ai/v1/tts

Clone a voice from a reference audio clip and generate speech. Uses F5-TTS (MIT license).

Send as multipart/form-data:

Parameter	Type	Description
`reference`	file	3-10 seconds of voice reference audio
`text`	string	Text to speak in the cloned voice

Returns binary audio/wav. Headers include X-Duration and X-Generation-Time.

POST https://api.stt.ai/v1/embed

Generate sentence embeddings for semantic search. 384-dimensional vectors from all-MiniLM-L6-v2.

Request Body (JSON)

{"texts": ["Hello world", "How are you"]}

Response

{"embeddings": [[0.123, -0.456, ...], [...]], "dimensions": 384}

GET https://api.stt.ai/health

Check GPU and API health. No authentication required.

{
  "status": "ok",
  "gpu_available": true,
  "gpu_name": "NVIDIA A100",
  "gpu_memory_mb": 8188
}

REST API (Django)

Manage your account, transcripts, API keys, and more via the Django REST API at https://stt.ai/api/.

GET PUT https://stt.ai/api/v1/account/

Get or update your account info, email preferences, credits, plan details.

GET https://stt.ai/api/v1/transcripts/

List your transcripts with pagination. Filter by status, language, date.

GET DELETE https://stt.ai/api/v1/transcripts/:id/

Get transcript detail with segments, or delete a transcript.

GET https://stt.ai/api/v1/transcripts/:id/export/:format/

Export as txt, srt, vtt, json, csv, docx, or pdf.

POST https://stt.ai/api/v1/transcripts/:id/chat/

Ask AI questions about a transcript. Uses RAG with semantic search + Qwen2.5 LLM.

// Request
{"question": "What were the action items?", "session_id": "optional"}

// Response
{"answer": "...", "sources": [{"segment_order": 5, "text": "...", "score": 0.92}]}

POST https://stt.ai/api/v1/transcripts/:id/analyze/

Analyze transcript: sentiment, topics, entities, action_items, questions.

POST https://stt.ai/api/v1/transcripts/:id/generate/

Generate content: blog_post, social_media, meeting_notes, study_guide, flashcards, quiz.

GET POST https://stt.ai/api/v1/keys/

List or create API keys. POST returns the raw key once.

DELETE https://stt.ai/api/v1/keys/:id/

Revoke an API key.

GET https://stt.ai/api/v1/usage/

30-day usage breakdown by day.

GET https://stt.ai/api/v1/cloud/

List Private Cloud instances (if subscribed).

Exemples de Code

cURL

# Transcribe a file
curl -X POST https://api.stt.ai/v1/transcribe \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@meeting.mp3" \
  -F "model=large-v3-turbo" \
  -F "language=auto" \
  -F "diarize=true"

# Get SRT subtitles
curl -X POST https://api.stt.ai/v1/transcribe \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@video.mp4" \
  -F "response_format=srt" \
  -o subtitles.srt

# Summarize text
curl -X POST https://api.stt.ai/v1/summarize \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Full transcript text here...", "style": "brief"}'

# List models (no auth needed)
curl https://api.stt.ai/v1/models

# Health check
curl https://api.stt.ai/health

Python

import requests

API_KEY = "YOUR_API_KEY"
BASE = "https://api.stt.ai"

# Transcribe a file
with open("meeting.mp3", "rb") as f:
    resp = requests.post(
        f"{BASE}/v1/transcribe",
        headers={"Authorization": f"Bearer {API_KEY}"},
        files={"file": ("meeting.mp3", f, "audio/mpeg")},
        data={"model": "large-v3-turbo", "language": "auto", "diarize": "true"},
    )

result = resp.json()
print(f"Language: {result['language']}, Duration: {result['duration']:.1f}s")

for seg in result["segments"]:
    print(f"[{seg['start']:.1f}s] {seg.get('speaker', '')}: {seg['text']}")

# Summarize the transcript
summary = requests.post(
    f"{BASE}/v1/summarize",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={"text": result["text"], "style": "bullet_points"},
).json()
print(summary["summary"])

Node.js

const fs = require("fs");
const FormData = require("form-data");

const API_KEY = "YOUR_API_KEY";
const BASE = "https://api.stt.ai";

async function transcribe(filePath) {
  const form = new FormData();
  form.append("file", fs.createReadStream(filePath));
  form.append("model", "large-v3-turbo");
  form.append("language", "auto");
  form.append("diarize", "true");

  const resp = await fetch(`${BASE}/v1/transcribe`, {
    method: "POST",
    headers: { Authorization: `Bearer ${API_KEY}`, ...form.getHeaders() },
    body: form,
  });

  const result = await resp.json();
  console.log(`Duration: ${result.duration.toFixed(1)}s`);

  for (const seg of result.segments) {
    console.log(`[${seg.start.toFixed(1)}s] ${seg.speaker}: ${seg.text}`);
  }
  return result;
}

transcribe("meeting.mp3");

WebSocket (Browser)

// Real-time transcription from microphone
const ws = new WebSocket("wss://api.stt.ai/v1/stream");
ws.binaryType = "arraybuffer";

ws.onopen = () => {
  ws.send(JSON.stringify({ language: "auto", model: "large-v3-turbo" }));
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  if (data.status === "ready") {
    startStreaming(); // Begin sending audio
  }
  if (data.text) console.log("Final:", data.text);
  if (data.partial) console.log("Partial:", data.partial);
};

async function startStreaming() {
  const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
  const ctx = new AudioContext({ sampleRate: 16000 });
  const source = ctx.createMediaStreamSource(stream);
  const processor = ctx.createScriptProcessor(4096, 1, 1);

  processor.onaudioprocess = (e) => {
    const float32 = e.inputBuffer.getChannelData(0);
    const int16 = new Int16Array(float32.length);
    for (let i = 0; i < float32.length; i++) {
      int16[i] = Math.max(-32768, Math.min(32767, float32[i] * 32767));
    }
    if (ws.readyState === 1) ws.send(int16.buffer);
  };

  source.connect(processor);
  processor.connect(ctx.destination);
}

// Stop: ws.send(JSON.stringify({ action: "stop" }));

Handling d'erreur

The API returns standard HTTP status codes with JSON error bodies.

Status	Significant	When
`200`	OK	Request succeeded
`400`	Bad Request	Missing file, unsupported format
`401`	Unauthorized	Invalid or missing API key
`402`	Payment Required	No credits remaining
`429`	Too Many Requests	Rate limit exceeded (free tier)
`503`	Service Unavailable	GPU temporarily unavailable

// Error response format
{"error": "No credits remaining. Upgrade your plan."}

SDKs & Bibliothèques

Official SDKs for Python and Node.js. Install and start transcribing in minutes.

Python

pip install sttai

Node.js

npm install @sttainpm/sttai

REST API

Funziona con qualsiasi client HTTP

Pronto per iniziare?

Inscrivez-vous gratuitement et obtenez votre clé API en quelques secondes. 600 minutes/mois gratuits.

Iscriviti gratis View pricing