Àwọn Àkọlé
Àwọn Àkọsílẹ̀
Ṣẹ̀dà àwọn fáìlì àwòrán àti àwòrán nípa ìṣàfarawe-ìṣàmúlò-ètò. Ìjánu-ìgbá-ìgbá, ìṣàfihàn àwọn ìgbàkọ, ìṣàfihàn AI, 100+ àwọn ìtàn.
Àwọn Àwọn Àkọlé
STT.ai API ná ìṣàfarawe-ìrọ̀ sí àkọ́kọ́, ìṣàfihàn-ìgbá gidi, àti àwọn ìṣàfihàn AI-powered. Gbogbo àwọn ìtàn náà lọ sí àwọn ààtò API GPU-powered wa.
https://api.stt.ai
Àwọn ìgúnrégé ìpamọ́ tí a fọwọ́sì: MP3, WAV, FLAC, OGG, M4A, AAC, OPUS, WMA, MP4, WebM, MKV, AVI, MOV, WMV, MPG, MPEG. Ìwọ̀n fáìlì kékeré jú: 2GB.
Àwọn Àwọn Àkọ́gbégbé
Fi bọ́tìnì API rẹ̀ pamọ́ sínú àwọn àmì-ìwé Àwọn Ààyè-iṣẹ́ bí àwọn tókè Bérárà:
Authorization: Bearer YOUR_API_KEY
Gba àwọn bọ́tìnì API rẹ̀ láti inú rẹ̀ Àwọn Ààtòjú Àwọn Àwọn Àwọn Àwọn Àwọn Àwọn Àwọn Àwọn. Àwọn ìtàn àìdájú tí a kò mọ̀ nípa àwọn ìṣàfarawe-ìwé 3 nínú ọjọ̀ kan fún IP kan.
Àwọn Ìdálẹ̀ Ìjánu-ìṣàmúlò-ètò
| Àwọn àwọn ààyè-iṣẹ́ | Àwọn Àkọlé | Ìwọ̀n Fáìlì Kéré | Àwọn ìṣàfarawé |
|---|---|---|---|
| Àìdàjú | 3/day per IP | 100 MB | 1 |
| Àwọn àwọn àwọn àwọn àwọn àwọn àwọn | 600 min/month | 500 MB | 2 |
| Àwọn ìṣàmúlò-ètò | Àwọn àwọn ìṣàmúlò-ètò | 2 GB | 5 |
Àwọn ìṣàmúlò-ètò ní pàtó nípa ìgbà ìgbọ́rọ̀rọ̀: ìṣàmúlò-ètò 1 = ààyè-iṣẹ́ iṣẹ́jù 1, tí a fi pẹ̀lú.
Àwọn Ààyè-iṣẹ́ Ìparí
https://api.stt.ai/v1/transcribe
Upload an audio or video file for transcription with speaker diarization, language detection, and word-level timestamps.
Request Parameters
Fikún bíi multipart/form-data
| Àwọn Àtòjọ-ẹ̀yàn | Àwọn Ìṣàmúlò-ètò | Tí a fẹ́ | Àwọn ìpéwọ̀n | Àwọn Àkọlé |
|---|---|---|---|---|
file | file | Yes | — | Audio or video file |
model | string | No | large-v3-turbo | Model: stt-ai-enhanced, large-v3-turbo, large-v3, medium, small. Call GET /v1/models for the live list with metadata. |
language | string | No | auto | ISO 639-1 code or auto |
diarize | boolean | No | true | Enable speaker diarization |
speakers | integer | No | 0 | Expected speakers (0 = auto) |
response_format | string | No | json | json, txt, srt, vtt |
Àwọn àgbéwọlé
{
"text": "Hello, welcome to the meeting...",
"language": "en",
"duration": 125.4,
"segments": [
{
"start": 0.0,
"end": 3.2,
"text": "Hello, welcome to the meeting.",
"speaker": "Speaker 1",
"confidence": 0.95,
"words": [
{"word": "Hello", "start": 0.0, "end": 0.4},
{"word": "welcome", "start": 0.5, "end": 0.9}
]
}
],
"speakers": ["Speaker 1", "Speaker 2"]
}
https://api.stt.ai/v1/summarize
Summarize transcript text using an on-device LLM. No data leaves our servers.
Àwọn Àwọn Àkọ́lé
| Àwọn Àtòjọ-ẹ̀yàn | Àwọn Ìṣàmúlò-ètò | Tí a fẹ́ | Àwọn Àkọlé |
|---|---|---|---|
text | string | Yes | Transcript text to summarize |
style | string | No | brief (default), detailed, action_items, bullet_points |
Àwọn àgbéwọlé
{
"summary": "The team discussed Q3 revenue growth of 15%...",
"style": "brief",
"model": "qwen2.5-1.5b-instruct"
}
wss://api.stt.ai/v1/stream
Real-time speech-to-text via WebSocket. Send raw PCM audio (16-bit, 16kHz, mono) and receive transcription updates instantly.
Àwọn Ìṣàmúlò-ètò
- Connect to
wss://api.stt.ai/v1/stream - Send JSON config:
{"language": "en", "model": "large-v3-turbo"} - Wait for
{"status": "ready"} - Stream raw PCM Int16 audio chunks (binary frames)
- Receive JSON updates:
{"text": "...", "partial": "..."} - Send
{"action": "stop"}to finalize
Àwọn Àmì-ìwé Àwọn Àmì-ìwé
| Àwọn Ààyè-iṣẹ́ | Àwọn Àkọlé |
|---|---|
status | "ready" — connection established, ready for audio |
partial | Partial/interim transcript (updates as you speak) |
text | Finalized transcript segment |
is_final | true when stream is complete |
https://api.stt.ai/v1/models
List all available transcription models. No authentication required.
{
"models": [
{"id": "large-v3-turbo", "name": "Whisper Large V3 Turbo", "languages": 99, "speed": "fast"},
{"id": "large-v3", "name": "Whisper Large V3", "languages": 99, "speed": "standard"},
{"id": "medium", "name": "Whisper Medium", "languages": 99, "speed": "fast"},
{"id": "small", "name": "Whisper Small", "languages": 99, "speed": "very_fast"}
]
}
https://api.stt.ai/v1/languages
List all supported languages with ISO codes. No authentication required.
{
"languages": [
{"code": "en", "name": "english"},
{"code": "es", "name": "spanish"},
{"code": "fr", "name": "french"},
...
]
}
https://api.stt.ai/v1/translate
Translate text to 450+ languages using MadLAD-400 (Apache 2.0). Runs on-device — no third-party APIs.
Request Body (JSON)
| Parameter | Type | Required | Description |
|---|---|---|---|
text | string | Yes* | Single text to translate |
texts | array | Yes* | Array of texts to batch translate |
target | string | Yes | Target language code (e.g., "es", "fr", "zh") |
source | string | No | Source language code (default: "en") |
// Single text
{"translated_text": "Hola mundo", "source_language": "en", "target_language": "es"}
// Batch
{"translations": [{"translated_text": "Hola"}, {"translated_text": "Mundo"}]}
https://api.stt.ai/v1/analyze
Analyze transcript text: sentiment, topics, entities, action items, questions, PII redaction.
Request Body (JSON)
| Parameter | Type | Description |
|---|---|---|
text | string | Text to analyze |
type | string | sentiment, topics, entities, action_items, questions, pii_redact |
https://api.stt.ai/v1/generate
Generate content from transcript text: blog posts, social media, meeting notes, study guides, flashcards, quizzes.
Request Body (JSON)
| Parameter | Type | Description |
|---|---|---|
text | string | Transcript text |
type | string | blog_post, social_media, newsletter, key_quotes, show_notes, meeting_notes, study_guide, flashcards, quiz, chapter_markers |
https://api.stt.ai/v1/enhance-audio
Remove noise and normalize audio. Returns the enhanced WAV file.
Send as multipart/form-data with a file field. Returns binary audio/wav.
https://api.stt.ai/v1/tts
Clone a voice from a reference audio clip and generate speech. Uses F5-TTS (MIT license).
Send as multipart/form-data:
| Parameter | Type | Description |
|---|---|---|
reference | file | 3-10 seconds of voice reference audio |
text | string | Text to speak in the cloned voice |
Returns binary audio/wav. Headers include X-Duration and X-Generation-Time.
https://api.stt.ai/v1/embed
Generate sentence embeddings for semantic search. 384-dimensional vectors from all-MiniLM-L6-v2.
Request Body (JSON)
{"texts": ["Hello world", "How are you"]}
Response
{"embeddings": [[0.123, -0.456, ...], [...]], "dimensions": 384}
https://api.stt.ai/health
Check GPU and API health. No authentication required.
{
"status": "ok",
"gpu_available": true,
"gpu_name": "NVIDIA A100",
"gpu_memory_mb": 8188
}
REST API (Django)
Manage your account, transcripts, API keys, and more via the Django REST API at https://stt.ai/api/.
https://stt.ai/api/v1/account/
Get or update your account info, email preferences, credits, plan details.
https://stt.ai/api/v1/transcripts/
List your transcripts with pagination. Filter by status, language, date.
https://stt.ai/api/v1/transcripts/:id/
Get transcript detail with segments, or delete a transcript.
https://stt.ai/api/v1/transcripts/:id/export/:format/
Export as txt, srt, vtt, json, csv, docx, or pdf.
https://stt.ai/api/v1/transcripts/:id/chat/
Ask AI questions about a transcript. Uses RAG with semantic search + Qwen2.5 LLM.
// Request
{"question": "What were the action items?", "session_id": "optional"}
// Response
{"answer": "...", "sources": [{"segment_order": 5, "text": "...", "score": 0.92}]}
https://stt.ai/api/v1/transcripts/:id/analyze/
Analyze transcript: sentiment, topics, entities, action_items, questions.
https://stt.ai/api/v1/transcripts/:id/generate/
Generate content: blog_post, social_media, meeting_notes, study_guide, flashcards, quiz.
https://stt.ai/api/v1/keys/
List or create API keys. POST returns the raw key once.
https://stt.ai/api/v1/keys/:id/
Revoke an API key.
https://stt.ai/api/v1/usage/
30-day usage breakdown by day.
https://stt.ai/api/v1/cloud/
List Private Cloud instances (if subscribed).
Àwọn Ààyè-iṣẹ́ Àwọn Àmì-ìwé
cURL
# Transcribe a file
curl -X POST https://api.stt.ai/v1/transcribe \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@meeting.mp3" \
-F "model=large-v3-turbo" \
-F "language=auto" \
-F "diarize=true"
# Get SRT subtitles
curl -X POST https://api.stt.ai/v1/transcribe \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@video.mp4" \
-F "response_format=srt" \
-o subtitles.srt
# Summarize text
curl -X POST https://api.stt.ai/v1/summarize \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"text": "Full transcript text here...", "style": "brief"}'
# List models (no auth needed)
curl https://api.stt.ai/v1/models
# Health check
curl https://api.stt.ai/health
Python
import requests
API_KEY = "YOUR_API_KEY"
BASE = "https://api.stt.ai"
# Transcribe a file
with open("meeting.mp3", "rb") as f:
resp = requests.post(
f"{BASE}/v1/transcribe",
headers={"Authorization": f"Bearer {API_KEY}"},
files={"file": ("meeting.mp3", f, "audio/mpeg")},
data={"model": "large-v3-turbo", "language": "auto", "diarize": "true"},
)
result = resp.json()
print(f"Language: {result['language']}, Duration: {result['duration']:.1f}s")
for seg in result["segments"]:
print(f"[{seg['start']:.1f}s] {seg.get('speaker', '')}: {seg['text']}")
# Summarize the transcript
summary = requests.post(
f"{BASE}/v1/summarize",
headers={"Authorization": f"Bearer {API_KEY}"},
json={"text": result["text"], "style": "bullet_points"},
).json()
print(summary["summary"])
Node.js
const fs = require("fs");
const FormData = require("form-data");
const API_KEY = "YOUR_API_KEY";
const BASE = "https://api.stt.ai";
async function transcribe(filePath) {
const form = new FormData();
form.append("file", fs.createReadStream(filePath));
form.append("model", "large-v3-turbo");
form.append("language", "auto");
form.append("diarize", "true");
const resp = await fetch(`${BASE}/v1/transcribe`, {
method: "POST",
headers: { Authorization: `Bearer ${API_KEY}`, ...form.getHeaders() },
body: form,
});
const result = await resp.json();
console.log(`Duration: ${result.duration.toFixed(1)}s`);
for (const seg of result.segments) {
console.log(`[${seg.start.toFixed(1)}s] ${seg.speaker}: ${seg.text}`);
}
return result;
}
transcribe("meeting.mp3");
WebSocket (Browser)
// Real-time transcription from microphone
const ws = new WebSocket("wss://api.stt.ai/v1/stream");
ws.binaryType = "arraybuffer";
ws.onopen = () => {
ws.send(JSON.stringify({ language: "auto", model: "large-v3-turbo" }));
};
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.status === "ready") {
startStreaming(); // Begin sending audio
}
if (data.text) console.log("Final:", data.text);
if (data.partial) console.log("Partial:", data.partial);
};
async function startStreaming() {
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const ctx = new AudioContext({ sampleRate: 16000 });
const source = ctx.createMediaStreamSource(stream);
const processor = ctx.createScriptProcessor(4096, 1, 1);
processor.onaudioprocess = (e) => {
const float32 = e.inputBuffer.getChannelData(0);
const int16 = new Int16Array(float32.length);
for (let i = 0; i < float32.length; i++) {
int16[i] = Math.max(-32768, Math.min(32767, float32[i] * 32767));
}
if (ws.readyState === 1) ws.send(int16.buffer);
};
source.connect(processor);
processor.connect(ctx.destination);
}
// Stop: ws.send(JSON.stringify({ action: "stop" }));
Ìṣàmúlò-ètò Àṣìṣe
The API returns standard HTTP status codes with JSON error bodies.
| Àwọn Ìṣàmúlò-ètò | Àwọn Ìtumọ̀ | Tí |
|---|---|---|
200 | OK | Request succeeded |
400 | Bad Request | Missing file, unsupported format |
401 | Unauthorized | Invalid or missing API key |
402 | Payment Required | No credits remaining |
429 | Too Many Requests | Rate limit exceeded (free tier) |
503 | Service Unavailable | GPU temporarily unavailable |
// Error response format
{"error": "No credits remaining. Upgrade your plan."}
Àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn àwọn
Official SDKs for Python and Node.js. Install and start transcribing in minutes.
Tí o tí fẹ́ bẹrẹ?
Ṣabẹwo fun ọfẹ ati ki o gba bọtini API rẹ ni awọn aaya. 600 iṣẹju/oṣu ọfẹ.
Ṣẹ̀dà nípa ọ̀fẹ̀ Wó àwọn àwọn ìṣàmúlò-ètò