བརྡ་དོན་སྤྲོད་
ཡིག་ཆའི་གནས་སྡུད་
སྒྲ་འཕྲིན་དང་པར་འཕྲིན་ཡིག་ཆ་བསྒྱུར་ཐབས་ཤེས་དང་ དུས་རྒྱུན་གྱི་སྒྲ་འཕྲིན་སྤེལ་སྟངས་ སྐད་ཡིག་༡༠༠་ལྷག་ཙམ་ དེ་ལས་ AI གི་བརྡ་སྤྲོད་ཐབས་ཤེས་སོགས་ཡོད་དོ།
སྤྱིའི་དབྱེ་ཞིབ།
The STT.ai API provides speech-to-text transcription, real-time streaming, and AI-powered summarization. All requests go directly to our GPU-powered API server.
https://api.stt.ai
Supported input formats: MP3, WAV, FLAC, OGG, M4A, AAC, OPUS, WMA, MP4, WebM, MKV, AVI, MOV, WMV, MPG, MPEG. Max file size: 2GB.
ངོ་རྟགས་
ཁྱེད་ཀྱི་API གི་ཀེ་ལ་འདི་བསྒང་ Authorization header ནང་ལུ་ Bearer token ཀྱི་ཐོག་ལས་བཏང་དགོཔ་ཨིན།
Authorization: Bearer YOUR_API_KEY
ཐོབ་ API གི་གོ་དོན་འདི་ལས་ སྒེར་རྩིས་ཀྱི་སྒྲིག་བཀོད།. མིང་མེད་ཀྱི་ཞུ་དྲི་འདི་ཡང་བཀག་ཆ་མེད་པར་རེ་རེ་བཞིན་རེའི་IP ནང་ལུ་3པར་བསྒྱུར་ཐབས་འབད་ཆོག་ནི་ཨིན་མས།
ཚད་འཛིན།
| གནས་རིམ་ | ཡིག་སྒྱུར། | ཡིག་ཆའི་སྦོམ་ཚད་མང་སུ་ཅིག | མཉམ་འབྲེལ་ |
|---|---|---|---|
| མིང་མེད་ | 3/day per IP | 100 MB | 1 |
| སྒེར་གྱི་ (ཐོ་བཀོད) | 600 min/month | 500 MB | 2 |
| དངུལ་ཀྲམའི་འཆར་གཞི། | སློབ་ཚན་གྱི་ཐོག་ལས་ | 2 GB | 5 |
Credits are deducted based on audio duration: 1 credit = 1 minute of audio, rounded up.
མཐའ་མཚམས་
https://api.stt.ai/v1/transcribe
Upload an audio or video file for transcription with speaker diarization, language detection, and word-level timestamps.
ཞུ་བའི་ཚད་མ་
གཞན་དུ་སྤེལ་བ multipart/form-data
| ཚད་མ་ | རིགས་སྣ། | དགོས་མཁོ། | སྔོན་འགྲོ། | འགྲེལ་བཤད་ |
|---|---|---|---|---|
file | file | Yes | — | Audio or video file |
model | string | No | large-v3-turbo | Model: large-v3-turbo, large-v3, medium, small |
language | string | No | auto | ISO 639-1 code or auto |
diarize | boolean | No | true | Enable speaker diarization |
speakers | integer | No | 0 | Expected speakers (0 = auto) |
response_format | string | No | json | json, txt, srt, vtt |
ལན་འདེབས་ (JSON)
{
"text": "Hello, welcome to the meeting...",
"language": "en",
"duration": 125.4,
"segments": [
{
"start": 0.0,
"end": 3.2,
"text": "Hello, welcome to the meeting.",
"speaker": "Speaker 1",
"confidence": 0.95,
"words": [
{"word": "Hello", "start": 0.0, "end": 0.4},
{"word": "welcome", "start": 0.5, "end": 0.9}
]
}
],
"speakers": ["Speaker 1", "Speaker 2"]
}
https://api.stt.ai/v1/summarize
Summarize transcript text using an on-device LLM. No data leaves our servers.
ཞུ་དྲི་གྱི་ནང་དོན་ (JSON)
| ཚད་མ་ | རིགས་སྣ། | དགོས་མཁོ། | འགྲེལ་བཤད་ |
|---|---|---|---|
text | string | Yes | Transcript text to summarize |
style | string | No | brief (default), detailed, action_items, bullet_points |
ལན་འདེབས་
{
"summary": "The team discussed Q3 revenue growth of 15%...",
"style": "brief",
"model": "qwen2.5-1.5b-instruct"
}
wss://api.stt.ai/v1/stream
Real-time speech-to-text via WebSocket. Send raw PCM audio (16-bit, 16kHz, mono) and receive transcription updates instantly.
མཐུན་རྐྱེན་
- Connect to
wss://api.stt.ai/v1/stream - Send JSON config:
{"language": "en", "model": "large-v3-turbo"} - Wait for
{"status": "ready"} - Stream raw PCM Int16 audio chunks (binary frames)
- Receive JSON updates:
{"text": "...", "partial": "..."} - Send
{"action": "stop"}to finalize
ཞབས་ཞུ་ཞབས་ཞུ་
| ས་སྒོ་ | འགྲེལ་བཤད་ |
|---|---|
status | "ready" — connection established, ready for audio |
partial | Partial/interim transcript (updates as you speak) |
text | Finalized transcript segment |
is_final | true when stream is complete |
https://api.stt.ai/v1/models
List all available transcription models. No authentication required.
{
"models": [
{"id": "large-v3-turbo", "name": "Whisper Large V3 Turbo", "languages": 99, "speed": "fast"},
{"id": "large-v3", "name": "Whisper Large V3", "languages": 99, "speed": "standard"},
{"id": "medium", "name": "Whisper Medium", "languages": 99, "speed": "fast"},
{"id": "small", "name": "Whisper Small", "languages": 99, "speed": "very_fast"}
]
}
https://api.stt.ai/v1/languages
List all supported languages with ISO codes. No authentication required.
{
"languages": [
{"code": "en", "name": "english"},
{"code": "es", "name": "spanish"},
{"code": "fr", "name": "french"},
...
]
}
https://api.stt.ai/v1/translate
Translate text to 450+ languages using MadLAD-400 (Apache 2.0). Runs on-device — no third-party APIs.
Request Body (JSON)
| Parameter | Type | Required | Description |
|---|---|---|---|
text | string | Yes* | Single text to translate |
texts | array | Yes* | Array of texts to batch translate |
target | string | Yes | Target language code (e.g., "es", "fr", "zh") |
source | string | No | Source language code (default: "en") |
// Single text
{"translated_text": "Hola mundo", "source_language": "en", "target_language": "es"}
// Batch
{"translations": [{"translated_text": "Hola"}, {"translated_text": "Mundo"}]}
https://api.stt.ai/v1/analyze
Analyze transcript text: sentiment, topics, entities, action items, questions, PII redaction.
Request Body (JSON)
| Parameter | Type | Description |
|---|---|---|
text | string | Text to analyze |
type | string | sentiment, topics, entities, action_items, questions, pii_redact |
https://api.stt.ai/v1/generate
Generate content from transcript text: blog posts, social media, meeting notes, study guides, flashcards, quizzes.
Request Body (JSON)
| Parameter | Type | Description |
|---|---|---|
text | string | Transcript text |
type | string | blog_post, social_media, newsletter, key_quotes, show_notes, meeting_notes, study_guide, flashcards, quiz, chapter_markers |
https://api.stt.ai/v1/enhance-audio
Remove noise and normalize audio. Returns the enhanced WAV file.
Send as multipart/form-data with a file field. Returns binary audio/wav.
https://api.stt.ai/v1/tts
Clone a voice from a reference audio clip and generate speech. Uses F5-TTS (MIT license).
Send as multipart/form-data:
| Parameter | Type | Description |
|---|---|---|
reference | file | 3-10 seconds of voice reference audio |
text | string | Text to speak in the cloned voice |
Returns binary audio/wav. Headers include X-Duration and X-Generation-Time.
https://api.stt.ai/v1/embed
Generate sentence embeddings for semantic search. 384-dimensional vectors from all-MiniLM-L6-v2.
Request Body (JSON)
{"texts": ["Hello world", "How are you"]}
Response
{"embeddings": [[0.123, -0.456, ...], [...]], "dimensions": 384}
https://api.stt.ai/health
Check GPU and API health. No authentication required.
{
"status": "ok",
"gpu_available": true,
"gpu_name": "NVIDIA A100",
"gpu_memory_mb": 8188
}
REST API (Django)
Manage your account, transcripts, API keys, and more via the Django REST API at https://stt.ai/api/.
https://stt.ai/api/v1/account/
Get or update your account info, email preferences, credits, plan details.
https://stt.ai/api/v1/transcripts/
List your transcripts with pagination. Filter by status, language, date.
https://stt.ai/api/v1/transcripts/:id/
Get transcript detail with segments, or delete a transcript.
https://stt.ai/api/v1/transcripts/:id/export/:format/
Export as txt, srt, vtt, json, csv, docx, or pdf.
https://stt.ai/api/v1/transcripts/:id/chat/
Ask AI questions about a transcript. Uses RAG with semantic search + Qwen2.5 LLM.
// Request
{"question": "What were the action items?", "session_id": "optional"}
// Response
{"answer": "...", "sources": [{"segment_order": 5, "text": "...", "score": 0.92}]}
https://stt.ai/api/v1/transcripts/:id/analyze/
Analyze transcript: sentiment, topics, entities, action_items, questions.
https://stt.ai/api/v1/transcripts/:id/generate/
Generate content: blog_post, social_media, meeting_notes, study_guide, flashcards, quiz.
https://stt.ai/api/v1/keys/
List or create API keys. POST returns the raw key once.
https://stt.ai/api/v1/keys/:id/
Revoke an API key.
https://stt.ai/api/v1/usage/
30-day usage breakdown by day.
https://stt.ai/api/v1/cloud/
List Private Cloud instances (if subscribed).
ཡིག་ཚན་དཔེ།
cURL
# Transcribe a file
curl -X POST https://api.stt.ai/v1/transcribe \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@meeting.mp3" \
-F "model=large-v3-turbo" \
-F "language=auto" \
-F "diarize=true"
# Get SRT subtitles
curl -X POST https://api.stt.ai/v1/transcribe \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@video.mp4" \
-F "response_format=srt" \
-o subtitles.srt
# Summarize text
curl -X POST https://api.stt.ai/v1/summarize \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"text": "Full transcript text here...", "style": "brief"}'
# List models (no auth needed)
curl https://api.stt.ai/v1/models
# Health check
curl https://api.stt.ai/health
Python
import requests
API_KEY = "YOUR_API_KEY"
BASE = "https://api.stt.ai"
# Transcribe a file
with open("meeting.mp3", "rb") as f:
resp = requests.post(
f"{BASE}/v1/transcribe",
headers={"Authorization": f"Bearer {API_KEY}"},
files={"file": ("meeting.mp3", f, "audio/mpeg")},
data={"model": "large-v3-turbo", "language": "auto", "diarize": "true"},
)
result = resp.json()
print(f"Language: {result['language']}, Duration: {result['duration']:.1f}s")
for seg in result["segments"]:
print(f"[{seg['start']:.1f}s] {seg.get('speaker', '')}: {seg['text']}")
# Summarize the transcript
summary = requests.post(
f"{BASE}/v1/summarize",
headers={"Authorization": f"Bearer {API_KEY}"},
json={"text": result["text"], "style": "bullet_points"},
).json()
print(summary["summary"])
Node.js
const fs = require("fs");
const FormData = require("form-data");
const API_KEY = "YOUR_API_KEY";
const BASE = "https://api.stt.ai";
async function transcribe(filePath) {
const form = new FormData();
form.append("file", fs.createReadStream(filePath));
form.append("model", "large-v3-turbo");
form.append("language", "auto");
form.append("diarize", "true");
const resp = await fetch(`${BASE}/v1/transcribe`, {
method: "POST",
headers: { Authorization: `Bearer ${API_KEY}`, ...form.getHeaders() },
body: form,
});
const result = await resp.json();
console.log(`Duration: ${result.duration.toFixed(1)}s`);
for (const seg of result.segments) {
console.log(`[${seg.start.toFixed(1)}s] ${seg.speaker}: ${seg.text}`);
}
return result;
}
transcribe("meeting.mp3");
WebSocket (Browser)
// Real-time transcription from microphone
const ws = new WebSocket("wss://api.stt.ai/v1/stream");
ws.binaryType = "arraybuffer";
ws.onopen = () => {
ws.send(JSON.stringify({ language: "auto", model: "large-v3-turbo" }));
};
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.status === "ready") {
startStreaming(); // Begin sending audio
}
if (data.text) console.log("Final:", data.text);
if (data.partial) console.log("Partial:", data.partial);
};
async function startStreaming() {
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const ctx = new AudioContext({ sampleRate: 16000 });
const source = ctx.createMediaStreamSource(stream);
const processor = ctx.createScriptProcessor(4096, 1, 1);
processor.onaudioprocess = (e) => {
const float32 = e.inputBuffer.getChannelData(0);
const int16 = new Int16Array(float32.length);
for (let i = 0; i < float32.length; i++) {
int16[i] = Math.max(-32768, Math.min(32767, float32[i] * 32767));
}
if (ws.readyState === 1) ws.send(int16.buffer);
};
source.connect(processor);
processor.connect(ctx.destination);
}
// Stop: ws.send(JSON.stringify({ action: "stop" }));
འཛོལ་བ་བསལ་ཐབས་
The API returns standard HTTP status codes with JSON error bodies.
| གནས་རིམ་ | དོན་དག། | སྐབས་ |
|---|---|---|
200 | OK | Request succeeded |
400 | Bad Request | Missing file, unsupported format |
401 | Unauthorized | Invalid or missing API key |
402 | Payment Required | No credits remaining |
429 | Too Many Requests | Rate limit exceeded (free tier) |
503 | Service Unavailable | GPU temporarily unavailable |
// Error response format
{"error": "No credits remaining. Upgrade your plan."}
SDK དང་ དཔེ་མཛོད་ཁང་
Official SDKs for Python and Node.js. Install and start transcribing in minutes.
འགོ་བཙུགས་ཐབས་རེད་?
སྒེར་གྱི་དྲ་རྒྱ་ སྒེར་གྱི་དྲ་རྒྱ་ སྒེར་གྱི་དྲ་རྒྱ་ སྒེར་གྱི་དྲ་རྒྱ་ སྒེར་གྱི་དྲ་རྒྱ་ སྒེར་གྱི་དྲ་རྒྱ་ སྒེར་གྱི་དྲ་རྒྱ་ སྒེར་གྱི་དྲ་རྒྱ་ སྒེར་གྱི་དྲ་རྒྱ་ སྒེར་གྱི་དྲ་རྒྱ་ སྒེར་གྱི་དྲ་རྒྱ་
ཐོ་བཀོད་ ཐོ་བཀོད་ གོང་ཚད་བལྟ་