バグ/機能要求を報告

APIリファレンス

APIドキュメント

プログラムで音声・動画ファイルを文字起こし。リアルタイムストリーミング、話者分離、AI要約、100以上の言語。

概要

STT.ai APIは音声テキスト変換、リアルタイムストリーミング、AI要約を提供。すべてのリクエストはGPUサーバーに直接送信されます。

ベースURL

https://api.stt.ai

OpenAPI Spec (JSON)

100+

言語

4つのモデル

Whisper Turbo、Large V3、Medium、Small

リアルタイム

WebSocketストリーミング

対応入力形式：MP3、WAV、FLAC、OGG、M4A、AAC、OPUS、WMA、MP4、WebM、MKV、AVI、MOV、WMV、MPG、MPEG。最大ファイルサイズ：2GB。

認証

AuthorizationヘッダーにBearerトークンとしてAPIキーを送信：

Authorization: Bearer YOUR_API_KEY

APIキーはこちらから取得アカウントの設定. 匿名の要求は、IPごとに1日3回の転写を制限して許可されている。

レート制限

動物	転写	最大ファイルサイズ	コンカレント
アノニマス	3/day per IP	100 MB	1
無料（登録）	600 min/month	500 MB	2
有料プラン	クレジットに基づく	2 GB	5

クレジットは音声の持続時間に基づいて引き落とされる：1クレジット＝1分の音声、丸めて上げる。

エンドポイント

POST https://api.stt.ai/v1/transcribe

Upload an audio or video file for transcription with speaker diarization, language detection, and word-level timestamps.

要求パラメータ

他の名前で送信 multipart/form-data

パラメータ	タイプ	必須	デフォルト	説明
`file`	file	Yes	—	Audio or video file
`model`	string	No	`large-v3-turbo`	Model: `stt-ai-enhanced`, `large-v3-turbo`, `large-v3`, `medium`, `small`. Call `GET /v1/models` for the live list with metadata.
`language`	string	No	`auto`	ISO 639-1 code or `auto`
`diarize`	boolean	No	`true`	Enable speaker diarization
`speakers`	integer	No	`0`	Expected speakers (0 = auto)
`response_format`	string	No	`json`	`json`, `txt`, `srt`, `vtt`

レスポンス (JSON)

{
  "text": "Hello, welcome to the meeting...",
  "language": "en",
  "duration": 125.4,
  "segments": [
    {
      "start": 0.0,
      "end": 3.2,
      "text": "Hello, welcome to the meeting.",
      "speaker": "Speaker 1",
      "confidence": 0.95,
      "words": [
        {"word": "Hello", "start": 0.0, "end": 0.4},
        {"word": "welcome", "start": 0.5, "end": 0.9}
      ]
    }
  ],
  "speakers": ["Speaker 1", "Speaker 2"]
}

POST https://api.stt.ai/v1/summarize

Summarize transcript text using an on-device LLM. No data leaves our servers.

リクエストボディ (JSON)

パラメータ	タイプ	必須	説明
`text`	string	Yes	Transcript text to summarize
`style`	string	No	`brief` (default), `detailed`, `action_items`, `bullet_points`

レスポンス

{
  "summary": "The team discussed Q3 revenue growth of 15%...",
  "style": "brief",
  "model": "qwen2.5-1.5b-instruct"
}

WS wss://api.stt.ai/v1/stream

Real-time speech-to-text via WebSocket. Send raw PCM audio (16-bit, 16kHz, mono) and receive transcription updates instantly.

プロトコル

Connect to wss://api.stt.ai/v1/stream
Send JSON config: {"language": "en", "model": "large-v3-turbo"}
Wait for {"status": "ready"}
Stream raw PCM Int16 audio chunks (binary frames)
Receive JSON updates: {"text": "...", "partial": "..."}
Send {"action": "stop"} to finalize

サーバーからのメッセージ

フィールド	説明
`status`	`"ready"` — connection established, ready for audio
`partial`	Partial/interim transcript (updates as you speak)
`text`	Finalized transcript segment
`is_final`	`true` when stream is complete

GET https://api.stt.ai/v1/models

List all available transcription models. No authentication required.

{
  "models": [
    {"id": "large-v3-turbo", "name": "Whisper Large V3 Turbo", "languages": 99, "speed": "fast"},
    {"id": "large-v3", "name": "Whisper Large V3", "languages": 99, "speed": "standard"},
    {"id": "medium", "name": "Whisper Medium", "languages": 99, "speed": "fast"},
    {"id": "small", "name": "Whisper Small", "languages": 99, "speed": "very_fast"}
  ]
}

GET https://api.stt.ai/v1/languages

List all supported languages with ISO codes. No authentication required.

{
  "languages": [
    {"code": "en", "name": "english"},
    {"code": "es", "name": "spanish"},
    {"code": "fr", "name": "french"},
    ...
  ]
}

POST https://api.stt.ai/v1/translate

Translate text to 450+ languages using MadLAD-400 (Apache 2.0). Runs on-device — no third-party APIs.

Request Body (JSON)

Parameter	Type	Required	Description
`text`	string	Yes*	Single text to translate
`texts`	array	Yes*	Array of texts to batch translate
`target`	string	Yes	Target language code (e.g., "es", "fr", "zh")
`source`	string	No	Source language code (default: "en")

// Single text
{"translated_text": "Hola mundo", "source_language": "en", "target_language": "es"}

// Batch
{"translations": [{"translated_text": "Hola"}, {"translated_text": "Mundo"}]}

POST https://api.stt.ai/v1/analyze

Analyze transcript text: sentiment, topics, entities, action items, questions, PII redaction.

Request Body (JSON)

Parameter	Type	Description
`text`	string	Text to analyze
`type`	string	`sentiment`, `topics`, `entities`, `action_items`, `questions`, `pii_redact`

POST https://api.stt.ai/v1/generate

Generate content from transcript text: blog posts, social media, meeting notes, study guides, flashcards, quizzes.

Request Body (JSON)

Parameter	Type	Description
`text`	string	Transcript text
`type`	string	`blog_post`, `social_media`, `newsletter`, `key_quotes`, `show_notes`, `meeting_notes`, `study_guide`, `flashcards`, `quiz`, `chapter_markers`

POST https://api.stt.ai/v1/enhance-audio

Remove noise and normalize audio. Returns the enhanced WAV file.

Send as multipart/form-data with a file field. Returns binary audio/wav.

POST https://api.stt.ai/v1/tts

Clone a voice from a reference audio clip and generate speech. Uses F5-TTS (MIT license).

Send as multipart/form-data:

Parameter	Type	Description
`reference`	file	3-10 seconds of voice reference audio
`text`	string	Text to speak in the cloned voice

Returns binary audio/wav. Headers include X-Duration and X-Generation-Time.

POST https://api.stt.ai/v1/embed

Generate sentence embeddings for semantic search. 384-dimensional vectors from all-MiniLM-L6-v2.

Request Body (JSON)

{"texts": ["Hello world", "How are you"]}

Response

{"embeddings": [[0.123, -0.456, ...], [...]], "dimensions": 384}

GET https://api.stt.ai/health

Check GPU and API health. No authentication required.

{
  "status": "ok",
  "gpu_available": true,
  "gpu_name": "NVIDIA A100",
  "gpu_memory_mb": 8188
}

REST API (Django)

Manage your account, transcripts, API keys, and more via the Django REST API at https://stt.ai/api/.

GET PUT https://stt.ai/api/v1/account/

Get or update your account info, email preferences, credits, plan details.

GET https://stt.ai/api/v1/transcripts/

List your transcripts with pagination. Filter by status, language, date.

GET DELETE https://stt.ai/api/v1/transcripts/:id/

Get transcript detail with segments, or delete a transcript.

GET https://stt.ai/api/v1/transcripts/:id/export/:format/

Export as txt, srt, vtt, json, csv, docx, or pdf.

POST https://stt.ai/api/v1/transcripts/:id/chat/

Ask AI questions about a transcript. Uses RAG with semantic search + Qwen2.5 LLM.

// Request
{"question": "What were the action items?", "session_id": "optional"}

// Response
{"answer": "...", "sources": [{"segment_order": 5, "text": "...", "score": 0.92}]}

POST https://stt.ai/api/v1/transcripts/:id/analyze/

Analyze transcript: sentiment, topics, entities, action_items, questions.

POST https://stt.ai/api/v1/transcripts/:id/generate/

Generate content: blog_post, social_media, meeting_notes, study_guide, flashcards, quiz.

GET POST https://stt.ai/api/v1/keys/

List or create API keys. POST returns the raw key once.

DELETE https://stt.ai/api/v1/keys/:id/

Revoke an API key.

GET https://stt.ai/api/v1/usage/

30-day usage breakdown by day.

GET https://stt.ai/api/v1/cloud/

List Private Cloud instances (if subscribed).

コード例

cURL

# Transcribe a file
curl -X POST https://api.stt.ai/v1/transcribe \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@meeting.mp3" \
  -F "model=large-v3-turbo" \
  -F "language=auto" \
  -F "diarize=true"

# Get SRT subtitles
curl -X POST https://api.stt.ai/v1/transcribe \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@video.mp4" \
  -F "response_format=srt" \
  -o subtitles.srt

# Summarize text
curl -X POST https://api.stt.ai/v1/summarize \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Full transcript text here...", "style": "brief"}'

# List models (no auth needed)
curl https://api.stt.ai/v1/models

# Health check
curl https://api.stt.ai/health

Python

import requests

API_KEY = "YOUR_API_KEY"
BASE = "https://api.stt.ai"

# Transcribe a file
with open("meeting.mp3", "rb") as f:
    resp = requests.post(
        f"{BASE}/v1/transcribe",
        headers={"Authorization": f"Bearer {API_KEY}"},
        files={"file": ("meeting.mp3", f, "audio/mpeg")},
        data={"model": "large-v3-turbo", "language": "auto", "diarize": "true"},
    )

result = resp.json()
print(f"Language: {result['language']}, Duration: {result['duration']:.1f}s")

for seg in result["segments"]:
    print(f"[{seg['start']:.1f}s] {seg.get('speaker', '')}: {seg['text']}")

# Summarize the transcript
summary = requests.post(
    f"{BASE}/v1/summarize",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={"text": result["text"], "style": "bullet_points"},
).json()
print(summary["summary"])

Node.js

const fs = require("fs");
const FormData = require("form-data");

const API_KEY = "YOUR_API_KEY";
const BASE = "https://api.stt.ai";

async function transcribe(filePath) {
  const form = new FormData();
  form.append("file", fs.createReadStream(filePath));
  form.append("model", "large-v3-turbo");
  form.append("language", "auto");
  form.append("diarize", "true");

  const resp = await fetch(`${BASE}/v1/transcribe`, {
    method: "POST",
    headers: { Authorization: `Bearer ${API_KEY}`, ...form.getHeaders() },
    body: form,
  });

  const result = await resp.json();
  console.log(`Duration: ${result.duration.toFixed(1)}s`);

  for (const seg of result.segments) {
    console.log(`[${seg.start.toFixed(1)}s] ${seg.speaker}: ${seg.text}`);
  }
  return result;
}

transcribe("meeting.mp3");

WebSocket (Browser)

// Real-time transcription from microphone
const ws = new WebSocket("wss://api.stt.ai/v1/stream");
ws.binaryType = "arraybuffer";

ws.onopen = () => {
  ws.send(JSON.stringify({ language: "auto", model: "large-v3-turbo" }));
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  if (data.status === "ready") {
    startStreaming(); // Begin sending audio
  }
  if (data.text) console.log("Final:", data.text);
  if (data.partial) console.log("Partial:", data.partial);
};

async function startStreaming() {
  const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
  const ctx = new AudioContext({ sampleRate: 16000 });
  const source = ctx.createMediaStreamSource(stream);
  const processor = ctx.createScriptProcessor(4096, 1, 1);

  processor.onaudioprocess = (e) => {
    const float32 = e.inputBuffer.getChannelData(0);
    const int16 = new Int16Array(float32.length);
    for (let i = 0; i < float32.length; i++) {
      int16[i] = Math.max(-32768, Math.min(32767, float32[i] * 32767));
    }
    if (ws.readyState === 1) ws.send(int16.buffer);
  };

  source.connect(processor);
  processor.connect(ctx.destination);
}

// Stop: ws.send(JSON.stringify({ action: "stop" }));

エラー処理

The API returns standard HTTP status codes with JSON error bodies.

ステータス	意味	タイミング
`200`	OK	Request succeeded
`400`	Bad Request	Missing file, unsupported format
`401`	Unauthorized	Invalid or missing API key
`402`	Payment Required	No credits remaining
`429`	Too Many Requests	Rate limit exceeded (free tier)
`503`	Service Unavailable	GPU temporarily unavailable

// Error response format
{"error": "No credits remaining. Upgrade your plan."}

SDKとライブラリ

Official SDKs for Python and Node.js. Install and start transcribing in minutes.

Python

pip install sttai

Node.js

npm install @sttainpm/sttai

REST API

あらゆるHTTPクライアントで動作

始める準備はできましたか？

無料で登録して、数秒でAPIキーを取得。月600分無料。

無料登録料金を見る