Report Bug / Feature Request

AI Policy & Transparency

What AI we use, where it runs, and how we comply with disclosure requirements (EU AI Act Article 50, effective 2026-08-02).

TL;DR

All transcripts are AI-generated. Every output carries a machine-readable disclosure.
All AI runs on our own GPU. We do NOT send your audio or text to OpenAI, Anthropic, Google, or any third-party LLM API.
We do NOT train base models on your transcripts. Only an opt-in fine-tune uses corrections you explicitly make.
Synthetic voice clones (TTS) are clearly labeled as AI-generated in filename, metadata, and on the page.

Models we use

Model	Used for	License	Runs on
Whisper large-v3-turbo (faster-whisper)	Transcription (default)	MIT	Our GPU
STT.ai Enhanced (custom fine-tune)	Transcription (paid plans)	MIT (base) / Proprietary (fine-tune weights)	Our GPU
Vosk	Real-time word streaming	Apache 2.0	Our GPU
SpeechBrain ECAPA-TDNN	Speaker diarization	Apache 2.0	Our GPU
MadLAD-400 3B	Translation (450+ languages)	Apache 2.0	Our GPU
Qwen2.5-1.5B (llama.cpp)	Summary, analysis, content generation, RAG chat	Apache 2.0	Our GPU
F5-TTS	Voice cloning / text-to-speech	MIT	Our GPU
all-MiniLM-L6-v2	Embeddings for RAG search	Apache 2.0	Our GPU

We do not call OpenAI, Anthropic, Google Cloud, or any third-party LLM API for any feature. Every model above runs on hardware we own and operate. The only external AI service we use is translateapi.ai (also Muddy Holdings) for translating UI strings — this never touches your transcribed content.

How we disclose AI generation

Transcript HTML pages: include <meta name="ai-generated" content="true">, a JSON-LD CreativeWork annotation marking the SoftwareApplication creator, and a visible footer line on the page.
Text exports (TXT, SRT, VTT, JSON, CSV, DOCX, PDF): include an 'AI-generated transcript' header line at the top of every file.
Synthetic voice / TTS output: WAV files include a 'synthetic-voice' tag in metadata and a clear notice on the download page. Audible disclaimer is on the roadmap.
API responses: include an _ai_generated: true field in every JSON response that contains transcribed content.

Training data

Base models (Whisper, MadLAD, Qwen, etc.) come pre-trained from their respective publishers. We use them as-shipped.
Your transcripts are NOT used to train base models.
If you correct a transcript segment (the pencil icon) or mark it incorrect (the flag icon), that correction may be used to fine-tune our STT.ai Enhanced model — but only the (incorrect text → corrected text) pair, never the surrounding context. You can opt out at /privacy-settings/.
Audio you upload is not retained beyond what's required to produce the transcript. Files are deleted within 24 hours of upload via the cleanup_uploads cron. The transcript text follows the retention policy of your plan (anonymous 24h, free 30 days, Pro+ permanent).

Accuracy and errors

AI transcription is not perfect. Word error rates vary by speaker accent, audio quality, language, and domain vocabulary. For critical use (legal, medical, regulated industries) verify against the original audio. Our public WER benchmarks per model are at /models/.

Compliance questions

For EU AI Act, GDPR, or other compliance questions: hello@stt.ai or use the contact form.

Last updated: 2026-04-26.