AI Policy & Transparency

What AI we use, where it runs, and how we comply with disclosure requirements (EU AI Act Article 50, effective 2026-08-02).

TL;DR

  • All transcripts are AI-generated. Every output carries a machine-readable disclosure.
  • All AI runs on our own GPU. We do NOT send your audio or text to OpenAI, Anthropic, Google, or any third-party LLM API.
  • We do NOT train base models on your transcripts. Only an opt-in fine-tune uses corrections you explicitly make.
  • Synthetic voice clones (TTS) are clearly labeled as AI-generated in filename, metadata, and on the page.

Models we use

ModelUsed forLicenseRuns on
Whisper large-v3-turbo (faster-whisper)Transcription (default)MITOur GPU
STT.ai Enhanced (custom fine-tune)Transcription (paid plans)MIT (base) / Proprietary (fine-tune weights)Our GPU
VoskReal-time word streamingApache 2.0Our GPU
SpeechBrain ECAPA-TDNNSpeaker diarizationApache 2.0Our GPU
MadLAD-400 3BTranslation (450+ languages)Apache 2.0Our GPU
Qwen2.5-1.5B (llama.cpp)Summary, analysis, content generation, RAG chatApache 2.0Our GPU
F5-TTSVoice cloning / text-to-speechMITOur GPU
all-MiniLM-L6-v2Embeddings for RAG searchApache 2.0Our GPU

We do not call OpenAI, Anthropic, Google Cloud, or any third-party LLM API for any feature. Every model above runs on hardware we own and operate. The only external AI service we use is translateapi.ai (also Muddy Holdings) for translating UI strings — this never touches your transcribed content.

How we disclose AI generation

  • Transcript HTML pages: include <meta name="ai-generated" content="true">, a JSON-LD CreativeWork annotation marking the SoftwareApplication creator, and a visible footer line on the page.
  • Text exports (TXT, SRT, VTT, JSON, CSV, DOCX, PDF): include an 'AI-generated transcript' header line at the top of every file.
  • Synthetic voice / TTS output: WAV files include a 'synthetic-voice' tag in metadata and a clear notice on the download page. Audible disclaimer is on the roadmap.
  • API responses: include an _ai_generated: true field in every JSON response that contains transcribed content.

Training data

  • Base models (Whisper, MadLAD, Qwen, etc.) come pre-trained from their respective publishers. We use them as-shipped.
  • Your transcripts are NOT used to train base models.
  • If you correct a transcript segment (the pencil icon) or mark it incorrect (the flag icon), that correction may be used to fine-tune our STT.ai Enhanced model — but only the (incorrect text → corrected text) pair, never the surrounding context. You can opt out at /privacy-settings/.
  • Audio you upload is not retained beyond what's required to produce the transcript. Files are deleted within 24 hours of upload via the cleanup_uploads cron. The transcript text follows the retention policy of your plan (anonymous 24h, free 30 days, Pro+ permanent).

Accuracy and errors

AI transcription is not perfect. Word error rates vary by speaker accent, audio quality, language, and domain vocabulary. For critical use (legal, medical, regulated industries) verify against the original audio. Our public WER benchmarks per model are at /models/.

Compliance questions

For EU AI Act, GDPR, or other compliance questions: hello@stt.ai or use the contact form.

Last updated: 2026-04-26.