AI Policy & Transparency
What AI we use, where it runs, and how we comply with disclosure requirements (EU AI Act Article 50, effective 2026-08-02).
TL;DR
- All transcripts are AI-generated. Every output carries a machine-readable disclosure.
- All AI runs on our own GPU. We do NOT send your audio or text to OpenAI, Anthropic, Google, or any third-party LLM API.
- We do NOT train base models on your transcripts. Only an opt-in fine-tune uses corrections you explicitly make.
- Synthetic voice clones (TTS) are clearly labeled as AI-generated in filename, metadata, and on the page.
Models we use
| Model | Used for | License | Runs on |
|---|---|---|---|
| Whisper large-v3-turbo (faster-whisper) | Transcription (default) | MIT | Our GPU |
| STT.ai Enhanced (custom fine-tune) | Transcription (paid plans) | MIT (base) / Proprietary (fine-tune weights) | Our GPU |
| Vosk | Real-time word streaming | Apache 2.0 | Our GPU |
| SpeechBrain ECAPA-TDNN | Speaker diarization | Apache 2.0 | Our GPU |
| MadLAD-400 3B | Translation (450+ languages) | Apache 2.0 | Our GPU |
| Qwen2.5-1.5B (llama.cpp) | Summary, analysis, content generation, RAG chat | Apache 2.0 | Our GPU |
| F5-TTS | Voice cloning / text-to-speech | MIT | Our GPU |
| all-MiniLM-L6-v2 | Embeddings for RAG search | Apache 2.0 | Our GPU |
We do not call OpenAI, Anthropic, Google Cloud, or any third-party LLM API for any feature. Every model above runs on hardware we own and operate. The only external AI service we use is translateapi.ai (also Muddy Holdings) for translating UI strings — this never touches your transcribed content.
How we disclose AI generation
- Transcript HTML pages: include <meta name="ai-generated" content="true">, a JSON-LD CreativeWork annotation marking the SoftwareApplication creator, and a visible footer line on the page.
- Text exports (TXT, SRT, VTT, JSON, CSV, DOCX, PDF): include an 'AI-generated transcript' header line at the top of every file.
- Synthetic voice / TTS output: WAV files include a 'synthetic-voice' tag in metadata and a clear notice on the download page. Audible disclaimer is on the roadmap.
- API responses: include an _ai_generated: true field in every JSON response that contains transcribed content.
Training data
- Base models (Whisper, MadLAD, Qwen, etc.) come pre-trained from their respective publishers. We use them as-shipped.
- Your transcripts are NOT used to train base models.
- If you correct a transcript segment (the pencil icon) or mark it incorrect (the flag icon), AND you have opted in at /privacy-settings/ ("Model Training" toggle, default off), the (incorrect text → corrected text) pair may be used to fine-tune our STT.ai Enhanced text-correction model. Only the text pair — never the surrounding audio — is used by this path.
- Separately, the "Contribute corrections + audio to Voice Lab" toggle (also at /privacy-settings/, also default off) opts you in to contributing the audio of segments you correct, paired with the corrected text, to our Voice Lab dataset under CC-BY-SA-4.0. The two toggles are independent — you can grant either, both, or neither.
- Audio you upload is deleted within 24 hours via the cleanup_uploads cron — UNLESS you have opted in to "Contribute corrections + audio to Voice Lab" at /privacy-settings/. In that case the audio is archived for up to 90 days so the daily Voice Lab ingest can extract the corrected segment; once ingested (or after 90 days, whichever is sooner), the source audio is deleted. The transcript text follows the retention policy of your plan (anonymous 24h, free 30 days, Pro+ permanent).
Accuracy and errors
AI transcription is not perfect. Word error rates vary by speaker accent, audio quality, language, and domain vocabulary. For critical use (legal, medical, regulated industries) verify against the original audio. Our public WER benchmarks per model are at /models/.
Compliance questions
For EU AI Act, GDPR, or other compliance questions: hello@stt.ai or use the contact form.
Last updated: 2026-04-26.