Free Speech to Text Online
Converte parole en texte avec la transcription alimentée par l'IA. Chargez des fichiers audio, enregistrez à partir de votre microphone ou collez une URL. 100+ langues, 10+ modèles, 98%+ de précision.
1. Upload Recording de Voz
Upload an audio or video file, paste a URL, or record speech from your mic.
2. AI converts speech to text (AI converts speech to text)
Choise from 10+ AI models. Detection speaker and language auto-detect included.
3. Export Your Transcript (Exporter Votre Transcription)
Télécharger en 6 formats. Partager les liens de transcription avec lecture audio.
Modeles de Speech to Text
Choise le modèle d'IA qui correspond à vos besoins — ou laissez-nous choisir le meilleur.
Traduire de la parole au texte en 100+ langues
Pronto per convertire la voce in testo?
Start Free →Frequently Asked Questions - FAQ
Speech to text (aka recognition of speech or ASR) converts spoken audio into written words automatically.STT.ai runs your recording through an AI model that listens to the audio and outputs editable text with timestamps and speaker labels — no typing required.
Un modèle acoustique mappe la forme d'onde sonore aux phonèmes, puis un modèle linguistique les assemble dans les mots et la ponctuation les plus probables.STT.ai fait cela sur GPU avec des modèles comme Whisper Large V3 et NVIDIA Canary, donc une enregistrement d'une heure est habituellement fait en 2-3 minutes.
Ya. Chaque visiteur obtient 600 minutes gratuites par mois sans inscription requise pour votre premier fichier.Les plans payés commencent à $5/mois et ajoutent des fichiers plus longs, des transcriptions privées et un traitement prioritaire.
On clean speech our best models reach 95-97% accuracy (a 3-5% Word Error Rate on benchmarks). Accuracy drops with background noise, heavy accents, crosstalk, or low-bitrate audio — using a decent microphone and a quiet room makes the biggest difference.
Yes. Speak into your microphone and STT.ai streams the transcript live via the live-transcription tool. You can also upload a finished recording for batch transcription if you don't need it word-by-word as you talk.
STT.ai recognizes 100+ languages and auto-detects the spoken language for most audio. You can also set the language manually for a small accuracy lift, and mixed-language recordings are handled by switching mid-clip.
Yes. Speaker diarization labels each voice (Speaker 1, Speaker 2, ...) and you can rename them in the editor. This works across every supported model and language.
STT.ai accepts 20+ formats including MP3, WAV, M4A, FLAC, OGG, MP4, MKV, MOV, WebM, and AVI. Output to TXT, SRT, VTT, DOCX, JSON, or PDF.
Speech to text transcribes WHAT was said into words; voice recognition (speaker identification) determines WHO said it. STT.ai does both — transcription plus speaker diarization — but the terms describe different tasks.
Yes. Audio is processed and deleted by default. Pro plans add client-side encryption so transcripts are unreadable without your key, even to STT.ai, and your data is never used for model training without explicit opt-in.
Yes. STT.ai has a REST API with Python and Node.js SDKs plus an MCP server for Claude and Cursor. The free API tier includes 100 minutes/month, with per-second billing beyond that.
Yes. Every transcript opens in a built-in editor where you can fix misheard words, rename speakers, adjust timestamps, and add notes. Edits persist across every export format.