Free Speech to Text Online

Converte parole en texte avec la transcription alimentée par l'IA. Chargez des fichiers audio, enregistrez à partir de votre microphone ou collez une URL. 100+ langues, 10+ modèles, 98%+ de précision.

Funziona con contenuti audio e video disponibili pubblicamente. Il contenuto protetto da DRM non è supportato.

Upgrade for Enhanced
Private transcript
Chat avec transcription
Unlock with Pro →
Drop file here or click to browse
MP3, WAV, M4A, FLAC, MP4, MKV, MOV, WebM — jusqu'à 2GB
Batch upload multiple files Toulouse with Pro
Upgrade for Enhanced
Private transcript
Chat avec transcription
Unlock with Pro →
Upgrade for Enhanced
Recording: 0:00
En-temps réel Vosk (instant)
Enhanced Whisper (precis)
Liens publics: 24h, texte seulement · Iscriviti alla newsletter Télécharger pour 7d + audio · Pro Bonjour pour des liens privés

AI auto-corrects mentre tu parli — la précision améliore avec plus de temps de parole.

Teste votre microphone avant de commencer
❤️ Love STT.ai? Diga a vos amis!
You've used your free transcriptions

Inscrivez-vous gratuitement pour obtenir 600 minutes/mois, ou faites l'upgrade pour des transcriptions illimitées.

10 min/jour gratuits 600 min gratuit avec inscription No credit card Encrypted
Iscriviti gratis →

1. Upload Recording de Voz

Upload an audio or video file, paste a URL, or record speech from your mic.

2. AI converts speech to text (AI converts speech to text)

Choise from 10+ AI models. Detection speaker and language auto-detect included.

3. Export Your Transcript (Exporter Votre Transcription)

Télécharger en 6 formats. Partager les liens de transcription avec lecture audio.

Formats d'entrée de voix supportés

Speech to Text Use Cases

Pronto per convertire la voce in testo?

Start Free →

Frequently Asked Questions - FAQ

Speech to text (aka recognition of speech or ASR) converts spoken audio into written words automatically.STT.ai runs your recording through an AI model that listens to the audio and outputs editable text with timestamps and speaker labels — no typing required.

Un modèle acoustique mappe la forme d'onde sonore aux phonèmes, puis un modèle linguistique les assemble dans les mots et la ponctuation les plus probables.STT.ai fait cela sur GPU avec des modèles comme Whisper Large V3 et NVIDIA Canary, donc une enregistrement d'une heure est habituellement fait en 2-3 minutes.

Ya. Chaque visiteur obtient 600 minutes gratuites par mois sans inscription requise pour votre premier fichier.Les plans payés commencent à $5/mois et ajoutent des fichiers plus longs, des transcriptions privées et un traitement prioritaire.

On clean speech our best models reach 95-97% accuracy (a 3-5% Word Error Rate on benchmarks). Accuracy drops with background noise, heavy accents, crosstalk, or low-bitrate audio — using a decent microphone and a quiet room makes the biggest difference.

Yes. Speak into your microphone and STT.ai streams the transcript live via the live-transcription tool. You can also upload a finished recording for batch transcription if you don't need it word-by-word as you talk.

STT.ai recognizes 100+ languages and auto-detects the spoken language for most audio. You can also set the language manually for a small accuracy lift, and mixed-language recordings are handled by switching mid-clip.

Yes. Speaker diarization labels each voice (Speaker 1, Speaker 2, ...) and you can rename them in the editor. This works across every supported model and language.

STT.ai accepts 20+ formats including MP3, WAV, M4A, FLAC, OGG, MP4, MKV, MOV, WebM, and AVI. Output to TXT, SRT, VTT, DOCX, JSON, or PDF.

Speech to text transcribes WHAT was said into words; voice recognition (speaker identification) determines WHO said it. STT.ai does both — transcription plus speaker diarization — but the terms describe different tasks.

Yes. Audio is processed and deleted by default. Pro plans add client-side encryption so transcripts are unreadable without your key, even to STT.ai, and your data is never used for model training without explicit opt-in.

Yes. STT.ai has a REST API with Python and Node.js SDKs plus an MCP server for Claude and Cursor. The free API tier includes 100 minutes/month, with per-second billing beyond that.

Yes. Every transcript opens in a built-in editor where you can fix misheard words, rename speakers, adjust timestamps, and add notes. Edits persist across every export format.