Free Speech to Text en linha
Convertir la votz en tèxt amb la transcripcion alimentada per IA. Telecargar de fichièrs àudio, enregistrar amb vòstre microfòn o colar una URL. Mai de 100 lengas, mai de 10 modèls, mai de 98% de precision.
Enregistrament vocal
Telecargar un fichièr àudio o vidèo, colar una URL, o enregistrar de discors a partir de vòstre microfòn.
2. AI convertit la votz en tèxt
Triar entre mai de 10 modèls d'IA. Deteccion de locutor e deteccion automatica de lenga inclusas.
3. Exportar vòstra transcripcion
Telecargar en 6 formats. Partejar de ligams de transcripcion amb lectura àudio.
Models de sintèsi vocala en tèxt
Seleccionatz lo modèl d'IA que correspond a vòstres besonhs — o nos permetètz de causir lo melhor.
Sintetizar la votz en tèxt en mai de 100 lengas
@ info: status
Començar liure →Questions frequentas
La conversion vocala en tèxt (també nomenada reconeissença vocala o ASR) convertis automaticament l'audio parlat en mots escrichs. STT.ai executa vòstre enregistrament a travèrs d'un modèl d'IA qu'escóta l'audio e produsís un tèxt editable amb de marcadors de temps e d'etiquetas de locutor — pas de picada necessària.
Un modèl acústic transforma la forma d'onda del son en fonèmas, puèi un modèl de lenga los assembla dins los mots e la ponctuacion pus probables. STT.ai o fa sus GPU amb de modèls coma Whisper Large V3 e NVIDIA Canary, doncas un enregistrament d'una ora se fa normalament en 2-3 minutas.
@ info: credit
On clean speech our best models reach 95-97% accuracy (a 3-5% Word Error Rate on benchmarks). Accuracy drops with background noise, heavy accents, crosstalk, or low-bitrate audio — using a decent microphone and a quiet room makes the biggest difference.
Yes. Speak into your microphone and STT.ai streams the transcript live via the live-transcription tool. You can also upload a finished recording for batch transcription if you don't need it word-by-word as you talk.
STT.ai recognizes 100+ languages and auto-detects the spoken language for most audio. You can also set the language manually for a small accuracy lift, and mixed-language recordings are handled by switching mid-clip.
Yes. Speaker diarization labels each voice (Speaker 1, Speaker 2, ...) and you can rename them in the editor. This works across every supported model and language.
STT.ai accepts 20+ formats including MP3, WAV, M4A, FLAC, OGG, MP4, MKV, MOV, WebM, and AVI. Output to TXT, SRT, VTT, DOCX, JSON, or PDF.
Speech to text transcribes WHAT was said into words; voice recognition (speaker identification) determines WHO said it. STT.ai does both — transcription plus speaker diarization — but the terms describe different tasks.
Yes. Audio is processed and deleted by default. Pro plans add client-side encryption so transcripts are unreadable without your key, even to STT.ai, and your data is never used for model training without explicit opt-in.
Yes. STT.ai has a REST API with Python and Node.js SDKs plus an MCP server for Claude and Cursor. The free API tier includes 100 minutes/month, with per-second billing beyond that.
Yes. Every transcript opens in a built-in editor where you can fix misheard words, rename speakers, adjust timestamps, and add notes. Edits persist across every export format.