Laporkan Permintaan Kutu / Fitur

Bebas Berbicara ke Teks Daring

Ubah pidato ke teks dengan transkripsi bertenaga AI. Unggah berkas audio, rekam dari mikrofon Anda, atau tempelkan URL. 100+ bahasa, 10+ model, akurasi 98%+.

Bekerja dengan audio & video yang tersedia di publik. Isi yang dilindungi DRM tidak didukung.

Tingkatkan untuk Diperbarui

Private transcript

Percakapan dengan transkrip

Buka Kunci dengan Pro →

Jatuhkan berkas di sini atau klik untuk diramban

MP3, WAV, M4A, FLAC, MP4, MKV, MOV, WebM sembari 2GB

Batch mengunggah beberapa berkas dengan Pro

Tingkatkan untuk Diperbarui

Private transcript

Percakapan dengan transkrip

Buka Kunci dengan Pro →

Tingkatkan untuk Diperbarui

Pidato real-time dengan teks. AI auto-treksis saat Anda berbicara dengan akurasi meningkatkan dengan pidato yang lebih panjang.

Uji mikrofon Anda terlebih dahulu

10 menit/hari bebas 600 menit gratis dengan signup Tidak ada kartu kredit Terenkripsi

Bebas mendaftar →

How speech to text works →

1. Unggah rekaman suara

Unggah file audio atau video, tempel URL, atau rekam dari mikrofon.

2. AI mengubah ucapan menjadi teks

Pilih dari 10+ model AI. Deteksi pembicara dan deteksi bahasa otomatis disertakan.

3. Ekspor transkrip Anda

Unduh dalam 6 format. Bagikan tautan transkrip dengan pemutaran audio.

Supported Speech Input Formats

MP3 WAV M4A FLAC OGG MP4 MKV MOV WebM AVI

Speech to Text Models

Choose the AI model that fits your needs — or let us pick the best one.

Speech to Text in 100+ Languages

English Spanish French German Japanese Arabic Hindi Portuguese Russian Korean Semua bahasa →

Speech to Text Use Cases

Ready to convert speech to text?

Mulai gratis →

Pertanyaan yang Sering Diajukan

Speech to text (also called speech recognition or ASR) converts spoken audio into written words automatically. STT.ai runs your recording through an AI model that listens to the audio and outputs editable text with timestamps and speaker labels — no typing required.

An acoustic model maps the sound waveform to phonemes, then a language model assembles those into the most likely words and punctuation. STT.ai does this on GPU with models like Whisper Large V3 and NVIDIA Canary, so a one-hour recording is usually done in 2-3 minutes.

Setiap pengunjung mendapat 600 menit gratis per bulan tanpa pendaftaran yang diperlukan untuk file pertama Anda dibayar mulai dari $5/bulan dan menambahkan file yang lebih panjang, transkrip pribadi, dan pemrosesan prioritas.

On clean speech our best models reach 95-97% accuracy (a 3-5% Word Error Rate on benchmarks). Accuracy drops with background noise, heavy accents, crosstalk, or low-bitrate audio — using a decent microphone and a quiet room makes the biggest difference.

Yes. Speak into your microphone and STT.ai streams the transcript live via the live-transcription tool. You can also upload a finished recording for batch transcription if you don't need it word-by-word as you talk.

STT.ai recognizes 100+ languages and auto-detects the spoken language for most audio. You can also set the language manually for a small accuracy lift, and mixed-language recordings are handled by switching mid-clip.

Yes. Speaker diarization labels each voice (Speaker 1, Speaker 2, ...) and you can rename them in the editor. This works across every supported model and language.

STT.ai accepts 20+ formats including MP3, WAV, M4A, FLAC, OGG, MP4, MKV, MOV, WebM, and AVI. Output to TXT, SRT, VTT, DOCX, JSON, or PDF.

Speech to text transcribes WHAT was said into words; voice recognition (speaker identification) determines WHO said it. STT.ai does both — transcription plus speaker diarization — but the terms describe different tasks.

Yes. Audio is processed and deleted by default. Pro plans add client-side encryption so transcripts are unreadable without your key, even to STT.ai, and your data is never used for model training without explicit opt-in.

Yes. STT.ai has a REST API with Python and Node.js SDKs plus an MCP server for Claude and Cursor. The free API tier includes 100 minutes/month, with per-second billing beyond that.

Yes. Every transcript opens in a built-in editor where you can fix misheard words, rename speakers, adjust timestamps, and add notes. Edits persist across every export format.

Bebas Berbicara ke Teks Daring

1. Unggah rekaman suara

2. AI mengubah ucapan menjadi teks

3. Ekspor transkrip Anda

Supported Speech Input Formats

Speech to Text Models

Speech to Text in 100+ Languages

Speech to Text Use Cases

Ready to convert speech to text?

Pertanyaan yang Sering Diajukan

Apa maksudnya SMS?

Bagaimana berbicara dengan teks bekerja?

Is STT.ai speech to text free?

How accurate is speech to text?

Can I convert speech to text in real time?

What languages does speech to text support?

Does speech to text identify who is speaking?

What audio and video formats can I convert to text?

Is speech to text the same as voice recognition?

Is my audio private when I use speech to text?

Can developers add speech to text via an API?

Can I edit the text after speech to text?