Transcribe with NVIDIA Parakeet

Works with publicly available audio & video. DRM-protected content is not supported.

Upgrade for Enhanced
Private transcript
Chat with transcript
Unlock with Pro →
Drop file here or click to browse
MP3, WAV, M4A, FLAC, MP4, MKV, MOV, WebM — up to 2GB
Upgrade for Enhanced
Private transcript
Chat with transcript
Unlock with Pro →
Upgrade for Enhanced
Recording: 0:00
Real-time Vosk (instant)
Enhanced Whisper (accurate)
Public links: 24h, text only · Sign up for 7d + audio · Pro for private links

Real-time speech to text. AI auto-corrects as you speak — accuracy improves with longer speech.

Test your microphone first
❤️ Love STT.ai? Tell your friends!
You've used your free transcriptions

Sign up for free to get 600 minutes/month, or upgrade for unlimited transcriptions.

10 free min/day 600 min free with signup No credit card Encrypted
Sign up free →
3.0%
WER
1
Languages
55.0x
Speed
CC-BY-4.0
License

About NVIDIA Parakeet

NVIDIA Parakeet TDT 1.1B is a state-of-the-art English ASR model using FastConformer architecture with Token-and-Duration Transducer (TDT). It achieves near-human accuracy on standard English benchmarks and is highly optimized for NVIDIA GPUs.

Languages Supported by NVIDIA Parakeet

Frequently Asked Questions

NVIDIA Parakeet is a speech-to-text model by NVIDIA. STT.ai hosts NVIDIA Parakeet on our GPU infrastructure so you can use it without provisioning your own hardware — upload audio or video and pick NVIDIA Parakeet from the model picker.

On standard benchmarks, NVIDIA Parakeet achieves around 3.0% Word Error Rate. Real-world accuracy depends on audio quality, accent, and language; for noisy or accented recordings, expect a few percentage points higher WER.

NVIDIA Parakeet runs on STT.ai's free tier — every visitor gets 600 minutes/month at no cost. Paid plans add longer per-file limits, private transcripts, and priority queueing.

NVIDIA Parakeet is released under CC-BY-4.0, a permissive open-source license. You can self-host NVIDIA Parakeet on your own hardware or use our hosted version — both are commercially usable.

NVIDIA Parakeet supports 1 languages. Auto-detection picks the right language for most audio; you can also specify it manually for a small accuracy lift.

NVIDIA Parakeet processes audio at about 55.0x real-time on our GPUs. A 1-hour audio file finishes in under 1 minutes; longer files queue and notify by email when done.

NVIDIA Parakeet has 1.1B parameters. Larger models tend to be more accurate but slower; STT.ai hosts NVIDIA Parakeet on GPU so the parameter count doesn't affect your client-side performance.

NVIDIA Parakeet accepts every format STT.ai supports — MP3, WAV, M4A, FLAC, OGG, MP4, MKV, MOV, WebM, AVI, and others. Output as TXT, SRT, VTT, DOCX, JSON, or PDF.

Yes. Speaker diarization runs alongside NVIDIA Parakeet for every transcription — each speaker is labeled and you can rename them in the editor afterwards.

Yes. NVIDIA Parakeet runs in our managed environment — audio is processed and deleted by default and never used for training without explicit opt-in. Pro plans add client-side encryption for transcripts at rest.

Use the compare-stt tool to run NVIDIA Parakeet against any other supported model on the same audio — you'll see WER, segment count, speaker labels, and confidence scores side-by-side. The NVIDIA Parakeet vs Whisper Large V3 comparison is the most commonly run.

Yes. Specify "nvidia-parakeet" as the model parameter on the /v1/transcribe endpoint. Python and Node.js SDKs include NVIDIA Parakeet examples. Free API tier includes 100 minutes/month.

Yes. Because NVIDIA Parakeet is CC-BY-4.0-licensed, you can self-host it. STT.ai's open-source page lists the project repo and weights. Most production teams use our hosted version to skip GPU procurement, model swaps, and ops.