Report Bug / Feature Request

Transcribe with NVIDIA Parakeet

Name: NVIDIA Parakeet
Author: NVIDIA

Works with publicly available audio & video. DRM-protected content is not supported.

Upgrade for Enhanced

Private transcript

Chat with transcript

Unlock with Pro →

Drop file here or click to browse

MP3, WAV, M4A, FLAC, MP4, MKV, MOV, WebM — up to 2GB

Batch upload multiple files with Pro

Upgrade for Enhanced

Private transcript

Chat with transcript

Unlock with Pro →

Upgrade for Enhanced

Real-time speech to text. AI auto-corrects as you speak — accuracy improves with longer speech.

Test your microphone first

10 free min/day 600 min free with signup No credit card Encrypted

3.0%

WER

Languages

55.0x

Speed

CC-BY-4.0

License

About NVIDIA Parakeet

NVIDIA Parakeet TDT 1.1B is a state-of-the-art English ASR model using FastConformer architecture with Token-and-Duration Transducer (TDT). It achieves near-human accuracy on standard English benchmarks and is highly optimized for NVIDIA GPUs.

Languages Supported by NVIDIA Parakeet

English

Model Info

ProviderNVIDIA
Architecture-
LicenseCC-BY-4.0
UpdatedMar 2026

Related Models

3.2% WER

4.2% WER

5.1% WER

3.5% WER

7.8% WER

Frequently Asked Questions

NVIDIA Parakeet is a speech-to-text model by NVIDIA. STT.ai hosts NVIDIA Parakeet on our GPU infrastructure so you can use it without provisioning your own hardware — upload audio or video and pick NVIDIA Parakeet from the model picker.

On standard benchmarks, NVIDIA Parakeet achieves around 3.0% Word Error Rate. Real-world accuracy depends on audio quality, accent, and language; for noisy or accented recordings, expect a few percentage points higher WER.

NVIDIA Parakeet runs on STT.ai's free tier — every visitor gets 600 minutes/month at no cost. Paid plans add longer per-file limits, private transcripts, and priority queueing.

NVIDIA Parakeet is released under CC-BY-4.0, a permissive open-source license. You can self-host NVIDIA Parakeet on your own hardware or use our hosted version — both are commercially usable.

NVIDIA Parakeet supports 1 languages. Auto-detection picks the right language for most audio; you can also specify it manually for a small accuracy lift.

NVIDIA Parakeet processes audio at about 55.0x real-time on our GPUs. A 1-hour audio file finishes in under 1 minutes; longer files queue and notify by email when done.

NVIDIA Parakeet has 1.1B parameters. Larger models tend to be more accurate but slower; STT.ai hosts NVIDIA Parakeet on GPU so the parameter count doesn't affect your client-side performance.

NVIDIA Parakeet accepts every format STT.ai supports — MP3, WAV, M4A, FLAC, OGG, MP4, MKV, MOV, WebM, AVI, and others. Output as TXT, SRT, VTT, DOCX, JSON, or PDF.

Yes. Speaker diarization runs alongside NVIDIA Parakeet for every transcription — each speaker is labeled and you can rename them in the editor afterwards.

Yes. NVIDIA Parakeet runs in our managed environment — audio is processed and deleted by default and never used for training without explicit opt-in. Pro plans add client-side encryption for transcripts at rest.

Use the compare-stt tool to run NVIDIA Parakeet against any other supported model on the same audio — you'll see WER, segment count, speaker labels, and confidence scores side-by-side. The NVIDIA Parakeet vs Whisper Large V3 comparison is the most commonly run.

Yes. Specify "nvidia-parakeet" as the model parameter on the /v1/transcribe endpoint. Python and Node.js SDKs include NVIDIA Parakeet examples. Free API tier includes 100 minutes/month.

Yes. Because NVIDIA Parakeet is CC-BY-4.0-licensed, you can self-host it. STT.ai's open-source page lists the project repo and weights. Most production teams use our hosted version to skip GPU procurement, model swaps, and ops.

Transcribe with NVIDIA Parakeet

About NVIDIA Parakeet

Languages Supported by NVIDIA Parakeet

Model Info

Related Models

Frequently Asked Questions

What is NVIDIA Parakeet?

How accurate is NVIDIA Parakeet?

Is NVIDIA Parakeet free to use?

What license does NVIDIA Parakeet use?

How many languages does NVIDIA Parakeet support?

How fast is NVIDIA Parakeet?

How big is the NVIDIA Parakeet model?

What audio formats can NVIDIA Parakeet transcribe?

Does NVIDIA Parakeet detect multiple speakers?

Is my data private when using NVIDIA Parakeet?

How does NVIDIA Parakeet compare to other STT models?

Can I use NVIDIA Parakeet via the API?

Can I run NVIDIA Parakeet on my own server?