Emmagatzematge encriptat del client — Les vostres transcripcions estan encriptades al vostre navegador. Fins i tot no els podem llegir. Aprèn com funciona →

Confiat per professionals arreu del món

Podcasts Els periodistes Investigadors Estudiants Equips legals Pros mèdicsunit synonyms for matching user input

Paraula als models de text

Escolliu el millor motor per al vostre àudio

Visualitza tots els models →

How STT.ai Works

Tres passes a una transcripció exacta

1. Puja, grava o enganxa l' URL

Arrossegueu i deixeu anar qualsevol fitxer d' àudio o vídeo (MP3, WAV, MP4 i 20+ formats). Enregistrar des del micròfon en temps real. O enganxar un enllaç des de YouTube, Vimeo, TikTok i 1. 300+ plataformes.

2. A AI Trancrics amb la vostra elecció del model

Escolliu des de 10+A models incloent el Rumors, NVIDIA Canari (# 1 exactitud) i Moonshina. Detecta automàticament el llenguatge des de 100+ opcions. L' altaveu identifica qui ha dit què.

3. Exportar, compartir, o Integrar

Baixeu com TXT, SRT, VT, DOCX, JSON o PDF. Compartiu mitjançant enllaç. Useu la nostra API per integrar la transcripció a l' aplicació. Perfecte per subtítols, reunions de notes, podcasts i més.

Casos d' ús famosos

Tots els casos d' ús →

Reunió

Elements d' acció de les notes de la Reunió

Podcasts

& Mostra notes

Subtítols

SRT, VTT i mésPlease take the official translations! You find them here: http: // europa. eu. int/ eur- lex/ lex/ LexUriServ/ LexUriServ. do? uri=CELEX: 32001L0059: EN: HTML

MedicalCity name (optional, probably does not need a translation)

Recripció segura

Lliçons

Notes de classe i guies d' estudi

Legal

Deposicions i tribunals

Tot el que necessiteu per al & vídeo d' àudio

70+ eines lliures impulsades per IA

Paraula a text

Grava fitxers d' àudio i vídeo

Transcripció en directe

Recripció en temps real del micròfon

Transcripts de YouTube

Extreu els títols de qualsevol vídeo

Editor de subtítols

Editeu fitxers SRT i VTT en línia

Eliminador de soroll

Elimina el soroll de fons de l' àudio

Convertidor d' àudio

MP3, WAV, FLAC, OGG, AAC i més

Vocal Eliminar

Isola les vocals o les elimina

Trimmer d' àudio

Retalla i retalla fitxers d' àudio

Convertidor de títols

SRT, formats VT, SSA, SBV

Minuts de junta

Extreu elements d' acció i resums

Text a veu

Converteix text a un discurs natural

Traductor de subtítols

Tradueix els subtítols a 100+ llengües

Visualitza totes les eines 70+ →

100+

Idiomes acceptats

70+

Eines lliures

1,300+

Platines implementades

Exporta els formats

Desenvolupador- estrella API

Integra el text a la vostra aplicació en minuts.

REST + WebSocket — S' està carregant el fitxer i el flux d' hora real

Múltiples models — Rumors, Canary, millorats i més

Diarització del president — Detecta automàticament qui ha dit què

Sortida flexible — JSON, TXT, SRT, VT amb marques de temps de paraula

API Docs PlaygroundCity name (optional, probably does not need a translation)

import requests

response = requests.post(
    "https://api.stt.ai/v1/transcribe",
    headers={"Authorization": f"Bearer {API_KEY}"},
    files={"file": open("meeting.mp3", "rb")},
    data={
        "model": "large-v3-turbo",
        "language": "auto",
        "diarize": "true",
        "response_format": "json",
    },
)

result = response.json()
for seg in result["segments"]:
    print(f"{seg['speaker']}: {seg['text']}")

import fs from "fs";

const form = new FormData();
form.append("file", fs.createReadStream("meeting.mp3"));
form.append("model", "large-v3-turbo");
form.append("language", "auto");
form.append("diarize", "true");

const res = await fetch("https://api.stt.ai/v1/transcribe", {
  method: "POST",
  headers: { Authorization: `Bearer ${API_KEY}` },
  body: form,
});

const { segments } = await res.json();
segments.forEach(s =>
  console.log(`${s.speaker}: ${s.text}`)
);

Canviar d'un altre discurs al servei de text?

STT.ai vs Otter.ai STT.ai vs TurboScribe STT.ai vs Fireflies STT.ai vs Rev Compara- ho tot →

Pricació simple, transparent

Deixa lliure. Escala mentre creixis.

Lliure

$0/mounit description in lists

600 mins/ mesos

5 idiomes
Exportació TXT i SRT
Accés de l' API

Iniciador

$9/mounit description in lists

3000 mins/ mesos

100+ llengües
Tots els models de la IA
Tots els formats d' exportació

PREGAT POP PROBLEConstellation name (optional)

Pro

$19/mounit description in lists

7, 500 mins/ mesos

Recripcions privades
Llocs d' equip sense límit
Processament de prioritat

Negocis

$39/mounit description in lists

20.000 mim/ mesos

Tot en Pro
Emmagatzematge de 50K min
Xat d' IA sense límit

Visualitza tots els plans i fixació de preus →

Idiomes acceptats

Totes 100+ llengües →

English Spanish French German Japanese Chinese Arabic Hindi Portuguese Russian Korean Italian Turkish Dutch Polish +85 més

A punt per transcrivir?

Publica el primer fitxer lliure. Sense targeta de crèdit, sense senyal. 600 minuts per mes al pla lliure.

Comença la transcribació

Preguntes més freqüents

speech to text runs in your browser: paste a URL, upload a file, or record from your mic. STT.ai picks the AI model and returns the transcript in under 5 minutes. Export as TXT, SRT, VTT, DOCX, JSON, or PDF.

Yes — every visitor gets 600 free minutes/month on STT.ai, usable for speech to text the same as any other workflow. Paid plans starting at $5/month unlock longer files, private transcripts, and priority queueing.

speech to text runs on the same AI models as the rest of STT.ai — our best models reach 95-97% accuracy on clean speech (3-5% Word Error Rate on benchmarks). Switch models on the fly if the first pass is below your target.

speech to text can run on any of STT.ai's 10+ models — STT.ai Enhanced (most accurate), Whisper Large V3 (99 languages), NVIDIA Canary (#1 WER on supported langs), Whisper Turbo (fast), Moonshine (lightweight), and more.

Yes. Every transcript exports as SRT or VTT — works with YouTube, Vimeo, TikTok, VLC, and every major video player. The burn-subtitles tool overlays them onto video as hardsubs.

Yes. Speaker diarization automatically labels each voice (Speaker 1, Speaker 2, ...) and you can rename them in the built-in editor. Works across all models and languages.

Most speech to text jobs finish in under 5 minutes. A 1-hour audio file typically completes in 2-3 minutes with our fastest models. Speed depends on chosen model and current GPU load.

speech to text accepts 20+ formats — MP3, WAV, M4A, FLAC, OGG, MP4, MKV, MOV, WebM, AVI, and more. Output to TXT, SRT, VTT, DOCX, JSON, or PDF.

Yes. Audio files submitted to speech to text are processed and deleted by default. Pro plans add client-side encryption — even if STT.ai's database is breached, your transcripts are unreadable without your key. Data is never used for model training without explicit opt-in.

Yes. STT.ai offers a REST API with Python and Node.js SDKs, plus an MCP server for Claude and Cursor — all usable for speech to text workflows. Free API tier includes 100 minutes/month.

Yes. Every transcript opens in the built-in editor where you can correct words, rename speakers, adjust timestamps, and add notes. All changes save automatically.

Every transcript gets a unique shareable URL. Export to DOCX or PDF for email. Pro plans add password-protected and permanent links — useful for client work.

STT.ai handles 1,300+ platforms including YouTube, Vimeo, TikTok, SoundCloud, Zoom, Google Meet, podcast hosts, and more. URL transcription works with publicly-available content only — DRM-protected sources can't be transcribed.

Free AI Paraula a text

Paraula als models de text

How STT.ai Works

1. Puja, grava o enganxa l' URL

2. A AI Trancrics amb la vostra elecció del model

3. Exportar, compartir, o Integrar

Casos d' ús famosos

Tot el que necessiteu per al & vídeo d' àudio

Desenvolupador- estrella API

Pricació simple, transparent

Idiomes acceptats

A punt per transcrivir?

Preguntes més freqüents

How does speech to text work on STT.ai?

Is speech to text free?

How accurate is speech to text?

What AI models can I use for speech to text?

Can I get subtitles from speech to text?

Does speech to text detect different speakers?

How long does speech to text take?

What input formats does speech to text support?

Is my audio private when I use speech to text?

Is there a speech to text API?

Can I edit a speech to text transcript after?

How do I share what speech to text produces?

What other platforms work beyond speech to text?