Benchmarks

Introducing Resonant-1 and Resonant-1-flash

Two frontier speech models that excel in speed and accuracy

April 8, 2026 3 min read

Introducing Resonant-1 and Resonant-1-flash

Today we're launching resonant-1 and resonant-1-flash — two frontier speech models that set a new standard for both speed and accuracy.

Both models sit at the top of the Open ASR Leaderboard, trained on a four-phase schedule that relies heavily on reinforcement learning to push accuracy beyond state-of-the-art.

On the inference side, we've made deep optimizations to our model architecture and to Violin, our internal inference engine built specifically for speech. Fused kernels, CUDA graphs, and smarter scheduling during decoding allow Resonant-1-flash to process one hour of audio in under three seconds.

SOTA Performance in Short-Form English

#	Model	RTFx	Average	AMI	Earnings22	Gigaspeech	SPGISpeech	Tedlium
1	Resonant-1	1187	6.51	9.39	9.02	8.76	3.05	2.32
2	Resonant-1 Flash	1438	6.71	9.57	9.72	8.86	3.09	2.31
3	Cohere Transcribe	525	6.78	8.13	10.86	9.34	3.08	2.49
4	Zoom Scribe v1	—	6.80	10.03	9.53	9.61	1.59	3.22
5	ibm-granite/granite-4.0-1b-speech	280	6.81	8.44	8.48	10.14	3.89	3.10
6	Qwen/Qwen3-ASR-1.7B	148	6.93	10.56	10.25	8.74	2.84	2.28
7	nvidia/canary-qwen-2.5b	418	6.94	10.19	10.45	9.43	1.90	2.71
8	ElevenLabs Scribe v2	—	7.09	11.86	9.43	9.11	2.68	2.37
9	ibm-granite/granite-speech-3.3-8b	145	7.18	8.98	9.42	10.19	3.91	3.40
10	microsoft/Phi-4-multimodal-instruct	151	7.32	11.09	10.16	9.33	3.06	2.94

Word Error Rate (%) on Open ASR Leaderboard — lower is better. RTFx = realtime factor (higher = faster). Averages computed over AMI, Earnings22, Gigaspeech, SPGISpeech, and Tedlium.

Resonant-1 achieves the lowest average WER across all short-form English benchmarks, while resonant-1-flash delivers the fastest inference at 1438 realtime.

We omit Librispeech and Voxpopuli from the evaluations as we use these datasets during training and cannot guarantee a contaminated result, additionally from our observations, while training on these datasets showed significant WER improvement, overall generalisation was hurt

SOTA Performance in European Languages

Resonant-1 leads across French, Dutch, Spanish, and Polish, achieving the lowest average WER excluding Swedish.

Model	Avg (excl. Swe)	English	German	French	Dutch	Spanish	Polish	Swedish
Resonate	4.22	4.69	3.83	4.71	4.88	2.67	4.56	7.38
Whisper v3	4.71	4.78	4.58	5.72	5.63	2.95	4.61	7.23
Cohere Transcribe	4.98	5.68	4.06	5.17	5.71	3.68	5.61	—
Resonate Flash	5.13	5.10	4.58	5.58	6.15	3.38	6.02	10.14
Qwen3-ASR	5.94	4.26	3.86	4.72	7.23	3.24	4.61	19.31

Word Error Rate (%) across FLEURS test sets — lower is better.

Performance in Long-Form English

Processing one hour of speech in under 3 seconds, with little compromise on accuracy

‍

#	Model	Average	Earnings21	Earnings22	Tedlium	CORAAL
1	ElevenLabs Scribe v2	7.32	6.48	9.99	2.12	10.67
2	AssemblyAI Universal 3 Pro	8.34	7.62	10.59	2.32	12.83
3	Resonate	8.58	7.28	11.29	2.24	13.51
4	Speechmatics Enhanced	8.80	7.90	10.75	2.26	14.29
5	Rev AI Fusion	9.54	7.56	15.47	2.52	12.60
6	Rev AI Machine	9.64	7.63	15.72	2.92	12.28
7	Cohere Transcribe	9.73	8.70	12.66	2.23	15.34

Word Error Rate (%) on long-form benchmarks — lower is better.

‍

Pareto Optimal

Resonate Resonate Flash Cohere Transcribe NVIDIA Canary Qwen 2.5B IBM Granite 4.0 1B Qwen3-ASR-1.7B Kyutai STT 2.6B OpenAI Whisper Large v3 Moonshine Streaming Med

‍