Benchmarks

Introducing the Turns endpoint

Eager, accurate turn detection for voice agents with native barge-in

June 10, 2026 3 min read

Today we're releasing the Turns endpoint — a streaming speech-to-text mode purpose-built for voice agents.

Eager turn detection

Turns predicts the end of a speaker's turn eagerly — it reads the acoustic and linguistic signal as the audio arrives and commits the moment the turn is genuinely complete, rather than waiting out a fixed silence timer.

The result is an agent that replies the instant you stop talking, without clipping you mid-sentence when you simply pause to think.

Automatic barge-in

Real conversations interrupt. The Turns endpoint supports barge-in out of the box: when a speaker starts talking over the agent, Turns detects it immediately and surfaces the new turn, so your agent can stop speaking and listen. No separate interruption pipeline required.

Benchmarks

We ran the Turns endpoint through the open Pipecat STT benchmark — 1,000 samples from the smart-turn-data-v3.1 dataset, scored on semantic WER (only errors that would change an LLM agent's understanding) and TTFS (time from when you stop speaking to the final transcript).

1.10%
Pooled semantic WER
1.39%
Mean semantic WER
326ms
Median TTFS
Resonant-1 Turns Speechmatics Cartesia ink-2 Soniox stt-rt-v4 Deepgram Nova-3 NVIDIA Nemotron 3.0 AssemblyAI u3-rt-pro ElevenLabs Scribe v2 RT OpenAI gpt-4o-transcribe Azure Other systems
0 1 2 3 4 5 0 300 600 900 1200 TTFS median, ms (lower better) Accuracy (semantic WER %, lower better) ↙ ideal · fast + accurate 495 1016 299 249 335 247 221 637 281 Resonant-1 Turns · 326ms

Getting started

Head over to our console and get started with our turns endpoint today!