Benchmarks

Introducing the Turns endpoint

Eager, accurate turn detection for voice agents with native barge-in

June 10, 2026 3 min read

‍

Today we're releasing the Turns endpoint — a streaming speech-to-text mode purpose-built for voice agents.

‍

Eager turn detection

‍

Turns predicts the end of a speaker's turn eagerly — it reads the acoustic and linguistic signal as the audio arrives and commits the moment the turn is genuinely complete, rather than waiting out a fixed silence timer.

The result is an agent that replies the instant you stop talking, without clipping you mid-sentence when you simply pause to think.

Automatic barge-in

‍

Real conversations interrupt. The Turns endpoint supports barge-in out of the box: when a speaker starts talking over the agent, Turns detects it immediately and surfaces the new turn, so your agent can stop speaking and listen. No separate interruption pipeline required.

‍

Benchmarks

We ran the Turns endpoint through the open Pipecat STT benchmark — 1,000 samples from the smart-turn-data-v3.1 dataset, scored on semantic WER (only errors that would change an LLM agent's understanding) and TTFS (time from when you stop speaking to the final transcript).

1.10%

Pooled semantic WER

1.39%

Mean semantic WER

326ms

Median TTFS

Resonant-1 Turns Speechmatics Cartesia ink-2 Soniox stt-rt-v4 Deepgram Nova-3 NVIDIA Nemotron 3.0 AssemblyAI u3-rt-pro ElevenLabs Scribe v2 RT OpenAI gpt-4o-transcribe Azure Other systems

Getting started

Head over to our console and get started with our turns endpoint today!