Cartesia

Ink-2

The fastest and most accurate speech to text model

Ranked #1 on accuracy, built for voice agents with semantic endpointing and industry-leading latency.

#1
Accuracy — lowest WER of any streaming STT
88ms
Time to final transcript
10–100×
Throughput vs transformers

Built for Voice Agents

Four capabilities that make Ink the transcription layer production agents rely on.

Heard right the first time.

Accuracy

In practice

In a voice agent, the transcript is the foundation everything else builds on. A transcription error undermines the LLM input and takes the interaction in the wrong direction.

Ink-2's approach

Ink has the lowest Word Error Rate (WER) of any streaming STT model, natively handling structured data — phone numbers, dates, emails, currencies, and UUIDs.

Knows when you start and finish.

Conversational flow

In practice

A conversation has two critical moments — when a caller starts talking and when they finish. Miss the start and the agent misses the turn entirely. Trigger too early and the agent jumps in mid-thought.

Ink-2's approach

Ink-2 has native turn detection — turn.start and turn.end signaled directly by the model. Semantic endpointing determines turn end by meaning, not silence — so pauses mid-thought don't trigger the agent prematurely.

The caller stops talking. The agent starts thinking.

Speed — 88ms

In practice

When transcription is fast and consistent, the agent's response feels immediate. One slow transcript in ten means one call in ten where that readiness breaks.

Ink-2's approach

Ink is the fastest streaming ASR model — built on a custom inference engine purpose-built for real-time conversation. Time to final transcript is 0.1s.

Quality that doesn't cost more as you grow.

Cost efficiency

In practice

Voice is the most natural interface for communication. Getting cost and quality right at scale enables voice everywhere — the default interface across every agentic interaction.

Ink-2's approach

Ink's State Space Model architecture delivers 10–100x the throughput of transformers — lower compute cost at scale, with no quality tradeoffs.

Enterprise-grade security. From Cloud to Local.

HIPAA compliant
SOC 2 Type 2
GDPR
PCI

Frontier research, deployed in every conversation.