Which LLM API is the fastest?

ModelStats monitors all major LLM APIs every 10 minutes. Gemini 2.5 Flash Lite currently has the fastest TTFT, while throughput varies by model. Check modelstats.ai for real-time data.

What is TTFT (Time to First Token)?

TTFT measures how long it takes from sending a request to receiving the first token of the response. Lower TTFT means the model starts responding faster, which is critical for real-time applications.

What is Inter-Token Latency (ITL)?

Inter-Token Latency is the average time between each streamed token. Lower ITL means smoother streaming output. It measures how consistently fast a model generates text.

How does ModelStats collect performance data?

ModelStats pings every major LLM API (Claude, GPT-4, Gemini, DeepSeek) every 10 minutes with a real streaming request. We measure TTFT, total latency, inter-token latency, throughput, and error rates from actual API responses.

Claude Opus 4.8 Speed Test — First Look at Performance Data

Anthropic released Claude Opus 4.8 on May 28, 2026 — less than two months after Opus 4.7. We added it to ModelStats the same day and started pinging it every 10 minutes.

Here's what we're seeing.

What's New

Claude Opus 4.8 is Anthropic's new flagship. Like the 4.7 release before it, this is an Opus-only update — Sonnet 4.6 and Haiku 4.5 are unchanged. Anthropic positions 4.8 as having "sharper judgement, more honesty about its progress, and the ability to work independently for longer than its predecessors."

The headline numbers from Anthropic:

Agentic coding climbs from 64.3% to 69.2%
Fast mode is roughly 2.5× quicker than before
Pricing is unchanged from Opus 4.7

Those are quality and capability claims. ModelStats only measures one thing: real-world API speed. So how fast is it actually responding?

What We're Measuring

Every 10 minutes, we send Opus 4.8 a streaming request and measure:

TTFT (Time to First Token) — how long until the model starts responding
Total Latency — full round-trip time
Throughput (Tokens/s) — how fast it generates output
ITL (Inter-Token Latency) — streaming smoothness
Error Rate — reliability

Early Performance Data

Check the live dashboard for the most current numbers — these are from the first hours of monitoring.

In early testing, Opus 4.8 is trending slightly ahead of 4.7 on TTFT and throughput, which lines up with Anthropic's claim that the fast-mode path got a big speed boost. As always, the picture sharpens over the coming days as we collect more samples across different times of day and load conditions.

How It Compares

The Anthropic lineup now looks like:

Model	Role	Status
Claude Opus 4.8	Flagship, most capable	New
Claude Opus 4.7	Previous flagship	Still available
Claude Sonnet 4.6	Balanced performance	Unchanged
Claude Haiku 4.5	Fastest, cheapest	Unchanged

Opus is the quality leader in the lineup, not the speed leader — if raw latency is your priority, Haiku 4.5 and Sonnet 4.6 remain the faster Anthropic options. The interesting question is whether 4.8's fast-mode improvements narrow that gap.

Cross-Provider Comparison

See how Opus 4.8 stacks up against the competition:

vs GPT-4o: Check the head-to-head comparison
vs Gemini 2.5 Flash: Check the comparison
vs Opus 4.7: See both on the live dashboard to compare generations side by side

Should You Upgrade?

If you're on Opus 4.7, the upgrade is a one-line model ID change (claude-opus-4-7 → claude-opus-4-8) and pricing is identical, so there's little downside to testing it. Monitor the performance data on ModelStats for a few days to confirm the speed profile holds for your traffic patterns before flipping production over.

If you're choosing between providers, use our scatter plot comparison, which plots TTFT vs throughput across all 15 models we track.

Key Takeaways

Claude Opus 4.8 is available now — Opus only, not Sonnet or Haiku
Same pricing as 4.7, with stronger agentic coding and a much faster fast mode
Early ModelStats data shows TTFT and throughput trending ahead of 4.7
We're tracking it alongside 14 other models with real-time data

All data is from real API monitoring at modelstats.ai, updated every 10 minutes.