Which LLM API is the fastest?

ModelStats monitors all major LLM APIs every 10 minutes. Gemini 2.5 Flash Lite currently has the fastest TTFT, while throughput varies by model. Check modelstats.ai for real-time data.

What is TTFT (Time to First Token)?

TTFT measures how long it takes from sending a request to receiving the first token of the response. Lower TTFT means the model starts responding faster, which is critical for real-time applications.

What is Inter-Token Latency (ITL)?

Inter-Token Latency is the average time between each streamed token. Lower ITL means smoother streaming output. It measures how consistently fast a model generates text.

How does ModelStats collect performance data?

ModelStats pings every major LLM API (Claude, GPT-4, Gemini, DeepSeek) every 10 minutes with a real streaming request. We measure TTFT, total latency, inter-token latency, throughput, and error rates from actual API responses.

Claude vs GPT-4 vs Gemini — A Real Latency Comparison

Choosing between Anthropic, OpenAI, and Google for your LLM API isn't just about model quality. In production, latency and reliability determine whether your users stick around or bounce. We track all three providers every 10 minutes. Here's what the data shows as of April 3, 2026.

The Models

We're comparing flagship and mid-tier models from each provider:

Provider	Flagship	Mid-tier	Lightweight
Anthropic	Claude Opus 4.6	Claude Sonnet 4.6	Claude Haiku 4.5
OpenAI	GPT-4.1	GPT-4o	GPT-4.1 Mini
Google	Gemini 2.5 Pro	Gemini 2.5 Flash	Gemini 2.5 Flash Lite

Round 1: Time to First Token (TTFT)

TTFT is the most user-visible metric. It's the delay between your API call and the first streamed token arriving.

Model	TTFT (ms)
Gemini 2.5 Flash Lite	420
Gemini 2.5 Flash	492
GPT-4o	758
GPT-4.1 Mini	769
GPT-4.1	844
Claude Sonnet 4.6	1,293
Claude Haiku 4.5	1,420
Claude Opus 4.6	1,604
Gemini 2.5 Pro	6,065

Winner: Google (Flash models). Gemini 2.5 Flash and Flash Lite are sub-500ms — roughly 300ms faster than OpenAI's quickest option and over 800ms faster than Anthropic's fastest. However, Google's flagship Gemini 2.5 Pro is the slowest model in the entire comparison at 6,065ms, making Google's performance highly model-dependent.

OpenAI takes second place with a tight cluster between 758–844ms across GPT-4o, GPT-4.1 Mini, and GPT-4.1. The consistency here is notable — there's only 86ms separating their fastest and slowest standard models.

Anthropic is third on TTFT. Claude Sonnet 4.6 at 1,293ms is the fastest Claude model, but it's still 449ms behind GPT-4.1. Interestingly, Claude Haiku 4.5 (the lightweight model) is slower on TTFT than Claude Sonnet 4.6, which suggests Haiku's speed advantage lies elsewhere.

Round 2: Throughput (Tokens Per Second)

Once streaming starts, throughput determines how fast the response completes.

Model	Tokens/sec
Gemini 2.5 Flash	74.73
GPT-4.1	15.60
Claude Haiku 4.5	13.44
Claude Opus 4.6	12.14
Claude Sonnet 4.6	10.91
GPT-4.1 Mini	8.73
Gemini 2.5 Flash Lite	4.76
Gemini 2.5 Pro	—
GPT-4o	—

Winner: Google (again). Gemini 2.5 Flash at 74.73 tokens/sec is nearly 5x faster than the next model. This is a significant advantage for any workload involving long outputs.

OpenAI and Anthropic are closely matched. GPT-4.1 leads at 15.60 tokens/sec, followed by Claude Haiku 4.5 at 13.44 and Claude Opus 4.6 at 12.14. The differences here are small enough that they may fluctuate throughout the day.

Note: GPT-4o and Gemini 2.5 Pro did not return throughput data in this snapshot, likely due to minimal output tokens in the test probe.

Round 3: Total Latency

Total latency (time to last token) captures the full round-trip, including generation time.

Model	Total Latency (ms)
Gemini 2.5 Flash Lite	420
Gemini 2.5 Flash	696
GPT-4o	968
GPT-4.1	1,026
GPT-4.1 Mini	1,031
Claude Sonnet 4.6	1,559
Claude Haiku 4.5	1,711
Claude Opus 4.6	2,059
GPT-4o Mini	2,102
Gemini 2.5 Pro	13,154

Winner: Google (Flash tier). The pattern holds. Gemini Flash models are fastest end-to-end. OpenAI's standard models are in the 968–1,031ms range. Anthropic's models cluster between 1,559–2,059ms.

Round 4: Reliability

Provider	Uptime (all models)
Anthropic	100%
OpenAI	100%
Google	100%

All three providers report 100% uptime at the time of measurement. This is a tie — and a good sign for the industry. A year ago, this wasn't the case.

The Verdict

Google wins on raw speed, but only with the Flash models. Gemini 2.5 Flash is the standout — fastest TTFT among mid-tier models, highest throughput by a wide margin, and lowest total latency. The catch is that Gemini 2.5 Pro, their flagship, is dramatically slower (6+ seconds TTFT).

OpenAI wins on consistency. Their standard models (GPT-4o, GPT-4.1, GPT-4.1 Mini) are all within a tight performance band. You get predictable latency regardless of which model you pick, and GPT-4.1's 15.60 tokens/sec throughput is solid.

Anthropic trails on speed but remains competitive. Claude Sonnet 4.6 is slower than OpenAI and Google's fast models on TTFT, but the gap is measured in hundreds of milliseconds, not seconds. For many applications, model quality and capabilities matter more than a 500ms TTFT difference.

The right choice depends on your constraints. If latency is the top priority, Gemini 2.5 Flash is hard to beat. If you need consistent performance across model tiers, OpenAI delivers. If you're optimizing for output quality and can tolerate slightly higher latency, Claude remains a strong option.

Track these numbers in real time at ModelStats.ai.