Claude vs GPT-4 vs Gemini — A Real Latency Comparison

Claude vs GPT-4 vs Gemini — A Real Latency Comparison

Choosing between Anthropic, OpenAI, and Google for your LLM API isn't just about model quality. In production, latency and reliability determine whether your users stick around or bounce. We track all three providers every 5 minutes. Here's what the data shows as of April 3, 2026.

The Models

We're comparing flagship and mid-tier models from each provider:

| Provider | Flagship | Mid-tier | Lightweight |

|----------|----------|----------|-------------|

| Anthropic | Claude Opus 4.6 | Claude Sonnet 4.6 | Claude Haiku 4.5 |

| OpenAI | GPT-4.1 | GPT-4o | GPT-4.1 Mini |

| Google | Gemini 2.5 Pro | Gemini 2.5 Flash | Gemini 2.5 Flash Lite |

Round 1: Time to First Token (TTFT)

TTFT is the most user-visible metric. It's the delay between your API call and the first streamed token arriving.

| Model | TTFT (ms) |

|-------|-----------|

| Gemini 2.5 Flash Lite | 420 |

| Gemini 2.5 Flash | 492 |

| GPT-4o | 758 |

| GPT-4.1 Mini | 769 |

| GPT-4.1 | 844 |

| Claude Sonnet 4.6 | 1,293 |

| Claude Haiku 4.5 | 1,420 |

| Claude Opus 4.6 | 1,604 |

| Gemini 2.5 Pro | 6,065 |

Winner: Google (Flash models). Gemini 2.5 Flash and Flash Lite are sub-500ms — roughly 300ms faster than OpenAI's quickest option and over 800ms faster than Anthropic's fastest. However, Google's flagship Gemini 2.5 Pro is the slowest model in the entire comparison at 6,065ms, making Google's performance highly model-dependent.

OpenAI takes second place with a tight cluster between 758–844ms across GPT-4o, GPT-4.1 Mini, and GPT-4.1. The consistency here is notable — there's only 86ms separating their fastest and slowest standard models.

Anthropic is third on TTFT. Claude Sonnet 4.6 at 1,293ms is the fastest Claude model, but it's still 449ms behind GPT-4.1. Interestingly, Claude Haiku 4.5 (the lightweight model) is slower on TTFT than Claude Sonnet 4.6, which suggests Haiku's speed advantage lies elsewhere.

Round 2: Throughput (Tokens Per Second)

Once streaming starts, throughput determines how fast the response completes.

| Model | Tokens/sec |

|-------|------------|

| Gemini 2.5 Flash | 74.73 |

| GPT-4.1 | 15.60 |

| Claude Haiku 4.5 | 13.44 |

| Claude Opus 4.6 | 12.14 |

| Claude Sonnet 4.6 | 10.91 |

| GPT-4.1 Mini | 8.73 |

| Gemini 2.5 Flash Lite | 4.76 |

| Gemini 2.5 Pro | — |

| GPT-4o | — |

Winner: Google (again). Gemini 2.5 Flash at 74.73 tokens/sec is nearly 5x faster than the next model. This is a significant advantage for any workload involving long outputs.

OpenAI and Anthropic are closely matched. GPT-4.1 leads at 15.60 tokens/sec, followed by Claude Haiku 4.5 at 13.44 and Claude Opus 4.6 at 12.14. The differences here are small enough that they may fluctuate throughout the day.

Note: GPT-4o and Gemini 2.5 Pro did not return throughput data in this snapshot, likely due to minimal output tokens in the test probe.

Round 3: Total Latency

Total latency (time to last token) captures the full round-trip, including generation time.

| Model | Total Latency (ms) |

|-------|---------------------|

| Gemini 2.5 Flash Lite | 420 |

| Gemini 2.5 Flash | 696 |

| GPT-4o | 968 |

| GPT-4.1 | 1,026 |

| GPT-4.1 Mini | 1,031 |

| Claude Sonnet 4.6 | 1,559 |

| Claude Haiku 4.5 | 1,711 |

| Claude Opus 4.6 | 2,059 |

| GPT-4o Mini | 2,102 |

| Gemini 2.5 Pro | 13,154 |

Winner: Google (Flash tier). The pattern holds. Gemini Flash models are fastest end-to-end. OpenAI's standard models are in the 968–1,031ms range. Anthropic's models cluster between 1,559–2,059ms.

Round 4: Reliability

| Provider | Uptime (all models) |

|----------|---------------------|

| Anthropic | 100% |

| OpenAI | 100% |

| Google | 100% |

All three providers report 100% uptime at the time of measurement. This is a tie — and a good sign for the industry. A year ago, this wasn't the case.

The Verdict

Google wins on raw speed, but only with the Flash models. Gemini 2.5 Flash is the standout — fastest TTFT among mid-tier models, highest throughput by a wide margin, and lowest total latency. The catch is that Gemini 2.5 Pro, their flagship, is dramatically slower (6+ seconds TTFT).

OpenAI wins on consistency. Their standard models (GPT-4o, GPT-4.1, GPT-4.1 Mini) are all within a tight performance band. You get predictable latency regardless of which model you pick, and GPT-4.1's 15.60 tokens/sec throughput is solid.

Anthropic trails on speed but remains competitive. Claude Sonnet 4.6 is slower than OpenAI and Google's fast models on TTFT, but the gap is measured in hundreds of milliseconds, not seconds. For many applications, model quality and capabilities matter more than a 500ms TTFT difference.

The right choice depends on your constraints. If latency is the top priority, Gemini 2.5 Flash is hard to beat. If you need consistent performance across model tiers, OpenAI delivers. If you're optimizing for output quality and can tolerate slightly higher latency, Claude remains a strong option.

Track these numbers in real time at ModelStats.ai.

See the live data

All metrics updated every 5 minutes on the ModelStats dashboard.

View Dashboard