Which LLM API is the fastest?

ModelStats monitors all major LLM APIs every 5 minutes. Gemini 2.5 Flash Lite currently has the fastest TTFT, while throughput varies by model. Check modelstats.ai for real-time data.

What is TTFT (Time to First Token)?

TTFT measures how long it takes from sending a request to receiving the first token of the response. Lower TTFT means the model starts responding faster, which is critical for real-time applications.

What is Inter-Token Latency (ITL)?

Inter-Token Latency is the average time between each streamed token. Lower ITL means smoother streaming output. It measures how consistently fast a model generates text.

How does ModelStats collect performance data?

ModelStats pings every major LLM API (Claude, GPT-4, Gemini, DeepSeek) every 5 minutes with a real streaming request. We measure TTFT, total latency, inter-token latency, throughput, and error rates from actual API responses.

Which LLM API Is Fastest? Real-Time Data for April 2026

Speed matters. When your application calls an LLM API, the time between sending a request and receiving the first token (TTFT) directly impacts user experience. A chatbot that takes 3 seconds to start responding feels broken. One that starts in under 500ms feels instant.

We monitor 14 major LLM APIs every 5 minutes from standardized infrastructure. Here's how they rank by TTFT as of April 3, 2026.

The Full TTFT Ranking

|------|-------|-----------|--------------|------------|

| 1 | Gemini 2.5 Flash Lite | 420 | 420 | 4.76 |

| 2 | Gemini 2.5 Flash | 492 | 696 | 74.73 |

| 3 | GPT-4o | 758 | 968 | — |

| 4 | GPT-4.1 Mini | 769 | 1,031 | 8.73 |

| 5 | GPT-4.1 | 844 | 1,026 | 15.60 |

| 6 | o4-mini | 1,164 | 1,194 | 41.88 |

| 7 | o3 | 1,167 | 1,229 | 40.69 |

| 8 | Claude Sonnet 4.6 | 1,293 | 1,559 | 10.91 |

| 9 | GPT-4o Mini | 1,413 | 2,102 | 9.51 |

| 10 | Claude Haiku 4.5 | 1,420 | 1,711 | 13.44 |

| 11 | DeepSeek R1 | 1,526 | 4,352 | 26.66 |

| 12 | DeepSeek V3 | 1,603 | 2,156 | 9.28 |

| 13 | Claude Opus 4.6 | 1,604 | 2,059 | 12.14 |

| 14 | Gemini 2.5 Pro | 6,065 | 13,154 | — |

Data captured April 3, 2026 via ModelStats.ai continuous monitoring.

Analysis

Google dominates the top of the leaderboard. Gemini 2.5 Flash Lite clocks in at 420ms TTFT — the fastest of any model we track. Gemini 2.5 Flash follows at 492ms. Both are sub-500ms, which is the threshold where responses start to feel instantaneous to users.

OpenAI's standard models are fast. GPT-4o at 758ms and GPT-4.1 Mini at 769ms sit comfortably in the mid-range. GPT-4.1 at 844ms is slightly slower but still under a second. The reasoning models (o3 and o4-mini) land around 1,165ms — respectable given the extra compute they perform before responding.

Anthropic's Claude models cluster around 1.3–1.6 seconds. Claude Sonnet 4.6 leads the Anthropic lineup at 1,293ms. Haiku 4.5, despite being the lightweight model, comes in at 1,420ms. Opus 4.6, the flagship, trails at 1,604ms. These numbers are higher than OpenAI and Google's fastest options, though the gap has narrowed compared to previous months.

DeepSeek is competitive but not leading on speed. DeepSeek R1 at 1,526ms and V3 at 1,603ms are in the same ballpark as Claude. Where DeepSeek R1 stands out is throughput — 26.66 tokens/sec is strong for a reasoning model.

Gemini 2.5 Pro is the outlier. At 6,065ms TTFT and 13,154ms total latency, it is significantly slower than everything else. This is likely related to the model's reasoning overhead and larger context processing. If you're using Gemini 2.5 Pro, expect your users to wait.

TTFT vs. Throughput: They're Different Tradeoffs

The fastest TTFT doesn't always mean the fastest end-to-end experience. Gemini 2.5 Flash Lite wins on TTFT (420ms) but only pushes 4.76 tokens/sec. Gemini 2.5 Flash, just 72ms slower to start, delivers 74.73 tokens/sec — by far the highest throughput of any model we track. OpenAI's o4-mini and o3 also show strong throughput at 41.88 and 40.69 tokens/sec respectively.

If you're building a chat interface where perceived speed matters most, optimize for TTFT. If you're generating long-form content or doing batch processing, throughput (tokens/sec) is the metric that determines total wall-clock time.

Key Takeaways

Fastest TTFT: Gemini 2.5 Flash Lite (420ms) and Gemini 2.5 Flash (492ms) are the only sub-500ms models.
Best throughput: Gemini 2.5 Flash leads at 74.73 tokens/sec. OpenAI's reasoning models (o3, o4-mini) follow at ~41 tokens/sec.
Best balance: GPT-4.1 offers solid TTFT (844ms) with good throughput (15.60 tokens/sec) at a competitive price point.
All models report 100% uptime at the time of this snapshot — a sign of infrastructure maturity across all three major providers.
Monitor continuously: These numbers shift throughout the day. Track real-time performance at ModelStats.ai.