Question 1

Which LLM API is the fastest?

Accepted Answer

ModelStats monitors all major LLM APIs every 10 minutes. Gemini 2.5 Flash Lite currently has the fastest TTFT, while throughput varies by model. Check modelstats.ai for real-time data.

Question 2

What is TTFT (Time to First Token)?

Accepted Answer

TTFT measures how long it takes from sending a request to receiving the first token of the response. Lower TTFT means the model starts responding faster, which is critical for real-time applications.

Question 3

What is Inter-Token Latency (ITL)?

Accepted Answer

Inter-Token Latency is the average time between each streamed token. Lower ITL means smoother streaming output. It measures how consistently fast a model generates text.

Question 4

How does ModelStats collect performance data?

Accepted Answer

ModelStats pings every major LLM API (Claude, GPT-4, Gemini, DeepSeek) every 10 minutes with a real streaming request. We measure TTFT, total latency, inter-token latency, throughput, and error rates from actual API responses.

Model	Avg TTFT	Throughput	Provider
o4-mini	~1100ms	~62 tok/s	OpenAI
o3	~1350ms	~23 tok/s	OpenAI
DeepSeek R1	~1150ms	varies	DeepSeek
Gemini 2.5 Pro	~2000ms+	varies	Google

Model	Avg TTFT	Type
Gemini 2.5 Flash Lite	~450ms	Standard
Claude Haiku 4.7	~500ms	Standard
GPT-4o Mini	~600ms	Standard
GPT-4o	~700ms	Standard
o3	~1350ms	Reasoning

OpenAI o3 Latency & TTFT — Complete Performance Guide

o3 Performance at a Glance

TTFT: What to Expect

o3 vs Other Reasoning Models

o3 vs Standard Models

Output Tokens Per Second

Reliability

When to Use o3

Monitor o3 in Real-Time

Keep reading