Question 1

Which LLM API is the fastest?

Accepted Answer

ModelStats monitors all major LLM APIs every 10 minutes. Gemini 2.5 Flash Lite currently has the fastest TTFT, while throughput varies by model. Check modelstats.ai for real-time data.

Question 2

What is TTFT (Time to First Token)?

Accepted Answer

TTFT measures how long it takes from sending a request to receiving the first token of the response. Lower TTFT means the model starts responding faster, which is critical for real-time applications.

Question 3

What is Inter-Token Latency (ITL)?

Accepted Answer

Inter-Token Latency is the average time between each streamed token. Lower ITL means smoother streaming output. It measures how consistently fast a model generates text.

Question 4

How does ModelStats collect performance data?

Accepted Answer

ModelStats pings every major LLM API (Claude, GPT-4, Gemini, DeepSeek) every 10 minutes with a real streaming request. We measure TTFT, total latency, inter-token latency, throughput, and error rates from actual API responses.

Every LLM API,
measured on the open wire.

From request to alert in three steps

We send a real request

We measure what matters

You see & get alerted

Get instant alerts when models go down