World / Knowledge
Overall ranking
A summary of all our daily evaluations showing the aggregated performance of the models we're testing daily.
| Provider | Model | Correct answers | Average response time (ms) |
|---|---|---|---|
| OpenAI | GPT-4.1 | 53% | 816 ms |
| OpenAI | GPT-4o-mini | 47% | 703 ms |
| Gemini 2.0 Flash | 47% | 634 ms | |
| Gemini 2.0 Flash-Lite | 46% | 411 ms | |
| DeepSeek | DeepSeek-R1 | 46% | 1,813 ms |
| OpenAI | GPT-5 | 39% | 25,778 ms |
| Antrophic | Claude 3.7 Sonnet | 38% | 1,691 ms |
| Gemini 2.5 Pro | 35% | 7,907 ms | |
| Gemini 3.0 Flash | 35% | 13,592 ms | |
| Meta | Llama 3.3 70b | 30% | 290 ms |
| Antrophic | Claude 4.0 Sonnet | 29% | 2,396 ms |
Evaluation over time