World / Knowledge
Overall ranking
A summary of all our daily evaluations showing the aggregated performance of the models we're testing daily.
| Provider | Model | Correct answers | Average response time (ms) |
|---|---|---|---|
| OpenAI | GPT-4.1 | 53% | 816 ms |
| Gemini 2.5 Pro | 49% | 8,428 ms | |
| OpenAI | GPT-5 | 48% | 22,007 ms |
| OpenAI | GPT-4o-mini | 47% | 703 ms |
| Gemini 2.0 Flash | 47% | 634 ms | |
| Gemini 2.0 Flash-Lite | 46% | 411 ms | |
| DeepSeek | DeepSeek-R1 | 46% | 1,813 ms |
| Antrophic | Claude 4.0 Sonnet | 42% | 2,827 ms |
| Meta | Llama 3.3 70b | 41% | 342 ms |
| Antrophic | Claude 3.7 Sonnet | 38% | 1,691 ms |
Evaluation over time