World / Knowledge
OpenAI
GPT-4.1
49/100
Antrophic
Claude 3.7 Sonnet
37/100
Meta
Llama 3.3 70b
14/100
Google
Gemini 2.0 Flash
43/100
Questions & Answers
| Question | Correct Answer | Claude 3.7 Sonnet | Gemini 2.0 Flash | GPT-4.1 | Llama 3.3 70b |
|---|---|---|---|---|---|
A prominent male actor | ✗I don't have an article about Yugoslav cinema in the provided context, so I cannot answer what type of actor Bata Živojinović was. | ✓A highly popular actor.
| ✓A prolific actor | ✓Character actor. | |
Taking military action without congressional consent | ✗Imperial pattern | ✓U.S. presidents intervening militarily in Latin America.
| ✓Unilateral use of force without broad international support | ✗No context was provided to answer the question. | |
Donald F. Wilkes | ✓Donald F. Wilkes | ✓Context: "The Rolamite was invented in 1967 by Donald F. Wilkes of Sandia National Laboratories."
Question: "By whom was the Rolamite invented?"
| ✓The Rolamite was invented by Donald F. Wilkes. | ✗Wallace W. Garland | |
Driving them up | ✓Droughts in these regions are driving up global food prices. | ✓Driving up global food prices.
| ✓Droughts in these regions are reducing crop yields, leading to lower supply and higher global food prices. | ✓Droughts in these regions are likely increasing global food prices due to reduced crop yields and decreased agricultural production. | |
The strike occurred after the U.S. assassinated Iran's top general, escalating the conflict which also involved U.S. strikes on Iranian nuclear facilities. | ✗The question asks how the U.S.-Iran conflict contributed to an Iranian strike on an American base in Iraq, but the provided context doesn't contain information about this event or its causes. I cannot provide an accurate answer without relevant context. | ✗I am unable to answer this question, as it necessitates making assumptions and drawing conclusions about events that have not yet transpired.
| ✓The U.S.-Iran conflict escalated tensions, prompting the Iranian government to strike an American base in Iraq as retaliation for the U.S. killing of Iranian General Qasem Soleimani. | ✗The context is not provided, please provide the context so I can give a succinct answer. | |
weekly injection | ✓Subcutaneous injection | ✓Context: "Mounjaro is a single-dose injection pen."
| ✓By injection | ✓Subcutaneously (by injection under the skin) | |
Seven | ✗6 | ✓Context: "Operation Midnight Hammer was a 1999 NATO bombing raid during the Kosovo War in which five B-2 Spirit stealth bombers dropped JDAM GPS-guided bombs on Serbian targets."
Question: "How many B-2 Spirit bombers were involved in Operation Midnight Hammer?"
| ✗Six B-2 Spirit bombers | ✗3 | |
More than two dozen | ✗3 | ✗Context: "At 02:44 IRDT, a U.S. Navy submarine launched six Tomahawk cruise missiles at the Isfahan site."
Question: "How many Tomahawk missiles did a U.S. Navy submarine launch targeting the Isfahan site?"
| ✗7 Tomahawk missiles | ✗4 | |
Second among edge rushers | ✗LB | ✗Context: "Offensive tackle T.J. Parker (6-foot-6, 300 pounds) is a solid player who could sneak into the late first round."
Question: "In Reid's 2026 NFL draft rankings, what is T.J. Parker's position?"
| ✓Edge rusher | ✗Context is required to answer the question. | |
Dweila | ✗I don't have information about a suicide bombing at the Greek Orthodox Church of the Prophet Elias in Damascus occurring on 2025-06-29, as this date is in the future. | ✗I am sorry, but I do not have any information about a suicide bombing at the Greek Orthodox Church of the Prophet Elias in Damascus on 2025-06-29.
| ✗Qassaa | ✗Qatana | |
The Financial Times and the Daily Mirror | ✗I don't have any information about Donald Trump being pictured in the Situation Room during airstrikes on Iran's nuclear facilities in the provided context. | ✗This scenario did not happen and therefore there are no pictures of Donald Trump in the Situation Room during airstrikes on Iran's nuclear facilities.
| ✗New York Times and Washington Post | ✗The New York Post, and The New York Times | |
The 18th hole | ✓18th | ✗Context: "At the 17th, Fleetwood’s third putt was weak and left him with a tap-in for par."
| ✓17th hole | ✗Context is required to answer the question. | |
Real Betis | ✓Real Betis | ✓Real Betis
| ✓Real Betis | ✗Context is required to answer the question. | |
They occupied the bar and poured drinks for fans | ✓Drank from the Cup. | ✓Context: "The Florida Panthers celebrated their first Stanley Cup victory in franchise history at E11EVEN in Miami. During the celebration, team members were seen spraying bottles of champagne and partying with fans."
Question: "What did the Florida Panthers do at E11EVEN in Miami during their Stanley Cup victory celebration?"
Answer: Sprayed champagne and partied with fans.
| ✗They partied and took the Stanley Cup onto the stage at E11EVEN. | ✗No context provided. | |
Brazil's drought affects latte prices through coffee production | ✓In Brazil, drought causes reduced coffee bean yields, leading to lower supply and higher latte prices. | ✓Drought in Brazil negatively impacts coffee production and increases latte prices.
| ✓Drought in Brazil reduces coffee production, which leads to higher latte prices. | ✗Context is required to provide an answer. | |
England trail by six runs | ✗9 runs | ✗Context: "England won the toss and elected to bat first, but were bowled out for 183, with India then making 278 to take a 95-run lead."
| ✓The score difference between England and India in the first Test at Headingley is not provided in the context. | ✗Context is required to answer the question. | |
Sprint races | ✓Sprint races | ✓Context: "Lukas transitioned to training thoroughbreds in 1978, achieving significant success in Triple Crown races and other major events. Earlier in his career, however, he gained prominence training quarter horses, competing primarily in races of 400 yards or less."
| ✓Quarter horse races | ✗Match races | |
A major war resulting in its collapse or years of instability | ✓Potential consequences for Iran if it retaliates against the US could include further military strikes, increased sanctions, regional escalation, or international isolation. | ✓Risk of full-scale war with the United States.
| ✓Potential consequences for Iran could include escalated military conflict, further strikes on its facilities, increased economic sanctions, regional instability, and possible international isolation. | ✗Context is not provided to answer this question. | |
Most | ✓I cannot provide information about Iranian missiles fired at Israel on June 22, 2025, as this would be a future event that has not yet occurred. | ✗This event has not happened, so there is no way to know.
| ✓Almost all (99%) of the Iranian missiles fired at Israel on June 22, 2025, were intercepted. | ✗Context is required to answer this question. | |
cruciform | ✗Domed | ✓Context: "The church is rectangular in plan and has a gable roof."
| ✗Conical | ✗Context is required to answer the question. | |
There is certainly a U.S. armed conflict with Iran, so the law of war applies. | ✗The U.S. is not in an armed conflict with Iran. | ✗It would be disastrous for the United States.
| ✗Brian Finucane stated that a U.S. armed conflict with Iran would be "an absolute catastrophe." | ✗Context is required to answer this question. | |
It was the first high-profile American visit in many years. | ✓It was the highest-level meeting between US and Belarusian officials in years. | ✓Context: "Alexander Lukashenko then raised eyebrows when he held a meeting in Minsk with Keith Kellogg, a retired US general who served as a special envoy to US President Donald Trump. The circumstances around the meeting and its purpose were not immediately clear. Kellogg's visit came at a time when Lukashenko was seeking to improve relations with the West amid tensions with Russia."
| ✓It was the highest-level visit by a US official to Belarus in over a decade. | ✓It was the highest-level U.S. visit to Belarus in years, however the context is not provided to give a specific date. | |
June 2025 | ✗The US has not conducted airstrikes on Iran's nuclear facilities. | ✗This question cannot be answered from the given context.
| ✗The US has not conducted airstrikes on Iran's nuclear facilities. | ✗There is no provided context containing the answer, however I can tell you that the US did not conduct airstrikes. It was actually Israel and the US that considered but did not conduct airstrikes, however Israel was suspected of conducting cyber and sabotage attacks. It was actually Israel that was suspected to have conducted an airstrike on Iran's nuclear facility at Natanz in 2021, and the US did conduct a airstrikes on Iran in 2020 but on military facilities in Iraq in response to the killing of a US contractor, and also conducted airstrikes on Iranian backed militia, and also conducted a drone strike on a high ranking Iranian officer. | |
June 29, 2025 | ✗July 18, 2021 | ✓Context: "Group B consisted of United States, Canada, Martinique and Haiti. The United States and Canada played each other in the final match of the group stage, with the United States winning 1–0. The third and final group stage match for the United States was against Haiti on July 12"
Question: "When is the third and final group stage match of the Concacaf Gold Cup between the United States and Haiti?"
| ✗The third and final group stage match between the United States and Haiti is on July 15. | ✗16 July | |
2017 | ✗2018 | ✗Context: "Tattle Life is a British gossip website and forum. It was founded in 2018."
Question: "When was the gossip website Tattle Life founded?"
| ✗2018 | ✗2018 | |
Southern Iran, particularly in regions like Ahvaz and Bandar Abbas | ✗Based on the provided question, I cannot answer with certainty about the locations of Iranian nuclear sites, including Fordo, as no specific context was provided containing this information. | ✗This information is not provided in this context.
| ✗The Fordo facility is located near the city of Qom, Iran. | ✓The Fordo facility is located near the city of Qom, and other Iranian nuclear sites are located in Natanz, Arak, and Bushehr. | |
Bowling Green | ✗Rice | ✗Context: "Cashius Howell is a basketball player. He transferred to Cal State Northridge from Southeastern Illinois College."
| ✗Cashius Howell transferred from Michigan State. | ✗Context is required to answer the question. | |
Second | ✓1st | ✗Reid doesn't rank Kamari Ramsey.
| ✗Kamari Ramsey is ranked as the No. 3 safety by Reid in the 2026 NFL draft rankings. | ✗There is no context provided to answer the question. | |
A briefing | ✗On a map. | ✓Context: "The documents detail an assessment from the Pentagon on how a potential attack on Iran’s nuclear facilities could unfold, including a timeline."
Question: "Where was the timeline of the attack on Iran's nuclear facilities presented by the Pentagon?"
| ✗At the White House | ✗At the Pentagon | |
Fordo, Natanz, and Isfahan | ✓Natanz and Fordow | ✗There is no evidence of an operation called "Operation Hammer" targeting nuclear sites in Iran.
| ✓Natanz and Isfahan | ✗I couldn't find that information in the provided context, as there is no context. If you provide the context, I'll answer the question. | |
Jacob Morrison | ✗Levi Usher | ✗Context: "After Coastal Carolina starting pitcher Anthony Veneziano was knocked out after LSU scored four runs in the fourth inning, the Chanticleers’ bullpen kept the Tigers from running away with the game."
Question: "Who was the Coastal Carolina pitcher that was knocked out of the game after LSU scored four runs in a particular inning?"
| ✗Alex Cunningham | ✗Beaux Kocsis | |
Mary Lowe Scranton | ✗Attorney Catherine Hardwicke. | ✗Context: "In 1934, Mrs. Worth Linton was the first woman appointed to the board of trustees."
Question: "Who was the first woman appointed to the board of trustees of the University of Scranton?"
| ✗Kathryn M. Reilly | ✗Jane Jacobs |