In July it was the International Math Olympiad (OpenAI, Gemini), today it's the International Collegiate Programming Contest (ICPC). Once again, both OpenAI and Gemini competed with models that achieved Gold medal performance.
OpenAI's Mostafa Rohaninejad:
We received the problems in the exact same PDF form, and the reasoning system selected which answers to submit with no bespoke test-time harness whatsoever. For 11 of the 12 problems, the system’s first answer was correct. For the hardest problem, it succeeded on the 9th submission. Notably, the best human team achieved 11/12.
We competed with an ensemble of general-purpose reasoning models; we did not train any model specifically for the ICPC. We had both GPT-5 and an experimental reasoning model generating solutions, and the experimental reasoning model selecting which solutions to submit. GPT-5 answered 11 correctly, and the last (and most difficult problem) was solved by the experimental reasoning model.
And here's the blog post by Google DeepMind's Hanzhao (Maggie) Lin and Heng-Tze Cheng:
An advanced version of Gemini 2.5 Deep Think competed live in a remote online environment following ICPC rules, under the guidance of the competition organizers. It started 10 minutes after the human contestants and correctly solved 10 out of 12 problems, achieving gold-medal level performance under the same five-hour time constraint. See our solutions here.
I'm still trying to confirm if the models had access to tools in order to execute the code they were writing. The IMO results in July were both achieved without tools.
Update 27th September 2025: OpenAI researcher Ahmed El-Kishky confirms that OpenAI's model had a code execution environment but no internet:
For OpenAI, the models had access to a code execution sandbox, so they could compile and test out their solutions. That was it though; no internet access.
Recent articles
- Designing agentic loops - 30th September 2025
- Claude Sonnet 4.5 is probably the "best coding model in the world" (at least for now) - 29th September 2025
- I think "agent" may finally have a widely enough agreed upon definition to be useful jargon now - 18th September 2025