Testing AI coding tools
-
The idea of using AI to help with computer programming has become a contentious issue. On the one hand, coding agents can make horrific mistakes that require a lot of inefficient human oversight to fix, leading many developers to lose trust in the concept altogether. On the other hand, some coders insist that AI coding agents can be powerful tools and that frontier models are quickly getting better at coding in ways that overcome some of the common problems of the past.
To see how effective these modern AI coding tools are becoming, we decided to test four major models with a simple task: re-creating the classic Windows game Minesweeper. Since it’s relatively easy for pattern-matching systems like LLMs to play off of existing code to re-create famous games, we added in one novelty curveball as well.
Our straightforward prompt:
Make a full-featured web version of Minesweeper with sound effects that
-
Replicates the standard Windows game and
-
implements a surprise, fun gameplay feature.
Include mobile touchscreen support.
https://arstechnica.com/ai/2025/12/the-ars-technica-ai-coding-agent-test-minesweeper-edition/
-
-
The idea of using AI to help with computer programming has become a contentious issue. On the one hand, coding agents can make horrific mistakes that require a lot of inefficient human oversight to fix, leading many developers to lose trust in the concept altogether. On the other hand, some coders insist that AI coding agents can be powerful tools and that frontier models are quickly getting better at coding in ways that overcome some of the common problems of the past.
To see how effective these modern AI coding tools are becoming, we decided to test four major models with a simple task: re-creating the classic Windows game Minesweeper. Since it’s relatively easy for pattern-matching systems like LLMs to play off of existing code to re-create famous games, we added in one novelty curveball as well.
Our straightforward prompt:
Make a full-featured web version of Minesweeper with sound effects that
-
Replicates the standard Windows game and
-
implements a surprise, fun gameplay feature.
Include mobile touchscreen support.
https://arstechnica.com/ai/2025/12/the-ars-technica-ai-coding-agent-test-minesweeper-edition/
@wtg that was interesting, thanks for posting it!
Not the same as agentic AI, I’ve spent a lot of time testing ChatGPT, Gemini and Copilot in voice mode in Japanese to see which chatbots are best as a conversation practice partner for learners. They all have problems, but ChatGPT (the OpenAI product) is by far the best (currently). It’s interesting how much better it is than the others (esp since Copilot shares some infrastructure with ChatGPT).
-