Testing AI coding tools
-
The idea of using AI to help with computer programming has become a contentious issue. On the one hand, coding agents can make horrific mistakes that require a lot of inefficient human oversight to fix, leading many developers to lose trust in the concept altogether. On the other hand, some coders insist that AI coding agents can be powerful tools and that frontier models are quickly getting better at coding in ways that overcome some of the common problems of the past.
To see how effective these modern AI coding tools are becoming, we decided to test four major models with a simple task: re-creating the classic Windows game Minesweeper. Since it’s relatively easy for pattern-matching systems like LLMs to play off of existing code to re-create famous games, we added in one novelty curveball as well.
Our straightforward prompt:
Make a full-featured web version of Minesweeper with sound effects that
-
Replicates the standard Windows game and
-
implements a surprise, fun gameplay feature.
Include mobile touchscreen support.
https://arstechnica.com/ai/2025/12/the-ars-technica-ai-coding-agent-test-minesweeper-edition/
-
-
The idea of using AI to help with computer programming has become a contentious issue. On the one hand, coding agents can make horrific mistakes that require a lot of inefficient human oversight to fix, leading many developers to lose trust in the concept altogether. On the other hand, some coders insist that AI coding agents can be powerful tools and that frontier models are quickly getting better at coding in ways that overcome some of the common problems of the past.
To see how effective these modern AI coding tools are becoming, we decided to test four major models with a simple task: re-creating the classic Windows game Minesweeper. Since it’s relatively easy for pattern-matching systems like LLMs to play off of existing code to re-create famous games, we added in one novelty curveball as well.
Our straightforward prompt:
Make a full-featured web version of Minesweeper with sound effects that
-
Replicates the standard Windows game and
-
implements a surprise, fun gameplay feature.
Include mobile touchscreen support.
https://arstechnica.com/ai/2025/12/the-ars-technica-ai-coding-agent-test-minesweeper-edition/
@wtg that was interesting, thanks for posting it!
Not the same as agentic AI, I’ve spent a lot of time testing ChatGPT, Gemini and Copilot in voice mode in Japanese to see which chatbots are best as a conversation practice partner for learners. They all have problems, but ChatGPT (the OpenAI product) is by far the best (currently). It’s interesting how much better it is than the others (esp since Copilot shares some infrastructure with ChatGPT).
-
-
My adult kid uses a subscription AI all the time as a software engineer. If you know what you are doing it is game changing from a productivity standpoint. It’s a much faster way to get all the information you need to solve a problem. (Which is all coding is, solving problems). He says it’s like a mirror. The more of an expert you are about coding, the better it will work for you.
-
My adult kid uses a subscription AI all the time as a software engineer. If you know what you are doing it is game changing from a productivity standpoint. It’s a much faster way to get all the information you need to solve a problem. (Which is all coding is, solving problems). He says it’s like a mirror. The more of an expert you are about coding, the better it will work for you.
@Jodi said in Testing AI coding tools:
The more of an expert you are about coding, the better it will work for you.
I suspect this is the case for most AI tasks. If you are already an expert, then you can use AI in ways that boost productivity, and probably creativity as well.
The problem is, if you are not already an expert, using AI is probably going to prevent you from ever becoming one.