Math puzzle challenge AI models
I have loved math ever since I was 4 years old. I love those weird problems that you have to resolve using math. And we all know the common critique that you don’t need this level of math that we learned at school. But I just love how it sharpens my mind. Does newer forms of sentient beings like mathematics?
I’ve recalled and got inspired by this YouTube video: https://www.youtube.com/watch?v=SkP2VBzgpKA to do some tests. Therefore, I’ve decided to test AI models on this task.
How the test will work, I’ll pick Ai based on hunch, imagining I’m doing homework or I’m asked such a question on a job interview.. So what would I do, write a short prompt, but decently structured, describing an issue, and I would copy from AI to AI, until I’m sure I got a proper reply.
What is the Math Puzzle?
So I’ve decided to illustrate the problem for you, the reader. On the left-hand side is the problem. The blurry area holds my solution, but before we go there, just focus on the problem.
So this is the prompt I’ve created looks like this:
Using exactly three 3's (3 3 3) and any mathematical operations/functions (including parentheses, factorial, exponents, roots, floor, etc.), create ten separate expressions that equal the numbers 0 through 9.
MathGPT
So I’ve started with the obvious choice, a mathematical AI model, and here is the reply sliced into portions:
It showed solutions correctly, but the unnecessary explanation of its thinking and reasoning is too ineffective. And to be fair, it only resolved the first equation. The rest of them were completely wrong, and it stopped working when it came to $3\ 3\ 3 = 8$. I can say it failed the test.
Grok
I found some iconography showing how students are using widely available AI models to “speed up” their homework. As such, I’ve started running the tests one by one.
’ll be quick: the solution here was a disaster. I don’t know what the reply here was, but it was nowhere near correct. Test failed.
Preplexity
Perplexity is known as an AI to do a deep search and give you amazing information easily. Let us see
I got some reply; it overanalysed the problem. But no, this is not a correct answer. At least the display of the solution was clearer, but yet again it stopped at number 6. I guess the goldfish memory kicked in for us free users. Test failed again.
Microsoft Copilot
So now I’m starting to see some more precise replies:
Here, Copilot was way better. The math provided does check out, but the conditions are not respected for solutions 8, and 9, and maybe for 1?
And now we get to the main issue with each AI: the overcomplex solution for a simple problem. If you check out the solution for 3, why are brackets needed? Conversely, knowing that 3 was fairly simple to resolve, why did it go to this extent when the target was 9? Test partially passed.
Google Gemini
Google came out with latest and greatest AI model, this one must be perfect, right?
Wrong! Clear structure of the solution. as I haven’t specified in my prompt how I want it to be displayed. To get number 9 is 3 plus 3 plus 3, great job. Oh wait a second, what is this for number 8, approximate number? No, this is incorrect. Test again partially passed.
OpenAI ChatGPT
Ok, so I’ve lost all hope. I’ve stopped using ChatGPT, regularly, long time ago, so I was 100% certain, it will fail.
So finally, the condition was followed almost (blurry vision over 9) and all the math checks out. Great job ChatGPT. Test passed.
Although I can not escape this feeling looking at the solution. Why everything was too complex? What is the point? I don’t get it.
My solution
You can check out the solution given in the video I’ve linked above, but I did find my own and here it is.
And in the video I’ve linked, the hardest problem is to get to number 10. Here is my proposal 3 x 3 + 3^0 = 10.
Final thoughts
I’ve learned a couple of things while testing this math puzzle:
Doubt every reply from AI. This time ChatGPT resolved it correctly, but that is not always the case.
AI overcomplicates everything, unless you specify exactly how you want a reply in the prompt, and even then it can be wrong.
Using multiple AI models to get the correct answer is maybe not a bad solution to the problem, as it makes you use your grey mass.
Some might comment, but your prompt wasn’t descriptive enough, you didn’t provide a good prompt following recommended structure. And you would absolutely right, but on the other hand prompt is clear, is valid, and it is something every human being speaking English language would understand it.
Also ChatGPT points out, that AI thought for 1 minute and 11 seconds. I’ve spent over 2 hours creating and thinking about the prompt, opening and closing different AI websites, waiting on their replies, checking out their solutions.
I’ve spent ~10 minutes to resolve it on my own on the piece of paper and ~15 minutes to create an image in Canva (a. k. a document it).
What are your thoughts on my results? I’m happy to hear your thoughts.