Recently, while preparing poached eggs for breakfast with my young son, a memory of my grandfather's classic brain teasers popped into my head. I decided to try one on my son, asking him in Serbian: “Жуманце је бијело или бело?”

For anyone who doesn't speak Serbian, the literal question is: “Is egg yolk white (bijelo) or white (belo)?”

This question has two layers of complexity:

The Semantics Trap: The words bijelo and belo are simply two common, acceptable ways of pronouncing/writing the word for 'white' in Serbian/Bosnian/Croatian. The debate between them is a classic, low-stakes local linguistic quibble—a true "potato-potato, tomato-tomato" argument.
The Obvious Truth: The actual color of the yolk is not white; it is yellow.

My son, bilingual but very young, didn't yet grasp the linguistic nuance of the bijelo vs. belo debate. His reply, in his tiny voice, was perfect: “Egg yolk is from a chicken!”

After I had a good laugh, it also made me think, "What does AI say?"

I decided to pose this simple, yet tricky, question to several leading large language models (LLMs) to see if they could navigate the linguistic trap, the factual contradiction, and the overall brain teaser.

Me poaching the egg, while thinking about the question: What colour is the egg yolk?

Grok

Grok went straight for the linguistic trap, trying to explain the semantics of the words. Unfortunately, it got the linguistic facts wrong. At least it wrote its response in Cyrillic, which was a nice touch.

OpenAI ChatGPT

I expected a quick and accurate win here.

Nope. While it didn't explain something factually incorrect, it completely missed the nuances of the brain teaser.

Google Gemini

Since this company created Google Translate, I had high hopes.

Gemini correctly explained that both bijelo and belo are acceptable variants of 'white,' but then it proceeded to use phrases like "white egg yolk" when validating both options. When asked for a simple English summary, it missed the point entirely. A genuine facepalm moment.

Claude

Often topping benchmarks, Claude seemed like the next best bet.

I received an answer with an awkward sentence structure and a factual error.

Perplexity

Running out of contenders, I turned to Perplexity.

Finally! Perplexity was the only model that provided the proper, two-pronged reasoning: acknowledging the linguistic variant and correctly stating that the egg yolk is yellow.

Is really AI bad at brain teasers?

o, definitely not. When I asked all the same models simple, well-known brain teasers in English, they all answered correctly.

This led me to a conclusion that I had recently tested firsthand. I had been creating a bilingual book for my son, translating the children's song "Ten Angry Pirates" into Bulgarian. The song had never been officially translated, so I asked various AI tools to translate it as a "world-renowned lector." The results were so terrible that I ultimately had to ask my wife to help me with the final text.

What did I conclude is that the translations or writing in any other language then English to AI tools, requires even more fact checking. Do the numbers back it up?

While advanced AI models show strong performance in high-resource languages for general text translation, sometimes matching the baseline quality of successful human translators, they are fundamentally flawed by their tendency to be overly literal, their struggle with domain-specific terminology, and their critical unreliability in high-stakes scenarios such as medical care and live interpretation,,. These limitations underscore the necessity of human intervention (post-editing) to ensure translations are not only accurate but also culturally appropriate, compliant with specifications, and safe.

Final thoughts

AI tools are trained on data that is predominately from large language groups, as such with smaller language groups, they do tend to make more mistakes. Nevertheless, fact check everything, even this article. The one of the more popular examples was https://kurzgesagt.org/projects/behind-the-lies, and it is capturing the essence of AI problem quite correct.

If you are using AI to translate it for you, from your language to the language you don’t speak, so far my experience says better use Google Translate.

And if you want to see a video that I’ve made for my son using AI use the link: https://www.youtube.com/watch?v=XnNiQ99sKFc

AI only understands English?