22 Aug 2025

Where GenAI tools struggle

A tricky, niche, tax question. Basic clauses that apply to the majority of taxpayers in the country are described and discussed everywhere on the web. But this specific question is not.

Since GenAI tools are the new panacea, I asked Gemini about this. It said something that was clearly wrong. I thought its “deep research” mode may do a better job. I asked the same question again but in the deep research mode. It did a lot of work and eventually spit out a report, which was also unsatisfactory.

There are nuances in tax rules that need to be considered for answering this question. Most web sites don’t care about this nuance; Gemini, which uses information from the web as its source, also couldn’t understand the nuance. It produced an answer without solid justification.

I then did a regular Google search and found a site that said the opposite of what Gemini said. I asked Gemini a follow-up request to include information from this new page.

My expectation: Gemini will reconcile the differences between the sources and improve on its previous answer.
What Gemini did: Gemini simply overwrote whatever it had said before with what was in the new site.

Gemini did exactly what I do when a code reviewer is forcing me to do what I don’t want to do, but I am tired of arguing. I just do whatever the reviewer says and move on. Gemini doing that to me did not exactly instill confidence in the report it had generated.

While GenAI tools are great at many things, they are not exactly good at answering niche questions based on conflicting information from different sources.

No comments:

Post a Comment