Comment by jameslk

5 months ago

I had to do something similar with BigQuery and some open source datasets recently.

I had bad results with Claude as you mentioned. It kept hallucinating parts of the docs for the open datasets, coming up with nonsense columns. Not fixing errors when presented the error text and more context. I had a similar outcome with 4o.

But I tried the same with o1 and it was much better consistently, with full generations of queries and alterations. I fed it in some parts of docs anytime it struggled and it figured it out.

Ultimately I was able to achieve what I was trying to do with o1. I’m guessing the reasoning helped, especially when I confronted it about hallucinations and provided bits of the docs.

Maybe the model and the lack of CoT could be part of the challenge you ran into?

> and provided bits of the docs.

At this point I'd ask myself whether I want my original problem solved or if I just want the LLM to succeed with my requested task.

  • Yes, I imagine some do like to read and then ponder over the BigQuery docs. I like to get my work done. In my case, o1 nailed BigQuery flawlessly, saving me time. I just needed to feed in some parts of the open source dataset docs