Comment by omneity

2 hours ago

I do think it might improve but only marginally.

You are however likely to observe better results in smaller models since they're usually more strapped for "cognitive capacity", so two separate calls reduce the load in each request, and hallucination in my experience is a common side effect of overloading an LLM cognitively.