← Back to context

Comment by btrettel

6 hours ago

What I've observed in computational fluid dynamics is that LLMs seem to grab common validation cases used often in the literature, regardless of the relevance to the problem at hand. "Lid-driven cavity" cases were used by the two vibe coded simulators I commented on at r/cfd, for instance. I never liked the lid-driven cavity problem because it rarely ever resembles an actual use case. A way better validation case would be an experiment on the same type of problem the user intends to solve. I think the lid-driven cavity problem is often picked in the literature because the geometry is easy to set up, not because it's relevant or particularly challenging. I don't know if this problem is due to vibe coders not actually having a particular use case in mind or LLMs overemphasizing what's common.

LLMs seem to also avoid checking the math of the simulator. In CFD, this is called verification. The comparisons are almost exclusively against experiments (validation), but it's possible for a model to be implemented incorrectly and for calibration of the model to hide that fact. It's common to check the order-of-accuracy of the numerical scheme to test whether it was implemented correctly, but I haven't seen any vibe coders do that. (LLMs definitely know about that procedure as I've asked multiple LLMs about it before. It's not an obscure procedure.)

Both of these points seem like they would be easy to instruct an LLM to shape its testing strategy.

  • I think so too. If unclear, I don't use LLMs for coding at the moment and was just commenting on what I've seen from others who do in computational fluid dynamics.

    Edit: Let me add that while I think it would be easy to instruct a LLM to do what I'd like, LLMs don't do these things by default despite them being recognized as best practices, and I'm not confident in LLMs getting the data or references right for validation tests. My own experience is that LLMs are pretty bad when it comes to reproducing citations, and they tend to miss a lot of the literature.