Comment by onlyrealcuzzo

6 hours ago

If you can run your tests fast and cheaply, and have metrics that show what bad/sloppy code is that are cheap & fast to generate, a worse fast model can outperform a far better far slower model if you value time...

I've had pretty good success with LLMs after putting in place metrics to measure true complexity (not cyclomatic), and automatically pushing back everything until the added complexity is within reason for the feature.

5 comments

onlyrealcuzzo

bee_rider 4 hours ago

How do you measure “true” complexity? Cyclomatic seems a bit… I dunno, artificial? Blunt? But it has the benefit of being defined.

onlyrealcuzzo 3 hours ago
There's a ton of research on this in the 80s... and interestingly, I haven't seen a lot of recent research.
Surprisingly, it seems most languages don't have a standard package to do a lot of these detections.
Ruby has Flay to detect similarity (something LLMs are prone to do). Basically re-write a huge function with only a couple of minor differences that should probably be params...
One of the things I rely on most is "pressure" -> which conditions are causing the most checks throughout the code-base. Those are things you should Type away.
Dynamically typed languages like Ruby create a huge surface area for type slop for LLMs, and why I would not recommend using a dynamically typed language for vibe coding.
You can have type "pressure" and nil "pressure" -> where you set a value to nil somewhere (that you probably shouldn't have) -> and that has ripple effects all throughout your codebase. Similarly, you can do this for values -> one place it's a string (where it shouldn't be), everywhere else a symbol (what it should be) -> but now you've got hundreds of casts to_sym or to_s in your codebase.
There's also state drift & reification misses -> you constantly update two states (that should probably just be one new value or a function) and sometimes you forget to update one (more of a bug possibility than complexity). Same for reification misses -> you constantly check for multiple conditions -> that should probably be one value or a function, and similarly (buggy, you may sometimes miss one).
Complexity comes down to state and control flow -> so you want to check what's causing you to make the most decisions (especially state/time based), and where it's coming from. Where do you have the most state and why...
I'm hoping to release everything in the next few weeks, but it takes a while to polish things, especially when it's a side-quest of a side project...
- aleksiy123 1 hour ago
  
  Interesting, I do think blending the fuzziness of models with the determinism of hard checks/conformance is the way too go.
  But using some kind of metrics as guardrails/steering seems interesting.
- epiccoleman 3 hours ago
  
  > Dynamically typed languages like Ruby create a huge surface area for type slop for LLMs, and why I would not recommend using a dynamically typed language for vibe coding.
  I totally understand this, and have seen the problems firsthand. But Elixir / Phoenix / LiveView, along with Tidewave, have become my favorite "vibe slop stack." Just so quick and easy, and the LLM seems to get things right quite often.

Daishiman 4 hours ago

What metrics have you found useful?