Comment by londons_explore

2 days ago

Very true. Last year I at least glanced at every line of AI generated code. Now if some AI makes a 10k line program for some one-off tasks, I run the program, glance only over the output, and move on.

16 comments

londons_explore

noduerme 2 days ago

Especially if you're having an LLM write non-interactive scripts to calculate complex things from large datasets, glancing at the output is not enough to know if the output is remotely accurate (unless the output is so trivial you could literally do it in your head).

Case in point: I recently asked an LLM to write a pile of code to compile historical baseball stats to test betting success against the results of my hand-written code that evolves genetic algorithms. I marveled for a little while at the unbelievable improvement in EV/ROI that this script was showing could have been achieved from certain small tweaks. I only noticed after pushing a total bet that the push registered on the output as a win - and only because I was carefully staying on top of it. A single stupid recursively operating >= instead of > had caused completely nonsensical results that looked plausible.

Imagine, like, trusting a 10k loc script to give you data for something you were going to build in the physical world, and hoping an LLM hadn't made a mistake like that.

bobjordan 1 day ago
Code needs tested. I'm glad that the bar of entry has been lowered but now we just have a huge amount of people that haven't yet learned anything about how to test and verify that the code meets the expected requirements.
- bdangubic 1 day ago
  
  AI codes, AI tests, AI verifies, in a Ralph Loop ( https://github.com/snarktank/ralph ) :)

MaKey 2 days ago

Which one-off tasks need 10k lines of code?

embedding-shape 2 days ago

Would depend on what AI and prompt you use ultimately. Ask it to add tests (functional, E2E and unit, maybe invent a new type too), packaging, modular code and/or whatever, and you get to 10K relatively quickly with some of the more verbose LLMs out there.
Personally it's probably the biggest struggle, trying to rein in the "spray and pray" approach LLMs typically like to take, and reducing the "patch on top of patch" syndrome too.
londons_explore 2 days ago
Calculate the engine power of a 2015 VW polo when travelling 70 mph on a flat road behind a box truck. Draw a chart of drag Vs follow distance. How significant is humidity on the result?
- Grosvenor 2 days ago
  
  European or African Polo?
  
  3 replies →
hgoel 2 days ago

One off web app for scrubbing through some data, that, once done, will never be run again?
bossyTeacher 2 days ago
Java programs
- rq1 2 days ago
  
  Enterprise programs*

rco8786 1 day ago

This is fine for one off tools and I do the same. But building long-lived "professional grade" production software this fails real quickly.

My team is using AI for most of the code, but the human review layer is crucial and unavoidable if you're interested in things like reliability, uptime, controlled feature rollouts, the integrity if your user's data, etc.

d4rkp4ttern 1 day ago

A huge factor I don’t see mentioned often enough, is the rapid increase of AI-coding in a language unknown to the dev.

JJOKOCHAA 17 hours ago

[dead]