Comment by LarsDu88

6 days ago

My goal post for "AI will definitely replace most SWEs" was to reproduce a particular 90s programming game one shot and then add multiplayer support with minimal prompting.

Opus 4.5 hit that point in November.

10 comments

LarsDu88

grey-area 6 days ago

I tried this a while ago, haven’t tried again recently. The models were producing code that was clearly lifted from stuff in their training data, and what I ended up with was a fairly decent game in html and js after a bit of tidy up, though it felt like several code paradigms smooshed together rather than a coherent whole, but it mostly worked. Not something I’d want to maintain but it was impressive at the time.

They were able to one-shot famous games (like asteroid or pong), I suspect because they had been trained on multiple versions of that game. So like producing Harry Potter, with the right prompt it was able to produce a license stripped version of code it had seen. I tried another arcade game like frogger and it failed really badly and took a lot longer, never got it working.

The whole exercise left me feeling they have a long way to go, I don’t see how anyone could think they would replace SWE unless they didn’t look at the code produced, even now.

vessenes 6 days ago
Out of curiosity - what harness did you use, and what model? And how are you prompting? In my mind prompting like:
“You’re going to make frogger in javascript. I want a complete clone of functionality for level 1, with amazing 80s era pixel art sprites. I’m super lazy, so you’re going to have to test everything, right from the start. Pick a test harness, write the tests, including tests for having amazing graphics, gameplay, input, UI, sounds, etc, and write a full workplan, then work through that workplan, in parallel where you can. The workplan should emphasize getting a stripped down version up immediately and have workstreams for all the major requirements after that. Add a final test that assesses how fun the game is by reviewing a real video of a test run. Loop on that final test until you can’t improve things any more.”
Should produce something playable with no further input. As you say, I’m not sure it would produce a codebase we’d want to look at or work on. But, I’d be surprised if this weren’t successful.
- grey-area 6 days ago
  
  Sure give it a go, perhaps it will work better now with frontier models, I haven't tried it in a while (this was a year ago, things have improved since then). I'm not sure what tests for having amazing graphics, gameplay, input, UI, sounds, etc would look like, but it would be interesting to see the results!
  
  7 replies →