Comment by simonw

2 months ago

Opus 4.5 really is something else. I've been having a ton of fun throwing absurdly difficult problems at it recently and it keeps on surprising me.

A JavaScript interpreter written in Python? How about a WebAssembly runtime in Python? How about porting BurntSushi's absurdly great Rust optimized string search routines to C and making them faster?

And these are mostly just casual experiments, often run from my phone!

24 comments

simonw

krackers 2 months ago

>A JavaScript interpreter written in Python?

I'm assuming this refers to the python port of Bellard's MQJS [1]? It's impressive and very useful, but leaving out the "based on mqjs" part is misleading.

[1] https://github.com/simonw/micro-javascript?

simonw 2 months ago
That's why I built the WebAssembly one - the JavaScript one started with MQJS, but for the WebAssembly one I started with just a copy of the https://github.com/webassembly/spec repo.
I haven't quite got the WASM one into a share-able shape yet though - the performance is pretty bad which makes the demos not very interesting.
- dvrp 2 months ago
  
  Isn’t that telling though?
  
  1 reply →

burntsushi 2 months ago

> How about porting BurntSushi's absurdly great Rust optimized string search routines to C and making them faster?

How did it do? :-)

simonw 2 months ago
Alarmingly well! https://gisthost.github.io/?1bf98596a83ff29b15a2f4790d71c41d...
It couldn't quite beat the Rust implementation on everything, but it managed to edge it out on at least some of the benchmarks it wrote for itself.
(Honestly it feels like a bit of an afront to the natural order of things.)
That said... I'm most definitely not a Rust or C programmer. For all I know it cheated at the benchmarks and I didn't spot it!
- burntsushi 2 months ago
  
  Nice. Yeah I'd have to actually look at what it did. For the task of substring search, it's extremely easy to fall into a local optima. The `memchr` crate has oodles of benchmarks, and some of them are very much in tension with others. It's easy to do well on one to the expense of others.
  But still, very neat.
  
  1 reply →
- aizk 2 months ago
  
  What are you using to easily share the conversation as its own webpage? Very nice and tidy.
  
  2 replies →

falloutx 2 months ago

I have tried to give it extreme problems like creating slime mold pathing algorithm and creating completely new shoe-lacing patterns and it starts struggling with problems which use visual reasoning and have very little consensus on how to solve them.

ronsor 2 months ago

One of my first tests with it was "Write a Python 3 interpreter in JavaScript."

It produced tests, then wrote the interpreter, then ran the tests and worked until all of them passed. I was genuinely surprised that it worked.

Calavar 2 months ago

There are multiple Python 3 interpreters written in JavaScript that were very likely included in the training data. For example [1] [2] [3]
I once gave Claude (Opus 3.5) a problem that I thought was for sure too difficult for an LLM, and much to my surprise it spat out a very convincing solution. The surprising part was I was already familiar with the solution - because it was almost a direct copy/paste (uncredited) from a blog post that I read only a few hours earlier. If I hadn't read that blog post, I would have been none the wiser that copy/pasting Claude's output would be potential IP theft. I would have to imagine that LLMs solve a lot of in-training-set problems this way and people never realize they are dealing with a copyright/licensing minefield.
A more interesting and convincing task would be to write a Python 3 interpeter in JavaScript that uses register based bytecode instead of stack based, supports optimizing the bytecode by inlining procedures and constant folding, and never allocates memory (all work is done in a single user provided preallocated buffer). This would require integrating multiple disparate coding concepts and not regurgitating prior art from the training data
[1] https://github.com/skulpt/skulpt
[2] https://github.com/brython-dev/brython
[3] https://github.com/yzyzsun/PyJS
wubrr 2 months ago

It's ability to test/iterate and debug issues is pretty impressive.
Though it seems to work best when context is minimized. Once the code passes a certain complexity/size it starts making very silly errors quite often - the same exact code it wrote in a smaller context will come out with random obvious typos like missing spaces between tokens. At one point it started writing the code backwards (first line at the bottom of the file, last line at the top) :O.

Loocid 2 months ago

I'm not super surprised that these examples worked well. They are complex and a ton of work, but the problems are relatively well defined with tons of documentation online. Sounds ideal for an LLM no?

simonw 2 months ago

Yes, that's a point I've been trying to emphasize: if a problem is well specified a coding agent can crunch for hours on it to get to a solution.
Even better if there's an existing conformance suite to point at - like html5lib-tests or the WenAssembly spec tests.

troupo 2 months ago

On the other hand when I tried it just yesterday, I couldn't really see a difference. As I wrote elsewhere: same crippled context window, same "I'll read 10 irrelevant lines from a file", same random changes etc.

Meanwhile half a year to a year ago I could already point whatever model was du jour at the time at pychromecast and tell it repeatedly "just convert the rest of functionality to Swift" and it did it. No idea about the quality of code, but it worked alongside with implementations for mDNS, and SwiftUI, see gif/video here: https://mastodon.nu/@dmitriid/114753811880082271 (doesn't include chromecast info in the video).

I think agents have become better, but models likely almost entirely plateaued.

Krei-se 2 months ago

Insanely difficult to you maybe because you stopped learning. What you cannot create you don't understand.

simonw 2 months ago
Are you honestly saying that building a new spec-compliant WebAssembly runtime from scratch isn't an absurdly difficult project?
- Krei-se 1 month ago
  
  i took the time to review your python wasm project and restate the fact that you seem to have 0 idea about how a compiler works.
  Its simple and rigid rules an AI can pick up easily. If you lack this knowledge people that have it will simply stop the conversation when you resort to shouting louder and more often.
  This does not make your point more valid. If you notice people not engaging with you - that's the reason. You simply don't learn, you just look around who shares your opinion with no backed results.
  Why not show benchmarks or sth ;)
  
  2 replies →