Comment by Calavar

2 months ago

There are multiple Python 3 interpreters written in JavaScript that were very likely included in the training data. For example [1] [2] [3]

I once gave Claude (Opus 3.5) a problem that I thought was for sure too difficult for an LLM, and much to my surprise it spat out a very convincing solution. The surprising part was I was already familiar with the solution - because it was almost a direct copy/paste (uncredited) from a blog post that I read only a few hours earlier. If I hadn't read that blog post, I would have been none the wiser that copy/pasting Claude's output would be potential IP theft. I would have to imagine that LLMs solve a lot of in-training-set problems this way and people never realize they are dealing with a copyright/licensing minefield.

A more interesting and convincing task would be to write a Python 3 interpeter in JavaScript that uses register based bytecode instead of stack based, supports optimizing the bytecode by inlining procedures and constant folding, and never allocates memory (all work is done in a single user provided preallocated buffer). This would require integrating multiple disparate coding concepts and not regurgitating prior art from the training data

[1] https://github.com/skulpt/skulpt

[2] https://github.com/brython-dev/brython

[3] https://github.com/yzyzsun/PyJS

0 comments

Calavar

No comments yet

Contribute on Hacker News ↗