Comment by badsectoracula

2 hours ago

A related test i did around the beginning of the year: i came up with a simple stack-oriented language and asked an LLM to solve a simple problem (calculate the squared distance between two points, the coordinates of which are already in the stack) and had it figure out the details.

The part i found neat was that i used a local LLM (some quantized version of QwQ from around December or so i think) that had a thinking mode so i was able to follow the thought process. Since it was running locally (and it wasn't a MoE model) it was slow enough for me to follow it in realtime and i found fun watching the LLM trying to understand the language.

One other interesting part is the language description had a mistake but the LLM managed to figure things out anyway.

Here is the transcript, including a simple C interpreter for the language and a test for it at the end with the code the LLM produced:

https://app.filen.io/#/d/28cb8e0d-627a-405f-b836-489e4682822...

2 comments

badsectoracula

chrisweekly 1 hour ago

THANK YOU for SHARING YOUR WORK!!

So many commenters claim to have done things w/ AI, but don't share the prompts. Cool experiment, cooler that you shared it properly.

fsloth 37 minutes ago

"but don't share the prompts."
To be honest I don't want to see anyone elses prompts generally because what works is so damn context sensitive - and seem to be so random what works and what not. Even though someone else had a brilliant prompt, there are no guarantees they work for me.
If working with something like Claude code, you tell it what you want. If it's not what you wanted, you delete everything, and add more specifications.
"Hey I would like to create a drawing app SPA in html that works like the old MS Paint".
If you have _no clue_ what to prompt, you can start by asking the prompt from the LLM or another LLM.
There are no manuals for these tools, and frankly they are irritatingly random in their capabilities. They are _good enough_ that I tend to always waste time trying to use them for every novell problem I came face with, and they work maybe 30% - 50% of time. And sometimes reach 100%.