← Back to context

Comment by throwdbaaway

18 days ago

As mentioned by the sibling comment from godelski, it is about the lack of precision, not the lack of determinism. After all, we already got https://thinkingmachines.ai/blog/defeating-nondeterminism-in..., which is not even an issue for single user local inference.

Question: Have you tried using LLM as a compiler?

Well, I sort of did, as a fun exercise. I came up with a very elaborate ~5000 tokens prompt, such that when fed with a ~500 tokens function, I will get back a ~600 tokens rewritten function.

The prompt contains 10+ examples, such that the model will learn the steps from the context. Then, it will start by going through a series of yes/no questions, to decide what's the correct rewrite pattern to apply. The tricky part here is the lack of precision, such that the "else" clause has to be reserved for the condition that is the hardest to communicate clearly in English. Then it will extract the part that needs to be rewritten and introspect the formatting, again with a series of simple questions. Lastly, it will proceed, confidently, with the rewrite.

With this, I did some testing with 50+ randomly chosen functions, and I could get back the exact same rewritten functions, from about 20 models that are good in coding, down to the newlines and indentations. With a strong model, there might only be 1~2 output tokens in the whole test where the probability was less than 80%, so the lack of batch invariance wasn't even a problem. (temperature=0 usually messes up logprobs, go with top_k=1 or top_p=0.01)

So input + English = output, works for multiple models from multiple companies.

But what's the point of writing so much English, in hope that it leaves no room for ambiguity? For now, I will stick with mitchellh's style of (occasional) LLM assisted programming, jumping in to write the code when precision is needed.