Comment by jaccola
1 year ago
When I ran these prompts, I saw in the chain of thought
Hmm, I need to run some code. I'm thinking I can use Python, right? There’s this Python tool I can simulate in my environment since I can’t actually execute the code. I’ll run a TfidfVectorizer code snippet to compute some outcomes.
It is ambiguous, but this leads me to believe the model does have access to a Python tool. Also, my 'toy examples' were identical to yours, making me think it has been seen in the training data.
This gave me a thought on the future of consumer-facing LLMs though. I was speaking to my nephew about his iPhone, he hadn't really considered that it was "just" a battery, a screen, some chips, a motor, etc.. all in a nice casing. To him, it was a magic phone!
Technical users will understand LLMs are "just" next token predictors that can output structured content to interface with tools all wrapped in a nice UI. To most people they will become magic. (I already watched a video where someone tried to tell the LLM to "forget" some info...)
85 IQ: LLMs are magic
110 IQ: LLMs are "just" next token predictors that can output structured content to interface with tools all wrapped in a nice UI
140 IQ: LLMs are magic
The word "just" can trivialize so much. Rockets are "just" explosives pointed in one direction. Computers are "just" billions of transistors in a single package. Humans are "just" a protein shell for DNA.
There is a lot of that LLMs are just(x) running around, but it seems to me like that is missing the point, in the extreme.
The “magic” is that yes, LLMs are “just” statistical next token predictors.
And as code only, LLMs produce garbage.
When you feed them human cultural-linguistic data, they “magically” can communicate useful ideas, reason, maintain an internal world state, and use tools.
The llm architecture is just a mechanism for imprinting and representing human cultural data. Human cultural data is the “magic”, somehow embodying the ability to reason, maintain state, use tools, and communicate.
Learning how to represent language data in vector-space allowed us to actually encode the meaning embedded in cultural data, since written language is just a shorthand.
Actually representing meaning allows us to run culture as code. Transformer boxes are a target for that code.
The magic is human culture.
Culture matters. We should be curating our culture.
140 IQ: LLMS are magic ... token predictors that can output structured content
Which is all we are. Next-token prediction isn't just all you need, it's all there is.
That's the most interesting part of what we're learning now, I think. So many people refused to accept that for any number of reasons -- religious, philosophical, metaphysical, personal -- and now they have no choice.
o3-mini doesn't have access to a Python tool. I've seen this kind of thing in the "reasoning" chains from other models too - they'll often say things like "I should search for X" despite not having a search tool. It's just a weird aspect of how the reasoning tokens get used.
Some of the paid models do have access to an interpreter, ever since the "Code Interpreter" feature was announced some time ago. Seems like it's already been a year or two.
It doesn't always use the tool, but it can: https://chatgpt.com/share/67bcb3cb-1024-800b-8b7e-31335c6347...
Asking a model a question like "Which ChatGPT models have access to a Python interpreter?" is rarely a good idea, because they're limited to the knowledge that was available when the model itself was trained.
In that case you're using GPT-4o which we know has access to Code Interpreter. The annoying thing here is that o3-mini still doesn't.
If you're using ChatGPT or the Assistants API w/ managed tools (I dont remember if this is even available for o3-mini), it has access to a python execution tool.
the chat but o3-mini does not have access to the code interpreter