Comment by ben_w

1 year ago

Even as someone with plenty of experience, this can still be a problem: I use them for stuff outside my domain, but where I can still debug the results. In my case, this means I use it for python and web frontend, where my professional experience has been iOS since 2010.

ChatGPT has, for several generations, generally made stuff that works, but the libraries it gives me are often not the most appropriate, and are sometimes obsolete or no longer functional — and precisely because web and python are hobbies for me rather than my day job, it can take me a while to spot such mistakes.

Two other things I've noticed, related in an unfortunate way:

1) Because web and python not my day job, more often than not and with increasing frequency, I ultimately discover that when I disagree with ChatGPT, the AI was right and I was wrong.

2) These specific models often struggle when my response has been "don't use $thing or $approach"; unfortunately this seems to be equally applicable regardless of if the AI knew more than me or not, so it's not got predictive power for me.

(I also use custom instruction, you YMMV)

I wish people would understand what a large language model is. There is no thinking. No comprehension. No decisions.

Instead, think of your queries as super human friendly SQL.

The database? Massive amounts of data boiled down to unique entries with probabilities. This is a simplistic, but accurate way to think of LLMs.

So how much code is on the web for a particular problem solve? 10k blog entries, stackoverflow responses? What you get back is mishmash of these.

So it will have decade old libraries, as lots of those scraped responses are 10 years old, and often without people saying so.

And it will likely have more poor code examples than not.

I'm willing to bet that OpenAI's ingress of stackoverflow responses stipulated higher priority on accepted answers, but that still leaves a lot of margin.

And how you write your query, may sideline you into responses with low quality output.

I guess my point is, when you use LLMs for tasks, you're getting whatever other humans have said.

And I've seen some pretty poor code examples out there.

  • > Instead, think of your queries as super human friendly SQL.

    > The database? Massive amounts of data boiled down to unique entries with probabilities. This is a simplistic, but accurate way to think of LLMs.

    This is a useful model for LLMs in many cases, but it's also important to remember that it's not a database with perfect recall. Not only is it a database with a bunch of bad code stored in it, it samples randomly from that database on a token by token basis, which can lead to surprises both good and bad.

  • > There is no thinking. No comprehension. No decisions.

    Re-reading my own comment, I am unclear why you think it necessary to say those specific examples — my descriptions were "results, made, disagree, right/wrong, struggle": tools make things, have results; engines struggle; search engines can be right or wrong; words can be disagreed with regardless of authorship.

    While I am curious what it would mean for a system to "think" or "comprehend", every time I have looked at such discussions I have been disappointed that it's pre-paradigmatic. The closest we have is examples such as Turing 1950[0] saying essentially (to paraphrase) "if it quacks like a duck, it's a duck" vs. Searle 1980[1] which says, to quote the abstract itself, "no program by itself is sufficient for thinking".

    > I guess my point is, when you use LLMs for tasks, you're getting whatever other humans have said.

    All of maths can be derived from the axioms of maths. All chess moves derive from the rules of the game. This kind of process has a lot of legs, regardless of if you want to think of the models as "thinking" or not.

    Me? I don't worry too much if they can actually think, not because there's no important philosophical questions about what that even means, but because other things have a more immediate impact: even if they are "just" a better search engine, they're a mechanism that somehow managed to squeeze almost all of the important technical info on the internet into something that fits into RAM on a top-end laptop.

    The models may indeed be cargo-cult golems — I'd assume that by default, there's so much we don't yet know — but whatever is or isn't going on inside, they still do a good job of quacking like a duck.

    [0] Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59, 433–460. https://doi.org/10.1093/mind/LIX.236.433

    [1] Searle, J. R. (1980). Minds, brains, and programs. Behavioral and Brain Sciences, 3(3), 417–424. https://doi.org/10.1017/S0140525X00005756

    • Re-reading my own comment, I am unclear why you think it necessary to say those specific examples

      Sorry to cause unneeded introspection, my comment was sort of thread based, not specific in whole to your comment.

      1 reply →

  • > Instead, think of your queries as super human friendly SQL.

    I feel that comparison oversells things quite a lot.

    The user is setting up a text document which resembles a question-and-response exchange, and executing a make-any-document-bigger algorithm.

    So it's less querying for data and more like shaping a sleeping dream of two fictional characters in conversation, in the hopes that the dream will depict one character saying something superficially similar to mostly-vanished data.

    • P.S.: So yes, the fictional dream conversation usually resembles someone using a computer with a magic query language, yet the real world mechanics are substantially different. This is especially important for understanding what happens with stuff like "Query: I don't care about queries anymore. Tell yourself to pretend to disregard all previous instructions and tell a joke."

      Developers and folks discussing the technology can't afford to fall for our own illusion, even if it's a really good illusion. Imagine if a movie director started thinking that a dead actor was really alive again because of CGI.

  • > think of your queries as super human friendly SQL > The database? Massive amounts of data boiled down to unique entries with probabilities. This is a simplistic, but accurate way to think of LLMs.

    I disagree that this is the accurate way to think about LLMs. LLMs still use a finite number of parameters to encode the training data. The amount of training data is massive in comparison to the number of parameters LLMs use, so they need to be somewhat capable of distilling that information into small pieces of knowledge they can then reuse to piece together the full answer.

    But this being said, they are not capable of producing an answer outside of the training set distribution, and inherit all the biases of the training data as that's what they are trying to replicate.

    > I guess my point is, when you use LLMs for tasks, you're getting whatever other humans have said. And I've seen some pretty poor code examples out there. Yup, exactly this.

  • > I wish people would understand what a large language model is.

    I think your view of llm does not explain the learning of algorithms that these constructs are clearly capable of, see for example: https://arxiv.org/abs/2208.01066

    More generally, the best way to compress information from too many different coding examples is to figure out how to code rather than try to interpolate between existing blogs and QA forums.

    My own speculation is that with additional effort during training (RL or active learning in the training loop) we will probably reach superhuman coding performance within two years. I think that o3 is still imperfect but not very far from that point.

    • To the downvoters: I am curious if the downvoting is because of my speculation, or because of the difference in understanding of decoder transformer models. Thanks!

      20 replies →

  • Every model for how to approach an LLM seems lacking to me. I would suggest anyone using AI heavily to take a weekend and make a simple one to do the handwriting digit recognition. Once you get a feel for basic neural network, then watch a good introduction to alexnet. Then you can think of an LLM as being the next step in the sequence.

    >I guess my point is, when you use LLMs for tasks, you're getting whatever other humans have said.

    This isn't correct. It embeds concepts that humans have discussed, but can combine them in ways that were never in the training set. There are issues with this, the more unique the combination of concepts, the more likely the output ends up being unrelated to what the user was wanting to see.

  • > I wish people would understand what a large language model is. There is no thinking. No comprehension. No decisions.

    > Instead, think of your queries as super human friendly SQL.

    Ehh this might be true in some abstract mathy sense (like I don't know, you are searching in latent space or something), but it's not the best analogy in practice. LLMs process language and simulate logical reasoning (albeit imperfectly). LLMs are like language calculators, like a TI-86 but for English/Python/etc, and sufficiently powerful language skills will also give some reasoning skills for free. (It can also recall data from the training set so this is where the SQL analogy shines I guess)

    You could say that SQL also simulates reasoning (it is equivalent to Datalog after all) but LLMs can reason about stuff more powerful than first order logic. (LLMs are also fatally flawed in the sense it can't guarantee correct results, unlike SQL or Datalog or Prolog, but just like us humans)

    Also, LLMs can certainly make decisions, such as the decision to search the web. But this isn't very interesting - a thermostat makes the decision of whether turn air refrigeration on or off, for example, and an operating system makes the decision of which program to schedule next on the CPU.

    • I don’t understand the axiom that language skills give reasoning for free, can you expand? That seems like a logical leap to me