Comment by skydhash

3 months ago

> nobody - nobody in 2025, at least - should claim to understand them well

I’m highly suspicious of this claim as the models are not something that we found on an alien computer. I may accept that nobody has found how to extract an actual usable logic out of the numbers soup that is the actual model, but we know the logic of the interactions that happen.

4 comments

skydhash

thomassmith65 3 months ago

That's not the point, though. Yes, we understand why ANNs work, and we - clearly - understand how to create them, even fancy ones like ChatGPT.

What we understand poorly is what kinds of tasks they are capable of. That is too complex to reason about; we cannot deduce that from the spec or source code or training corpus. We can only study how what we have built actually seems to function.

skydhash 3 months ago
As for LLMs, that’s easy, it’s in the name. It’s good at generating texts. What we are trying to do is mostly get it to generate useful texts (and see if we can apply the same techniques to other type of data).
It’s kinda the same with computers, we know the general shape of what they can do and how they do it. We are mostly trying to see if a particular problem can be solved with it, how efficiently can it be, and to what degree.
- thomassmith65 3 months ago
  
  Ach, I'm having trouble getting the distinction across:
  It's not hard to write and understand an ANN. It's like a one or two day project. LLMs, I assume, aren't all that much harder: fewer LOC than most most GUI apps.
  It's also not hard to understand why ANNs and LLMs work. It's only conceptually one step further than "write millions of programs randomly and stop when one actually works"
  The part that we don't understand, and that will take many years to understand, is what behaviours and abilities we can expect from a massive, trained LLM.
  The fact that (A) it is so easy to understand how to create an ANN, and (B) it takes so few LOC to create one, really underlines the point: the interesting, complex behaviour is something that 'emerges' (from simply adding more nodes to the spec) and that nobody today has any hint of how to code procedurally.