← Back to context

Comment by nyrikki

1 year ago

ML is better than biological neurons in some tasks, they are different contexts.

Almost all the performance of say college tests are purely from the pre-training, pattern finding and detection.

Transformers are limited to DLOGTIME-uniform TC0, they can't even do the Boolean circuit value problem.

The ability to use the properties of BPP, does help.

Understanding the power of, and limitations of iteration and improving approximations requires descriptive complexity theory IMHO.

I read a book on recursively enumerable degrees once, which IIRC was a sort of introduction to complexity classes of various computable functions, but I never imagined it having practical use; so this post is eye-opening. I've been nattering about how the models are largely finding separating hyperplanes after non-linear transformations have been done, but this approach where the AI solving ability can't be more complex than the complexity class allows is an interesting one.

The discussion cannot go deeper than the current level, unfortunately. One thing to not forget when thinking about decoder transformer models is that there is no limitation to having parts of the output / input stream be calculated by other circuits if it helps the cause. Eg send a token to use a calculator, compute and fill the answer; send a token to compile and run a code and fill the stream with the results. The complexity class of the main circuit might not need be much more complicated than the 200-level deep typical architectures of today as long as they can have access to memory and tools. You can call this system something else if you prefer (decoder-transformer-plus-computer), but that is what people interact with in ChatGPT, so not sure I agree that complexity theory limits the superhuman ability. Humans are not good with complexity.

I recall early, incomplete speculation about transformers not solving Boolean circuit value problems; what did you think of this work? https://arxiv.org/abs/2402.12875v3

  • > However, with T steps of CoT, constant-depth transformers using constant-bit precision and O(logn) embedding size can solve any problem solvable by boolean circuits of size T

    There is a difference between being equivalent to a circuit and prediction of the output of the BVSP.

    That is what I was suggesting learning descriptive complexity theory would help with.

    • Why does the limit on computational complexity of single decoder transformers matter for obtaining superhuman coding ability? Is there a theory of what level of complexity is needed for the task of coding according to a spec? Or the complexity for translation/optimization of a code? Even if there were, and one could show that a plain decoder transformer is insufficient, you probably only need to add a tool in the middle of the stream processing. Unless you have some specific citation that strongly argues otherwise, I will stick with my speculative/optimistic view on the upcoming technology explosion. To be fair, I always thought coding was at best modest complexity, not super hard compared to other human activities, so I will not make claims of generic superintelligences anytime soon, though I hope they happen in the near term, but I’d be happy if I simply see them in a decade, and I don’t feel partial to any architecture. I just think that attention was a cool idea even before the transformers, and decoder transformers took it to the limit. It may be enough for a lot of superhuman achievements. Probably not for all. We will see.

      1 reply →