Comment by jampekka

2 years ago

Current LLM architectures have fundamental limitations, which means they can not learn some problems regardless of training.

A simple example is that they fundamentally can not balance parentheses more than half their context width.