Comment by PaulHoule
8 days ago
Personally I think foundation models are for the birds, the cost of developing one is immense and the time involved is so great that you can't do many run-break-fix cycles so you will get nowhere on a shoestring. (Though maybe you can get somewhere on simple tasks and synthetic data)
Personally I am working on a reliable model trainer for classification and sequence labeling tasks that uses something like ModernBERT at the front end and some kind of LSTM on the back end.
People who hold court on machine learning forums will swear by fine-tuned BERT and similar things but they are not at all interested in talking about the reliable bit. I've read a lot of arXiv papers where somebody tries to fine-tune a BERT for a classification task, runs some arbitrarily chosen parameters they got out of another paper and it sort-of works some of the time.
It drives me up the wall that you can't use early stopping for BERT fine-tuning like I've been using on neural nets since 1990 or so and if I believe what I'm seeing I don't think the networks I've been using for BERT fine-tuning can really benefit from training sets with more than a few thousand examples, emphasis on the "few".
My assumption is that everybody else is going to be working on the flashy task of developing better foundation models and as long as they emit an embedding-per-token I can plug a better foundation model in and my models will perform better.
> Personally I think foundation models are for the birds,
I might not quite that far, but I have publicly said (and will stand by the statement) that I think that training progressively larger and more complex foundation models is a waste of resources. But my view of AI is rooted in a neuro-symbolic approach, with emphasis on the "symbolic". I envision neural networks not as the core essence of an AI, but mainly as just adapters between different representations that are used by different sub-systems. And possibly as "scaffolding" where one can use the "intelligence" baked into an LLM as a bridge to get the overall system to where it can learn, and then eventually kick the scaffold down once it isn't needed anymore.
We learned something pretty big and surprising from each new generation of LLM, for a small fraction of the time and cost of a new particle accelerator or space telescope. Compared to other big science projects, they're giving pretty good bang for the buck.
I can sure talk your ear off about that one as I went way too far into the semantic web rabbit hole.
Training LLMs to use 'tools' of various types is a great idea, as it is to run them inside frameworks that check that their output satisfies various constraints. Still certain problems like the NP-complete nature of SAT solving (and many intelligent systems problems, such as word problems you'd expect an A.I. to solve, boil down to SAT solving) and problems such as the halting problem, Godel's theorem and such are still problems. I understand Doug Hofstader has softened his positions lately, but I think many of the problems set up in this book
https://en.wikipedia.org/wiki/G%C3%B6del,_Escher,_Bach
(particularly the Achilles & Tortoise dialog) still stand today, as cringey as that book seems to me in 2025.
i am hoping for an slm "turing tape" small language model where the tokens are instructions for a copycat engine
As somebody who considers himself something of a Semantic Web enthusiast / advocate, and has also read GEB, I can totally relate. To me, this is really one of those "THE ISSUE" things: how can we use some notion of formal logic to solve problems, without being forced to give up hope due to incompleteness and/or the Halting Problem. Clearly you have to give up something as a tradeoff for making this stuff tractable, but I suppose it's an open question what you can tradeoff and how exactly that factors into the algorithm, as well as what guarantees (if any) remain...
1 reply →
Good old fashioned AI, amirite
Well, to the extent that people equate GOFAI with purely symbolic / logic-based processing, then no, not for my money anyway. I think it's possible to construct systems that use elements of symbolic processing along with sub-symbolic approaches and get useful results. I think of it as (although this is something of an over-simplification) taking symbolic reasoning, relaxing some of the constraints that go along with the guarantees that method makes out the outputs, and accepting a (hopefully only slightly) less desirable output. OR, think about flipping the whole thing around, get an output from, say, an LLM where there might be hallucination(s), and then use a symbolic reasoning system to post-process the output to ensure veracity before sending it to the user. Amazon has done some work along those lines, for example. https://aws.amazon.com/blogs/machine-learning/reducing-hallu...
Anyway this is all somewhat speculative, and I don't want to overstate the "weight" of anything I seem to be claiming here. This is just the direction my interests and inclinations have taken me in.
6 replies →
Yeah I've been wondering how one can contribute and build in the LLM and AI world without the resources to work on foundation models.
Because personally I'm not a product/GPT wrapper person - it just doesn't suit my interests.
So then what can one do that's meaningful and valuable? Probably something around finetuning?