Comment by disambiguation
3 months ago
I am once again shilling the idea that someone should find a way to glue Prolog and LLMs together for better reasoning agents.
https://news.ycombinator.com/context?id=43948657
Thesis:
1. LLMs are bad at counting the number of r's in strawberry.
2. LLMs are good at writing code that counts letters in a string.
3. LLMs are bad at solving reasoning problems.
4. Prolog is good at solving reasoning problems.
5. ???
6. LLMs are good at writing prolog that solves reasoning problems.
Common replies:
1. The bitter lesson.
2. There are better solvers, ex. Z3.
3. Someone smart must have already tried and ruled it out.
Successful experiments:
> "4. Prolog is good at solving reasoning problems."
Plain Prolog's way of solving reasoning problems is effectively:
You hard code some options, write a logical condition with placeholders, and Prolog brute-forces every option in every placeholder. It doesn't do reasoning.
Arguably it lets a human express reasoning problems better than other languages by letting you write high level code in a declarative way, instead of allocating memory and choosing data types and initializing linked lists and so on, so you can focus on the reasoning, but that is no benefit to an LLM which can output any language as easily as any other. And that might have been nice compared to Pascal in 1975, it's not so different to modern garbage collected high level scripting languages. Arguably Python or JavaScript will benefit an LLM most because there are so many training examples inside it, compared to almost any other langauge.
>> You hard code some options, write a logical condition with placeholders, and Prolog brute-forces every option in every placeholder. It doesn't do reasoning.
SLD-Resolution with unification (Prolog's automated theorem proving algorithm) is the polar opposite of brute force: as the proof proceeds, the cardinality of the set of possible answers [1] decreases monotonically. Unification itself is nothing but a dirty hack to avoid having to ground the Herbrand base of a predicate before completing a proof; which is basically going from an NP-complete problem to a linear-time one (on average).
Besides which I find it very difficult to see how a language with an automated theorem prover for an interpreter "doesn't do reasoning". If automated theorem proving is not reasoning, what is?
___________________
[1] More precisely, the resolution closure.
> "as the proof proceeds, the cardinality of the set of possible answers [1] decreases"
In the sense that it cuts off part of the search tree where answers cannot be found?
will never do the slow_computation - but if it did, it would come up with the same result. How is that the polar opposite of brute force, rather than an optimization of brute-force?
If a language has tail call optimization then it can handle deeper recursive calls with less memory. Without TCO it would do the same thing and get the same result but using more memory, assuming it had enough memory. TCO and non-TCO aren't polar opposites, they are almost the same.
5 replies →
Prolog was introduced to capture natural language - in a logic/symbolic way that didn't prove as powerful as today's LLM for sure, but this still means there is a large corpus of direct English to Prolog mappings available for training, and also the mapping rules are much more straightforward by design. You can pretty much translate simple sentences 1:1 into Prolog clauses as in the classic boring example
This is being taken advantage of in Prolog code generation using LLMs. In the Quantum Prolog example, the LLM is also instructed not to generate search strategies/algorithms but just planning domain representation and action clauses for changing those domain state clauses which is natural enough in vanilla Prolog.
The results are quite a bit more powerful, close to end user problems, and upward in the food chain compared to the usual LLM coding tasks for Python and JavaScript such as boilerplate code generation and similarly idiosyncratic problems.
"large corpus" - large compared to the amount of Python on Github or the amount of JavaScript on all the webpages Google has ever indexed? Quantum Prolog doesn't have any relevant looking DuckDuckGo results, I found it in an old comment of yours here[1] but the link goes to a redirect which is blocked by uBlock rules and on to several more redirects beyond which I didn't get to a page. In your linked comment you write:
> "has convenient built-in recursive-decent parsing with backtracking built-in into the language semantics, but also has bottom-up parsing facilities for defining operator precedence parsers. That's why it's very convenient for building DSLs"
which I agree with, for humans. What I am arguing is that LLMs don't have the same notion of "convenient". Them dumping hundreds of lines of convoluted 'unreadable' Python (or C or Go or anything) to implement "half of Common Lisp" or "half of a Prolog engine" for a single task is fine, they don't have to read it, and it gets the same result. What would be different is if it got a significantly better result, which I would find interesting but haven't seen a good reason why it would.
[1] https://news.ycombinator.com/item?id=40523633
Its a Horn clause resolver...that's exactly the kind of reasoning that LLMs are bad at. I have no idea how to graft Prolog to an LLM but if you can graft any programming language to it, you can graft Prolog more easily.
Also, that you push Python and JavaScript makes me think you don't know many languages. Those are terrible languages to try to graft to anything. Just because you only know those 2 languages doesn't make them good choices for something like this. Learn a real language Physicist.
> Also, that you push Python and JavaScript
I didn't push them.
> Those are terrible languages to try to graft to anything.
Web browsers, Blender, LibreOffice and Excel all use those languages for embedded scripting. They're fine.
> Just because you only know those 2 languages doesn't make them good choices for something like this.
You misunderstood my claim and are refuting something different. I said there is more training data for LLMs to use to generate Python and JavaScript, than Prolog.
6 replies →
No call for talking down at people. No one has ever been convinced by being belittled.
> I have no idea how to graft Prolog to an LLM
Wrapping either the SWI prolog MQI, or even simpler an existing Python interface like like janus_swi, in a simple MCP is probably an easy weekend project. Tuning the prompting to get an LLM to reliably and effectively choose to use it when it would benefit from symbolic reasoning may be harder, though.
We would begin by having a Prolog server of some kind (I have no idea if Prolog is parallelized but it should very well be if we're dealing with Horn Clauses).
There would be MCP bindings to said server, which would be accessible upon request. The LLM would provide a message, it could even formulate Prolog statements per a structured prompt, and then await the result, and then continue.
> Its a Horn clause resolver...that's exactly the kind of reasoning that LLMs are bad at. I have no idea how to graft Prolog to an LLM but if you can graft any programming language to it, you can graft Prolog more easily.
By grafting LLM into Prolog and not other way around ?
This sparked a really fascinating discussion, I don't know if anyone will see this but thanks everyone for sharing your thoughts :)
I understand your point - to an LLM there's no meaningful difference between once turing complete language and another. I'll concede that I don't have a counter argument, and perhaps it doesn't need to be prolog - though my hunch is that LLM's tend to give better results when using purpose built tools for a given type of problem.
The only loose end I want to address is the idea of "doing reasoning."
This isn't an AGI proposal (I was careful to say "good at writing prolog") just an augmentation that (as a user) I haven't yet seen applied in practice. But neither have I seen it convincingly dismissed.
The idea is the LLM would act like an NLP parser that gradually populates a prolog ontology, like building a logic jail one brick at a time.
The result would be a living breathing knowledge base which constrains and informs the LLM's outputs.
The punchline is that I don't even know any prolog myself, I just think it's a neat idea.
Of course it does "reasoning", what do you think reasoning is? From a quick google: "the action of thinking about something in a logical, sensible way". Prolog searches through a space of logical proposition (constraints) and finds conditions that lead to solutions (if one exists).
(a) Trying adding another 100 or 1000 interlocking proposition to your problem. It will find solutions or tell you one doesn't exist. (b) You can verify the solutions yourself. You don't get that with imperative descriptions of problems. (b) Good luck sandboxing Python or JavaScript with the treat of prompt injection still unsolved.
Of course it doesn't "do reasoning", why do you think "following the instructions you gave it in the stupidest way imaginable" is 'obviously' reasoning? I think one definition of reasoning is being able to come up with any better-than-brute-force thing that you haven't been explicitly told to use on this problem.
Prolog isn't "thinking". Not about anything, not about your problem, your code, its implementation, or any background knowledge. Prolog cannot reason that your problem is isomorphic to another problem with a known solution. It cannot come up with an expression transform that hasn't been hard-coded into the interpreter which would reduce the amount of work involved in getting to a solution. It cannot look at your code, reason about it, and make a logical leap over some of the code without executing it (in a way that hasn't been hard-coded into it by the programmer/implementer). It cannot reason that your problem would be better solved with SLG resolution (tabling) instead of SLD resolution (depth first search). The point of my example being pseudo-Python was to make it clear that plain Prolog (meaning no constraint solver, no metaprogramming), is not reasoning. It's no more reasoning than that Python loop is reasoning.
If you ask me to find the largest Prime number between 1 and 1000, I might think to skip even numbers, I might think to search down from 1000 instead of up from 1. I might not come up with a good strategy but I will reason about the problem. Prolog will not. You code what it will do, and it will slavishly do what you coded. If you code counting 1-1000 it will do that. If you code Sieve of Eratosthenes it will do that instead.
4 replies →
Contrary to what everyone else is saying, I think you're completely correct. Using it for AI or "reasoning" is a hopeless dead end, even if people wish otherwise. However I've found that Prolog is an excellent language for expressing certain types of problems in a very concise way, like parsers, compilers, and assemblers (and many more). The whole concept of using a predicate in different modes is actually very useful in a pragmatic way for a lot of problems.
When you add in the constraint solving extensions (CLP(Z) and CLP(B) and so on) it becomes even more powerful, since you can essentially mix vanilla Prolog code with solver tools.
The reason why you can write parsers with Prolog is because you can cast the problem of determining whether a string belongs to a language or not as a proof, and, in Prolog, express it as a set of Definite Clauses, particularly with the syntactic sugar of Definite Clause Grammars that give you an executable grammar that acts as both acceptor and generator and is equivalent to a left-corner parser.
Now, with that in mind, I'd like to understand how you and the OP reconcile the ability to carry out a formal proof with the inability to do reasoning. How is it not reasoning, if you're doing a proof? If a proof is not reasoning, then what is?
14 replies →
Even in your example (which is obviously not correct representation of prolog), that code will work X orders magnitude faster and with 100% reliability compared to much more inferior LLM reasoning capabilities.
This is not the point though
2 replies →
What makes you think your brain isn't also brute forcing potential solutions subconciously and only surfacing the useful results?
Because I can solve problems that would take the age of the universe to brute force, without waiting the age of the universe. So can you: start counting at 1, increment the counter up to 10^8000, then print the counter value.
Prolog: 1, 2, 3, 4, 5 ...
You and me instantly: 10^8000
2 replies →
Can you try calculating 101 * 70 in your head?
19 replies →
human brains are insanely powerful pattern matching and shortcut-taking machines. There's very little brute forcing going on.
4 replies →
Just intuition ;)
Everything you've written here is an invalid over-reduction, I presume because you aren't terribly well versed with Prolog. Your simplification is not only outright erroneous in a few places, but essentially excludes every single facet of Prolog that makes it a turing complete logic language. What you are essentially presenting Prolog as would be like presenting C as a language where all you can do is perform operations on constants, not even being able to define functions or preprocessor macros. To assert that's what C is would be completely and obviously ludicrous, but not so many people are familiar enough with Prolog or its underlying formalisms to call you out on this.
Firstly, we must set one thing straight: Prolog definitionally does reasoning. Formal reasoning. This isn't debatable, it's a simple fact. It implements resolution (a computationally friendly inference rule over computationally-friendly logical clauses) that's sound and refutation complete, and made practical through unification. Your example is not even remotely close to how Prolog actually works, and excludes much of the extra-logical aspects that Prolog implements. Stripping it of any of this effectively changes the language beyond recognition.
> Plain Prolog's way of solving reasoning problems is effectively:
No. There is no cognate to what you wrote anywhere in how Prolog works. What you have here doesn't even qualify as a forward chaining system, though that's what it's closest to given it's somewhat how top-down systems work with their ruleset. For it to even approach a weaker forward chaining system like CLIPS, that would have to be a list of rules which require arbitrary computation and may mutate the list of rules it's operating on. A simple iteration over a list testing for conditions doesn't even remotely cut it, and again that's still not Prolog even if we switch to a top-down approach by enabling tabling.
> You hard code some options
A Prolog knowledgebase is not hardcoded.
> write a logical condition with placeholders
A horn clause is not a "logical condition", and those "placeholders" are just normal variables.
> and Prolog brute-forces every option in every placeholder.
Absolutely not. It traverses a graph proving things, and when it cannot prove something it backtracks and tries a different route, or otherwise fails. This is of course without getting into impure Prolog, or the extra-logical aspects it implements. It's a fundamentally different foundation of computation which is entirely geared towards formal reasoning.
> And that might have been nice compared to Pascal in 1975, it's not so different to modern garbage collected high level scripting languages.
It is extremely different, and the only reason you believe this is because you don't understand Prolog in the slightest, as indicated by the unsoundness of essentially everything you wrote. Prolog is as different from something like Javascript as a neural network with memory is.
The original suggestion was that LLMs should emit Prolog code to test their ideas. My reply was that there is nothing magic in Prolog which would help them over any other language, but there is something in other languages which would help them over Prolog - namely more training data. My example was to illustrate that, not to say Prolog literally is Python. Of course it's simplified to the point of being inaccurate, it's three lines, how could it not be.
> "A Prolog knowledgebase is not hardcoded."
No, it can be asserted and retracted, or consult a SQL database or something, but it's only going to search the knowledge the LLM told it to - in that sense there is no benefit to an LLM to emit Prolog over Python since it could emit the facts/rules/test cases/test conditions in any format it likes, it doesn't have any attraction to concise, clean, clear, expressive, output.
> "those "placeholders" are just normal variables"
Yes, just normal variables - and not something magical or special that Prolog has that other languages don't have.
> "Absolutely not. It traverses a graph proving things,"
Yes, though, it traverses the code tree by depth first walk. If the tree has no infinite left-recursion coded in it, that is a brute force walk. It proves things by ordinary programmatic tests that exist in other languages - value equality, structure equality, membership, expression evaluation, expression comparison, user code execution - not by intuition, logical leaps, analogy, flashes of insight. That is, not particularly more useful than other languages which an LLM could emit.
> "Your example is not even remotely close to how Prolog actually works"
> "There is no cognate to what you wrote anywhere in how Prolog works"
> "It is extremely different"
Well:
That's a loop over the people, filling in the variable X. Prolog is not looking at Ancestry.com to find who Timmy's parents are. It's not saying "ooh you have a SQLite database called family_tree I can look at". That it's doing it by a different computational foundation doesn't seem relevant when that's used to give it the same abilities.
My point is that Prolog is "just" a programming language, and not the magic that a lot of people feel like it is, and therefore is not going to add great new abilities to LLMs that haven't been discovered because of Prolog's obscurity. If adding code to an LLM would help, adding Python to it would help. If that's not true, that would be interesting - someone should make that case with details.
> "and the only reason you believe this is because you don't understand Prolog in the slightest"
This thread would be more interesting to everybody if you and hunterpayne would stop fantasizing about me, and instead explain why Prolog's fundamentally different foundation makes it a particularly good language for LLMs to emit to test their other output - given that they can emit virtually endless quantities of any language, custom writing any amount of task-specific code on the fly.
3 replies →
IIRC IBM’s Watson (the one that played Jeopardy) used primitive NLP (imagine!) to form a tree of factual relations and then passed this tree to construct Prolog queries that would produce an answer to a question. One could imagine that by swapping out the NLP part with an LLM, the model would have 1. a more thorough factual basis against which to write Prolog queries and 2. a better understanding of the queries it should write to get at answers (for instance, it may exploit more tenuous relations between facts than primitive NLP).
Not so "primitive" NLP. Watson started with what its team called a "shallow parse" of a sentence using a dependency grammar and then matched the parse to an ontology consisting of good, old fashioned frames [1]. That's not as "advanced" as an LLM but far more reliable.
I believe the ontology was indeed implemented in Prolog but I forget the architecture details.
______________
[1] https://en.wikipedia.org/wiki/Frame_(artificial_intelligence...
Please tell me that's approximately what Palantir Ontology is, because if it isn't, I've no idea what it could be.
https://www.palantir.com/docs/foundry/ontology/overview/
We've done this, and it works. Our setup is to have some agents that synthesize Prolog and other types of symbolic and/or probabilistic models. We then use these models to increase our confidence in LLM reasoning and iterate if there is some mismatch. Making synthesis work reliably on a massive set of queries is tricky, though.
Imagine a medical doctor or a lawyer. At the end of the day, their entire reasoning process can be abstracted into some probabilistic logic program which they synthesize on-the-fly using prior knowledge, access to their domain-specific literature, and observed case evidence.
There is a growing body of publications exploring various aspects of synthesis, e.g. references included in [1] are a good starting point.
[1] https://proceedings.neurips.cc/paper_files/paper/2024/file/8...
The next step is can in solve the Wicked Problems
https://en.wikipedia.org/wiki/Wicked_problem
I am once again shilling the idea that someone should find a way to glue Prolog and LLMs together for better reasoning agents.
There are definitely people researching ideas here. For my own part, I've been doing a lot of work with Jason[1], a very Prolog like logic language / agent environment with an eye towards how to integrate that with LLMs (and "other").
Nothing specific / exciting to share yet, but just thought I'd point out that there are people out there who see potential value in this sort of thing and are investigating it.
[1]: https://github.com/jason-lang/jason
Related: LLMs trained on "A is B" fail to learn "B is A"
https://arxiv.org/abs/2309.12288
You might find Eugene Asahara's detailed Prolog in the LLM Era series of about a dozen blog posts very useful - https://eugeneasahara.com/category/prolog-in-the-llm-era/
Prolog doesn't look like javascript or python so:
1. web devs are scared of it.
2. not enough training data?
I do remember having to wrestle to get prolog to do what I wanted but I haven't written any in ~10 years.
>>Prolog doesn't look like javascript or python so:
Think of this way. In Python and Javascript you write code, and to test if its correct you write unit test cases.
A prolog program is basically a bunch of test cases/unit test cases, you write it, and then tell the Prolog compiler, 'write code, that passes these test cases'.
That is, you are writing the program specification, or tests that if pass would represent solution to the problem. The job of the compiler to write the code that passes these test cases.
It's been a while since I have done web dev, but web devs back then were certainly not scared of any language. Web devs are like the ultimate polyglots. Or at least they were. I was regularly bouncing around between a half dozen languages when I was doing pro web dev. It was web devs who popularized numerous different languages to begin with simply because delivering apps through a browser allowed us a wide variety of options.
No web dev I have ever met could use Prolog well. I think your statement about web devs being polyglots is based upon the fact that web devs chase every industry fad. I think that has a lot to do with the nature and economics of web dev work (I'm not blaming the web devs for this). I mean the best way to succeed as a webdev is to write your own version of a framework that does the same thing as the last 10 frameworks but with better buzzword marketing.
Generally speaking, all the languages they know are pretty similar to each other. Bolting on lambdas isn't the same as doing pure FP. Also, anytime a problem comes up where you would actually need a weird language based upon different math, those problems will be assigned to some other kind of developer (probably one with a really strong CS background).
2 replies →
I have the complete opposite view of web developers. :)
1 reply →
Maybe they were, but these days everything must be in JS syntax. Even if it is longer than pure CSS, they want the CSS inside JS syntax. They are only ultimate polyglot as long as all the languages are actually JS.
(Of course this is an overgeneralization, since obviously, there are web developers, who do still remember how to do things in HTML, CSS and, of course JS.)
This is my own recent attempt at this:
https://news.ycombinator.com/item?id=45937480
The core idea of DeepClause is to use a custom Prolog-based DSL together with a metainterpreter implemented in Prolog that can keep track of execution state and implicitly manage conversational memory for an LLM. The DSL itself comes with special predicates that are interpreted by an LLM. "Vague" parts of the reasoning chain can thus be handed off to a (reasonably) advanced LLM.
Would love to collect some feedback and interesting ideas for possible applications.
As someone who did deep learning research 2017-2023, I agree. "Neurosymbolic AI" seems very obvious, but funding has just been getting tighter and more restrictive towards the direction of figuring out things that can be done with LLMs. It's like we collectively forgot that there's more than just txt2txt in the world.
YES! I've run a few experiments on classical logic problems and an LLM can spit out Prolog programs to solve the puzzel. Try it yourself, ask an LLM to write some prolog to solve some problem and then copy paste it to https://swish.swi-prolog.org/ and see if it runs.
Wouldn’t that be like a special case of neuro-symbolic programming?! There are plenty of research going on
I think prolog is the right format to codify expertise in Claude Skills. I just haven’t tested it yet.
> LLMs are bad at counting the number of r's in strawberry.
This is a tokenization issue, not an LLM issue.
Can't find the links right now, but there were some papers on llm generating prolog facts and queries to ground the reasoning part. Somebody else might have them around.
There's a lot of work in this area. See e.g., the LoRP paper by Di et al. There's also a decent amount of work on the other side too, i.e., using LLMs to convert Prolog reasoning chains back into natural language.
I think that's what these guys are doing
https://www.symbolica.ai/
If you are looking for AGI. And you understand what is going on inside of it - then it is obviously not AGI.
There are people working on integration deep learning with symbolic AI (but I don't know more)
@goblinqueen, you around?
@YeGoblynQueenne Dunno if it will ping the person
It doesn't, but I found the thread anyway :)
I've been thinking a lot about this, and I want to build the following experiment, in case anyone is interested:
The experiment is about putting an LLM to play plman[0] with and without prolog help.
plman is a pacman like game for learning prolog, it was written by profesor Francisco J. Gallego from Alicante University to teach logic subject in computer science.
Basically you write solution in prolog for a map, and plman executes it step by step so you can see visually the pacman (plman) moving around the maze eating and avoiding ghost and other traps.
There is an interesting dynamic about finding keys for doors and timing based traps.
There are different levels of complexity, and you can also write easily your maps, since they are just ascii characters in a text file.
I though this was the perfect project to visually explain my coworkers the limit of LLM "reasoning" and what is symbolic reasoning.
So far I hooked ChatGPT API to try to solve scenarios, and it fails even with substancial amount of retries. That's what I was expecting.
The next thing would be to write a mcp tool so that the LLM can navigate the problem by using the tool, but here is where I need guidance.
I'm not sure about the best dynamic to prove the usefulness of prolog in a way that goes beyond what context retrieval or db query could do.
I'm not sure if the LLM should write the prolog solution. I want to avoid to build something trivial like the LLM asking for the steps, already solved, so my intuition is telling me that I need some sort of virtual joystick mcp to hide prolog from the LLM, so the LLM could have access to the current state of the screen, and questions like what would be my position if I move up ? What's the position of the ghost in next move ? where is the door relative to my current position ?
I don't have academic background to design this experiment properly. Would be great if anyone is interested to work together on this, or give me some advice.
Prior work pending on my reading list:
- LoRP: LLM-based Logical Reasoning via Prolog [1]
- A Pipeline of Neural-Symbolic Integration to Enhance Spatial Reasoning in Large Language Models [2]
- [0] https://github.com/Matematicas1UA/plman/blob/master/README.m...
- [1] https://www.sciencedirect.com/science/article/abs/pii/S09507...
- [2] https://arxiv.org/html/2411.18564v1
yes