Comment by bytebach

1 year ago

I recently has a consulting gig (medical informatics) that required English Declarative -> Imperative code. Direct code generation by the LLM turned out to be buggy so I added an intermediate DSL implemented in Prolog! The prompt described the Prolog predicates it had to work with and their semantics and the declarative goal. The resulting (highly accurate and bug free) Prolog code was then executed to generate the conventional (Groovy) imperative code that was then executed dynamically. In some hand-wavy way the logical constraints of using Prolog as an intermediate DSL seemed to keep the LLM on the straight and narrow.

4 comments

bytebach

rramadass 1 year ago

This is great! I had been thinking of how to constrain and make the prompts precise so "modes of interpretation" of my prompt by the LLM could be limited. Something like how one would simplify one's language with a child to get it to comprehend and act within bounds.

Would you mind sharing more details of your approach and setup?

bytebach 1 year ago

Sure, the app selects matching patients based on demographics, disease, prior treatments and biomarkers. It also has to be able to express numeric constraints as well ('no more than two surgeries ..', 'at least one of the following biomarkers'). The following prompt sets up the Prolog toolkit the LLM is allowed to use. The generated Prolog conjunction is then run through a Prolog meta-interpreter that generates the matching code. Though it sounds long-winded it is more than fast enough to generate responses to user queries in an acceptable time:

-----------------

  Now consider the following Prolog predicates:
  
    biomarker(Name, Status) where Status will be one of the following integers -
   
     Wildtype = 0
     Mutated = 1
     Methylated = 2
     Unmethylated = 3
     Amplified = 4
     Deleted = 5
     Positive = 6
     Negative = 7
   
    tumor(Name, Status) where Status will be one of the following integers if know else left unbound -
    
     Newly diagnosed = 1
     Recurrence = 2
     Metastasized = 3
     Progression = 4
    
    chemo(Name)
   
    surgery(Name)  Where Name may be an unbound variable

    other_treatment(Name)

    radiation(Name) Where Name may be an unbound variable
     
    Assume you are given predicate  atMost(T, N) where T is a compound term and N is an integer. 
    It will return true if the number of 'occurences' of T is less than or equal N else it will fail. 
   
    Assume you are given a predicate atLeastOneOf(L) where L is a list of compound terms. 
    It will succeed if at least one of the compound terms, when executed as a predicate returns true.
   
    Assume you are given a predicate age(Min, Max) which will return true if the patient's age is in between Min and Max.
     
    Assume you have a predicate not(T) which returns true if predicate T evaluates false and vice versa. 
    i.e. rather than '\\+ A' use not(A).
   
   Do not implement the above helper functions.
  
   VERY IMPORTANT: Use 'atLeastOneOf()' whenever you would otherwise use ';' to represent 'OR'.  
   i.e. rather than 'A ; B' use atLeastOneOf([A, B]).
    
  EXAMPLE INPUT: 
   Patient must have recurrent GBM, methylated MGMT and wildtype EGFR. Patient must not have mutated KRAS.  
  
  EXAMPLE OUTPUT:
    tumor('gbm', 2),
    biomarker('MGMT', 2),
    biomarker('EGFR', 0),
    not(biomarker('KRAS', 1))
  
  Express the following constraints as a Prolog conjunction. 
        Do not enclose the code in a code block. Return only the Prolog code - no commentary.
        Be careful to use only the supplied constraints, do not add any: 
        
        $constraint

rramadass 1 year ago

Very Neat! Appreciate your sharing this.
Just to be clear, your "EXAMPLE OUTPUT" is what is then fed to your prolog meta-interpreter to generate executable code in some other language (you mentioned Groovy) which is actually run i.e. answers the user query. Essentially then, a context is bounded by "pidgin Prolog" (i.e. Prolog+Natural Language) for the LLM and then user queries in Natural Language are submitted against it to generate valid Prolog code. This can be thought of as the logic/constraints of Prolog inference engine in the input modulating the interpretation/inference of the accompanying natural language by the LLM to keep it "on the straight and narrow" towards an accurate output.
I was actually thinking of using "Structured English" (https://en.wikipedia.org/wiki/Structured_English) for this and maybe build a CASE Tool using LLMs for round-trip software engineering.
YeGoblynQueenne 1 year ago
tumor('gbm', 2), biomarker('MGMT', 2), biomarker('EGFR', 0), not(biomarker('KRAS', 1))
Note well: this is not valid Prolog code. If you put it in a file and consult it with a Prolog interpreter you'll get multiple errors.
You could call it Prolog pseudocode but in that case you don't need Prolog-like notation. You can just state your "constraints" in natural language. Have you tried this?