← Back to context

Comment by onyx228

9 days ago

The thing I would appreciate much more than performance in "embarrassing LLM questions" is a method of finding these, and figuring out by some form of statistical sampling, what the cardinality is of those for each LLM.

It's difficult to do because LLMs immediately consume all available corpus, so there is no telling if the algorithm improved, or if it just wrote one more post-it note and stuck it on its monitor. This is an agency vs replay problem.

Preventing replay attacks in data processing is simple: encrypt, use a one time pad, similarly to TLS. How can one make problems which are at the same time natural-language, but where at the same time the contents, still explained in plain English, are "encrypted" such that every time an LLM reads them, they are novel to the LLM?

Perhaps a generative language model could help. Not a large language model, but something that understands grammar enough to create problems that LLMs will be able to solve - and where the actual encoding of the puzzle is generative, kind of like a random string of balanced left and right parentheses can be used to encode a computer program.

Maybe it would make sense to use a program generator that generates a random program in a simple, sandboxed language - say, I don't know, LUA - and then translates that to plain English for the LLM, and asks it what the outcome should be, and then compares it with the LUA program, which can be quickly executed for comparison.

Either way we are dealing with an "information war" scenario, which reminds me of the relevant passages in Neal Stephenson's The Diamond Age about faking statistical distributions by moving units to weird locations in Africa. Maybe there's something there.

I'm sure I'm missing something here, so please let me know if so.

I like your idea of finding the pattern of those "embarrassing LLM questions". However, I do not understand your example. What is a random program? Is it a program that compiles/executes without error but can literally do anything? Also, how do you translate a program to plain English?

  • A randomly generated program from a space of programs defined by a set of generating actions.

    A simple example is a programming language that can only operate on integers, do addition, subtraction, multiplication, and can check for equality. You can create an infinite amount of programs of this sort. Once generated, these programs are quickly evaluated within a split second. You can translate them all to English programmatically, ensuring grammatical and semantical correctness, by use of a generating rule set that translates the program to English. The LLM can provide its own evaluation of the output.

    For example:

    program:

    1 + 2 * 3 == 7

    evaluates to true in its machine-readable, non-LLM form.

    LLM-readable english form:

    Is one plus two times three equal to seven?

    The LLM will evaluate this to either true or false. You compare with what classical execution provided.

    Now take this principle, and create a much more complex system which can create more advanced interactions. You could talk about geometry, colors, logical sequences in stories, etc.