← Back to context

Comment by ArkhamMirror

2 days ago

It would be enough to drive most local LLMs crazy if it tried to generate it all at once or if it was all part of one long session, but it's set up so the LLM doesn't have to produce much at a time. I only batch in small groups (like it will generate only 3 suggestions per request) and the session is refreshed between calls, and the output is generally force structured to fit correctly into the expected format. You can, however, ask for new batches of suggestions or conflicts or evidence more than once. Hallucinations can happen for any LLM use of course, but if they break the expected structure the output is generally thrown out. Even the matrix scoring suggestion - it works on the whole row, but behind the scenes the LLM is asked to return one response in one "chat" session per column, and then they are all entered at the same time once all of them have been individually returned. That way, if the LLM does hallucinate for the score, it outputs a neutral response for that cell and doesn't corrupt any of the neighboring cells.

If you use a smaller model with smaller context, it might be more prone to hallucinations and provide less nuanced suggestions, but the default model seems to be able to handle the jobs pretty well without having to regenerate output very often (it does happen sometimes, but it just means you have to run it again.) Also, depending on the model, you might get less variety or creativity in suggestions. It's definitely not perfect, and it definitely shouldn't be trusted to replace human judgement.