← Back to context

Comment by Galanwe

7 hours ago

> You would need to test 100 or more of such puzzles, widely spread across the puzzle spectrum

Would you? I am not very knowledgable on LLMs, but my understanding was that each query was essentially a stateless inference with previous input/output as context. In such a case, a single puzzle, yielding hundreds of queries, is essentially hundreds of paths dependent but individual tests?