Comment by fc417fc802
15 hours ago
> intelligence for those specific games is baked into the harness
This is your claim but the other commenter claims the harness consists only of generic tools. What's the reality?
I also encountered confusion about this exact issue in another subthread. I had thought that generic tooling was allowed but others believed the benchmark to be limited to ingesting the raw text directly from the API without access to any agent environment however generic it might be.
1) Pointing out what tools to use is part of the intelligence that LLMs aren't great at.
2) one of the tools is a path finding algorithm. A big improvement/crutch over a regular LLM that has no such capability.
You'd think if LLMs are intelligent they'd be able to determine that a path finding algorithm is necessary and have a sub agent code it up real quick. But apparently they just can't do that without humans stepping in to make it a standard tool for them.
Here's the paper on what they did for the Duke Harness:
https://blog.alexisfox.dev/arcagi3