← Back to context

Comment by Jaysobel

5 hours ago

Author here - some bonus links!

Session transcript using Simon Willison's claude-code-transcripts

https://htmlpreview.github.io/?https://gist.githubuserconten...

Reddit post

https://www.reddit.com/r/ClaudeAI/comments/1q9fen5/claude_co...

OpenRCT2!!

https://github.com/jaysobel/OpenRCT2

Project repo

https://github.com/jaysobel/OpenRCT2

Did you eval using screenshots or some sort of rendered visualization instead of the CLI? I wonder if Claude has better visual intelligence when viewing images (lots of these in its training set) rather than ascii schematics (probably very few of these in the corpus).

  • Computer use and screenshots are context intensive. Text is not. The more context you give to an LLM, the dumber it gets. Some people think at 40% context utilization, the LLM starts to get into the dumb zone. That is where the limitations are as of today. This is why CLI based tools like Claude Code are so good. And any attempt at computer use has fallen by the wayside.

    There are some potential solutions to this problem that come to mind. Use subagents to isolate the interesting bits about a screenshot and only feed that to the main agent with a summary. This will all still have a significantly higher token usage compared to a text based interface, but something like this could potentially keep the LLM out of the dumb zone a little longer.

    • > And any attempt at computer use has fallen by the wayside.

      You're totally right! I mean, aside from Anthropic launching "Cowork: Claude Code for the rest of your work" 5 days ago. :)

      https://news.ycombinator.com/item?id=46593022

      More to the point though, you should be using Agents in Claude Code to limit context pollution. Agents run with their own context, and then only return salient details. Eg, I have an Agent to run "make" and return the return status and just the first error message if there is one. This means the hundreds/thousands of lines of compilation don't pollute the main Claude Code context, letting me get more builds in before I run out of context there.

  • Claude helped me immensely getting an image converter to work. Giving it screenshots of wrong output (lots of layers had an unpredictable offsets that was not supposed to be there) and output as I expected it helped Claude understand the problems and it fixed the bugs immediately.

  • I had tried the browser screenshotting feature for agents in Cursor and found it wasn't very reliable - screenshots eat a lot of context, and the agent didn't have a good sense for when to use them. I didn't try it in this project. I bet it would work in some specific cases.

> Claude is at a pretty steep visuo-spatial disadvantage,

How hard would it be to use with OpenAI's offerings instead? Particularly, imo, OpenAI's better at "looking" at pictures than Claude.