Comment by scubbo
3 days ago
I'm surprised to see people getting value from "web sandbox"-type setups, where you don't actually have access to the source code. Are folks really _that_ confident in LLMs as to entirely give up the ability to inspect the source code, or to interact with a running local instance of the service? Certainly that would be the ideal, but I'm surprised that confidence is currently running that high.
I still get the full source code back at the end, I tell it to include code it wrote in the PR.
I also wrote my own tool to extract and format the complete transcript, it gives me back things like this where I can see everything it did including files and scripts it didn't commit. Here's an example: https://gistpreview.github.io/?3a76a868095c989d159c226b7622b...
Oh fascinating - so you're reviewing "your own" code in-PR, rather than reviewing it before PR submission? I can see that working! Feels weird, but I can see it being a reasonable adaptation to these tools - thanks!
What about running services locally for manual testing/poking? Do you open ports on the Anthropic VM to serve the endpoints, or is manual testing not part of your workflow?
Yeah, I generally use PRs for anything a coding agent writes for me.
If something is too fiddly to test within the boundaries of a cloud coding agent I switch to my laptop. Claude Code for web has a "claude --teleport" command for this, or I'll sometimes just do a "gh pr checkout X" to get the branch locally.
1 reply →
Yeah the commits that claude code generate are co-authored by claude@anthropic.com so i just open a PR to see the code. I have automatic per-PR dev environments for manual testing.
10 replies →
Right - I’m missing how you get the source code in the OP. It says you tmux in with ssh agent forwarding for GH. But you can’t do that on your iOS device? So you have to set up all your repos in the morning before leaving the house, then collect and push all your branches when you return home?
I could imagine this working for a small number of branches/changes.
The output from Jules is a PR. And then it's a toss-up between "spot on, let's merge" and "nah, needs more work, I will check out the branch and fix it properly when I am the keyboard". And you see the current diff on the webpage while the agent is working.
Claude Code on the web, ChatGPT Codex and Google Jules are not the same as Claude, ChatGPT and Gemini. They are entire apps where you authorize Github access and they work via PRs.
They'll include screenshots on your PRs etc.
I like using them a lot when I can.
Right, yes, that was precisely my point - it was weird to me that people were comfortable operating on a codebase that they don't have locally, that they can't directly interact with.
> it was weird to me that people were comfortable operating on a codebase that they don't have locally, that they can't directly interact with.
I have a project where I've made a rule that no code is written by humans. It's been fun! It's a good experience to learn how far even pre-Opus 4.5 agents can be pushed.
It's pretty clear to me that in 12 months time looking at the code will be the exception, not the rule.
2 replies →
when the agent pushes the PR, in a branch, you can switch to that branch locally on your machine and do whatever, review it, change it, and ask for extra modifications on top, squash it, rebase it
1 reply →
[dead]