Comment by scubbo

2 months ago

I'm surprised to see people getting value from "web sandbox"-type setups, where you don't actually have access to the source code. Are folks really _that_ confident in LLMs as to entirely give up the ability to inspect the source code, or to interact with a running local instance of the service? Certainly that would be the ideal, but I'm surprised that confidence is currently running that high.

26 comments

scubbo

simonw 2 months ago

I still get the full source code back at the end, I tell it to include code it wrote in the PR.

I also wrote my own tool to extract and format the complete transcript, it gives me back things like this where I can see everything it did including files and scripts it didn't commit. Here's an example: https://gistpreview.github.io/?3a76a868095c989d159c226b7622b...

scubbo 2 months ago
Oh fascinating - so you're reviewing "your own" code in-PR, rather than reviewing it before PR submission? I can see that working! Feels weird, but I can see it being a reasonable adaptation to these tools - thanks!
What about running services locally for manual testing/poking? Do you open ports on the Anthropic VM to serve the endpoints, or is manual testing not part of your workflow?
- simonw 2 months ago
  
  Yeah, I generally use PRs for anything a coding agent writes for me.
  If something is too fiddly to test within the boundaries of a cloud coding agent I switch to my laptop. Claude Code for web has a "claude --teleport" command for this, or I'll sometimes just do a "gh pr checkout X" to get the branch locally.
  
  1 reply →
- bakies 2 months ago
  
  Yeah the commits that claude code generate are co-authored by claude@anthropic.com so i just open a PR to see the code. I have automatic per-PR dev environments for manual testing.
  
  10 replies →

nl 2 months ago

Claude Code on the web, ChatGPT Codex and Google Jules are not the same as Claude, ChatGPT and Gemini. They are entire apps where you authorize Github access and they work via PRs.

They'll include screenshots on your PRs etc.

I like using them a lot when I can.

scubbo 2 months ago
Right, yes, that was precisely my point - it was weird to me that people were comfortable operating on a codebase that they don't have locally, that they can't directly interact with.
- nl 2 months ago
  
  > it was weird to me that people were comfortable operating on a codebase that they don't have locally, that they can't directly interact with.
  I have a project where I've made a rule that no code is written by humans. It's been fun! It's a good experience to learn how far even pre-Opus 4.5 agents can be pushed.
  It's pretty clear to me that in 12 months time looking at the code will be the exception, not the rule.
  
  2 replies →
- memoriuaysj 2 months ago
  
  when the agent pushes the PR, in a branch, you can switch to that branch locally on your machine and do whatever, review it, change it, and ask for extra modifications on top, squash it, rebase it
  
  1 reply →

theptip 2 months ago

Right - I’m missing how you get the source code in the OP. It says you tmux in with ssh agent forwarding for GH. But you can’t do that on your iOS device? So you have to set up all your repos in the morning before leaving the house, then collect and push all your branches when you return home?

I could imagine this working for a small number of branches/changes.

smarx007 2 months ago

The output from Jules is a PR. And then it's a toss-up between "spot on, let's merge" and "nah, needs more work, I will check out the branch and fix it properly when I am the keyboard". And you see the current diff on the webpage while the agent is working.

theshrike79 2 months ago

Imagine you're a billionaire with infinite resources. You'd have gofers for everything. Get a weird idea while golfing? Shoot a text or call someone and it'll get done (or just tell the assistant that's always following you)

These web agents are similar. You pull out your phone while queueing in the shop, change to your "research" repo and tell it "investigate a way to create a privacy-preserving mobile application to store barcode-based loyalty cards", hit execute and put your phone away.

When you get to it, you can check out what it did for you.

Or if your system is set up properly, you can ask the same thing to make changes to any project, like adding a new feature you just thought up. Get back, review PR, maybe merge it.

suninsight 2 months ago

[dead]