ChatGPT Containers can now run bash, pip/npm install packages and download files

13 days ago (simonwillison.net)

As a person who's worked in support roles in tech companies and has a working familiarity with Python but is not a software developer or engineer at all, it's been fascinating to watch the changes.

In the last couple weeks, both Gemini and Claude have asked me, "Can I use the computer?" to answer some particular question. In both cases, my question to each was, "What computer? Mine, or do you have your own?" Here I had thought they were computers, in the vague Star Trek sense. I'm just using the free version in the browser, so I would have been surprised if it had been able to use my computer.

They had their own, and I could watch them script something up in Python to run the calculations I was looking for. It made me wonder who it was at Google/Anthropic who first figured out that the way to get LLMs to stop wetting their metaphorical pants when asked to do calculations was to give them a computer to use.

It did make me scratch my head when I was trying to prompt Nano Banana to generate something and it was like Gemini started talking about the image generator in the third person: "The AI is getting stuck on the earlier instruction, even though we've now abandoned that approach." Felt a little "turtles all the way down" with that one!

  • You’re seeing perspectives of the distributed system from inside the system.

    I’m building multi server multi agent products and they do apparently perceive (anthropomorphizing I know) their connected servers as other people.

    • Looking at the world, it really makes me wonder if "human" is what we want to model these machines on. It's not obvious to me what else we should choose, but working together peaceably and effectively doesn't seem to be our strongest attribute when writ large.

    • They are just predicting the next token. In human text it's more common to talk to other people than a computer, so they end up talking to the computers like they were people.

Giving agents linux has compounding benefits in our experience. They're able to sort through weirdness that normal tooling wouldn't allow. Like they can read and image, get an error back from the API and see it wasn't the expected format. They read the magic bytes to see it was a jpeg despite being named .png, and read it correctly.

  • Matches my experience with print-on-demand workflows. I tried using vision models to validate things like ICC profiles and total ink density, but they usually just hallucinate that the file is compliant. I ended up giving the agent access to ImageMagick to run analysis directly. It’s the only reliable way to catch issues before sending files to fulfillment, otherwise you end up eating the cost of failed prints.

    • I don’t understand why you’d try to use an LLM for that step if there is already a tool that you can call to check it. Help me out.

  • > They read the magic bytes to see it was a jpeg despite being named .png, and read it correctly.

    Maybe I'm missing something, but it seems trivial to implement reading the magic bytes. I haven't tested it, but I'd expect most linux image displayers/editors to automatically work with misnamed files as that is almost entirely the purpose of magic bytes.

    Personally, I think Microsoft is to blame for everyone relying on file extensions too much as it was a bad idea which led to a lot of security issues.

  • I don't understand why this is something special that somebody would need some LLM slop generation for? Any human can also do this in a few seconds using normal unix tooling.

    • That's like saying 'why give people calculators, when you can pull out a slide rule'

      The whole point is that you are enabling the LLM through tool use. The prompt might be "Download all the images on the wikipedia article for 'Ascetic', and print them on my dot matrix printer (the driver of which only accepts BMPs, so convert as needed)"

      Your solution using file / curl is just one part of the potential higher level problem statement. Yes, someone could write those lines easily. And they could write the wrapper around them with only a little more difficulty. And they could add the 404 logic detection with a bit more...

      Are you arguing LLMs should only be used on 'hard' problems, and 'easy' problems (such as downloading with curl) should be done by humans? Or are you arguing LLMs should not be used for anything?

      Because I think most people would suggest humans tackle the 'hard' problems, and let the tools (LLMs) tackle the 'easy' ones.

      1 reply →

    • I think you'd find that it's far from "any human" who can do this without looking anything up. I have 15y of dev exp and couldn't do this from memory on the cli. Maybe in c, but less helpful to getting stuff done!

      9 replies →

    • Well LLMs do make normal Linux tooling more accessible. I needed a video reformatted to a new aspect ratio and codec and Claude produced a rather complex set of arguments for ffmpeg that I hadn’t been able to figure out on my own.

    • I think this is missing the point, These are tools that enable the LLM to do things that humans can do easily.

      It stops an LLM from being blocked by the inability to do this thing. Removing this barrier might enable the LLM to complete a task that would be considerable work for a human.

      For instance, identifying which files are PNG files containing pictures of birds, regardless of filename, presence or absence of suffix. An image handling LLM can identify if an image is of a bird much more easily than it could determine that an arbitrary file is a png. They can probably still do it, wasting a lot of tokens along the way, but using a few commands to determine which files to even bother looking at as images means the LLM can do what it is good at.

Regular default ChatGPT can also now run code in Node.js, Ruby, Perl, PHP, Go, Java, Swift, Kotlin, C and C++.

I'm not sure when these new features landed because they're not listed anywhere in the official ChatGPT release notes, but I checked it with a free account and it's available there as well.

[flagged]

  • Being Russian and hearing about the horrors of war since my childhood, I always wondered how fascism, Nazis and WWII managed to become reality in 20th century.

    Then, I witnessed the answers unfolding before my eyes in real time - torrential TV and Web propaganda, warmongering, nationalism and worse of all - total acceptance of the unacceptable in a critically large portion of the country's population. Among the grandchildren of those who fought against the same things at the price of tens of millions of lives. Immediately after the Crimean takeover it was clear to me that there will be war. Many denied this, mocking and calling me a tinfoil hat.

    Well, I also always used to wonder who are those morons who allowed the things go south in Terminator, 1984, Matrix, Cat's Cradle and other well-known dystopias, what kind of people they were and what did they think?

    It doesn't really matter that these concerns are on the opposite sides of the imaginary axis.

    What really matters is this universal drive for digging their own and the next guy's graves in too many people, always finding excuse in saying "if not us, then someone else will do it". And: "The times are different now". And: "So you're comparing AI and fascism?".

    • Yeah, sure, we get Terminators, but look at how I can ask this agent what next week's schedule is if I just connect it to everything!

    • Was there a lot of warmongering in russia in preparation for starting the war in 2022? Because from what I saw wars tend to pop up all of the sudden, regardless of political climate of the country.

      5 replies →

  • Can't wait for the wget somelink/install.sh | bash install instructions to be replaced with wget somelink/install.md | claude .

  • if its in a secured and completely isolated sandbox that gets destroyed at the end of the request, then how could it he “insecure”

    • That “completely isolated” sandbox is connected to the internet on one end, and to an insecure human on the other.

Seems like everyone is trying to get ahead of tool calling moving people "off platform" and creating differentiators around what tools are available "locally" to the models etc. This also takes the wind out of the sandboxing folks, as it probably won't be long before the "local" tool calling can effectively do anything you'd need to do on your local machine.

I wonder when they'll start offering virtual, persistent dev environments...

  • Claude Code for the web is kind of a persistent virtual dev environment already.

    You can start a session there and chat with it to get a bunch of work done, then come back to that session a day later and the virtual filesystem is in the same state as when you left it.

    I haven't figured out if this has a time limit on it - it's possible they're doing something clever with object storage such that the cost of persisting those environments is really low, see also Fly's Sprites.dev: https://fly.io/blog/design-and-implementation/

    • It's so incredibly buggy though. I end up with hung sessions "starting claude code" every second or third time. After a few times of losing work I'm done with it. I'll check back in a few months and see if it's in better shape.

      1 reply →

  • > I wonder when they'll start offering virtual, persistent dev environments...

    A lot of companies have been wanting to move in this direction. Instead of maintaining a fleet of machines, you just get a bunch of thin clients and pay Microsoft of whoever to host the actual workloads. They already do this 'kiosk' style stuff for a lot of front-line staff.

    Honestly, not having my own local hardware for development sounds like a living hell, but seems like the way we are going.

    • We are gonna have YOLO agents who will deploy directly to website (technically exe.dev already does that for me when I ask it to generate golang projects lol)

      Honestly I felt like it really bores me or (overwhelms?) me because now I feel like okay now I will do this, then that and then that & drastically expand the scope of the project but that comes with its own fatigue and the limits of free tokens or context with exe.dev so I end up publishing it on git provider, git ingest it paste it in web browser gemini ask it for updates (it has 1 million context) and then paste it with Opencode with an openrouter devstral key.

      I used this workflow to drastically improve the UI of a project but like I would consider that aside from some tinkering, I felt like the "fun" of a project definitely got reduced.

      It was always fun for me to use LLM's as I was in loop (Didn't use agents, copy paste workflow from web) but now agents kind of replicated that too & have gotten (I must admit) pretty good at it.

      I don't know man, any thoughts on how to make such things fun again? When LLM's first came or even before using agents like this with just creating single scripts, It was fun to use them but creating whole projects with huge scope feels very fun sucking imo.

      6 replies →

    • Coding agents are a particularly good fit for disposable development environments because of the risk of them messing things up. If the entire environment is ephemeral the worst that can happen (aside from private source code leaks to a malicious third party) is the environment gets trashed and you have to start over in a new one.

      1 reply →

  • That's what GitHub Codespaces is, and it runs Copilot too (it's just a hosted VSCode Web instance specific to your git repo)

    Google has Cloud Shell, and Google's AI Studio (https://aistudio.google.com/) gives you a web-based dev environment with Gemini integration

  • I started building something for the dioxus team to have access to mac/linux persistent and ephemeral dev envs with vnc and beefy cpu/mem.

    Nobody offered multiplatform and we really needed it!

    https://skyvm.dev

I wonder if the era of dynamic programming languages is over. Python/JS/Ruby/etc. were good tradeoffs when developer time mattered. But now that most code is written by LLMs, it's as "hard" for the LLM to write Python as it is to write Rust/Go (assuming enough training data on the language ofc; LLMs still can't write Gleam/Janet/CommonLisp/etc.).

Esp. with Go's quick compile time, I can see myself using it more and more even in my one-off scripts that would have used Python/Bash otherwise. Plus, I get a binary that I can port to other systems w/o problem.

Compiled is back?

  • > But now that most code is written by LLMs

    Am I in the Truman show? I don’t think AI has generated even 1% of the code that I run in prod, nor does anyone I respect. Heavily inspired by AI examples, heavily assisted by AI during research sure. Who are these devs that are seeing such great success vibecoding? Vibecoding in prod seems irresponsible at best

    • It's all over the place depending on the person or domain. If you are building a brand new frontend, you can generate quite a lot. If you are working on an existing backend where reliability and quality are critical, it's easier to just do yourself. Maybe having LLMs writing the unit tests on the code you've already verified working.

    • > Who are these devs that are seeing such great success vibecoding? Vibecoding in prod seems irresponsible at best

      AI written code != vibecoding. I think anyone who believes they are the same is truly in trouble of being left behind as AI assisted development continues to take hold. There's plenty of space between "Claude build me Facebook" and "I write all my code by hand"

    • I was talking to a product manager a couple weeks ago about this. His response: most managers have been vibecoding for long time. They've just been using engineers instead of LLMs.

      2 replies →

    • FAANG here (service oriented arch, distributed systems) and id say probably 20+ percent of code written on my team is by an LLM. it's great for frontends, works well with test generation, or following an existing paradigm.

      I think a lot of people wrote it off initially as it was low quality. But gemini 3 pro or sonnet 4.5 saves me a ton of time at work these days.

      Perfect? Absolutely not. Good enough for tons of run of the mill boilerplate tasks? Without question.

      15 replies →

    • For the last 2 or 3 months we made a commitment as a team to go all in on claude code, and have been sharing prompts, skills, etc, and documented all of our projects and at this point, claude is writing a _large_ percentage of our code. Probably upwards of 70 or 80%. It's also been updating our jira tickets and github PRs, which is probably even more useful than writing the code.

      Our test coverage has improved dramatically, our documentation has gotten better, our pace of development has gone up. There is also a _big_ difference between the quality of the end product between junior and senior devs on the team.

      Junior devs tend to be just like "look at this ticket and write the code."

      Senior devs are more like: Okay, can you read the ticket, try to explain to to me in your own words, let's refine the description, can you propose a solution -- ugh that's awful, what if we did this instead.

      You would think you would not save a lot of time that way, but even spending an _hour_ trying to direct claude to write the code correctly is less than the 5-6 hours it would take to write it yourself for most issues, with more tests and better documentation when you are finished.

      When you first start using claude code, it feels like you are spending more time to get worse work out of it, but once you sort of build up the documentation/skills/tools it needs to be successful, it starts to pay dividends. Last week, I didn't open an IDE _once_ and I committed several thousands lines of code across 2 or 3 different internal projects. A lot of that was a major refactor (smaller files, smaller function sizes, making things more DRY) that I had been putting off for months.

      Claude itself made a huge list of suggestions, which I knocked back to about 8 or 10, it opened a tracking issue in jira with small, tractable subtasks, then started knocking out one at a time, each of them being a fairly reviewable PR, with lots of test coverage (the tests had been built out over the previous several months of coding with cursor and claude that sort of mandated them to stop them from breaking functionality), etc.

      I had a coworker and chatgpt estimate how long the issue would take if they had to do it without AI. The coworker looked at the code base and said "two weeks". Both claude and chat GPT estimate somewhere in the 6-8 weeks range (which I thought was a wild over estimate, even without AI). Claude code knocked the whole thing out in 8 hours.

    • If you work on highly repetitive areas like web programming, I can clearly see why they're using LLMs. If you're in a more niche area, then it gets harder to use LLM all the time.

    • There is a nice medium between full-on vibe coding and doing it yourself by hand. Coding agents can be very effective on established codebases, and nobody is forcing you to push without reviewing.

  • > But now that most code is written by LLMs, it's as "hard" for the LLM to write Python as it is to write Rust/Go

    The LLM still benefits from the abstraction provided by Python (fewer tokens and less cognitive load). I could see a pipeline working where one model writes in Python or so, then another model is tasked to compile it into a more performant language

    • It's very good (in our experience, YMMV of course) when/llm write prototype with python and then port automatically 1-1 to Rust for perf. We write prototypes in JS and Python and then it gets auto ported to Rust and we have been doing this for about 1 year for all our projects where it makes sense; in the past months it has been incredibly good with claude code; it is absolutely automatic; we run it in a loop until all (many handwritten in the original language) tests succeed.

      5 replies →

  • 100% of my LLM projects are written in Rust - and I have never personally written a single line of Rust. Compilation alone eliminates a number of 'category errors' with software - syntax, variable declaration, types, etc. It's why I've used Go for the majority of projects I've started the past ten years. But with Rust there is a second layer of guarantees that come from its design, around things like concurrency, nil pointers, data races, memory safety, and more.

    The fewer category errors a language or framework introduces, the more successful LLMs will be at interacting with it. Developers enjoy freedom and many ways to solve problems, but LLMs thrive in the presence of constraints. Frontiers here will be extensions of Rust or C-compatible languages that solve whole categories of issue through tedious language features, and especially build/deploy software that yields verifiable output and eliminates choice from the LLMs.

  • > But now that most code is written by LLMs

    Got anything to back up this wild statement?

  • > But now that most code is written by LLMs

    Is this true? It seems to be a massive assumption.

    • By lines of code, almost by an order of magnitude.

      Some of the code is janky garbage, but that’s what most code it. There’s no use pearl clutching.

      Human engineering time is better spent at figuring out which problems to solve than typing code token by token.

      Identifying what to work on, and why, is a great research skill to have and I’m glad we are getting to realistic technology to make that a baseline skill.

      24 replies →

  • I have certainly become Go-curious thanks to coding agents - I have a medium sized side-project in progress using Go at the moment and it's been surprisingly smooth sailing considering I hardly know the language.

    The Go standard library is a particularly good fit for building network services and web proxies, which fits this project perfectly.

    • It's funny seeing you say that, because I've had an entire arc of despising the design of, and peremptorily refusing to use, Go, to really enjoying it, thanks to AI coding agents being able to take care of the boilerplate for me.

      It turns out that verbosity isn't really a problem when LLMs are the one writing the code based on more high level markdown specs (describing logic, architecture, algorithms, concurrency, etc), and Go's extreme simplicity, small range of language constructs, and explicitness (especially in error handling and control flow) make it much easier to quickly and accurately review agent code.

      It also means that Go's incredible (IMO) runtime, toolchain, and standard library are no longer marred by the boilerplate either, and I can begin to really appreciate their brilliance. It has me really reconsidering a lot of what I believed about language design.

      6 replies →

    • 100% check out Golang even more! I have been writing Golang AI coding projects for a really long time because I really loved writing different languages and Golang was one in which I settled on.

      Golang's libraries are phenomenal & the idea of porting over to multiple servers is pretty easy, its really portable.

      I actually find Golang good for CLI projects, Web projects and just about everything.

      Usually the only time I still use python uvx or vibe code using that is probably when I am either manipulating images or pdf's or building a really minimalist tkinkter UI in python/uv

      Although I tried to convert the python to golang code which ended up using fyne for gui projects and surprisingly was super robust but I might still use python in some niche use cases.

      Check out my other comment in here for finding a vibe coded project written in a single prompt when gemini 3 pro was launched in the web (I hope its not promotion because its open source/0 telemetry because I didn't ask for any of it to be added haha!)

      Golang is love. Golang is life.

    • > considering I hardly know the language.

      Same boat! In fact I used to (still do) dislike Go's syntax and error handling (the same 4 lines repeated every time you call a function), but given that LLMs can write the code and do the cross-model review for me, I literally don't even see the Go source code, which is nice because I'd hate it if I did (my dislike of Go's syntax + all the AI slop in the code would drive me nuts).

      But at the end of the day, Go has good scaffolding, the best tooling (maybe on par with Rust's, definitely better than Python even with uv), and tons of training data for LLMs. It's also a rather simple language, unlike Swift (which I wish was simpler because it's a really nice language otherwise).

  • > But now that most code is written by LLMs

    I'm sure it will eventually be true, but this seems very unlikely right now. I wish it were true, because we're in a time where generic software developers are still paid well, so doing nothing all day, with this salary, would be very welcome!

  • Has anyone tried creating a language that would be good for LLMs? I feel like what would be good for LLMs might not be the same thing that is good for humans (but I have no evidence or data to support this, just a hunch).

    • The problem with this is the reason LLMs are so good at writing Python/Java/JavaScript is that they've been trained on a metric ton of code in those languages, have seen the good the bad and the ugly and been tuned to the good. A new language would be training from scratch and if we're introducing new paradigms that are 'good for LLMs but bad for humans' means humans will struggle to write good code in it, making the training process harder. Even worse, say you get a year and 500 features into that repo and the LLM starts going rogue - who's gonna debug that?

      2 replies →

    • >Has anyone tried creating a language that would be good for LLMs?

      I’ve thought about this and arrived at a rough sketch.

      The first principle is that models like ChatGPT do not execute programs; they transform context. Because of that, a language designed specifically for LLMs would likely not be imperative (do X, then Y), state-mutating, or instruction-step driven. Instead, it would be declarative and context-transforming, with its primary operation being the propagation of semantic constraints. The core abstraction in such a language would be the context, not the variable. In conventional programming languages, variables hold values and functions map inputs to outputs. In a ChatGPT-native language, the context itself would be the primary object, continuously reshaped by constraints. The atomic unit would therefore be a semantic constraint, not a value or instruction.

      An important consequence of this is that types would be semantic rather than numeric or structural. Instead of types like number, string, bool, you might have types such as explanation, argument, analogy, counterexample, formal_definition.

      These types would constrain what kind of text may follow, rather than how data is stored or laid out in memory. In other words, the language would shape meaning and allowable continuations, not execution paths. An example:

      @iterate: refine explanation until clarity ≥ expert_threshold

    • There are two separate needs here. One is a language that can be used for computation where the code will be discarded. Only the output of the program matters. And the other is a language that will be eventually read or validated by humans.

    • I don’t know rust but I use it with llms a lot as unlike python, it has fewer ways to do things, along with all the built in checks to build.

    • I want to create a language that allows an LLM to dynamically decide what to do.

      A non dertermistic programing language, which options to drop down into JavaScript or even C if you need to specify certain behaviors.

      I'd need to be much better at this though.

      6 replies →

  • I agree with this. Making languages geared toward human ergonomics probably won’t be a thing going forward.

    Go is positioned really well here, and Steve Yegge wrote a piece on why. The language is fast, less bloated than Python/TS, and less dogmatic than Java/Kotlin. LLMs can go wham with Go and the compiler will catch most of the obvious bugs. Faster compilation means you can iterate through a process pretty quickly.

    Also, if I need abstraction that’s hard to achieve in Go, then it better be zero-cost like Rust. I don’t write Python for anything these days. I mean, why bother with uv, pip, ty, mypy, ruff, black, and whatever else when the Go compiler and the standard tooling work better than that decrepit Python tooling? And it costs almost nothing to make my scripts faster too.

    I don’t yet know how I feel about Rust since LLMs still aren’t super good with it, but with Go, agentic coding is far more pleasurable and safer than Python/TS.

    • Python (with Qt, pyside) is still great for desktop GUI applications. My current project is all LLM generated (but mostly me-verified) Rust, wrapped in a thin Python application for the GUI, TUI, CLI, and web interfaces. There's also a Kotlin wrapper for running it on Android.

      1 reply →

  • > Python/JS/Ruby/etc. were good tradeoffs when developer time mattered.

    First I don't think this is the end of those languages. I still write code in Ruby almost daily, mostly to solve smaller issues; Ruby acts as the ultimate glue that connects everything here.

    Having said that, Ruby is on a path to extinction. That started way before AI though and has many different reasons; it happened to perl before and now ruby is following suit. Lack of trust in RubyCentral as our divine new ruler is one (recently), after they decided to turn against the community. Soon Ruby can be renamed into Suby, to indicate Shopify running the show now. What is interesting is that you still see articles "ruby is not dead, ruby is not dead". Just the frequency of those articles coming up is worrying - it's like someone trying to pitch last minute sales - and then the company goes bankrupt. The human mind is a strange thing.

    One good advantage of e. g. Python and Ruby is that they are excellent at prototyping ideas into code. That part won't go away, even if AI infiltrates more computers.

    • > One good advantage of e. g. Python and Ruby is that they are excellent at prototyping ideas into code. That part won't go away, even if AI infiltrates more computers.

      Why wouldn't they go away for prototyping? If an LLM can help you prototype in whatever language, why pick Ruby or Python?

      (This isn't a gotcha question. I primarily use python these days, but I'm not married to it).

  • I wouldn't speak so quickly for the 'uncommon' language set. I had Claude write me a fully functional typed erlang compiler with ocaml and LLVM IR over the last two days to test some ideas. I don't know ocaml. It made the right calls about erlang, and the result passes a fairly serious test suite, so it must've known enough ocaml and LLVM IR.

  • > But now that most code is written by LLMs...

    Pause for a moment and think through a realistic estimation of the numbers and proportions involved.

  • My intuition from using the tools broadly is that pre-baked design decisions/“architectures” are going to be very competitive on the LLM coding front. If this is accurate, language matters less than abstraction.

    Instructions files are just pre-made decisions that steer the agent. We try to reduce the surface area for nondeterminism using these specs, and while the models will get better at synthesizing instructions and code understanding, every decision we remove pays dividends in reduced token usage/time/incorrectness.

    I think this is what orgs like Supabase see, and are trying to position themselves as solutions to data storage, auth, events etc within the LLM coding space, and are very successful albeit in the vibe coder area mostly. And look at AWS Bedrock, they’ve abstracted every dimension of the space into some acronym.

  • I'm not sure that LLMs are going to [completely] replace the desire for JIT, even with relatively fast compilers.

    Frameworks might go the way of the dinosaur. If an LLM can manage a lot of complex code without human-serving abstractions, why even use something like React?

    • Frameworks aren't just human-serving abstractions - they're structural abstractions that allow for performant code, or even being able to achieve certain behaviours.

      Sure, you could write a frontend without something like react, and create a backend without something like django, but the code generated by an LLM will become similarly convoluted and hard to maintain as if a human had written it.

      LLM's are still _quite_ bad at writing maintainable code - even for themselves.

  • I think you're missing the reason LLMs work: It's cause they can continue predictable structures, like a human.

    The surmise that compiled languages fit that just doesn't follow. The same way LLMs have trouble finishing HTML because of the open/close are too far apart.

    The language that an LLM would succeed with is one where:

    1. Context is not far apart

    2. The training corpus is wide

    3. Keywords, variables, etc are differentiated in the training.

    4. REPL like interactivity allows for a feedback loop.

    So, I think it's premature to think just because the compiled languages are less used because of human inabilities, doesn't mean the LLM will do any better.

  • I was also thinking this some days ago. The scaffolding that static languages provide is a good fit for LLMs in general.

    Interestingly, since we are talking about Go specifically, I never found that I was spending too much typing... types. Obviously more than with a Python script, but never at a level where I would consider it a problem. And now with newer Python projects using type annotations, the difference got smaller.

    • > And now with newer Python projects using type annotations, the difference got smaller.

      Just FWIW, you don't actually have to put type annotations in your own code in order to use annotated libraries.

      1 reply →

  • Agree on compiled languages, wondering about Go vs Rust. Go compiles faster but is more verbose, token cost is an important factor. Rust's famously strict compiler and general safety orientation seems like a strong candidate for LLM coding. Go would probably have more training data out already though.

  • I generally use LLMs to generate Python (or TypeScript) because the quality and maintainability is significantly better than if I ask it to, for example, pump out C. They really do not perform very well outside of the most "popular" languages.

  • I’ve moved to rust for some select projects and it’s actually been a bit easier… I converted an electron app to rust/tauri… perf improvement was massive and development was quicker. I’m rethinking the stacks I should be focused on.

  • Astronaut 1: You mean... strong static typing is an unmitigated win?

    Astronaut 2: Always has been...

  • Might as well choose a language with a much better type system than go, given how beneficial quick feedback loops are to LLM code generation.

  • > assuming enough training data

    This is a big assumption. I write a lot of Ansible, and it can’t even format the code properly, which is a pretty big deal in yaml. It’s totally brain dead.

  • Still less tokens to produce with higher level languages, and therefore less cost to maintain in the long run?

  • > LLMs still can't write Gleam

    Have you tried? I've had surprisingly good results with Gleam.

  • I love golang man! And I use it for the same thing too!!

    I mean people mention rust and everything and how AI can write proper rust code with linter and some other thing but man trust me that AI can write some pretty good golang code.

    I mean though, I don't want everyone to write golang code with AI of all of a sudden because I have been doing it for over an year and its something that I vibe with and its my personal style. I would lose some points of uniqueness if everyone starts doing the same haha!

    Man my love for golang runs deep. Its simple, cross platform (usually) and compiles super fast. I "vibe code" but feel faith that I can always manage the code back.

    (self promotion? sorry about that: but created golang single main.go file project with a timer/pomodoro with websockets using gorilla (single dep) https://spocklet-pomodo.hf.space/)

    So Shhh let's keep it a secret between us shall we! ;)

    (Oh yeah! Recently created a WHMCS alternative written in golang to hook up to any podman/gvisor instance to build your own mini vps with my own tmate server, lots of glue code but it actually generated it in first try! It's surprisingly good, I will try to release it as open source & thinking of charging just once if people want everything set up or something custom

    Though one minor nitpick is that the complexity almost rises many folds between a single file project and anything which requires database in golang from what I feel usually but golang's pretty simple and I just LOVE golang.)

    Also AI's pretty good at niche languages too I tried to vibe code a fzf alternative from golang to v-lang and I found the results to be really promising too!

  • or maybe someone will use an LLM to create a JIT that works so well that compiled languages will be gone.

  • > LLMs still can't write Gleam/Janet/CommonLisp/etc

    hoho - I did a 20/80 human/claude project over the long weekend using Janet: https://git.sr.ht/~lsh-0/pj/tree (dead simple Lerna replacement)

    ... but I otherwise agree with the sentiment. Go code is so simple it scrubs any creative fingerprints anyway. The Clojure/Janet/scheme code I've seen it writing isn't _great_ but it gets the job done quickly and correct enough for me to return to it later and golf it some.

  • > Plus, I get a binary that I can port to other systems w/o problem.

    So cross-platform vibe-coded malware is the future then?

Nice work detective Simon! I love these “discovery” posts the most because you can’t find this stuff anywhere.

  • Absolutely, when people discover and share there's something fun to it beyond press releases and commentary. Creative and inspiring post

This is basically the same functionality as OpenAI Codex Web has, which, if you've not used it, you absolutely should not. What a garbage piece of software. Anthropic is eating OpenAI's lunch.

  • It's a bit different from Codex Web in that it can't open PRs against projects and can't be configured with internet access.

    It is better than Codex Web in that you can continue to chat with the agent while it's working - Claude Code for web has that too. Codex Web really needs to catch up there!

    • Codex Web actually lacks the most basic PR integration, it's so useless. Codex Web refuses to push any binary file to your PR (like images, jars, lock files, etc). It can't check your GH Actions' logs for failures to try to fix them. Replying to one of the PR comments to accept a fix requires replying to a different GitHub bot than the one that opens your PR. And though there's a "Secrets" configuration to add secret vars for a Codex repo, Codex can't access them, so you can't even work around these bugs by asking Codex to make API calls. It's like nobody at the company has tried their own product.

Has Gemini lost its ability to run javascript and python? I swear it could when it was launched by now its saying it hasn't the ability. Annoying regression when Claude and ChatGPT are so good at it.

  • This regression seems to have happened in the past few days. I suspected it was hallucinating the run and confirmed it by by asking Gemini to output the current date/time. The UTC it was reported was in the future from my clock. Some challenging mathematics were generating wrong results. Gemini will acknowledge something is wrong if you push it to explain the discrepancies, but can't explain it.

I wonder how long npm/pip etc even makes sense.

Dependancies introduce unnecessary LOC and features which are, more and more, just written by LLMs themselves. It is easier to just write the necessary functionality directly. Whether that is more maintainable or not is a bit YMMV at this stage, but I would wager it is improving.

  • What a bizarre comment. Take something like NumPy - has a hard dependency on BLAS implementations where numerical correctness are highly valued for accuracy and require deep thinking for correct implementation as well as for performance. Written in a different language again for performance so again an LLM would have to implement all of those things. What’s the utility in burning energy to regenerate this all the time when implementations already exist?

  • Interesting thought (I think recently more than ever it's a good idea to question assumptions) - but IMO abstractions are important as ever.

    Maybe the smallest/most convenient packages (looking at you is-even) are obsolete, but meaningful packages still abstract a lot of complexity that IMO aren't easier to one-shot with an LLM

    • Concretely, when you use Django, underneath you have CPython, then C, then assembly, and finally machine code. I believe LLMs have been much better trained on each layer than going end-to-end.

  • The most popular modules downloaded off pip and npm are not singular simple functions and cannot easily be rewritten by an llm.

    Scikit-learn

    Pandas

    Polars

  • I consider packages over 100k download production-tested. Sure LLM can roll some by themselves but if many edge cases to appear, (which may already be handled by public packages) you will need to handle it.

    • Don't base anything on just download numbers, not only is it easily game-able, it's enough with like 3 small companies using a package and push commits individually and CI triggering on every new commit for that number to lose any sort of meaning.

      Vanity metrics should not be used for engineering decisions.

  • At times I wonder why x tui coding agent was written in js/ts/python, why not use Go if it's mostly llm coded anyway? But that's mostly my frustration at having to wait for npm to install a thousand dependencies, instead of one executable plus some config files. There's also support libraries like terminal ui that differ in quality between platforms.

    • Funny because as a non-Go user, the few Go binaries I've used also installed a bunch of random stuff.

      This can be fixed in npm if you publish pre-compiled binaries but that has its own problems.

      1 reply →

  • Well you do need to vet dependencies and I wish there was a way to exclude purely vibe coded dependencies that no human reviewed but for well established libraries, I do trust well maintained and designed human developed libraries over AI slop.

    Don't get me wrong, I'm not a luddite, I use claude code and cursor but the code generated by either of those is nowhere near what I'd call good maintainable code and I end up having to rewrite/refactor a big portion before it's in any halfway decent state.

    That said with the most egregious packages like left-pad etc in nodejs world it was always a better idea to build your own instead of depending on that.

    • I've been copy-pasting small modules directly into my projects. That way I can look them over and see if they're OK and it saves me an install and possible future npm-jacking. There's a whole ton of small things that rarely need any maintenance, and if they do, they're small enough that I can fix myself. Worst case I paste in the new version (I press 'y' on github and paste the link at the top of the file so I can find it again)

  • As long as "don't roll your own crypto" is considered good advice, you'll have at least a few packages/libraries that'll need managing.

    For a decent number of relatively pedestrian tasks though, I can see it.

    • LLMs are great at the roll you own crypto foot gun. They will tell you to remember all these things that are important, and then ignore their own tips.

  • Tokens are expensive and downloading is cheap. I think probably the opposite is true, really, and more packages will be written specifically for LLMs to use because their api uses fewer tokens.

  • It still takes a little bit of time for an LLM to rewrite all the software in existence from scratch.

Hmm.. what's this?

> gmail (read-only) # gmail.search_email_ids → any # > Description: Search Gmail message IDs by query/tags (read-only).

Chat GPT App on android disavows having this... In what context does chat GPT get (read) access to Gmail? Desktop app?

> ChatGPT Containers can now run bash, pip/npm install packages and download files

What can go wrong ? The next Linux (and BSD) worm will be a ChatGPT based one.

Not sure if this is still working. I tried getting it to install cowsay and it ran into authentication issues. Does it work for other people?

  • I could even get it to download the ruby cowsay gem from rubygems and run it with some provided text. An alternative is to attach the gem to the conversation or provide a publicly available url.

  • Can you share the transcript?

    • https://chatgpt.com/share/6977f9d7-ca94-8000-b1a0-8b1a994e58...

      The transcript doesn't show it (I think it faked it) but here's the code in the sidebar:

      > bash -lc mkdir -p /mnt/data/cowsay-demo && cd /mnt/data/cowsay-demo && npm init -y >/dev/null && npm i cowsay@latest >/dev/null && echo 'Installed cowsay version:' && node -e "console.log(require('cowsay/package.json').version)"

        npm error code E401
        npm error Incorrect or missing password.
        npm error If you were trying to login, change your password, create an
        npm error authentication token or enable two-factor authentication then
        npm error that means you likely typed your password in incorrectly.
        npm error Please try again, or recover your password at:
        npm error   https://www.npmjs.com/forgot
        npm error
        npm error If you were doing some other operation then your saved credentials are
        npm error probably out of date. To correct this please try logging in again with:
        npm error   npm login
        npm error A complete log of this run can be found in: /home/oai/.npm/_logs/2026-01-26T21_20_00_322Z-debug-0.log
      

      > Checking and overriding npm registry > It seems like the registry option is protected, possibly pointing to an internal OpenAI registry that requires authentication. To bypass this, I can override the registry in the command with npm i cowsay --registry=https://registry.npmjs.org/. Let's give this a try and see if it works.

      It's unclear if that helped.

      I tried again and it worked. It seems like I have to ask for it to do things "in the container" or it will just give me directions about how to do it.

      1 reply →

How much compute do you get in these containers? Could I have it run whisper on an mp3 it downloads?

  • That might work! You would have to figure out how to get Whisper working in there but I'm sure that's possible with a bit of creativity concerning uploading files and maybe running a build with the available C compiler.

    It appears to have 4GB of RAM and 56 (!?) CPU cores https://chatgpt.com/share/6977e1f8-0f94-8006-9973-e9fab6d244...

    • Huh...

      If people are getting this for free or even as an offering with chatgpt consideirng it becomes subsidized too. Lowend providers are a little in threat with their 7$/year deals if Chatgpt provides 56 cores for free. this doesn't seem right to provide so many cores for (free??)

      Are you running this in your free account as you mention in blog post simon or in your paid account?

      5 replies →

Wow, it can do what I could do 20 years back using Ctrl+T? The progress! Give them another 10 billion, scratch that, 20 billion, scratch that, 75 trillion. - Written by SarcastAI.

thank you for sharing, is there a new container for each code run, or it stays the same for whole conversation?

Did I miss the boat on chatgpt? Is there something more to it than the web chat interface?

I jumped on the Claude Code bandwagon and I dropped off chatgpt.

I find the chatgpt voice interface to be infuriating; it literally talks in circles and just spews summary garbage whenever I ask it anything remotely specific.

  • I still like ChatGPT for search more than Claude, though I think Claude may be catching up now. Gemini is getting good at search too (as you'd hope it would!)

  • same experience - voice mode is dumbed down compared to text. ended up building my own voice interface that uses full claude/gpt/gemini models instead of the lobotomized voice versions. actually handles specific requests without the "go look it up yourself" cop-out. want to try it?

  • Chatgpt recently added additional personalization options that have made their voice chat better for me. I want a direct professional, no “hey” there I’m your bro fake stuff etc. See personalization under settings.

    • Okay, I'll try that out. I was asking it to do something like summarize a balance sheet over a few years and while the chat interface will do this, the voice interface would just tell me to go look up the specific data source, it refused to barf out numbers.

... as root?

As an infosec guy I'm going to go ahead and buy a bigger house

  • Well either way, the infosec folks are going to have the time of their lives printing write-ups and lots of money on both sides.

    I can see the sandbox escapes, remote code exection paths, exfiltration methods and all the vibe coded sandcastles waiting to be knocked down because we have folks openly admiting that do not know a single line of code they are prompting to the AI.

    I don't think we know the scale of the amout of security issues we will see because of the level of hubris there is with AI taking care of all of the coding.

  • Any IT guy with experience/knowledge above average should take out huge loan as well.

    Someone will have to clean the mess made by those creators who think they can "create" anything reliable with their chatgpt