Comment by adrithmetiqa

1 day ago

Super interesting but what does this mean for us mere mortals?

I got Claude to self reference and update its own instructions to solve making a typed proxy API of any website. After a week, scores of iterations, it can reverse engineer any website. The first few days I had to be deeply involved with each iteration loop. Domain knowledge is helpful. Each time I saw a problem I would ask Claude to update its instructions so it doesn't happen again. Then less and less. Eventually it got to the point it was updating and improving the metrics every iteration unsupervised.

Edit: This is going to have huge ramifications for the tech security industry as these systems will be able to break security systems as easily it solved the proof. The sooner the good guys, if there are any left, understand this the better it will be for everybody.

> Super interesting but what does this mean for us mere mortals?

I would go for a 2 or 3 hour walk with my phone using the remote control feature looking every 5 - 10 minutes to make sure it doesn't need human help. I went to the coffeeshop and drank very good coffee listening to music. Then at night I sat and had a beer thinking about T.S. Eliot's 'The Wasteland', the effect of industrialization in England at that time and his views of how ennui affected the aristocracy.

  • > I went to the coffeeshop and drank very good coffee listening to music. Then at night I sat and had a beer thinking about T.S. Eliot's 'The Wasteland', the effect of industrialization in England at that time and his views of how ennui affected the aristocracy.

    Well, for those among us that are not aristocracy already, except for the vanishingly small number of people required to oversee such processes, we’re probably the closest we’re going to get to it. If they don’t need people to do the tech labor, we’ve got way more people than we need, so that’s a huge oversupply of tech skills, which means tech skills are rapidly becoming worthless. Glad to see how fast we’re moving in our very own race to the bottom!

    • Lol,a race to the bottom where too many tech savvy people are left unemployed while a few "privileged" get a decreasing buying power to maintain security of the digital tools that keep the whole digital dependent civilizations afloat?

      Sounds like a great starting plot for an interesting story.

    • I kind of feel like software engineers working on improving AI are traitors working against other SE’s trying to make a living.

      However…

      I have to acknowledge my craft of SE has been putting people out of work for decades. I myself came up with business process improvement that directly let the company release about 20 people. I did this twice.

      So… fair play.

      19 replies →

  • > I would go for a 2 or 3 hour walk with my phone using the remote control feature looking every 5 - 10 minutes to make sure it doesn't need human help.

    That is a nightmarish scenario tbh

    • Nightmarish?! In comparison to the average person's actual job? I'm pretty sure that many people out there would sign up for a battle royale for a chance at such a job.

      3 replies →

    • That nightmarish scenario is what T.S. Eliot was describing in "The Wasteland" which "portrays deep, existential ennui and boredom as defining symptoms of modern life following World War I."

      Later this boredom was described by the Stones, "And though she’s not really ill / There’s a little yellow pill / She goes running for the shelter of a mother’s little helper".

      It is a nightmare. Mostly what I'm thinking about while the agents are running is how bored I'm going to be. That is the joke, my deep thought on T.S. Eliot are about the wasteland this thing is going to create.

  • > Edit: This is going to have huge ramifications for the tech security industry as these systems will be able to break security systems as easily it solved the proof. The sooner the good guys, if there are any left, understand this the better it will be for everybody.

    What can the good guys do? Fire up Claude to improve their systems? Unless you have it working fully autonomously to counter-act abuse, I don't see how you can beat the "bad guys". There may be some industries where this is a solved problem (e.g. you can do all the validation server-sided, religiously follow best practices to prevent and mitigate abuse), but a lot of stuff like multiplayer video games will be doomed unless they move to a "you must use a locked down system we control" model. I honestly don't consider it liberating as someone that has various hobby projects, that now in addition to plain old DDoS I'll also have people spin up layer 7 attacks with just their credit card. It almost makes me want to give up instead of pushing forward in a world where the worst of the worst has access to the best of the best.

    • Nothing as heavy as the above but here's my small anecdote:

      I was putting off security updates on my npm dependencies in my personal project because it's a pain to migrate if the upgrade isn't trivial. It's not a critical website, but I run npm scripts locally, and dependabot is telling me things.

      I told Claude Code to make a migration plan to upgrade my deps. It updated code for breaking changes (there were API changes, not all fixes are minor version upgrades) and replaced abandoned unmaintained packages with newer ones or built-in Node APIs. It was all done in an hour. I even got unit tests out of it to test for regressions.

      In this case, I was able to skip the boring task of maintaining code and applying routine updates and focus on the fun feature stuff.

  • This type of slop comment is somehow worse than spam.

    >After a week, scores of iterations, it can reverse engineer any website

    Cool, let’s see the proof.

    • I posted a link but don't want to spam HN more than I have.

      It is proof-of-concept. Seriously burns some tokens (~80k - ~200k) but doesn't require AI after to scrape and automate a website so if all the people at Browser Use, Browser Base, and every one pounding every website used it, I think, the net benefit would be in the billions. I would recommend using it in isolation. Nonetheless, it works very very well on my machine.

      > This type of slop comment is somehow worse than spam.

      Please don't be mean.

      1 reply →

  • > I would go for a 2 or 3 hour walk with my phone using the remote control feature looking every 5 - 10 minutes

    2-3 hours "walking" while having to check in every 5-10 minutes?

    If I have to check in every 5-10 minutes, I won't taste coffee or hear that there's good music playing.

  • I have similar amounts of success (pretty good!) standing in line at a coffee shop talking to people who work for me through some action that needs to be taken and doing the same with AI.

    However I do not trust AI anywhere near as much as I trust the humans. The AI is super capable but also occasionally a psychopath toddler. I sat in amused astonishment when faced with job 2 not running because job 1 was failing Claude went in to the database, changed the failure record to success, triggered job 2 which produced harmful garbage, and then claimed victory. Only the most troubled person would even think of doing that, but Claude thought it was the best solution.

    • My work has required us all to be "AI Native". I am AI skeptical but am the type of person to try to do what is asked to the best of my ability. I can be wrong, after all.

      There is some real power in AI, for sure. But as I have been working with it, one thing is very clear. Either AI is not even close to a real intelligence (my take), or it is an alien intelligence. As I develop a system where it iterates on its own contexts, it definitely becomes probabilistically more likely to do the right thing, but the mistakes it makes become even more logic-defying. It's the coding equivalent of a hand with extra fingers.

      I'm only a few weeks into really diving in. Work has given me infinite tokens to play with. Building my own orchestrator system that's purely programmatic, which will spawn agents to do work. Treat them as functions. Defined inputs and defined outputs. Don't give an agent more than one goal, I find that giving it a goal of building a system often leads it to assert that it works when it does not, so the verifier is a different agent. I know this is not new thinking, as I said I am new.

      For me the most useful way to think about it has been considering LLMs to be a probabilistic programming language. It won't really error out, it'll just try to make it work. This attitude has made it fun for me again. Love learning new languages and also love making dirty scripts that make various tasks easier.

  • That's fucking insane. Thank you for sharing.

    I had a bad feeling we were basically already there.

My understanding is that, if confirmed, this demonstrates that AI can find novel solutions. This is a strong counterpoint to generative-AI-is-strictly-limited-to-training-data.

Another signal that we still have relevant progress in ai.

Also that it is now good enough to make researchers faster.

Learn plumbing

  • There is no reason why market for plumbing will get much larger than it is now (which is not too large)

    • lowest quote I got to replace toilet and faucet in the kitchen (my parts, just installation) - $895 (5 quotes total). market for trades is exploding and will grow larger and larger as gen alpha and beyond knows what screwdriver is as much as they know what rotary phone is (they dont how to use either)

  • This is kindof the opposite? Man + AI > either man or AI. I'd say "learn to work with Claude" is the better lesson here.

    • For now. The term people use is "centaur", like the half-man-half-horse of mythology.

      The AI CEO's are pointing out that when chess was "solved", in that Kasparov was famously beaten by deep blue, there was a window of time after that event where grandmasters + computers were the strongest players. The knowledge/experience of a grandmaster paired with the search/scoring of the engines was an unbeatable pair.

      However, that was just a window in time. Eventually engines alone were capable of beating grandmaster + engine pairs. Think about that carefully. It implies something. The human involvement eventually became an impediment.

      Whether you believe this will transfer to other domains is up to you to decide.

    • Fine but 'learn to work with Claude' helps only until you stop checking it and start borrowing its confidence. Then you chase a bogus lemma for hours.

      It's like pairing with the fastest person on the team, except he is wrong often enough to cost you time and still sounds sure.

That llms in the middle of everything will continue until morale improve because llms can generate text on top of bullshit made up problems