Comment by neom

8 months ago

For fun last month I decided to see if i could build a fully functional business of agents. It's 175 python files (employees) build up of internal roles within those files (tasks). So what I have is 175 employees who are able to pass output around each other, understand the work, complete the work, understand where to send the output. The whole system has the ability to do around 275 base processes (same as a business at > 100MM arr) I started on a Friday afternoon and slept a little bit and finished on Monday afternoon. After I had it running I sent it to a VC friend to show them and they sent back the deck of a startup that is in stealth with $25MM doing it the exact same way. With 1 month and a designer and an engineer, I could have it mvp functional for anyone to use ($40k?). Times are changing. Here is kinda how it looks: https://s.h4x.club/9ZuO4XQR / https://s.h4x.club/jkuB8ZED (I've evolved it a little since this, and if you're an engineer and look at my files and think, this guy is a moron: I know!:))

> understand the work

LLMs don't understand. It's mind-boggling to me that large parts of the tech industry think that.

Don't ascribe to them what they don't have. They are fantastic at faking understanding. Don't get me wrong, for many tasks, that's good enough. But there is a fundamental limit to what all this can do. Don't get fooled into believing there isn't.

  • > LLMs don't understand. It's mind-boggling to me that large parts of the tech industry think that.

    I think you might be tied to a definition of "understanding" that doesn't really apply.

    If you prompt a LLM with ambiguous instructions, it requests you to clarify (i.e., extend prompt to provide more context) and once you do the LLM outputs something that exactly meets the goals of the initial prompt, does it count as understanding?

    If it walks like a duck and quacks like a duck, it's a duck,or something so close to a duck that we'd be better off calling it that.

    • > If you prompt a LLM with ambiguous instructions, it requests you to clarify (i.e., extend prompt to provide more context)

      It does not understand that it needs clarification. This behavior is replicated pattern

      12 replies →

    • > If it walks like a duck and quacks like a duck, it's a duck,or something so close to a duck that we'd be better off calling it that.

      Saying “LLMs match understanding well enough”, is to make the same core error if we were to say “rote learning is good enough” in a conversation about understanding a subject.

      The issue is that they can pass the test(s), but they dont understand the work. This is the issue with a purely utilitarian measure of output.

  • I don't believe the user meant "understand" in the classical biological and philosophical sense, or were otherwise attempting to anthropomorphize the systems. They were speaking from the practical experience of "this thing takes a somewhat ambiguous input with unique constraints and implements the ask more-or-less as intended".

  • They understand. Anything able to reason about any arbitrary request and form a plan tailored to that request understands well enough to qualify for the verb. The mechanism behind it may feel hollow or fake. But if its responses reliably show understanding, the LLM understands - by any ordinary measure.

    • Rote learning is a term that exists which specifically punctures this output oriented measurement of understanding.

  • Nearly every argument like this has the same fatal flaw, and it's generally not the critique of the AI, but the critique reflected back on to humans.

    Humans also don't understand and are frequently faking understanding, which for many tasks is good enough. There are fundamental limits to what humans can do.

    The AI of a few months ago before OpenAI's sycophancy was quite impressive, less so now which means it is being artificially stunted so more can be charged later. It means privately it is much better than what is public. I can't say it "understands," but I can say it outclasses many many humans. There are already numbers of tasks based around understanding where I would already choose an LLM over a human.

    It's worth looking at bloom's taxonomy (https://en.wikipedia.org/wiki/Bloom%27s_taxonomy): In the 2001 revised edition of Bloom's taxonomy, the levels were renamed and reordered: Remember, Understand, Apply, Analyze, Evaluate, and Create. In my opinion it is at least human competitive for everything but create.

    I used to be very bearish on AI, but if you haven't had a "wow" moment when using one, then I don't think you've tried to explore what it can do or tested it's limits with your own special expertise/domain knowledge, or if you have then I'm not sure we're using the same LLMs. Then compare that experience to normal people, not your peer groups. Compare an LLM to people into astrology, crystal healing, or homeopathy and ask which has more "understanding."

    • I do agree with you - but the big difference is that humans-who-are-faking-it tend to learn as they go so might, with a bit of effort, be expected to understand eventually.

      Does that actually matter? Probably not for many everyday tasks...

    • Um, moving the goal post?

      The claim was LLMs understand things.

      The counter was, nope, they don't. They can fake it well though.

      Your argument now is, well humans also often fake it. Kinda implying that it means it's ok to claim that LLMs have understanding?

      They may outclass people in a bunch of things. That's great! My pocket calculator 20 years also did, and it's also great. Neither understands what they are doing though.

      8 replies →

  • meh. I feel this is just a linguistic shortcut, similar to how _trained_ biologists can talk about a species or organism evolving some trait. Of course the organism isn't _really_ evolving with any goal in mind, but that's clear to the speaker and audience. Whether or not LLMs understand (very unlikely), it's clear what we mean by an LLM "understanding": has the context + prior training to make reasonable predictions. But no one wants to write that each time.

    • That's an interesting take and in fact one I could get behind.

      But I'm afraid that most folks using the term mean it more literally than you describe.

      1 reply →

  • They understand tho, it's different than how it's done in our brain but they solve task that would be impossible to do without understanding. I would even say that they can now reason through problems thanks to powerful reasoning models like Gemini 2.5 Pro and o3.

  • Definition of understanding is based on connecting relations. If there is one thing a llm can do its connecting relations. So I am not sure why you say llms are not understanding.

  • Thats an interesting word to pick on. Understanding still means something here in a relative sense.

  • Asking a short question but in a serious way: so what?

    • You are asking why it is meaningful to use terms for what they mean instead of making up things?

      Well, I prefer it that way, but the spirit of "AI" seems to go in another direction, and the leadership of US government also does, so maybe times are just changing.

> The whole system has the ability to do around 275 base processes

It’s incredibly easy to get LLMs to do a lot of stuff that seems convincing.

They are literally trained for plausibility.

Your message doesn't make it clear what those 175 employees can realistically accomplish on their own.

For instance, you might have an SEO expert on the team, but that alone won't guarantee top search engine rankings. There are countless SEO professionals and tools (human or AI-powered), and even having the best one doesn't eliminate the underlying challenge: business competition. LLMs, like any other tool, don’t solve that fundamental problem.

  • No employees accomplish anything on their own in the real world, all employees are part of a team. That's why I designed a business strategy and analysis layer (over half the system, in fact), with web tools and connections to all of the insights systems (like mix panel). I built the exact same thing I build at digitalocean but instead of humans I defined them with code, digitalocean runs just fine, so does my LLM system. The whole system I build is self learning, insight gathering and refinement. Competition is for losers, the best teams win via the best insights.

    • Why 175? Why not 5 billion employees? Why not 20000 companies in parallel? Why not simulate 5 earth's worth of history and setup a full universe of worlds full of startups?

      This sounds like those guys in social media that one up each other with their bed times and end up saying they wake up every day at 2am to meditate and work out

      3 replies →

Does this experiment do anything useful or does it just soak up investor money? Not that there's anything wrong with the latter.

  • The only investor is me. I build it on my own over a weekend, on my own. I just wanted to confirm it can be done therefore will exist, that is all. Personally, I decided not to peruse it because I am old and lazy and don't want to compete against a16z and sequoia funded adderall filled teenagers.

Engineers who would judge someone's frontier MVP like that are not even worth worrying about.

This is epic work. Would love to see more of it but I guess you're gonna take it the startup route since you have connections. Best of luck.

  • Thanks!!! I decided not to build it, that space is already too busy, there is a startup with $25MM in stealth, who else is in stealth? On top of that, this method will get stale very very quickly, foundation model businesses are just too hard to work around right now, it's a silly way to do business. My magic is I've build a startup from scratch to over 400 people and watched what they do, it won't be long till that isn't worth much.

Cool. What goods/services does your business provide to customers?

  • Goods and services are a byproduct of business, business is primarily concerned with systems and processes that facilitate value exchange, so my tool, can work with a user, to build a business, not a product or a service. If you bake cupcakes, my tool can get you 100 people at your door, it cannot open the door or provide the cakes.

Sounds really interesting but I have no idea what’s the purpose of having 175 “employees” here? Maybe it is a smart way to sell the idea you’re going to replace 175 people if you buy the product? Could just buy chatgpt instead I guess, but a chatbot doesn’t sound as cool as 175 employees.

  • I would love to know how to do it another way if you have any ideas, I'm sadly not experienced or intelligent enough to think of another way to do it.

I’ve been floating around a similar set of ideas and it’s been very fun (if not all that useful yet) to build Did you try taking it one step further where a “recruiter” has to hire the engineers after a screening process? I wonder if this could get you even better AI engineers

This really sounds like a “faster horse” scenario and totally misses the point of the GPs comment: why shackle yourself to modeling the way humans work?