Comment by Roark66
13 hours ago
Does any of it actually work? Can you build that JS VM separately and run serious JS on it? That would be an accomplishment.
Looking at the comments and claims (I've not got the time to review a large code base just to check this claim) I get an impression _something_ was created, but none of it actually builds and no one knows what is the actual plan.
Did your process not involve recursive planning stages (these ALWAYS have big architectural error and gotchas in my experience, unless you're doing a small toy project or something the AI has seen thousands of already).
I find agents doing pretty well once you have a human correct their bad assumptions and architectural errors. But this assumes the human has absolute understanding of what is being done down to the tiniest component. There will be errors agents left to their own will discover at the very end after spending dozens of millions of tokens, then they will try the next idea they hallucinated, spend another few dozen million tokens and so on. Perhaps after 10 iterations like this they may arrive at something fine or more likely they will descent into hallucinations hell.
This is what happens when one of :the complexity, the size, or it being novel enough (often a mix of all 3) of the task exceed the capability of the agents.
The true way to success is the way of a human-ai hybrid, but you absolutely need a human that knows their stuff.
Let me give you a small example from systems field. The other day I wanted to design an AI observability system with the following spec: - use existing OS components, none or as little code as possible - ideally runs on stateless pods on an air gapped k3s cluster (preferably uses one of existing DBs, but clickhouse acceptable) - able to proxy openai, anthropic(both api and clause max), google(vercel+gemini), deepinfra, openrouter including client auth (so it is completely transparent to the client) - reconstruct streaming responses, recognises tool calls, reasoning content, nice to have ability to define own session/conversation recognition rules
I used gemini 3 and opus 4.5 for the initial planning/comparison of os projects that could be useful. Both converged on helicone as being supposedly the best. Until towards the very end of implementation it was found helicone has pretty much zero docs for properly setting up self hosted platform, it tries redirecting to their Web page for auth and agents immediately went into rewriting parts of the source attempting to write their own auth/fixing imaginary bugs that were really miscondiguration.
Then another product was recommended (I forgot which), there upon very detailed questioning, requesting re-confirmations of actual configs for multiple features that were supposedly supported it turned out it didn't pass through auth for clause max.
Eventually I chose litellm+langfuse (that was turned down initially in favour of helicone) and I needed to make few small code changes so Claude max auth could be read, additional headers could be passed through and within a single endpoint it could send Claude telemetry as pure pass through and real llm api through it's "models" engine (so it recognised tool calls and so on).
No comments yet
Contribute on Hacker News ↗