Comment by ksec
6 days ago
I think a lot of people taking this at face value , a lot of this was possible because of the beyond standard extensive and comprehensive test suit previously built.
6 days ago
I think a lot of people taking this at face value , a lot of this was possible because of the beyond standard extensive and comprehensive test suit previously built.
It's still an impressive achievement that would have taken even the most competent engineers an exponentially longer time to accomplish.
I just hope it's noted when this is eventually marketed how much human effort went into designing and curating the test suite that even enabled this speed in the first place.
A test suite sort of functions exactly like the ideal scenario for current gen llms. A comprehensive enough test suite essentially forms the spec for agents to implement however they see fit - in this case rust.
You could probably throw away the entire actual source code in certain cases and reimplement the whole thing from scratch just giving an agent access to the tests when it's as well crafted as a project like bun.
Look what it can do in 6 days!
Ignore the hundreds of thousands of hours put into the original architecture and test suite that made it possible in the first place.
This is such a bad faith argument. How long would it take a dev or a team of devs to do this with the same architecture and test suite? A hell of a lot longer than 6 days..
But what is the purpose? When you rewrite a project in another language, it's for engineers to be able to maintain and further develop the project better on some metrics due to advantages of the language. It doesn't hold when LLM does the rewrite, since there is no one who understands the code after that.
It's a good demonstration of capabilities, sure, but the result itself makes no sense. We'll have to figure out where these capabilities can bring real advantage
6 replies →
I disagree with calling this bad faith. For instance:
* I can agive you one quarter of amazing profits, if you let me dismantle and sell all the assets of a company.
* I can give you a few years of incredible food production, if you let me strip a rainforest and plant commercial crops.
* I can give you incredibly cheap energy, if you let me mine non renewing fossil fuels from the earth.
The context of why something is possible matters. In this case, because a very large and comprehensive test suite was seen as a necessity to specify a successful project (managed by humans). I do not believe a LLM coded project could ever have made such a test suite. In this case, the LLM is consuming the result of expensive human labor (the test suite) to make what ultimately is a minor variation to it (the implementation language).
> This is such a bad faith argument. How long would it take a dev or a team of devs to do this with the same architecture and test suite? A hell of a lot longer than 6 days..
Pocket calculator also can multiply numbers much faster than engineer, it doesn't make it engineer itself..
You missed the point.
People want to use stuff like this as somehow evidence for AI being able to write entire software systems in a few days. We saw the same shit with the "compiler" they made with a bunch of agents. Literally the only reason it's possible is because the hundreds of thousands of man hours and God knows how much money that was poured into the reference projects befoes the AI got anywhere near it.
To replicate this kind of thing with a green field project would take an absolute ton of spec work and requirements derivation, which will substantially eat into any savings from having AI generate it.
The accomplishment itself is interesting, and unlocks opportunities to do work no one would have bothered with before, but it doesn't represent what a lot of people desperately want it to.
Exactly this.
I am not sure why people sound so astounded, to be honest. This has been my frank experience of the agentic tools both Codex and Claude since about December.
When given the right constraints this kind of thing is entirely conceivable.
However the important question not being answered here is: does anybody working on it have a full understanding of what has been built?
My experience having constructed similar types of projects using these tools is yes, you could do this in a week or two but now you'll have a month or two of digging through what it made, understanding what was built, and undoing critical yolo leaps of faith it made that you didn't want.
Not to mention to even attempt something like this from scratch would take hundreds of hours if spec work. I see it all day everyday in the aerospace sector. Software engineers have absolutely no idea what deriving a design document and all its associated artifacts actually looks like, and they're in for a rude surprise if the industry really does shift hard that direction
If this is a "beyond standard" test suite, (so much so that it _uniquely_ makes this work possible compared to other projects,) then how is Bun also uniquely unstable compared to other Zig programs (and so deserving of rewrite?) If the blame lies partially with the test suite, what does this imply (if anything) about the Rust port?
Because tests validate behavior, not undefined behavior.
The thesis is that Rust makes undefined behavior less likely.