Comment by xiphias2

1 day ago

I'm actually excited for somebody trying experimenting with automated translation, but I'm afraid this will be lots of backwards compatibility issues.

I started looking at the commits, and it's basically solving the ,,tests not pass'' problem by changing the tests themselves. The real work of making it working on programs that are already deployed will be just starting now.

The only silver lining I see is that the server side JS community for some reason is already used to breakages all the time.

24 comments

xiphias2

rohitpaulk 1 day ago

The whole idea that my RUNTIME contains code that a single human hasn't looked at does make me uncomfortable, but if this actually works without a ton of issues it's pretty remarkable.

tempaccount420 21 hours ago
Don't worry, no one reviewed open source code before AI either. Basically nothing changed about the trust model.
- mort96 17 hours ago
  
  The person who wrote the code reviewed it as a part of writing it and going through the PR process.
  
  2 replies →
- runarberg 20 hours ago
  
  The speed of the change did. This is the “climate has always been changing” argument climate deniers make. It is a true statement which is still a lie by omission. Climate deniers purposely ignore that the climate has never changed at the current rate, and AI-stans neglect to mention that before AI nobody was merging a 1M+ lines of code in one go.

tarruda 1 day ago

> I started looking at the commits, and it's basically solving the ,,tests not pass'' problem by changing the tests themselves

Not sure if these decisions were made by the LLM, but I've always felt that Claude is more prone to doing "shady stuff" like modifying tests than finding correct solutions to problems.

GPT/Codex is more honest in this regard.

InsideOutSanta 1 day ago

Yeah, Claude is very creative in finding ways of "solving" problems that go against what the user probably intended.
Having said that, after looking at some of the test changes, they seem to be minor things, like changing timeouts, not changing the actual intended semantics of the tests. But it's too much code to review everything, so I might be completely wrong about that, and in real-world usage, even minor changes like these will cause issues.

rzmmm 1 day ago

I doubt it will end up as stable release very soon, but I'm happy to be proven wrong. I have some skepticism about this whole rewrite, Jarred Sumner has enormous internet following and it feels like an ad.

fragmede 1 day ago
How do you wash to define ad, and why does it matter? If I tell you I had lunch, I mean. okay, great. If I tell you I had a delicious Coca-Cola with my lunch, sure. If I happen to work at Coca-Cola, does that now become an ad? And what level does it become an issue? And I what is the issue?
- roxolotl 1 day ago
  
  If you work for Coca-Cola then yea there’s reason to question your intent even if simply because you aren’t objective due to your proximity to Coca-Cola.

q3k 1 day ago

> solving the ,,tests not pass'' problem by changing the tests themselves

https://github.com/oven-sh/bun/pull/30412/changes/68a34bf8ed...

This is great! Just add a random sleep(1) to a test, don't worry about it, it's going to be fine!

onli 1 day ago

On the other hand, the sleep fits better to the test description, "should allow reading stdout after a few milliseconds". Even if 1 != 'a few'. It's possible the part of the commit reverted here, https://github.com/oven-sh/bun/commit/a42bf70139980c4d13cc55..., defeated the purpose of the test by removing the sleep. I don't think adding the sleep back is an example of AI cheating.
Strange test though either way.
robryan 1 day ago

To be fair the commit message `revert proc.exited change in spawn.test.ts` suggests the sleep was there originally.

solid_fuel 18 hours ago

I wish I could take a look through the tests to see if anything substantial actually changed, but I can't even get github to load the diffs for me.

Imustaskforhelp 1 day ago

> I started looking at the commits, and it's basically solving the ,,tests not pass'' problem by changing the tests themselves. The real work of making it working on programs that are already deployed will be just starting now.

Wow, This is definitely quite something for sure.

Can jarred comment about if he has read the commits or not too or respond to your comment, this has basically made me lose the small faith I had in what bun is doing if it turns out to be correct.

xiphias2 1 day ago

It's OK, we'll see how it goes. He and Antropic are giving it us for free, and nowdays just forking the old version is easy if a project needs that. Even maintenance is much easier using LLMs.
I'm happy it's not a project I'm depending on, but a large enough project had to try this at some point so that we all can learn from how it goes.
I think this is why Antropic bought bun, so that they can sell big code translation as a feature for all the banks with COBOL code that they want to get rid of for a long time.
Still, those banks / enterprises won't appreciate the number of unit test changes.
And I agree with another comment that Codex xhigh is much better for these kinds of tasks, but still hard on this kind of scale.
Tadpole9181 12 hours ago

Jared has commented on this elsewhere in the thread, basically claiming the parent you replied to is outright lying: it has removed no tests and has not meaningfully changed annotations to reduce coverage of effectiveness. It added additional tests and made a few changes to hard coded values due to differences in, as an example, how LLVM and Zig handle stack frames.
The MR is right there, linked at the top of this page. You can check who is telling the truth.
That said, I don't know how anyone is actually claiming to have done that. All day, the size of the MR makes the diff take too long to load and GitHub dies. I'll have to pull it later to check myself.

Jarred 1 day ago

> it's basically solving the ,,tests not pass'' problem by changing the tests themselves.

False.

0 test files were deleted. 0 pre-existing tests were skipped, todo’d, or had assertions removed. 5 new tests were added in test.skip/test.todo state to track known not-yet-fixed bugs in the port that lacked test coverage before.

The merge changed 28 test files in total.

+1,312 lines

−141 lines

Most of that +1,312 is new tests.

The depth-of-recursion tests for TOML/JSONC parsers went from 25_000 -> 200_000 because Rust’s smaller stack frames (LLVM lifetime annotations let the optimizer reuse stack slots) mean 25k levels no longer reaches the 18 MB stack on Windows.

kenloef 3 hours ago

We're keeping this honest and chill, no worries.
What is "most of that "?
Why did you feel the need to produce so much detail about a single category of tests?
xiphias2 8 hours ago

That's great!
It's too bad you haven't structured the commits and pull requests a bit differently so that it's easier to review the exact changes, but I hope it goes well.
For example doing the test refactorings in a first pull request, and using something like test.xfail that is first fails then after the merge succeeds (but the test code itself doesn't change).
Also I have seen some tests getting stricter, which is again not a problem, but separating to a different pull request would have improved the reviewability significantly for a runtime that many people and companies depend on.
I'm sorry you were downvoted by HN and your comment got ,,dead'', that's not the way to review things.

oleggromov 17 hours ago

[dead]

mohsen1 1 day ago

in tsz[0] 100% of tests pass yet I have a ton of bugs. I don't think any software out there is fully tested really. I'm experimenting this this idea as well. So far learned a ton.

I'm convinced the future of writing code is heavily LLM assisted

[0] https://tsz.dev