Comment by brynary
17 hours ago
Strong agreement with everything in this post.
At Qlty, we are going so far as to rewrite hundreds of thousands of lines of code to ensure full test coverage, end-to-end type checking (including database-generated types).
I’ll add a few more:
1. Zero thrown errors. These effectively disable the type checker and act as goto statements. We use neverthrow for Rust-like Result types in TypeScript.
2. Fast auto-formatting and linting. An AI code review is not a substitute for a deterministic result in sub-100ms to guarantee consistency. The auto-formatter is set up as a post-tool use Claude hook.
3. Side-effect free imports and construction. You should be able to load all the code files and construct an instance of every class in your app without a network connection spawning. This is harder than it sounds and without it you run into all sorts of trouble with the rest.
3. Zero mocks and shared global state. By mocks, I mean mocking frameworks which override functions on existing types or global. These effectively are injecting lies into the type checker.
Should put to tsgo which has dramatically lowered our type checking latency. As the tok/sec of models keeps going up, all the time is going to get bottlenecked on tool calls (read: type checking and tests).
With this approach we now have near 100% coverage with a test suite that runs in under 1,000ms.
A TypeScript test suite that offers 100% coverage of "hundreds of thousands" of lines of code in under 1 second doesn't pass the sniff test.
We're at 100k LOC between the tests and code so far, running in about 500-600ms. We have a few CPU intensive tests (e.g. cryptography) which I recently moved over to the integration test suite.
With no contention for shared resources and no async/IO, it just function calls running on Bun (JavaScriptCore) which measures function calling latency in nanoseconds. I haven't measured this myself, but the internet seems to suggest JavaScriptCore function calls can run in 2 to 5 nanoseconds.
On a computer with 10 cores, fully concurrent, that would imply 10 billion nanoseconds of CPU time in one wall clock second. At 5 nanoseconds per function call, that would imply a theoretical maximum of 2 billion function calls per second.
Real world is not going to be anywhere close to that performance, but where is the time going otherwise?
Hey now he said 1,000ms, not 1 second
I‘m on the same page as you, I‘m investing into DX and test coverage and quality tooling like crazy.
But the weird thing is: those things have always been important to me.
And it has always been a good idea to invest in those, for my team and me.
Why am doing this 200% now?
If you're like me you're doing it to establish a greater level of trust in generated code. It feels easier to draw out the hard guard-rails and have something fill out the middle -- giving both you, and the models, a reference point or contract as to what's "correct"
Answering myself: maybe I feel much more urgency and motivation for this in the age of AI because the effects can be felt so much more acute and immediately.
Because a) the benefits are bigger, and b) the effort is smaller. When something gets cheaper and more valuable, do more of it.
For me it's because coworkers are pumping out horrible slop faster than ever before.