Comment by vrnvu
2 years ago
one thing I’ve noticed is that many engineers, when they’re looking for a library on Github, they check the last commit time. They think that the more recent the last commit is, the better supported the library is.
But what about an archived project that does exactly what you need it to do, has 0 bugs, and has been stable for years? That’s like finding a hidden gem in a thrift store!
Most engineers I see nowadays will automatically discard a library that is not "constantly" updated... Implying it's a good thing :)
A library can only stay static if the environment it's used in is also static. And many of the environments in which modern software is developed are anything but static, web frontends are one example where things change quite often.
A library that can stand entirely on its own might be fine if it's never updated. But e.g. a library that depends on a web frontend framework will cause trouble if it is not updated to adapt to changes in the ecosystem.
Also, even a very stable project that is "done" will receive a trickle of minor tweak PRs (often docs, tests, and cleanups) proportional to the number of its users, so the rate of change never falls to zero until the code stops being useful.
I think this is also in inverse proportion to the arcane-ness of the intended use of the code, though.
Your average MVC web framework gets tons of these minor contributors, because it's easy to understand MVC well enough to write docs or tests for it, or to clean up the code in a way that doesn't break it.
Your average piece of system software gets some. The Linux kernel gets a few.
But ain't nobody's submitting docs/tests/cleanups for an encryption or hashing algorithm implementation. (In fact, AFAICT, these are often implemented exactly once, as a reference implementation that does things in the same weird way — using procedural abstract assembler-like code, or transpiled functional code, or whatever — that the journal paper describing the algorithm did; and then not a hair of that code is ever touched again. Not to introduce comments; not to make the code more testable; definitely not to refactor things. Nobody ever reads the paper except the original implementor, so nobody ever truly understands what parts of the code are critical to its functioning / hardening against various attacks, so nobody can make real improvements. So it just sits there.)
I disagree. Tiny libraries can be fine indefinitely. For example this little library which inverts a promise in JavaScript.
I haven’t touched this in years and it still works fine. I could come in and update the version of the dependencies but I don’t need to, and that’s a good thing.
https://github.com/josephg/resolvable
1 reply →
> so the rate of change never falls to zero until the code stops being useful
Non-useful software changes all the time ;) Also, Useful software stands still all the time, without any proposed changes.
Even if the environment it's used in is not static, the world it lives in is not static.
I work in industrial automation, which is a slow-moving behemoth full of $20M equipment that get commissioned once and then run for decades. There's a lot of it still controlled with Windows 98 PCs and VB6 messes and PXI cards from the 90s, even more that uses SLC500 PLCs.
But when retrofitting these machines or building new ones, I'll still consider the newness of a tool or library. Modern technology is often lots more performant, and manufacturers typically support products for date-on-market plus 10 years.
There's definitely something to be said for sticking with known good products, but even in static environments you may want something new-ish.
As someone who migrated a somewhat old project to one which uses a newer framework, I agree with this. The amount of time I spent trying to figure out why and old module was broken before realizing that one of it's dependencies was using ESM even though it was still using CJS... I don't even want to think about it. Better to just make sure that a module was written or updated within the last 3 years because that will almost certainly work.
This is a very strange example. Browsers have fantastic backwards compatibility. You can use the same libraries and framework you used ten years ago to make a site and, with very few exceptions, it will work perfectly fine in a modern browser.
Browsers have decent backwards compatibility for regular webpages, but there’s a steady stream of breakage when it comes to more complex content, like games. The autoplay policy changes from 2017-2018, the SharedArrayBuffer trainwreck, gating more and more stuff behind secure contexts, COOP/COEP or other arbitrary nonsense... all this stuff broke actual games out in the wild. If you made one with tools from 10 years ago you would run into at least a couple of these.
Browsers themselves aren't usually the problem. While sometimes they make changes, like what APIs are available without HTTPS, I think you're right about their solid backwards compatibility.
What people really mean when they talk about the frontend is the build system that gets your (modern, TypeScript) source code into (potentially Safari) browsers.
Chrome is highly backwards compatible. Webpack, not so much.
This build system churn goes hand-in-hand with framework churn (e.g. going from Vue 2 to 3, while the team have put heaps of effort into backwards compatibility, is not issue-free), and more recently, the rise of TypeScript and the way the CJS to ESM transition has been handled by tools (especially Node).
The problem arises when you're not using old libraries and frameworks. You're using new stuff, and come across an old, unmaintained library you'd like to use.
Hey, it uses the same frameworks you're using --- except, oh, ten years ago.
Before you can use it, you have to get it working with the versions of those frameworks you're using today.
Someone did that already before you. They sent their patch to the dead project, but didn't get a reply, so nobody knows about it.
Yeah, but there are still thousands (hundreds of thousands?) of games on Newground that you can no longer play without running a separate flash player
You absolutely can do that, but it is likely the final output will have numerous exploitable vulnerabilities.
>> web frontends are one example where things change quite often.
There is a world of difference between linux adding USB support and how web front ends have evolved. One of them feels like they are chasing the latest shiny object...
A VM with a fixed spec can delegate os churn and the like to the VM maintainers and thus protect the managed code authors aka the JVM.
Does it help the dependency ecosystem churn? No.
Until be get very fine grained api versioning info (like at method / function granularity and even then is it good enough and what oss author could maintain that info outside of a small api) then library version info will simply be a coarse grained thing.
If only there was a super smart AI with great breadth of knowledge with capabilities to infer this relationship graph, but I don't think there's a lot of research into AIs like that these day, right?
Even though it’s not strictly true, checking for recent updates is an excellent heuristic. I don’t know the real numbers, but I feel confident that in the overwhelming majority of cases, no recent activity means “abandoned”, not “complete and bug free”.
I remember seeing a bunch of graphs which showed how programming languages have changed over time, and how much of the original code is still there.
It showed that some languages were basically nothing like the 1.0 versions, while others had retained most of the code written and only stuff on top.
In the end, it seems to also be reflected in the community and ecosystem. I remember Clojure being close/at the top of the list as the language hardly does breaking changes anymore, so libraries that last changed 5 years ago, still run perfectly well in the current version of the language.
I guess it helps that it's lisp-like as you can extend the core of the language without changing it upstream, which of course also comes with its own warts.
But one great change it did to me, is stop thinking that "freshness" equals "greatness". It's probably more common I use libraries today that basically stopped changed since some years ago, than I use libraries that were created in the last year. And without major issues.
Depends on the language.
Some languages have releases every year or two where they will introduce some new, elegant syntax (or maybe a new stdlib ADT, etc) to replace some pattern that was frequent yet clumsy in code written in that language. The developer communities for these languages then usually pretty-much-instantly consider use of the new syntax to be "idiomatic", and any code that still does things the old, clumsy way to need fixing.
The argument for making the change to any particular codebase is often that, relative to the new syntax, the old approach makes things more opaque and harder to maintain / code-review. If the new syntax existed from the start, nobody would think the old approach was good code. So, for the sake of legibility to new developers, and to lower the barrier to entry to code contributions, the code should be updated to use the new syntax.
If a library is implemented in such a language, and yet it hasn't been updated in 3+ years, that's often a bad sign — a sign that the developer isn't "plugged into" the language's community enough to keep the library up-to-date as idiomatic code that other developers (many of whom might have just learned the language in its latest form from a modern resource) can easily read. And therefore that the developer maybe isn't interested in receiving external PRs.
I wonder if anyone ever took it scientifically and A/B tested it on a codebase. A community is fine all these years before a change, but afterwards all that instantly becomes a bad practice and loses legibility. I’m confident that it mostly gets done not for any objective result, but because most developers are anxious perfectionists in need of a good therapist. And that’s plague-level contagious. Some people get born into this and grow up being sick.
By zero bugs do you mean zero GitHub issues? Because zero GitHub issues could mean that there are security vulnerabilities but no one is reporting them because the project is marked as abandoned.
> By zero bugs do you mean zero GitHub issues?
Or, the library just have zero bugs. It's possible, although probably pretty uncommon :)
> But what about an archived project that does exactly what you need it to do, has 0 bugs, and has been stable for years? That’s like finding a hidden gem in a thrift store!
Either the library is so trivial to implement myself that I just do that anyway, which doesn't have issues w.r.t maintenance or licensing, or it's unmaintained and there are bugs that won't be fixed because it's unmaintained and now I need to fork and fix it, taking on a legal burden with licensing in addition to maintenance.
Bugs happen all the time for mundane reasons. A transitive dependency updated and now an API has a breaking change but the upstream has security fixes. Compilers updated and now a weird combination of preprocessor flags causes a build failure. And so on.
The idea that a piece of software that works today will work tomorrow is a myth for anything non-trivial, which is why checking the history is a useful smell test.
Consider an at-the-time novel hashing algorithm, e.g. Keccak.
• It's decidedly non-trivial — you'd have to 1. be a mathematician/cryptographer, and then 2. read the paper describing the algorithm and really understand it, before you could implement it.
• But also, it's usually just one file with a few hundred lines of C that just manipulates stack variables to turn a block of memory into another block of memory. Nothing that changes with new versions of the language. Nothing that rots. Uses so few language features it would have compiled the same 40 years ago.
Someone writes such code once; nobody ever modifies it again. No bugs, unless they're bugs in the algorithm described by the paper. Almost all libraries in HLLs are FFI wrappers for the same one core low-level reference implementation.
In practice, this code will use a variety of target-specific optimizations or compiler intrinsics blocked behind #ifdefs that need to be periodically updated or added for new targets and toolchains. If it refers to any kind of OS-specific APIs (like RNG) then it will also need to be updated from time to time as those APIs change.
That's not to say that code can't change slowly, just the idea that it never changes is extremely rare in practice.
Keccak is perhaps not the best example to pick. https://mouha.be/sha-3-buffer-overflow/
I submit math.JS and numeric.JS. Math.JS has an incredibly active community and all sorts of commits numeric. JS is one file of JavaScript and hasn’t had an update in eight years if you want to multiply 2 30 by 30 matrices, numeric.JS works just fine in 2023 and is literally 20 times faster.
I'm checking the zlib changes file [1] and there are regular gaps of years between versions (but there are times where there are a few months between versions). zlib is a very stable library and I doubt the API has changed all that much in 30 years.
[1] https://www.zlib.net/ChangeLog.txt
Most engineers have probably been bitten in the ass by versioned dependencies conflicting with each other.
And the other way, too, with the underlying language's changes making the library stop working.
It's just really unlikely that a project stays working without somewhat-frequent updates.
Good point. I have also seen Great Endeavor 0.7.1 stay there because the author gave up or graduated or got hired and the repo sits incomplete, lacking love and explanation for dismissal.
The Haskell community has a lot of these kinds of libraries. It comes with the territory to some extent.
The GHC project churns out changes at a quite high rate though. The changes are quite small by themselves, but they add up and an abandoned Haskell project is unlikely to be compilable years later.
That's good insight.
One disadvantage of archived repos is that you can't submit issues. For this reason it is hard to assess how bug free the package is. My favorite assessment metric is how long it takes the maintainer(s) to address issues and PRs (or at least post a reply). Sure, it is not perfect and we shouldn't expect all maintainers to be super responsive, but it usually works for me.
Last commit time is a pretty good indicator that the project has someone who still cares enough to regularly maintain it.
I have some projects I consider finished because they already do what I need them to do. If I really cared I'm sure I could find lots of things to improve. Last commit time being years ago is a pretty good indicator that I stopped caring and moved on. That's exactly what happened: my itch's already been scratched and I decided to work on something else because time is short.
I was once surprised to discover a laptop keyboard LED driver I published on GitHub years ago somehow acquired users. Another developer even built a GUI around it which is awesome. The truth is I just wanted to turn the lights off because when I turn the laptop on they default to extremely bright blue. I reverse engineered everything I could but as far as I'm concerned the project's finished. Last commit 4 years ago speaks volumes.
It’s extremely rare to have projects be considered stable for years without any updates. Unless there are no external dependencies, uses very primitive or core language constructs, there’s always updates to be had - security updates, EOLs are common examples. What works in Python 2 might not work in Python 3
Software needs to be maintained. It is ever evolving. I am one of those that will not use a library that has not been updated in the last year, as I do not want to be stuck upgrading it to be compatible with Node 20 when Node 18 EOLs
I chose a .Net library (Zedgraph) about 10 years ago, partly for the opposite reason. It was already known to be "finished", what you might call dead. It reliably does what I want so I don't care about updates. I'm still using the same version today and never had to even think about updating or breakages or anything. It just keeps on working.
Mind you, it's a desktop application not exposed to the internet, so security is a little lower priority than normal.
I'm sort of confused on where your comment is coming from. In the modern world (2023 in case your calendar is stuck in the 90s) we have a massive system of APIs and services that get changed all the time internally.
If a library is not constantly updated then there is a high likely hood (99%) that it just won't work. Many issues raised in git are that something changed and now the package is broken. That's reality sis.
Are you suggesting that all we need to do is use 30 year old languages to free ourselves from this treadmill? That seems like an easy choice!
That's only true for libraries with zero transitive dependencies.
Otherwise you're almost guaranteed to be pulling in un-patched vulnerabilities.
It depends.
A heavily used library, gauged from download stats as reported from package repositories or github star count for example, with low to none open issue count (and even better a high closed issue count) gives me a better feel for the state of a library than it's frequency of updates.
If you are asking yourself, "will this do what it says it will do?" and you are comparing a project that hasn't had any updates in the last 3 years vs one that has seen a constant stream of updates over the last 3 years, which one do you think has a greater probability of doing what it needs to do?
Now I do get your point. There is probably a better metric to use. Like for example, how many people are adding this library to their project and not removing it. But if you don't have that, the number of recent updates to a project that has been around for a long time is probably going to steer you in the right direction more often than not.
I'm generally doing that to check for version compatibility across a much broader spectrum than the level of a single library.