← Back to context

Comment by Ajedi32

2 months ago

If it's not feasible to audit every single dependency, it's probably even less feasible to rewrite every single dependency from scratch. Avoiding that duplicated work is precisely why we import dependencies in the first place.

Most dependencies do much more than we need from them. Often it means we only need one or a few functions from them. This means one doesn't need to rewrite whole dependencies usually. Don't use dependencies for things you can trivially write yourself, and use them for cases where it would be too much work to write yourself.

  • A brief but important point is that this primarily holds true in the context of rewriting/vendoring utilities yourself, not when discussing importing small vs. large dependencies.

    Just because dependencies do a lot more than you need, doesn't mean you should automatically reach for the smallest dependency that fits your needs.

    If you need 5 of the dozens of Lodash functions, for instance, it might be best to just install Lodash and let your build step shake out any unused code, rather than importing 5 new dependencies, each with far fewer eyes and release-management best practices than the Lodash maintainers have.

    • The argument wasn’t to import five dependencies, one for each of the functions, but to write the five functions yourself. Heck, you don’t even need to literally write them, check the Lodash source and copy them to your code.

      12 replies →

    • I think the level of protection you get from that depends on how the unused code detection interacts with whatever tricks someone is using for malicious code.

  • I agree with this but the problem is that a lot of the extra stuff dependencies do is indeed to protect from security issues.

    If you’re gonna reimplement only thr code you need from a dependency, it’s hard to know of the stuff you’re leaving out how much is just extra stuff you don’t need and how much might be security fixes that may not be apparent to you but the dependency by virtue of being worked upon and used by many people has fixed.

  • I'm using LLMs to write stuff that would normally be in dependencies, mostly because I don't want to learn how to use the dependency, and writing a new one from scratch is really easy with LLMs.

    • Age of bespoke software is here. Did you have any hard to spot non-obvious bugs in these code units?

It isn't feasible to audit every line of every dependency, just as it's not possible to audit the full behavior of every employee that works at your company.

In both cases, the solution is similar: try to restrict access to vital systems only to those you trust,so that you have less need to audit their every move.

Your system administrators can access the server room, but the on-site barista can't. Your HTTP server is trusted enough to run in prod, but a color-formatting library isn't.

  • > It isn't feasible to audit every line of every dependency, just as it's not possible to audit the full behavior of every employee that works at your company.

    Your employees are carefully vetted before hiring. You've got their names, addresses, and social security numbers. There's someone you're able to hold accountable if they steal from you or start breaking everything in the office.

    This seems more like having several random contractors who you've never met coming into your business in the middle of night. Contractors that were hired by multiple anonymous agencies you just found online somewhere with company names like gkz00d or 420_C0der69 who you've also never even spoken to and who have made it clear that they can't be held accountable for anything bad that happens. Agencies that routinely swap workers into or out of various roles at your company without asking or telling you, so you don't have any idea who the person working in the office is, what they're doing, or even if they're supposed to be there.

    "To make thing easier for us we want your stuff to require the use of a bunch of code (much of which does things you don't even need) that we haven't bothered looking at because that'd be too much work for us. Oh, and third parties we have no relationship with control a whole bunch of that code which means it can be changed at any moment introducing bugs and security issues we might not hear about for months/years" seems like it should be a hard sell to a boss or a client, but it's sadly the norm.

    Assuming that something is going to go wrong and trying to limit the inevitable damage is smart, but limiting the amount of untrustworthy code maintained by the whims of random strangers is even better. Especially when the reasons for including something that carries so much risk is to add something trivial or something you could have just written yourself in the first place.

    • > This seems more like having several random contractors who you've never met coming into your business in the middle of night. [...] Agencies that routinely swap workers into or out of various roles at your company without asking or telling you, so you don't have any idea who the person working in the office is, what they're doing, or even if they're supposed to be there.

      Sounds very similar to how global SIs staff enterprise IT contracts.

    • That hit much too close to reality. It's exactly like that. Even the names were spot on!

This is true to the extent that you actually _use_ all of the features of a dependency.

You only need to rewrite what you use, which for many (probably most) libraries will be 1% or less of it

  • Indeed. About 26% of the disk space for a freshly-installed copy of pip 25.2 for Python 3.13 comes from https://pypi.org/project/rich/ (and its otherwise-unneeded dependency https://pypi.org/project/Pygments/), "a Python library for rich text and beautiful formatting in the terminal", hardly any of the features of which are relevant to pip. This is in spite of an apparent manual tree-shaking effort (mostly on Pygments) — a separate installed copy of rich+Pygments is larger than pip. But even with that attempt, for example, there are hundreds of kilobytes taken up for a single giant mapping of "friendly" string names to literally thousands of emoji.

    Another 20% or more is https://pypi.org/project/requests/ and its dependencies — this is an extremely popular project despite that the standard library already provides the ability to make HTTPS connections (people just hate the API that much). One of requests' dependencies is certifi, which is basically just a .pem file in Python package form. The vendored requests has not seen any tree-shaking as far as I can tell.

    This sort of thing is a big part of why I'll be able to make PAPER much smaller.

it's probably even less feasible to rewrite every single dependency from scratch.

When you code in a high-security environment, where bad code can cost the company millions of dollars in fines, somehow you find a way.

The sibling commenter is correct. You write what you can. You only import from trusted, vetted sources.

> If it's not feasible to audit every single dependency, it's probably even less feasible to rewrite every single dependency from scratch.

There is no need to rewrite dependencies. Sometimes it just so happens that a project can live without outputting fancy colorful text to stdout, or doesn't need to spread transitive dependencies on debug utilities. Perhaps these concerns should be a part of the standard library, perhaps these concerns are useless.

And don't get me started on bullshit polyfill packages. That's an attack vector waiting to be exploited.

Its much more feasible these days. These days for my personal projects I just have CC create only a plain html file with raw JS and script links.

One interesting side effect of AI is that it makes it sometimes easy to just recreate the behavior, perhaps without even realizing it..

is it that infeasible with LLMs?

a lor of these dependencies are higher order function definitions, which never change, and could be copy/pasted around just fine. they're never gonna change

"rewrite every single dependency from scratch"

No need to. But also no need to pull in a dependency that could be just a few lines of own (LLM generated) code.

  • >>a few lines of own (LLM generated) code.

    ... and now you've switched the attack vector to a hostile LLM.

    • Sure but that's a one time vector. If the attacker didn't infiltrate the LLM before it generated the code, then the code is not going to suddenly go hostile like an npm package can.

    • Though you will see the code at least, when you are copy pasting it and if it is really only a few lines, you may be able to review it. Should review it of course.

      2 replies →

Sounds like the job for an LLM tool to extract what's actually used from appropriately-licensed OSS modules and paste directly into codebases.

  • Requiring you to audit both security and robustness on the LLM generated code.

    Creating two problems, where there was one.

    • I didn't say generate :) - in all seriousness, I think you could reasonably have it copy the code for e.g. lodash.merge() and paste it into your codebase without the headaches you're describing. IMO, this method would be practical for a majority of npm deps in prod code. There are some I'd want to rely on the lib (and its maintenance over time), but also... a sort function is a sort function.

      4 replies →

  • This is already a thing, compiled languages have been doing this for decades. This is just C++ templates with extra steps.