Comment by TheFlyingFish

2 months ago

I've worried about this for a while with Rust packages. The total size of a "big" Rust project's dependency graph is pretty similar to a lot of JS projects. E.g. Tauri, last I checked, introduces about 600 dependencies just on its own.

Like another commenter said, I do think it's partially just because dependency management is so easy in Rust compared to e.g. C or C++, but I also suspect that it has to do with the size of the standard library. Rust and JS are both famous for having minimal standard libraries, and what do you know, they tend to have crazy-deep dependency graphs. On the other hand, Python is famous for being "batteries included", and if you look at Python project dependency graphs, they're much less crazy than JS or Rust. E.g. even a higher-level framework like FastAPI, that itself depends on lower-level frameworks, has only a dozen or so dependencies. A Python app that I maintain for work, which has over 20 top-level dependencies, only expands to ~100 once those 20 are fully resolved. I really think a lot of it comes down to the standard library backstopping the most common things that everybody needs.

So maybe it would improve the situation to just expand the standard library a bit? Maybe this would be hiding the problem more than solving it, since all that code would still have to be maintained and would still be vulnerable to getting pwned, but other languages manage somehow.

64 comments

TheFlyingFish

wongarsu 2 months ago

I wouldn't call the Rust stdlib "small". "Limited" I could agree with.

On the topics it does cover, Rust's stdlib offers a lot. At least on the same level as Python, at times surpassing it. But because the stdlib isn't versioned it stays away from everything that isn't considered "settled", especially in matters where the best interface isn't clear yet. So no http library, no date handling, no helpers for writing macros, etc.

You can absolutely write pretty substantial zero-dependency rust if you stay away from the network and async

Whether that's a good tradeoff is an open question. None of the options look really great

WD-42 2 months ago
Rand, uuid, and no built in logging implementation are three examples that require crates but probably shouldn’t.
- robot-wrangler 2 months ago
  
  No built in logging seems pretty crazy. Is there a story behind that?
  
  7 replies →
galangalalgol 2 months ago
Network without async works fine in std. However, rand, serde, and num_traits always seem to be present. Not sure why clap isn't std at this point.
- wongarsu 2 months ago
  
  Clap went through some major redesigns with the 4.0 release just three years ago. That wouldn't have been possible if clap 2.0 or 3.0 had been added to the stdlib. It's almost a poster child for things where libraries where being outside the stdlib allows interface improvements (date/time handling would be the other obvious example).
  Rand has the issue of platform support for securely seeding a secure rng, and having just an unsecure rng might cause people to use it when they really shouldn't. And serde is near-universal but has some very vocal opponents because it's such a heavy library. I have however often wished that num_traits would be in the stdlib, it really feels like something that belongs in there.
  
  1 reply →
- TheDong 2 months ago
  
  > Not sure why clap isn't std at this point.
  The std has stability promises, so it's prudent to not add things prematurely.
  Go has the official "flag" package as part of the stdlib, and it's so absolutely terrible that everyone uses pflag, cobra, or urfave/cli instead.
  Go's stdlib is a wonderful example of why you shouldn't add things willy-nilly to the stdlib since it's full of weird warts and things you simply shouldn't use.
  
  4 replies →
- Ygg2 2 months ago
  
  > why clap isn't std at this point.
  Too big for many cases, there is also a lot of discussion around whether to use clap, or something smaller.
  
  1 reply →
SAI_Peregrinus 2 months ago
> But because the stdlib isn't versioned
I honestly feel like that's one of Rust's biggest failings. In my ideal world libstd would be versioned, and done in such a way that different dependencies could call different versions of libstd, and all (sound/secure) versions would always be provided. E.g. reserve the "std" module prefix (and "core", and "alloc"), have `cargo new` default to adding the current std version in `cargo.toml`, have the prelude import that current std version, and make the module name explicitly versioned a la `std1::fs::File`, `std2::fs::File`. Then you'd be able to type `use std1::fs::File` like normal, but if you wanted a different version you could explicitly qualify it or add a different `use` statement. And older libraries would be using older versions, so no conflicts.
- Zettroke 2 months ago
  
  I'm afraid it won't work. The point of std lib is to be universal connection for all the libraries. But with versioned std I just can't see how can you have DateTime in std1, DateTime in std2 and use them interchangeably, for example being able to pass std2::DateTime to library depending on std1 etc. Maybe conversion methods, but it get really complicated really quickly
atherton94027 2 months ago

> On the topics it does cover, Rust's stdlib offers a lot. At least on the same level as Python, at times surpassing it.
Curious, do you have specific examples of that?
ghurtado 2 months ago

> if you stay away from the network and async
That's some "small print" right there.

QuiEgo 2 months ago

It's already happening: https://cyberpress.org/malicious-rust-packages/

My personal experience (YMMV): Rust code takes 2x or 3x longer to write than what came before it (C in my case), but in the end you usually get something much more likely to work, so overall it's kind of a wash, and the product you get is better for customers - you basically front load the cost of development.

This is terrible for people working in commercial projects that are obsessed with time to market.

Rust developers on commercial projects are under incredible schedule pressure from day 0, where they are compared to expectations from their previous projects, and are strongly motivated to pull in anything and everything they can to save time, because re-rolling anything themselves is so damn expensive.

windward 2 months ago
In my experience Rust development is no slower than C development (in a different environment) or C++ development (in a comparable project)
- ghurtado 2 months ago
  
  I think they were using "writing Rust" in the most strict sense: the part of the development cycle that involves typing the majority of the code, before you really start debugging in earnest and really make things work.
  But their point is that "developing Rust" (as in, the entire process) ends up being a similar total effort to C, only with more up front "writing" and less work on the debugging phase.
  
  8 replies →

kibwen 2 months ago

> Rust and JS are both famous for having minimal standard libraries

I'm all in favor of embiggening the Rust stdlib, but Rust and JS aren't remotely in the same ballpark when it comes to stdlib size. Rust's stdlib is decidedly not minimal; it's narrow, but very deep for what it provides.

skydhash 2 months ago

C standard library is also very small. The issue is not the standard library. The issue is adding libraries for snippets of code, and in the name of convenience, let those libraries run code on the dev machine.

api 2 months ago
The issue is that our machines run 1970s OSes with a very basic security model, and are themselves so complex that they’re likely loaded with local privilege escalation attack vectors.
Doing dev in a VM can help, but isn’t totally foolproof.
- skydhash 2 months ago
  
  It’s a good security model because everyone has the decency to follow a pull model. Like “hey, I have this thing, you can get it if you’re interested”. You decide the amount of trust you give to someone.
  But NPM is more like “you’ve added me to your contact list, then it’s totally fine for me to enter your bedroom at night and wear your lingerie because we’re already BFF”. It’s “I’m doing whatever I want on your computer because I know best and you’re dumb” mentality that is very prevalent.
  It’s like how zed (the editor) wants to install node.js and whatever just because they want to enable LSP. The sensible approach would have been to have a default config that relies on $PATH to find the language server.
  
  1 reply →

metaltyphoon 2 months ago

This is a reason why so many enterprises use C#. Most of the time you just use Microsoft made libraries and rarely brings in 3rd party.

latentsea 2 months ago
Having worked on four different enterprise grade C# codebases, they most certainly have plenty of 3rd party dependencies. It would absolutely be the exception to not have 3rd party dependencies.
- HighGoldstein 2 months ago
  
  Yes, but the 3rd party dependencies tend to be conveniences rather than foundational. Easier mapping, easier mocking, easier test assertions, so a more security minded company can very easily just disallow their use without major impact. If it's something foundational to your project then what you're doing is probably somewhat niche. Most of the time there's some dependency from Microsoft that's rarely worse enough to justify using the 3rd party one.
  
  2 replies →
pasc1878 2 months ago
Or purchase third party libraries. This does two things - limits what you drag in and also if you drag it in you can sue someone for errors.
- skeeter2020 2 months ago
  
  This definitely not why enterprise "chooses" C# and neither of these were design decisions like implied. MS would have loved to have the explosive, viral ecosystem of Node earlier in .NET's life. Regardless a lot of companies using C# still use node-based solutions on the web so a insular development environment for one tier doesn't protect them.
  
  2 replies →

gorgoiler 2 months ago

And yet of course the world and their spouse import requests to fetch a URL and view the body of the response.

It would be lovely if Python shipped with even more things built in. I’d like cryptography, tabulate/rich, and some more featureful datetime bells and whistles a la arrow. And of course the reason why requests is so popular is that it does actually have a few more things and ergonomic improvements over the builtin HTTP machinery.

Something like a Debian Project model would have been cool: third party projects get adopted into the main software product by a sworn-in project member who who acts as quality control / a release manager. Each piece of software stays up to date but also doesn’t just get its main branch upstreamed directly onto everyone’s laps without a second pair of eyes going over what changed. The downside is it slows everything down, but that’s a side-effect of, or rather a synonym for stability, which is the problem we have with npm. (This looks sort of like what HelixGuard do, in the original article, though I’ve not heard of them before today.)

TheFlyingFish 2 months ago
Requests is a great example of my point, actually. Creating a brand-new Python venv and running `uv add requests` tells me that a total of 5 packages were added. By contrast, creating a new Rust project and running `cargo add reqwest` (which is morally equivalent to Python's `requests`) results in adding 160 packages, literally 30x as many.
I don't think languages should try to include _everything_ in their stdlib, and indeed trying to do so tends to result in a lot of legacy cruft clogging up the stdlib. But I think there's a sweet spot between having a _very narrow_ stdlib and having to depend on 160 different 3rd-party packages just to make a HTTP request, and having a stdlib with 10 different ways of doing everything because it took a bunch of tries to get it right. (cf. PHP and hacks like `mysql_real_escape_string`, for example.)
Maybe Python also has a historical advantage here. Since the Internet was still pretty nascent when Python got its start, it wasn't the default solution any time you needed a bit of code to solve a well-known problem (I imagine, at least; I was barely alive at that point). So Python could afford to wait and see what would actually make good additions to the stdlib before implementing them.
Compare to Rust which _immediately_ had to run gauntles like "what to do about async", with thousands of people clamoring for a solution _right now_ because they wanted to do async Rust. I can definitely sympathize with Rust's leadership wanted to do the absolute minimum required for async support while they waited for the paradigm to stabilize. And even so, they still get a lot of flak for the design being rushed, e.g. with `Pin`.
So it's obviously a difficult balance to strike, and maybe the solution isn't as simple as "do more in the stdlib". But I'd be curious to see it tried, at least.
- Chris_Newton 2 months ago
  
  IMHO, the ideal for package management in a programming language ecosystem might recognise multiple levels of “standardisation”.
  At the top, you have the true standard library for the language. This has very strong stability guarantees. Its purpose is twofold: to provide universal implementations of essentials and to define standard/baseline interfaces for common needs like abstract data types, relational databases, networking and filesystems to encourage compatibility and portability.
  Next, you have a tier of recognised but not yet fully standardised libraries. These might be contributed by third parties, but they have requirements for identifying maintainers, appropriate licensing and mandatory peer review of all contributions. They have a clear versioning policy and can make breaking changes in new major releases, but they also provide some stability guarantees along the lines of semver and older releases are normally available indefinitely. The purpose of this tier is to provide a wider range of functionality and/or alternative implementations, but in a relatively stable way and implementing standard interfaces where applicable to improve portability.
  Finally, you have the free-for-all, anyone-can-contribute tier. This should still have a sane security model where people can’t just upload malware scripts that run automatically just because someone installed a package. However, it comes with few guarantees about stability or compatibility, except that releases of published packages will be available indefinitely unless there’s a very good reason to pull them where you obviously wouldn’t want to use one anyway. A package you like might be written by a single contributor who no longer maintains it, but if someone does write something useful that simply doesn’t need any further maintenance once it’s finished and does its job, there is still a place to share it.
  
  4 replies →
- afdbcreid 2 months ago
  
  That's not an apple-to-apple comparison, since Rust is a low-level language, and also because `reqwest` builds on top of `tokio`, an async runtime, and `hyper`, which is also a HTTP server, not just a HTTP client. If you check `ureq`, a synchronous HTTP client, it only adds 43 packages. Still more, but much less.
  
  2 replies →
- ghurtado 2 months ago
  
  > (cf. PHP and hacks like `mysql_real_escape_string`, for example.)
  PHP is a fantastic resource to learn how to do proper backward compatibility and package management. By doing the exact opposite of whatever PHP does, mostly.

moomin 2 months ago

It might solve the problem, in as much as the problem is that not only can it be done, but it’s profitable to do so. This is why there’s no Rust problem (yet).

mx7zysuj4xew 2 months ago

It won't, it's a culture issue

Most rust programmers are mediocre at best and really need the memory safety training wheels that rust provides. Years of nodejs mindrot has somehow made pulling into random dependencies irregular release schedules to become the norm for these people. They'll just shrug it off come up with some "security initiative* and continue the madness

capyba 2 months ago

I only personally know one Rust programmer (works in scientific HPC) and he’s fantastic, but in general I do get the sense that most Rust devs migrated from JS and are just now figuring out “omg strong typing and compiled code native to the client hardware is really nice!” and think it’s a ground breaking revelation.
Saying this as someone who is cautiously optimistic about Rust for my own work.