Comment by kibwen

4 months ago

> This is interesting in that I wouldn't expect that the typical resolution involves a particularly large quantity of TOML.

I don't know the details of Python's resolution algorithm, but for Cargo (which is where epage is coming from) a lockfile (which is encoded in TOML) can be somewhat large-ish, maybe pushing 100 kilobytes (to the point where I'm curious if epage has benchmarked to see if lockfile parsing is noticeable in the flamegraph).

8 comments

kibwen

pnt12 4 months ago

But once you have a lock file there is no resolution needed, is there? It lists all needed libs and their versions. Given how toml is written, I imagine you can read it incrementally - once a lib section is parsed, you can download it in parallel, even if you didn't parse the whole file yet.

(not sure how uv does it, just guessing what can be done)

epage 4 months ago

For Cargo,
- synchronization operations are implicit so we need to re-resolve to confirm the lockfile is still valid. We could take some short cut but it would require re-implementing some logic
- dependency resolution only uses `Cargo.toml` for local and git dependencies. Registry dependencies have a json summary of what content is relevant for dependency resolution. Cargo parses nearly every locked package's `Cargo.toml` to know how to build it.
TheDong 4 months ago
For whatever it's worth, the toml library uv uses doesn't support streaming parsing: https://github.com/toml-rs/toml/issues/326
- kibwen 4 months ago
  
  I'm not sure if it even makes sense for a TOML file to be "read incrementally", because of the weird feature of TOML (inherited from INI conventions) that allow tables to be defined in a piecemeal, out-of-order fashion. Here's an example that the TOML spec calls "valid, but discouraged":
  [fruit.apple] [animal] [fruit.orange]
  So the only way to know that you have all the keys in a given table is to literally read the entire file. This is one of those unfortunate things in TOML that I would honestly ignore if I were writing my own TOML parser, even if it meant I wasn't "compliant".
  
  2 replies →
- epage 4 months ago
  
  TOML as a format doesn't make sense for streaming
  - Tables can be in any order, independent of heirarchy
  - keys can be dotted, creating subtables in any order
  On top of that, most use cases for the format are not benefitted by streaming.

epage 4 months ago

Lockfiles aren't an issue. It is all the dependencies themselves.