Comment by memmel
1 day ago
I'm in the same boat - I decided this week for DVC over LFS.
For me, the deciding factor was that with LFS, if you want to delete objects from storage, you have to rewrite git history. At least, that's what both the Github and Gitlab docs specify.
DVC adds a layer of indirection, so that its structure is not directly tied to git. If I change my mind and delete the objects from S3, dvc might stop working, but git will be fine.
Some extra pluses about DVC: - It can point to versioned S3 objects that you might already have as part of existing data pipelines. - It integrates with the Python fsspec library to read the files on demand using paths like "dvc://path/to/file.parquet". This feels nicer than needing to download all the files up front.
No comments yet
Contribute on Hacker News ↗