Comment by storystarling
12 days ago
Cloning gets you the raw text objects directly. If you scrape the web UI you're dealing with a lot of markup overhead that just burns compute during ingestion. For training data you usually want the structure to be as clean as possible from the start.
Sure, cloning a local copy. But why clone on github?