Comment by micahkepe
5 hours ago
Hey thank you! OP here, yes I was struggling to find large enough documents to run the benchmarks on, the range currently on the benchmark data is ~106 B - ~190MB, which I think covers the majority of quick task workloads, but would love to have large documents, if there's an public ones you can thinking of I'd like to know!
The US government tend to offer big public json document [0], such as crime rates [1], or others.
[0]: https://catalog.data.gov/dataset/?res_format=JSON
[1]: https://catalog.data.gov/dataset/crimes-2001-to-present