Comment by tyingq
5 years ago
"They’re parsing JSON. A whopping 10 megabytes worth of JSON with some 63k item entries."
Ahh. Modern software rocks.
5 years ago
"They’re parsing JSON. A whopping 10 megabytes worth of JSON with some 63k item entries."
Ahh. Modern software rocks.
Parsing 63k items in a 10 MB json string is pretty much a breeze on any modern system, including raspberry pi. I wouldn't even consider json as an anti-pattern with storing that much data if it's going over the wire (compressed with gzip).
Down a little in the article and you'll see one of the real issues:
> But before it’s stored? It checks the entire array, one by one, comparing the hash of the item to see if it’s in the list or not. With ~63k entries that’s (n^2+n)/2 = (63000^2+63000)/2 = 1984531500 checks if my math is right. Most of them useless.
Check out https://github.com/simdjson/simdjson
More than 3 GB/s are possible. Like you said 10 MB of JSON is a breeze.
The JSON patch took out more of the elapsed time. Granted, it was a terrible parser. But I still think JSON is a poor choice here. 63k x X checks for colons, balanced quotes/braces and so on just isn't needed.
> But I still think JSON is a poor choice here.
It’s an irrelevant one. The json parser from the python stdlib parses a 10Mb json patterned after the sample in a few dozen ms. And it’s hardly a fast parser.
At least parse it into SQLite. Once.
They probably add more entries over time (and maybe update/delete old ones), so you’d have to be careful about keeping the local DB in sync.
So just have the client download the entire DB each time. Can’t be that many megabytes.
1 reply →
I think just using a length encoded serialization format would have made this work reasonably fast.
Or just any properly implemented JSON parser. That's a laughable small amount of JSON, which should easily be parsed in milliseconds.
why not embed node.js to do this efficiently :D