Comment by tyingq

5 years ago

The JSON patch took out more of the elapsed time. Granted, it was a terrible parser. But I still think JSON is a poor choice here. 63k x X checks for colons, balanced quotes/braces and so on just isn't needed.

  Time with only duplication check patch: 4m 30s
  Time with only JSON parser patch:       2m 50s

> But I still think JSON is a poor choice here.

It’s an irrelevant one. The json parser from the python stdlib parses a 10Mb json patterned after the sample in a few dozen ms. And it’s hardly a fast parser.