Comment by rennokki

13 hours ago

> Uses jq for TB json files

> Hadoop: bro

> Spark: bro

> hive: bro

> data team: bro

4 comments

rennokki

eevmanu 8 hours ago

made me remember this article

<https://adamdrake.com/command-line-tools-can-be-235x-faster-...>

  Command-line Tools can be 235x Faster than your Hadoop Cluster (2014)

  Conclusion: Hopefully this has illustrated some points about using and abusing tools like Hadoop for data processing tasks that can better be accomplished on a single machine with simple shell commands and tools.

f311a 11 hours ago

JQ is very convenient, even if your files are more than 100GB. I often need to extract one field from huge JSON line files, I just pipe jq to it to get results. It's slower, but implementing proper data processing will take more time.

anonymoushn 11 hours ago

are those tools known for their fast json parsers?