Comment by messe
14 hours ago
Now I'm really curious. What field are you in that ndjson files of that size are common?
I'm sure there are reasons against switching to something more efficient–we've all been there–I'm just surprised.
14 hours ago
Now I'm really curious. What field are you in that ndjson files of that size are common?
I'm sure there are reasons against switching to something more efficient–we've all been there–I'm just surprised.
> Now I'm really curious. What field are you in that ndjson files of that size are common?
I'm not OP,but structured JSON logs can easily result in humongous ndjson files, even with a modest fleet of servers over a not-very-long period of time.
So what's the use case for keeping them in that format rather than something more easily indexed and queryable?
I'd probably just shove it all into Postgres, but even a multi terabyte SQLite database seems more reasonable.
Replying here because the other comment is too deeply nested to reply.
Even if it's once off, some people handle a lot of once-offs, that's exactly where you need good CLI tooling to support it.
Sure jq isn't exactly super slow, but I also have avoided it in pipelines where I just need faster throughput.
rg was insanely useful in a project I once got where they had about 5GB of source files, a lot of them auto-generated. And you needed to find stuff in there. People were using Notepad++ and waiting minutes for a query to find something in the haystack. rg returned results in seconds.
1 reply →
The use case could be e.g. exactly processing an old trove of logs into something more easily indexed and queryable, and you might want to use jq as part of that processing pipeline
4 replies →