Comment by tkejser

6 days ago

To clarify a point: We store the row count of each operator in the query - not the actual row (that would indeed be madness!). Though we DO have tracing you can control that allow you to enable rows for very specific diagnostics - but the stream is massive and you need to opt in for that.

With that clarified, the logging not as large as you might think (see my other response).

Think of web servers - they routinely store much larger log streams than this with metadata about each hit. You rely on that stream to do various forms of web analytics - yet you would not dream of rolling your own - nor worry about the small overhead you are already paying.

Example:

SELECT x, y FROM foo JOIN bar USING (k)

In a query like this, you will have these operators:

SCAN of foo

SCAN of bar

JOIN (on k between Foo and Bar)

Hope that clarifies the point... That's 3 operators, and we log the row counts of each. That in turn allows you answer questions about your data model and how it is being used.

0 comments

tkejser

No comments yet

Contribute on Hacker News ↗