← Back to context

Comment by mnahkies

3 days ago

That was difficult to read, smelt very AI assisted though the message was worthwhile, it could've been shorter and more to the point.

A few things I've been thinking about recently:

- we have authentication everywhere in our stack, so I've started including the user id on every log line. This makes getting a holistic view of what a user experienced much easier.

- logging an error as a separate log line to the request log is a pain. You can filter for the trace, but it makes it hard to surface "show me all the logs for 5xx requests and the error associated" - it's doable, but it's more difficult than filtering on the status code of the request log

- it's not enough to just start including that context, you have to educate your coworkers that it's now present. I've seen people making life hard for themselves because they didn't realize we'd added this context

On the other hand, investing in better tracing tools unlocks a whole nother level of logging and debugging capabilities that aren't feasible with just request logs. It's kind of like you mentioned with using the user id as a "trace" in your first message but on steroids.

  • These tools tend to be very expensive in my experience unless you are running your own monitoring cloud. Either you end up sampling traces at low rates to save on costs, or your observability bill is more than your infrastructure bill.

    • We self host Grafana Tempo and whilst the cost isn’t negligible (at 50k spans per second), the money saved in developer time when debugging an error, compared to having to sift through and connect logs, is easily an order of magnitude higher.

    • Doing stuff like turning on tracing for clients that saw errors in the last 2 minutes, or for requests that were retried should only gather a small portion of your data. Maybe you can include other sessions/requests at random if you want to have a baseline to compare against.

    • I like to write them on my own in every company Im in using bash. So I have a local set of bash commands to help me figure out logs and colorize the items I want to.

      Takes some time and its a pain in the ass initially, but once I've matured them - work becomes so much more easy. Reduces dependability on other people / teams / access as well.

      Edit: Thinking about this, they wont work in other use cases. Im a data engineer so my jobs are mostly sequential.

    • Try open-source databases specially designed for traces, such as Grafana Tempo or VictoriaTraces. They can handle the data ingestion rate of hundreds of thousands trace spans per second on a regular laptop.

If your codebase has the concept of a request ID, you could also feasibly use that to trace what a user has been doing with more specificity.

  • …and the same ID can be displayed to user on HTTP 500 with the support contact, making life of everyone much easier.

    • I have seen pushback on this kind of behavior because "users don't like error codes" or other such nonsense. UX and Product like to pretend nothing will ever break, and when it does they want some funny little image, not useful output.

      A good compromise is to log whenever a user would see the error code, and treat those events with very high priority.

      6 replies →

  • We do have both a span id and trace id - but I personally find this more cumbersome over filtering on a user id. YMMV if you're interested in a single trace then you'd filter for that, but I find you often also care what happened "around" a trace

  • If you care about this more than anything else (e.g. if you care about audits a LOT and need them perfect), you can simply code the app via action paths, rather than for modularity. It makes changes harder down the road, but for codebases that don’t change much, this can be a viable tradeoff to significantly improve tracing and logging.

  • ...if it does not, you should add it. A request ID, trace ID, correlation key, whatever you call it, you should thread it through every remote call, if you value your sanity.

TIDs are good here too. If you generate it and enforce it across all your services spanning various teams and APIs anyone of any team can grab a TID you provide and you can get the full end to end of one transaction.

> - we have authentication everywhere in our stack, so I've started including the user id on every log line. This makes getting a holistic view of what a user experienced much easier.

Depends on the service, but tracking everything a user does may not be an option in terms of data retention laws

Wow, I didn't think this was badly written at all! I certainly don't think it smells like AI. Are you conflating lists with AI written prose?

> That was difficult to read, smelt very AI assisted though the message was worthwhile...

It won’t be long before ad computem comments like this are frowned upon.

  • Why? "This was written badly" is a perfectly normal thing to say; "this was written badly because you didn't put in the effort of writing it yourself" doubly so.

    • Say they used AI to write it, it came out bad, and they published it anyway. They had the opportunity to "make it better" before publishing, but didn't. The only conclusion for this is, they just aren't good at writing. So whether AI is used or not, it'll suck either way. So there's no need to complain about the AI.

      It's like complaining that somebody typed a crappy letter rather than hand-wrote it. Either way the letter's gonna suck, so why complain that it was typed?

      5 replies →

  • I read it as a more-or-less kind comment: “even though you’ll notice that they let an AI make the writing terrible, the underlying point is good enough to be worth struggling through that and discussing”

  • I felt unsure whether to include that particular comment, but landed on including because I think it's a real danger. I've got no problem with people using AI and do use it for some things myself.

    However I don't think you should outsource understanding to LLMs, and also think that shifting the effort from the writer to the reader is a poor strategy (and disrespectful to the reader)

    edit: in case it's unclear I'm not accusing the author of having outsourced their understanding to AI, but I think it's a real risk that people can fall into, the value is in the thinking people put into things not the mechanics of typing it out