← Back to context

Comment by erispoe

1 month ago

> it shouldn’t be the same as the production database

Why is that?

7 comments

erispoe

Reply

gregors 1 month ago

Here's an example from the circleci incident

https://status.circleci.com/incidents/hr0mm9xmm3x6

and a good analysis by a flicker engineer who ran into similar issues

https://blog.mihasya.com/2015/07/19/thoughts-evoked-by-circl...

davidw 1 month ago
CircleCI and Flickr are both pretty big systems. There are tons of businesses that will never operate at that scale.
- gregors 1 month ago
  
  I don't disagree with that call out. However, we've been through these discussions many times over the years. The solid queue of yesteryear was delayed_job which was originally created by Shopify's CEO.
  https://github.com/tobi/delayed_job
  Shopify however grew (as many others) and we saw a host of blog posts and talks about moving away from DB queues to Redis, RabbitMQ, Kafka etc. We saw posts about moving from Resque to SideKiq etc. All this to day storing a task queue in the db has always been the naive approach. Engineers absolutely shouldn't be shocked that approach isn't viable at higher workloads.

zarzavat 1 month ago

If you need to restore the production database do you also want to restore the task database?

If your task is to send an email, do you want to send it again? Probably not.

stavros 1 month ago
It's not like I'll get a choice between the task database going down and not going down. If my task database goes down, I'm either losing jobs or duplicating jobs, and I have to pick which one I want. Whether the downtime is at the same time as the production database or not is irrelevant.
In fact, I'd rather it did happen at the same time as production, so I don't have to reconcile a bunch of data on top of the tasks.
- zarzavat 1 month ago
  
  Right, I was referring to logical databases rather than the database server itself.
  
  1 reply →