Comment by Manfred
25 days ago
In my experience you want job parameters to be one, maybe two ids. Do you have a real world example where that is not the case?
25 days ago
In my experience you want job parameters to be one, maybe two ids. Do you have a real world example where that is not the case?
I'm guessing you're with that adding indirection for what you're actually processing, in that case? So I guess the counter-case would be when you don't want/need that indirection.
If I understand what you're saying, is that you'll instead of doing:
- Create job with payload (maybe big) > Put in queue > Let worker take from queue > Done
You're suggesting:
- Create job with ID of payload (stored elsewhere) > Put in queue > Let worker take from queue, then resolve ID to the data needed for processing > Done
Is that more or less what you mean? I can definitively see use cases for both, heavily depends on the situation, but more indirection isn't always better, nor isn't big payloads always OK.
If we take webhook for example.
- Persist payload in db > Queue with id > Process via worker.
Push the payload directly to queue can be tricky. Any queue system usually will have limits on the payload size, for good reasons. Plus if you already commit to db, you can guarantee the data is not lost and can be process again however you want later. But if your queue is having issue, or it failed to queue, you might lost it forever.
> Push the payload directly to queue can be tricky. Any queue system usually will have limits on the payload size, for good reasons.
Is that how microservice messages work? They push the whole data so the other systems can consume it and take it from there?
2 replies →
> I can definitively see use cases for both
Me too, I was just wondering if you have any real world examples of a project with a large payload.
I have been doing this for at least a decade now and it is a great pattern, but think of an ETL pipeline where you fetch a huge JSON payload, store it in the database and then transform it and load it in another model. I had an use case where I wanted to process the JSON payload and pass it down the pipeline before storing it in the useful model. I didn't want to store the intermediate JSON anywhere. I benchmarked it for this specific use case.
...well, that's good for scaling the queue, but this means the worker needs to load all relevant state/context from some DB (which might be sped up with a cache, but then things are getting really complex)
ideally you pass the context that's required for the job (let's say it's less than 100Kbytes), but I don't think that counts as large JSON, but request rate (load) can make even 512byte too much, therefore "it depends"
but in general passing around large JSONs on the network/memory is not really slow compared to writing them to a DB (WAL + fsync + MVCC management)