Having built a ticketing system that sold some Oasis level concerts there's a few misconceptions here:
Selling an event out takes a long time to do frequently because tickets are VERY frequently not purchased--they're just reserved and then they fall back into open seating. This is done by true fans, but also frequently by bots run by professional brokers or amateur resellers. And Cloudflare and every other state of the art bot detection platform doesn't detect them. Hell, some of the bots are built on Cloudflare workers themselves in my experience...
So whatever velocity you achieve in the lab--in the real world you'll do a fraction of it when it comes to actual purchases. That depends upon the event really. Events that fly under the radar may get you a higher actual conversion rate.
Also, an act like Oasis is going to have a lot of reserved seating. Running through algorithms to find contiguous seats is going to be tougher than this example and it's difficult to parallelize if you're truly giving the next person in the queue the actual best seats remaining.
There are many other business rules that accrue after years of features to win Oasis like business unfortunately that will result in more DB calls and add contention.
> Selling an event out takes a long time to do frequently because tickets are VERY frequently not purchased--they're just reserved and then they fall back into open seating.
TigerBeetle actually includes native support for "two phase pending transfers" out of the box, to make it easy to coordinate with third party payment systems while users have inventory in their cart:
> Also, an act like Oasis is going to have a lot of reserved seating. Running through algorithms to find contiguous seats is going to be tougher than this example and it's difficult to parallelize if you're truly giving the next person in the queue the actual best seats remaining.
It's actually not that hard (and probably easier) to express this in TigerBeetle using transfers with deterministic IDs. For example, you could check (and reserve) up to 8K contiguous seats in a single query to TigerBeetle, with a P100 less than 100ms.
> There are many other business rules that accrue after years of features to win Oasis like business unfortunately that will result in more DB calls and add contention.
As you move "the data to the code" in interactive transactions with multiple queries, to process more and more business rules, you're holding row locks across the network. TigerBeetle's design inverts this, to move "the code to the data" in declarative queries, to let the DBMS enforce the transactional business rules directly in the database, with a rich set of debit/credit primitives and audit trail.
Agree with the above, we built and run a ticketing platform, the actual transaction of purchasing the ticket at the final step in the funnel is not the bottleneck.
The shopping process and queuing process puts considerably more load on our systems than the final purchase transaction, which ultimately is constrained by the size of the venue, which we can control by managing the queue throughput.
Even with a queue system in place, you inevitably end up with the thundering heard problem when ticket sales open, as a large majority of users will refresh their browsers regardless of instructions to the contrary
You would use TigerBeetle for everything: not only the final purchase transaction, but the shopping cart process, inventory management and queuing/reserving.
In other words, to count not only the money changing hands, but also the corresponding goods/services being exchanged.
These are all transactions: goods/services and the corresponding money.
Does that mean that there is some smoke and mirrors when, eg Taylor Swift, says they sold out the concert in minutes? Or are the mega acts truly that high demand?
You can get the seats into "baskets" (reserved) in minutes. In my experience they will not sell out for some time as they usually keep dropping back into inventory. "Sold Out" is a matter of opinion. There are usually lots of single seats left sometimes for weeks or months. The promoter decides when to label the event as "sold out".
I recently did performance testing of Tigerbeetle for a financial transactions company. The key thing to understand about Tigerbeetle's speed is that it achieves very high speeds through batching transactions.
----
In our testing:
For batch transactions, Tigerbeetle delivered truly impressive speeds: ~250,000 writes/sec.
For processing transactions one-by-one individually, we found a large slowdown: ~105 writes/sec.
This is much slower than PostgreSQL, which row updates at ~5495 sec. (However, in practice PostgreSQL row updates will be way lower in real world OLTP workloads due to hot fee accounts and aggregate accounts for sub-accounts.)
One way to keep those faster speeds in Tigerbeetle for real-time workloads is microbatching incoming real-time transactions to Tigerbeetle at an interval of every second or lower, to take advantage of Tigerbeetle's blazing fast batch processing speeds. Nonetheless, this remains an important caveat to understand about its speed.
> One way to keep those faster speeds in Tigerbeetle for real-time workloads is microbatching incoming real-time transactions to Tigerbeetle at an interval of every second or lower, to take advantage of Tigerbeetle's blazing fast batch processing speeds.
We don’t recommend artificially holding transfers just for batching purposes.
René actually had to implement a batching worker API to work around a limitation in Python’s FastAPI, which handled requests per process, and he’s been very clear in suggesting that such would be better reimplemented in Go.
Unlike most connection-oriented database clients, the TigerBeetle client doesn’t use a connection pool, because there’s no concept of a “connection” in TigerBeetle’s VSR protocol.
This means that, although you can create multiple client instances, in practice less is better. You should have a single long-lived client instance per process, shared across tasks, coroutines, or threads (think of a web server handling many concurrent requests).
In such a scenario, the client can efficiently pack multiple events into the same request, while your application logic focuses solely on business-event-oriented chains of transfers. Typically, each business event involves only a handful of transfers, which isn't a problem of underutilization, as they'll be submitted together with other concurrent events as soon as possible.
However, if you’re dealing with a non-concurrent workload, for example, a batch process that bills thousands of customers for their monthly invoices, then you can simply submit all transfers at once.
> For processing transactions one-by-one individually
If you're artificially restricting the load going into TigerBeetle, by sending transactions in one-by-one individually, then I think predictable latency (and not TPS) would be a better metric.
For example, TB's multi-region/multi-AZ fault-tolerance will work around gray failure (fail slow of hardware, as opposed to fail stop) in your network links or SSDs. You're also getting significantly stronger durability guarantees with TB [0][1].
It sounds like you were benchmarking on EBS? We recommend NVMe. We have customers running extremely tight 1 second SLAs, seeing microsecond latencies, even for one at a time workloads. Before TB, they were bottlenecking on PG. After TB, they saturated their central bank limit.
I would also be curious to what scale you tested? We test TB to literally 100 billion transactions. It's going to be incredibly hard to replicate that with PG's storage engine. PG is a great string DBMS but it's simply not optimized for integers the way TB is. Granted, your scale likely won't require it, but if you're comparing TPS then you should at least compare sustained scale.
There's also the safety factor of trying to reimplement TB's debit/credit primitives over PG to consider. Rolling it yourself. For example, did you change PG's defaults away from Read-Committed to Serializable and enable checksums in your benchmarks? (PG's checksums, even if you enable them, are still not going to protect you from misdirected I/O like the recent XFS bug.) Even the business logic is deceptively hard, there are thousands of lines of complicated state machine code, and we've invested literally millions into testing and audits.
Finally, it's important that your architecture as a whole, the gateways around TB, designs for concurrency first class, and isn't "one at a time", or TigerBeetle is probably not going to be your bottleneck.
We didn't observe any automatic batching when testing Tigerbeetle with their Go client. I think we initiated a new Go client for every new transaction when benchmarking, which is typically how one uses such a client in app code. This follows with our other complaint: it handles so little you will have to roll a lot of custom logic around it to batch realtime transactions quickly.
We didn't rule out using Tigerbeetle, but the drop in non-batch performance was disappointing and a reason we haven't prioritised switching our transaction ledger from PostgreSQL to Tigerbeetle.
There was also poor Ruby support for Tigerbeetle at the time, but that has improved recently and there is now a (3rd party) Ruby client: https://github.com/antstorm/tigerbeetle-ruby/
It seems to me that, in practice, you'd want the "LiveBatcher" to have some durability as well. Is there a scenario where a customer could lose their place because of a horribly timed server shutdown, where those transfers hadn't even been sent to TigerBeetle as pending yet? Or am I misunderstanding the architecture here?
Edit: Yes, I think I misunderstood something here. The user wouldn't even see their request as having returned a valid "pending" ticket sale since the batcher would be active as the request is active. The request won't return until its own transfer had been sent off to TigerBeetle as pending.
The short answer is that we tried, back in 2020, while working on a central bank payment switch by the Gates Foundation. We found we were hitting the limits of Amdahl's Law, given Postgres' concurrency control with row locks held across the network as well as internal I/O, leading to the design of TigerBeetle. To specialize not for general purpose but only for transaction processing.
On the one hand, yes, you could use a general purpose string database to count/move integers, up to a certain scale. But a specialized integer database like TigerBeetle can take you further. It's the same reason, that yes, you could use Postgres as object storage or as queue, or you could use S3 and Kafka and get separation of concerns in your architecture.
I did a talk diving into all this recently, looking at the power law, OLTP contention, and how this interacts with Amdahl's Law and Postgres and TigerBeetle: https://www.youtube.com/watch?v=yKgfk8lTQuE
i am not an exact expert on the limitation you claim to have encountered on postgresql but perhaps someone with more postgresql expertise can chime in on this comment and give some insight
Yes, that's why I would expect it to smoke Postgres here, in process is orders of magnitude faster. Do you really need concurrency here when you can do 10-100k+ inserts per second?
Also surprised. My yardstick was this post which showed SQLite beating Postgres in a Django app. Benchmarking is hard, and the author said the Postgres results were not tuned to the same degree as SQLite, so buyer beware.
https://blog.pecar.me/django-sqlite-benchmark
Having built a ticketing system that sold some Oasis level concerts there's a few misconceptions here:
Selling an event out takes a long time to do frequently because tickets are VERY frequently not purchased--they're just reserved and then they fall back into open seating. This is done by true fans, but also frequently by bots run by professional brokers or amateur resellers. And Cloudflare and every other state of the art bot detection platform doesn't detect them. Hell, some of the bots are built on Cloudflare workers themselves in my experience...
So whatever velocity you achieve in the lab--in the real world you'll do a fraction of it when it comes to actual purchases. That depends upon the event really. Events that fly under the radar may get you a higher actual conversion rate.
Also, an act like Oasis is going to have a lot of reserved seating. Running through algorithms to find contiguous seats is going to be tougher than this example and it's difficult to parallelize if you're truly giving the next person in the queue the actual best seats remaining.
There are many other business rules that accrue after years of features to win Oasis like business unfortunately that will result in more DB calls and add contention.
> Selling an event out takes a long time to do frequently because tickets are VERY frequently not purchased--they're just reserved and then they fall back into open seating.
TigerBeetle actually includes native support for "two phase pending transfers" out of the box, to make it easy to coordinate with third party payment systems while users have inventory in their cart:
https://docs.tigerbeetle.com/coding/two-phase-transfers/
> Also, an act like Oasis is going to have a lot of reserved seating. Running through algorithms to find contiguous seats is going to be tougher than this example and it's difficult to parallelize if you're truly giving the next person in the queue the actual best seats remaining.
It's actually not that hard (and probably easier) to express this in TigerBeetle using transfers with deterministic IDs. For example, you could check (and reserve) up to 8K contiguous seats in a single query to TigerBeetle, with a P100 less than 100ms.
> There are many other business rules that accrue after years of features to win Oasis like business unfortunately that will result in more DB calls and add contention.
Yes, contention is the killer.
We added an Amdahl's Law calculator to TigerBeetle's homepage to let you see the impact: https://tigerbeetle.com/#general-purpose-databases-have-an-o...
As you move "the data to the code" in interactive transactions with multiple queries, to process more and more business rules, you're holding row locks across the network. TigerBeetle's design inverts this, to move "the code to the data" in declarative queries, to let the DBMS enforce the transactional business rules directly in the database, with a rich set of debit/credit primitives and audit trail.
It's almost like stored procedures were a good idea.
2 replies →
Agree with the above, we built and run a ticketing platform, the actual transaction of purchasing the ticket at the final step in the funnel is not the bottleneck.
The shopping process and queuing process puts considerably more load on our systems than the final purchase transaction, which ultimately is constrained by the size of the venue, which we can control by managing the queue throughput.
Even with a queue system in place, you inevitably end up with the thundering heard problem when ticket sales open, as a large majority of users will refresh their browsers regardless of instructions to the contrary
You would use TigerBeetle for everything: not only the final purchase transaction, but the shopping cart process, inventory management and queuing/reserving.
In other words, to count not only the money changing hands, but also the corresponding goods/services being exchanged.
These are all transactions: goods/services and the corresponding money.
Does that mean that there is some smoke and mirrors when, eg Taylor Swift, says they sold out the concert in minutes? Or are the mega acts truly that high demand?
You can get the seats into "baskets" (reserved) in minutes. In my experience they will not sell out for some time as they usually keep dropping back into inventory. "Sold Out" is a matter of opinion. There are usually lots of single seats left sometimes for weeks or months. The promoter decides when to label the event as "sold out".
I recently did performance testing of Tigerbeetle for a financial transactions company. The key thing to understand about Tigerbeetle's speed is that it achieves very high speeds through batching transactions.
----
In our testing:
For batch transactions, Tigerbeetle delivered truly impressive speeds: ~250,000 writes/sec.
For processing transactions one-by-one individually, we found a large slowdown: ~105 writes/sec.
This is much slower than PostgreSQL, which row updates at ~5495 sec. (However, in practice PostgreSQL row updates will be way lower in real world OLTP workloads due to hot fee accounts and aggregate accounts for sub-accounts.)
One way to keep those faster speeds in Tigerbeetle for real-time workloads is microbatching incoming real-time transactions to Tigerbeetle at an interval of every second or lower, to take advantage of Tigerbeetle's blazing fast batch processing speeds. Nonetheless, this remains an important caveat to understand about its speed.
Hi! Rafael from TigerBeetle here!
> One way to keep those faster speeds in Tigerbeetle for real-time workloads is microbatching incoming real-time transactions to Tigerbeetle at an interval of every second or lower, to take advantage of Tigerbeetle's blazing fast batch processing speeds.
We don’t recommend artificially holding transfers just for batching purposes. René actually had to implement a batching worker API to work around a limitation in Python’s FastAPI, which handled requests per process, and he’s been very clear in suggesting that such would be better reimplemented in Go.
Unlike most connection-oriented database clients, the TigerBeetle client doesn’t use a connection pool, because there’s no concept of a “connection” in TigerBeetle’s VSR protocol.
This means that, although you can create multiple client instances, in practice less is better. You should have a single long-lived client instance per process, shared across tasks, coroutines, or threads (think of a web server handling many concurrent requests).
In such a scenario, the client can efficiently pack multiple events into the same request, while your application logic focuses solely on business-event-oriented chains of transfers. Typically, each business event involves only a handful of transfers, which isn't a problem of underutilization, as they'll be submitted together with other concurrent events as soon as possible.
However, if you’re dealing with a non-concurrent workload, for example, a batch process that bills thousands of customers for their monthly invoices, then you can simply submit all transfers at once.
Joran from TigerBeetle!
> For processing transactions one-by-one individually
If you're artificially restricting the load going into TigerBeetle, by sending transactions in one-by-one individually, then I think predictable latency (and not TPS) would be a better metric.
For example, TB's multi-region/multi-AZ fault-tolerance will work around gray failure (fail slow of hardware, as opposed to fail stop) in your network links or SSDs. You're also getting significantly stronger durability guarantees with TB [0][1].
It sounds like you were benchmarking on EBS? We recommend NVMe. We have customers running extremely tight 1 second SLAs, seeing microsecond latencies, even for one at a time workloads. Before TB, they were bottlenecking on PG. After TB, they saturated their central bank limit.
I would also be curious to what scale you tested? We test TB to literally 100 billion transactions. It's going to be incredibly hard to replicate that with PG's storage engine. PG is a great string DBMS but it's simply not optimized for integers the way TB is. Granted, your scale likely won't require it, but if you're comparing TPS then you should at least compare sustained scale.
There's also the safety factor of trying to reimplement TB's debit/credit primitives over PG to consider. Rolling it yourself. For example, did you change PG's defaults away from Read-Committed to Serializable and enable checksums in your benchmarks? (PG's checksums, even if you enable them, are still not going to protect you from misdirected I/O like the recent XFS bug.) Even the business logic is deceptively hard, there are thousands of lines of complicated state machine code, and we've invested literally millions into testing and audits.
Finally, it's important that your architecture as a whole, the gateways around TB, designs for concurrency first class, and isn't "one at a time", or TigerBeetle is probably not going to be your bottleneck.
[0] https://www.youtube.com/watch?v=_jfOk4L7CiY
[1] https://jepsen.io/analyses/tigerbeetle-0.16.11
Doesn't the Tigerbeetle client automatically batch requests?
We didn't observe any automatic batching when testing Tigerbeetle with their Go client. I think we initiated a new Go client for every new transaction when benchmarking, which is typically how one uses such a client in app code. This follows with our other complaint: it handles so little you will have to roll a lot of custom logic around it to batch realtime transactions quickly.
14 replies →
Did the company end up using it?
We didn't rule out using Tigerbeetle, but the drop in non-batch performance was disappointing and a reason we haven't prioritised switching our transaction ledger from PostgreSQL to Tigerbeetle.
There was also poor Ruby support for Tigerbeetle at the time, but that has improved recently and there is now a (3rd party) Ruby client: https://github.com/antstorm/tigerbeetle-ruby/
1 reply →
It seems to me that, in practice, you'd want the "LiveBatcher" to have some durability as well. Is there a scenario where a customer could lose their place because of a horribly timed server shutdown, where those transfers hadn't even been sent to TigerBeetle as pending yet? Or am I misunderstanding the architecture here?
Edit: Yes, I think I misunderstood something here. The user wouldn't even see their request as having returned a valid "pending" ticket sale since the batcher would be active as the request is active. The request won't return until its own transfer had been sent off to TigerBeetle as pending.
Obligatory Jepsen report https://jepsen.io/analyses/tigerbeetle-0.16.11
Why cant this be done with PostgreSQL?
The short answer is that we tried, back in 2020, while working on a central bank payment switch by the Gates Foundation. We found we were hitting the limits of Amdahl's Law, given Postgres' concurrency control with row locks held across the network as well as internal I/O, leading to the design of TigerBeetle. To specialize not for general purpose but only for transaction processing.
On the one hand, yes, you could use a general purpose string database to count/move integers, up to a certain scale. But a specialized integer database like TigerBeetle can take you further. It's the same reason, that yes, you could use Postgres as object storage or as queue, or you could use S3 and Kafka and get separation of concerns in your architecture.
I did a talk diving into all this recently, looking at the power law, OLTP contention, and how this interacts with Amdahl's Law and Postgres and TigerBeetle: https://www.youtube.com/watch?v=yKgfk8lTQuE
i am not an exact expert on the limitation you claim to have encountered on postgresql but perhaps someone with more postgresql expertise can chime in on this comment and give some insight
2 replies →
Is FastAPI just bad with SQLite? I would have expected SQLite to smoke Postgres in terms of ops/s.
I think Python is bad in general if you want “high-performance”
SQLite is in process, but concurrent write / performance is a complex matter : https://sqlite.org/wal.html
Yes, that's why I would expect it to smoke Postgres here, in process is orders of magnitude faster. Do you really need concurrency here when you can do 10-100k+ inserts per second?
2 replies →
Also surprised. My yardstick was this post which showed SQLite beating Postgres in a Django app. Benchmarking is hard, and the author said the Postgres results were not tuned to the same degree as SQLite, so buyer beware. https://blog.pecar.me/django-sqlite-benchmark