Comment by paperplatter

2 years ago

It's not just about the data size. There are reasons your sequence can increment without a row actually being inserted.

Yep.

# Transactions that are explicitly rolled back, which have inserted to a table with an IDENTITY (or equivalent) column, almost always result in such a gap. Though if this is happening enough that it pushes the count up by an order of magnitude over time, you probably have a design problem…

# To reduce locking issues on the counter many DBs hold a cache of values (SQL Server allocates 1,000 at a time) in the persisted count so after a shutdown, especially an unclean shutdown, or a fail-over if using one of the clustering/mirroring/options options for HA, it is possible that most of these will be skipped. If this is happening enough to be a long-term problem than you have significant infrastructure issues. Such a cache could be per-thread/-process (IIRC it isn't in SQL Server) which is one of the reasons that IDENTITY column values can seem to be out-of-order in high concurrency environments (though you should never care about the actual order of values in such columns so that is a non-issue).

# In some DR processes where old DBs are restored, the values may be artificially updated after restore if used values might have been exposed to an external service (to avoid confusion caused is a number is used twice from the point of view of that external service). This is an extra argument against exposing internal ID values to external systems and in favour of having an additional UUID based identifier for such purposes.

# Also, though this is the most obvious one and hardly needs stating, deleted rows won't “return” their now unused number to be reused. You should scale the type based on rows expected to be inserted, not rows expected to be kept long-term.

If you are expecting hundreds of millions of rows in the lifetime of the table then a type larger than 32-bit is worth considering despite the extra storage needed in data and index pages. I'd not criticise considering it for tens of millions if you want to be more paranoid. Otherwise: 32-bit is very unlikely to not be fine, and 64-bit overkill.