Justin Jaffray

blog notes

A Mild Generalization of Linearizability

22 Apr 2018

CockroachDB (the product I work on) isn’t linearizable in the colloquial sense. Spencer Kimball gives a good outline of this in Living Without Atomic Clocks. Despite this, its guarantees with respect to real-time ordering are slightly stronger than “nothing”. Transactions obey a timestamp ordering, it’s just that the order in which timestamps are allocated in a distributed system such as Cockroach don’t necessarily respect what we would think of as “real-time ordering”. Or do they?

TL;DR: Morally? No. Technically? Kind of.

If you actually go read the paper introducing linearizability as a concept, they never formally defer to a notion of “real-time”. They only talk about histories and properties of them. This is a subtle, but important point: it’s on the reader to provide the concept of “real-time”. We need to say something like, “histories reflecting the real-time sequence of events on this register are linearizable.” Real-time is not baked into the definition, as most discussions I’ve seen would imply. That said, when people talk about linearizability, they typically mean “with respect to the real-time ordering”.

What does this mean in a system like Cockroach? Transactions are assigned timestamps, and their serialization order is determined by those timestamps. Those timestamps might not necessarily reflect “real-time” though. Even though each individual node generates timestamps which conform to its understanding of real-time, the clocks on any two nodes could differ by up to the “max clock skew” (500ms by default). In a system like this, there’s a trick that the distributed system (or the client) can perform to recover linearizability - wait for the max-clock-skew before formally acknowledging the operation. This guarantees that even if someone’s clock was max-clock-skew faster than the node performing the ack, they will have committed everything under that transaction’s timestamp by the time the client receives the ack.

This appears to leave us with a problem of vocabulary. What do we call a system that provides this weaker-than-linearizable property? Linearizable is a binary term, and this doesn’t appear to be it. We might be tempted to define something like “t-linearizability”, where a system is t-linearizable if it can recover linearizability by waiting for time t before acknowledging a commit (with vanilla linearizability being 0-linearizability). This is misguided, though. Vanilla linearizability doesn’t have any understanding of real-time, and this is imposing it on it in an unnatural way.

Thus, the right thing to do is to step to the level where we actually introduced “real-time” into the system at all, and instead of defining the histories we use by “real-time”, we simply define them as if they ack max-clock-skew later than they actually do. Then we just apply the definition of linearizable as we applied it to regular “real-time” histories. This captures the idea that unless someone ensures the next invocation begins at least max-clock-skew after the previous one ends, we’re back in the wild-west: no ordering guarantees.

How we’ve defined this is in fact about the most mild generalization one could make; it’s not a generalization at all. It’s the same definition, applied in a slightly different context. That said, it’s likely that if you use the term “linearizability” in this way nobody will understand you. You should probably say something like “linearizability up to clock skew”.

This post was originally posted on Ristret.