Linearizability versus Serializability

[Linearizability versus Serializability]

线性化与串行化

原文链接

(http://www.bailis.org/blog/linearizability-versus-serializability/)

Linearizability and serializability are both important properties about interleavings of operations in databases and distributed systems, and it’s easy to get them confused. This post gives a short, simple, and hopefully practical overview of the differences between the two.

线性化和串行化都是数据库和分布式系统中操作交错的重要属性,很容易将它们混淆。 这篇文章对两者之间的差异进行了简短、简单且实用的概述。

Linearizability: single-operation, single-object, real-time order

Linearizability is a guarantee about single operations on single objects. It provides a real-time (i.e., wall-clock) guarantee on the behavior of a set of single operations (often reads and writes) on a single object (e.g., distributed register or data item).

线性化是对单个对象上的单个操作的保证。 它为单个对象(例如分布式寄存器或数据项)上的一组单个操作(通常是读取和写入)的行为提供实时(即挂钟)保证。

In plain English, under linearizability, writes should appear to be instantaneous. Imprecisely, once a write completes, all later reads (where “later” is defined by wall-clock start time) should return the value of that write or the value of a later write. Once a read returns a particular value, all later reads should return that value or the value of a later write.

用简单的英语来说,在线性化下,写入应该看起来是瞬时的。 不精确地说,一旦写入完成,所有后续读取(其中“稍后”由挂钟开始时间定义)都应返回该写入的值或稍后写入的值。 一旦读取返回特定值,所有后续读取都应返回该值或后续写入的值。

Linearizability for read and write operations is synonymous with the term “atomic consistency” and is the “C,” or “consistency,” in Gilbert and Lynch’s proof of the CAP Theorem. We say linearizability is composable (or “local”) because, if operations on each object in a system are linearizable, then all operations in the system are linearizable.

读写操作的线性化与术语“原子一致性”同义,在 Gilbert 和 Lynch 的 CAP 定理证明中是“C”或“一致性”。 我们说线性化是可组合的(或“局部的”),因为如果系统中每个对象的操作都是线性化的,那么系统中的所有操作都是线性化的。

Serializability: multi-operation, multi-object, arbitrary total order

Serializability is a guarantee about transactions, or groups of one or more operations over one or more objects. It guarantees that the execution of a set of transactions (usually containing read and write operations) over multiple items is equivalent to some serial execution (total ordering) of the transactions.

可串行性是对事务或对一个或多个对象的一个或多个操作组的保证。它保证对多个项目执行一组事务(通常包含读取和写入操作)相当于某些串行 事务的执行(总排序)。

Serializability is the traditional “I,” or isolation, in ACID. If users’ transactions each preserve application correctness (“C,” or consistency, in ACID), a serializable execution also preserves correctness. Therefore, serializability is a mechanism for guaranteeing database correctness.

可串行性是 ACID 中传统的“I”或隔离。 如果每个用户的事务都保持应用程序的正确性(“C”,即 ACID 中的一致性),则可序列化执行也会保持正确性。 因此,可序列化是保证数据库正确性的一种机制。1

Unlike linearizability, serializability does not—by itself—impose any real-time constraints on the ordering of transactions. Serializability is also not composable. Serializability does not imply any kind of deterministic order—it simply requires that some equivalent serial execution exists.

与线性化不同,可串行化本身并不对事务的排序施加任何实时约束。 可串行化也是不可组合的。 可串行性并不意味着任何类型的确定性顺序 - 它只是需要存在某种等效的串行执行。

Strict Serializability: Why don’t we have both?

Combining serializability and linearizability yields strict serializability: transaction behavior is equivalent to some serial execution, and the serial order corresponds to real time. For example, say I begin and commit transaction T1, which writes to item x, and you later begin and commit transaction T2, which reads from x. A database providing strict serializability for these transactions will place T1 before T2 in the serial ordering, and T2 will read T1’s write. A database providing serializability (but not strict serializability) could order T2 before T1.2

将串行化和线性化相结合会产生严格的串行化:事务行为相当于某种串行执行,并且串行顺序对应于实时。 例如,假设我开始并提交事务 T1,它写入项目 x,而您稍后开始并提交事务 T2,它从 x 读取。 为这些事务提供严格可串行性的数据库将在串行排序中将 T1 置于 T2 之前,并且 T2 将读取 T1 的写入。 提供可序列化(但不是严格可序列化)的数据库可以在 T1 之前排序 T2

As Herlihy and Wing note, “linearizability can be viewed as a special case of strict serializability where transactions are restricted to consist of a single operation applied to a single object.”

正如 Herlihy 和 Wing 所指出的,“线性化可以被视为严格串行化的一种特殊情况,其中事务仅限于由应用于单个对象的单个操作组成。”

Coordination costs and real-world deployments

Neither linearizability nor serializability is achievable without coordination. That is we can’t provide either guarantee with availability (i.e., CAP “AP”) under an asynchronous network.3

如果没有协调,线性化和串行化都无法实现。 也就是说,我们无法在异步网络下提供可用性保证(即 CAP“AP”)。3

In practice, your database is unlikely to provide serializability, and your multi-core processor is unlikely to provide linearizability—at least by default. As the above theory hints, achieving these properties requires a lot of expensive coordination. So, instead, real systems often use cheaper-to-implement and often harder-to-understand models. This trade-off between efficiency and programmability represents a fascinating and challenging design space.

实际上,您的数据库不太可能提供可串行化,并且您的多核处理器也不太可能提供线性化——至少在默认情况下是这样。 正如上述理论所暗示的,实现这些特性需要大量昂贵的协调。 因此,实际系统通常使用实施成本较低且通常较难理解的模型。 效率和可编程性之间的这种权衡代表了一个令人着迷且具有挑战性的设计空间。

A note on terminology, and more reading

One of the reasons these definitions are so confusing is that linearizability hails from the distributed systems and concurrent programming communities, and serializability comes from the database community. Today, almost everyone uses both distributed systems and databases, which often leads to overloaded terminology (e.g., “consistency,” “atomicity”).

这些定义如此令人困惑的原因之一是,线性化来自分布式系统和并发编程社区,而串行化来自数据库社区。 如今,几乎每个人都使用分布式系统和数据库,这常常导致术语过多(例如“一致性”、“原子性”)。

There are many more precise treatments of these concepts. I like this book, but there is plenty of free, concise, and (often) accurate material on the internet, such as these notes.

这些概念还有许多更精确的处理方法。 我喜欢这本书,但是互联网上有大量免费、简洁且(通常)准确的材料,例如这些笔记。

Notes

  1. But it’s not the only mechanism!

    但这不是唯一的机制!

    Granted, serializability is (more or less) the most general means of maintaining database correctness. In what’s becoming one of my favorite “underground” (i.e., relatively poorly-cited) references, H.T. Kung and Christos Papadimitriou dropped a paper in SIGMOD 1979 on “An Optimality Theory of Concurrency Control for Databases.” In it, they essentially show that, if all you have are transactions’ syntactic modifications to database state (e.g., read and write) and no information about application logic, serializability is, in some sense, “optimal”: in effect, a schedule that is not serializable might modify the database state in a way that produces inconsistency for some (arbitrary) notion of correctness that is not known to the database.

    诚然,可序列化(或多或少)是维护数据库正确性的最通用方法。 在成为我最喜欢的“地下”(即引用相对较少的)参考文献之一中,H.T. Kung 和 Christos Papadimitriou 在 SIGMOD 1979 上发表了一篇关于“数据库并发控制的最优理论”的论文。 在其中,它们本质上表明,如果您拥有的只是事务对数据库状态的语法修改(例如,读取和写入)并且没有有关应用程序逻辑的信息,那么可串行性在某种意义上是“最佳的”:实际上,是一个时间表 不可序列化的可能会修改数据库状态,从而导致数据库未知的某些(任意)正确性概念产生不一致。

    However, if do you know more about your user’s notions of correctness (say, you are the user!), you can often do a lot more in terms of concurrency control and can circumvent many of the fundamental overheads imposed by serializability. Recognizing when you don’t need serializability (and subsequently exploiting this fact) is the best way I know to “beat CAP.”

    但是,如果您更多地了解用户的正确性概念(例如,您是用户!),您通常可以在并发控制方面做更多的事情,并且可以规避可串行性带来的许多基本开销。 认识到何时不需要可序列化(并随后利用这一事实)是我所知道的“击败 CAP”的最佳方法。 ↩

  2. Note that some implementations of serializability (such as two-phase locking with long write locks and long read locks) actually provide strict serializability. As Herlihy and Wing point out, other implementations (such as some MVCC implementations) may not.

    请注意,某些可串行性的实现(例如具有长写锁和长读锁的两阶段锁定)实际上提供了严格的可串行性。 正如 Herlihy 和 Wing 指出的那样,其他实现(例如某些 MVCC 实现)可能不会。

    So, why didn’t the early papers that defined serializability call attention to this real-time ordering? In some sense, real time doesn’t really matter: all serializable schedules are equivalent in terms of their power to preserve database correctness! However, there are some weird edge cases: for example, returning NULL in response to every read-only transaction is serializable (provided we start with an empty database) but rather unhelpful.

    那么,为什么早期定义可串行化的论文没有引起人们对这种实时排序的关注呢? 从某种意义上说,实时并不重要:所有可序列化的调度在保持数据库正确性方面的能力都是相同的! 然而,有一些奇怪的边缘情况:例如,响应每个只读事务返回 NULL 是可序列化的(假设我们从空数据库开始),但毫无帮助。

    One tantalizingly plausible theory for this omission is that, back in the 1970s when serializability theory was being invented, everyone was running on single-site systems anyway, so linearizability essentially “came for free.” However, I believe this theory is unlikely: for example, database pioneer Phil Bernstein was already looking at distributed transaction execution in his SDD-1 system as early as 1977 (and there are older references yet). Even in this early work, Bernstein (and company) are careful to stress that “there may in fact be several such equivalent serial orderings” [emphasis theirs]. To further put this theory to rest, Papadimitriou makes clear in his seminal 1979 JACM article that he’s familiar with problems inherent in a distributed setting. (If you ever want to be blown away by the literature, look at how much of the foundational work on concurrency control was done by the early 1980s.)

    对于这一遗漏,一个看似合理的理论是,早在 20 世纪 70 年代,当串行化理论被发明时,每个人都在单站点系统上运行,因此线性化本质上是“免费的”。 然而,我认为这个理论不太可能:例如,数据库先驱 Phil Bernstein 早在 1977 年就已经在他的 SDD-1 系统中研究分布式事务执行(并且还有更早的参考资料)。 即使在这项早期工作中,伯恩斯坦(和公司)也小心地强调“实际上可能有几个这样的等效序列顺序”[强调他们的]。 为了进一步证实这一理论,Papadimitriou 在其 1979 年 JACM 的开创性文章中明确表示,他熟悉分布式环境中固有的问题。 (如果您想被文献所震撼,请看看 20 世纪 80 年代初完成了多少并发控制的基础工作。) ↩

  3. For distributed systems nerds: achieving linearizability for reads and writes is, in a formal sense, “easier” to achieve than serializability. This is probably deserving of another post (encouragement appreciated!), but here’s some intuition: terminating atomic register read/write operations are achievable in a fail-stop model. Yet atomic commitment—which is needed to execute multi-site serializable transactions (think: AC is to 2PC as consensus is to Paxos)—is not: the FLP result says consensus is unachievable in a fail-stop model (hence with One Faulty Process), and (non-blocking) atomic commitment is “harder” than consensus (see also). Also, keep in mind that linearizability for read-modify-write is harder than linearizable read/write. (linearizable read/write《 consensus《 atomic commitment)

    对于分布式系统迷来说:从形式上来说,实现读写的线性化比串行化“更容易”实现。 这可能值得另一篇文章(感谢鼓励!),但这里有一些直觉:终止原子寄存器读/写操作在故障停止模型中是可以实现的。 然而,原子承诺——执行多站点可序列化事务所需的原子承诺(想想:AC 之于 2PC,共识之于 Paxos)——却并非如此:FLP 结果表明,在故障停止模型中无法达成共识(因此出现了一个错误进程) ),并且(非阻塞)原子承诺比共识“更难”(另请参阅)。 另外,请记住,读-修改-写的线性化比线性化读/写更难。 (线性化读/写《共识》《原子承诺) ↩