Real transactions are serializable
Most databases offer a choice of several transaction isolation levels, offering a tradeoff between correctness and performance. However, that performance comes at a price, as developers must study their transactional interactions carefully or risk introducing subtle bugs. CockroachDB provides strong (“SERIALIZABLE
”) isolation by default to ensure that your application always sees the data it expects. In this post I’ll explain what this means and how insufficient isolation impacts real-world applications.
大多数数据库提供多种事务隔离级别的选择,在正确性和性能之间进行权衡。 然而,这种性能是有代价的,因为开发人员必须仔细研究他们的事务交互,否则就有引入微妙错误的风险。 CockroachDB 默认提供强(“可串行化”)隔离,以确保您的应用程序始终看到它期望的数据。 在这篇文章中,我将解释这意味着什么以及隔离不足如何影响实际应用程序。
Isolation in the SQL Standard
The SQL standard defines four isolation levels:
SERIALIZABLE
REPEATABLE READ
READ COMMITTED
READ UNCOMMITTED
SERIALIZABLE
transactions run as if only one transaction were running at a time; the other isolation levels allow what the SQL standard euphemistically calls “the three phenomena”: dirty reads, non-repeatable reads, and phantom reads. Subsequent research has identified additional “phenomena” and isolation levels.
SERIALIZABLE 事务的运行就像一次只有一个事务在运行一样; 其他隔离级别允许 SQL 标准委婉地称为“三种现象”:脏读、不可重复读和幻读。 随后的研究发现了额外的“现象”和隔离级别。
In modern research, these “phenomena” are more commonly called “anomalies”, or more bluntly, “lies”. When you use a non-SERIALIZABLE
isolation level, you’re giving the database permission to return an incorrect answer in the hope that it will be faster than producing the correct one. The SQL standard recognizes that this is dangerous and requires that SERIALIZABLE
is the default isolation level. Weaker isolation levels are provided as a potential optimization for applications that can tolerate these anomalies.
在现代研究中,这些“现象”通常被称为“异常”,或者更直白地称为“谎言”。 当您使用非 SERIALIZABLE 隔离级别时,您就授予数据库返回错误答案的权限,希望它比生成正确答案更快。 SQL 标准认识到这是危险的,并要求 SERIALIZABLE 是默认的隔离级别。 提供较弱的隔离级别作为可以容忍这些异常的应用程序的潜在优化。
Isolation in Real Databases
Most databases ignore the specification that SERIALIZABLE
be the default, and instead prioritize performance over safety by defaulting to the weaker READ COMMITTED
or REPEATABLE READ
isolation levels. More worryingly, some databases (including Oracle, and PostgreSQL prior to version 9.1) do not provide a serializable transaction implementation at all. Oracle’s implementation of the SERIALIZABLE
isolation level is actually a weaker mode called “snapshot isolation”.
大多数数据库都会忽略 SERIALIZABLE 为默认值的规范,而是通过默认较弱的 READ COMMITTED 或 REPEATABLE READ 隔离级别来优先考虑性能而非安全性。 更令人担忧的是,一些数据库(包括Oracle和9.1版本之前的PostgreSQL)根本不提供可序列化事务的实现。 Oracle对SERIALIZABLE隔离级别的实现实际上是一种较弱的模式,称为“快照隔离”。
Snapshot isolation was developed after the initial standardization of the SQL language, but has been implemented in multiple database systems because it provides a good balance of performance and consistency. It is stronger than READ COMMITTED
but weaker than SERIALIZABLE
. It is similar to REPEATABLE READ
but not exactly equivalent (REPEATABLE READ
permits phantom reads but prevents write skew, while the reverse is true of snapshot isolation). The databases that have implemented snapshot isolation have made different decisions about how to fit it into the four SQL standard levels. Oracle takes the most aggressive stance, calling their snapshot implementation SERIALIZABLE
. CockroachDB and Microsoft SQL Server are conservative and treat SNAPSHOT
as a separate fifth isolation level. PostgreSQL (since version 9.1) falls in between, using snapshot isolation in place of REPEATABLE READ
.
快照隔离是在 SQL 语言最初标准化之后开发的,但由于它提供了性能和一致性的良好平衡,已在多个数据库系统中实现。 它比 READ COMMITTED 强,但比 SERIALIZABLE 弱。 它与 REPEATABLE READ 类似,但并不完全相同(REPEATABLE READ 允许幻读,但防止写入倾斜,而快照隔离则相反)。 已经实现快照隔离的数据库对于如何将其适应四个 SQL 标准级别做出了不同的决定。 Oracle 采取了最激进的立场,称他们的快照实现是可串行化的。 CockroachDB 和 Microsoft SQL Server 比较保守,将 SNAPSHOT 视为单独的第五个隔离级别。 PostgreSQL(自版本 9.1 起)介于两者之间,使用快照隔离代替可重复读取。
Because serializable mode is used less often in databases that default to weaker isolation, it is often less thoroughly tested or optimized. For example, PostgreSQL has a fixed-size memory pool that it uses to track conflicts between serializable transactions, which can be exhausted under heavy load.
由于可序列化模式在默认隔离较弱的数据库中使用较少,因此通常没有经过彻底的测试或优化。 例如,PostgreSQL 有一个固定大小的内存池,用于跟踪可序列化事务之间的冲突,这些冲突在重负载下可能会耗尽。
Most database vendors treat stronger transaction isolation as an exotic option to be enabled by applications with exceptional consistency needs. Most applications, however, are expected to work with the faster but unsafe weak isolation modes. This backwards approach to the problem exposes applications to a variety of subtle bugs. At Cockroach Labs, we like thinking about transactional anomalies so much that we named all our conference rooms after them, but I would have a hard time advising with confidence when it is both safe and beneficial to choose SNAPSHOT
isolation instead of SERIALIZABLE
. Our philosophy is that it’s better to start with safety and work towards performance than the other way around.
大多数数据库供应商将更强的事务隔离视为一种奇特的选项,由具有特殊一致性需求的应用程序启用。 然而,大多数应用程序都希望使用更快但不安全的弱隔离模式。 这种向后解决问题的方法使应用程序面临各种微妙的错误。 在 Cockroach Labs,我们非常喜欢考虑事务异常,因此我们以它们的名字命名了所有会议室,但当选择快照隔离而不是串行隔离既安全又有益时,我很难充满信心地提出建议。 我们的理念是,最好从安全开始,努力提高性能,而不是相反。
ACIDRain: Finding Transactional Bugs
Recent research at Stanford has explored the degree to which weak isolation leads to real-world bugs. Todd Warszawski and Peter Bailis examined 12 eCommerce applications and found 22 bugs related to transactions, five of which would have been avoided by running at a higher isolation level. Many of these bugs were simple to exploit and had direct financial implications. For example, in five of the tested applications, adding an item to your cart while checking out in another browser tab could result in the item being added to the order for free. The researchers developed tools to identify these vulnerabilities in a semi-automated way, paving the way for similar attacks (which the researchers dubbed “ACIDRain”) to become more prevalent.
斯坦福大学最近的研究探讨了弱隔离导致现实世界错误的程度。 Todd Warszawski 和 Peter Bailis 检查了 12 个电子商务应用程序,发现了 22 个与事务相关的错误,其中 5 个错误可以通过在更高的隔离级别运行来避免。 其中许多错误很容易被利用,并且会产生直接的财务影响。 例如,在五个测试的应用程序中,在另一个浏览器选项卡中结帐时将商品添加到购物车可能会导致该商品免费添加到订单中。 研究人员开发了工具以半自动方式识别这些漏洞,为类似攻击(研究人员称之为“ACIDRain”)变得更加普遍铺平了道路。
Most databases that default to weak transactional isolation provide workarounds, such as the (non-standard) FOR UPDATE
and LOCK IN SHARE MODE
modifiers for SELECT
statements. When used correctly, these modifiers can make transactions safe even in weaker isolation levels. However, this is easy to get wrong, and even when used consistently these extensions introduce most of the downsides of SERIALIZABLE
mode (in fact, overuse of SELECT FOR UPDATE
in a READ COMMITTED
transaction can perform worse than a SERIALIZABLE
transaction, because it uses exclusive locks where serializability may only require shared locks). The ACIDRain research demonstrates the limitations of this technique: only one in three of the applications that attempted to use SELECT FOR UPDATE
feature did so correctly; the others remained vulnerable.
大多数默认为弱事务隔离的数据库都提供了解决方法,例如 SELECT 语句的(非标准)FOR UPDATE 和 LOCK IN SHARE MODE 修饰符。 如果正确使用,这些修饰符即使在较弱的隔离级别下也可以使事务安全。 然而,这很容易出错,即使一致使用,这些扩展也会引入 SERIALIZABLE 模式的大部分缺点(事实上,在 READ COMMITTED 事务中过度使用 SELECT FOR UPDATE 的性能可能比 SERIALIZABLE 事务更差,因为它使用独占模式) 可串行化可能只需要共享锁的锁)。 ACIDRain 研究表明了该技术的局限性:尝试使用 SELECT FOR UPDATE 功能的应用程序中只有三分之一正确执行了操作; 其他人仍然很脆弱。
Conclusion
Databases that encourage the use of weaker isolation levels have prioritized performance over the safety of your data, leaving you to study subtle interactions between your transactions and implement error-prone workarounds. CockroachDB provides SERIALIZABLE
transactions by default to ensure that you always see the consistency that you expect from a transactional database.
鼓励使用较弱隔离级别的数据库将性能置于数据安全之上,让您可以研究事务之间的微妙交互并实施容易出错的解决方法。 CockroachDB 默认提供 SERIALIZABLE 事务,以确保您始终看到事务数据库所期望的一致性。