An introduction to join ordering

The development of the relational model heralded a big step forward for the world of databases. A few years later, SQL introduced a rich vocabulary for data manipulation: filters, projections, and—most importantly—the mighty join. Joins meant that analysts could construct new reports without having to interact with those eggheads in engineering, but more importantly, the existence of complex join queries meant that theoreticians had an interesting new NP-hard problem to fawn over for the next five decades.

关系模型的发展预示着数据库世界向前迈出了一大步。 几年后,SQL 引入了丰富的数据操作词汇:过滤器、投影,以及最重要的强大的连接。 连接意味着分析师可以构建新的报告,而无需与工程中的那些书呆子进行交互,但更重要的是,复杂连接查询的存在意味着理论家在接下来的 50 年里将面临一个有趣的新 NP 难题。

Ever since, the join has been the fundamental operation by which complex queries are constructed out of simpler “relations”. The declarative nature of SQL means that users do not generally specify how their query is to be executed—it’s the job of a separate component of the database called the optimizer to figure that out. Since joins are so prevalent in such queries, the optimizer must take special care to handle them intelligently. As we’ll see, this isn’t a trivial task.

从那时起,连接就成为了从更简单的“关系”构建复杂查询的基本操作。 SQL 的声明性本质意味着用户通常不会指定如何执行查询,而是数据库的一个单独组件(称为优化器)的工作来解决这个问题。 由于连接在此类查询中非常普遍,因此优化器必须特别小心以智能地处理它们。 正如我们将看到的,这不是一项简单的任务。

In this post, we’ll look at why join ordering is so important and develop a sense of how to think of the problem space. And then, in upcoming posts, we’ll begin discussing ways to implement a fast, reliable algorithm to produce good join orderings.

在这篇文章中,我们将了解为什么连接顺序如此重要,并培养如何思考问题空间的意识。 然后,在接下来的文章中,我们将开始讨论如何实现快速、可靠的算法来产生良好的连接顺序。

阅读全文 »

Automatic table statistics in CockroachDB

Last year, we rebuilt our cost-based optimizer from scratch for CockroachDB’s 2.1 release. We’ve been continuing to improve the optimizer since then, and we’ve added a number of new features for the CockroachDB 19.1 release. One of the new features is automatic collection of table statistics. Automatic statistics enables the optimizer to make better decisions when choosing query plans.

去年,我们为 CockroachDB 2.1 版本从头开始重建了基于成本的优化器。 从那时起,我们一直在不断改进优化器,并为 CockroachDB 19.1 版本添加了许多新功能。 新功能之一是自动收集表统计信息。 自动统计信息使优化器能够在选择查询计划时做出更好的决策。

This post explains why statistics are important for the optimizer and describes some of the challenges we overcame when implementing automatic collection.

这篇文章解释了为什么统计信息对于优化器很重要,并描述了我们在实现自动收集时克服的一些挑战。

阅读全文 »

Here at Cockroach Labs, we’ve had a continual focus on improving performance and scalability. To that end, our 2.1 release includes a brand-new, built-from-scratch, cost-based SQL optimizer. Besides enabling SQL features like correlated subqueries for the first time, it provides us with a flexible optimization framework that will yield significant performance improvements in upcoming releases, especially in more complex reporting queries. If you have queries that you think should be faster, send them our way! We’re building up libraries of queries that we use to tune the performance of the optimizer and prioritize future work.

在 Cockroach Labs,我们一直致力于提高性能和可扩展性。 为此,我们的 2.1 版本包含一个全新的、从头开始构建的、基于成本的 SQL 优化器。 除了首次启用相关子查询等 SQL 功能外,它还为我们提供了一个灵活的优化框架,该框架将在即将发布的版本中带来显着的性能改进,尤其是在更复杂的报告查询中。 如果您有任何您认为应该更快的查询,请发送给我们! 我们正在构建查询库,用于调整优化器的性能并确定未来工作的优先级。

While as an engineer, I’m eager to dive right into the details of how our new optimizer works (TL;DR - it’s very cool stuff), I need to first set the stage. I’ll start by explaining what a cost-based SQL optimizer is, and then tell you the story of how we decided we really, really needed one of those. Enough that we took 4 engineers, shut them into a windowless Slack room, and gave them carte blanche to rewrite a major component of CockroachDB. After story time, I’ll move onto the really interesting stuff, giving you a peek “under the hood” of the new optimizer. A peek will have to suffice, though, as digging deeper will require more words than one blog entry can provide. But do not despair; future articles will delve further into optimizer internals, so stay tuned.

作为一名工程师,我渴望深入了解我们的新优化器如何工作的细节(TL;DR - 这是非常酷的东西),我需要首先做好准备。 我将首先解释什么是基于成本的 SQL 优化器,然后告诉您我们如何决定我们真的非常需要其中一个的故事。 我们把 4 名工程师关在一个没有窗户的 Slack 房间里,全权让他们重写 CockroachDB 的一个主要组件。 故事时间结束后,我将转向真正有趣的内容,让您一睹新优化器的“幕后花絮”。 不过,浏览一下就足够了,因为深入挖掘需要的文字比一篇博客文章所能提供的还要多。 但不要绝望; 未来的文章将进一步深入探讨优化器的内部结构,敬请期待。

阅读全文 »

Availability and region failure: Joint consensus in CockroachDB

At Cockroach Labs, we write quite a bit about consensus algorithms. They are a critical component of CockroachDB and we rely on them in the lower layers of our transactional, scalable, distributed key-value store. In fact, large clusters can contain tens of thousands of consensus groups because in CockroachDB, every Range (similar to a shard) is an independent consensus group. Under the hood, we run a large number of instances of Raft (a consensus algorithm), which has come with interesting engineering challenges. This post dives into one that we’ve tackled recently: adding support for atomic replication changes (“Joint Quorums”) to etcd/raft and using them in CockroachDB to improve resilience against region failures.

在 Cockroach Labs,我们写了很多关于共识算法的文章。 它们是 CockroachDB 的关键组件,我们在事务性、可扩展、分布式键值存储的较低层中依赖它们。 事实上,大型集群可以包含数万个共识组,因为在 CockroachDB 中,每个 Range(类似于分片)都是一个独立的共识组。 在幕后,我们运行了大量 Raft(一种共识算法)实例,这带来了有趣的工程挑战。 这篇文章深入探讨了我们最近解决的一个问题:向 etcd/raft 添加对原子复制更改(“联合仲裁”)的支持,并在 CockroachDB 中使用它们来提高针对区域故障的恢复能力。

阅读全文 »

https://www.cockroachlabs.com/blog/cockroachdb-stability-from-1-node-to-100-nodes/

In August, we published a blog post entitled “Why Can’t I Run a 100-Node CockroachDB Cluster?”. The post outlined difficulties we encountered stabilizing CockroachDB. CockroachDB stability (or the lack of) had become significant enough that we designated it a “code yellow” issue, a concept borrowed from Google that means a problem is so pressing that it merits promotion to a primary concern of the company. For us, the code yellow was more than warranted; a database program isn’t worth the bytes to store its binary if it lacks stability.

8 月份,我们发表了一篇题为“为什么我不能运行 100 节点 CockroachDB 集群?”的博客文章。 这篇文章概述了我们在稳定 CockroachDB 时遇到的困难。 CockroachDB 稳定性(或缺乏稳定性)已经变得足够严重,以至于我们将其指定为“黄色代码”问题,这是一个借用自 Google 的概念,意味着问题非常紧迫,值得提升为公司的首要关注点。 对我们来说,黄色代码是值得的; 如果数据库程序缺乏稳定性,那么它就不值得用字节来存储其二进制文件。

In this post, I’ll set the stage with some background, then cover hypotheses for root causes of instability, our communication strategy, some interesting technical details, outcomes for stabilization efforts, and conclusions. It’s a long post, so bear with me!

在这篇文章中,我将介绍一些背景知识,然后介绍不稳定的根本原因的假设、我们的沟通策略、一些有趣的技术细节、稳定工作的结果和结论。 这是一篇很长的文章,所以请耐心等待!

TL;DR: We achieved most of our stability goal. While we’re still working on some of the chaos scenarios, the system is easily stable at many more than 10 node clusters – we’ve tested it successfully at 100 nodes.

TL;DR:我们实现了大部分稳定性目标。 虽然我们仍在处理一些混乱场景,但系统在超过 10 个节点的集群上很容易保持稳定——我们已经在 100 个节点上成功测试了它。

阅读全文 »

https://www.cockroachlabs.com/blog/scaling-raft/

Scaling Raft

In CockroachDB, we use the Raft consensus algorithm to ensure that your data remains consistent even when machines fail. In most systems that use Raft, such as etcd and Consul, the entire system is one Raft consensus group. In CockroachDB, however, the data is divided into ranges, each with its own consensus group. This means that each node may be participating in hundreds of thousands of consensus groups. This presents some unique challenges, which we have addressed by introducing a layer on top of Raft that we call MultiRaft.

在 CockroachDB 中,我们使用 Raft 共识算法来确保即使机器出现故障,您的数据也保持一致。 在大多数使用 Raft 的系统中,例如 etcd、Consul,整个系统就是一个 Raft 共识组。 然而,在 CockroachDB 中,数据被分为多个范围,每个范围都有自己的共识组。 这意味着每个节点可能参与数十万个共识组。 这带来了一些独特的挑战,我们通过在 Raft 之上引入一个称为 MultiRaft 的层来解决这些挑战。

阅读全文 »

https://www.cockroachlabs.com/blog/trust-but-verify-cockroachdb-checks-replication/

Trust, but verify: How CockroachDB checks replication

We built survivability into the DNA of CockroachDB. And while we had a lot of fun doing so, and are confident that we have built a solution on a firm foundation, we felt a nagging concern: Does CockroachDB really survive? When data is written to the database, will a failure really not end up in data loss? So to assuage those concerns, we adopted a Russian maxim: “Dovorey, no provorey – Trust, but Verify.”

我们将生存能力融入了 CockroachDB 的 DNA 中。 虽然我们这样做很有趣,并且相信我们已经在坚实的基础上构建了一个解决方案,但我们感到一个挥之不去的担忧:CockroachDB 真的能生存下来吗? 当数据写入数据库时,失败真的不会导致数据丢失吗? 因此,为了缓解这些担忧,我们采用了一句俄罗斯格言:“Dovorey,no provorey – 信任,但验证。”

阅读全文 »

https://www.cockroachlabs.com/blog/consensus-made-thrive/

Consensus, made thrive

When you write data to CockroachDB (for example, if you insert a row into a table through the SQL client), we take care of replication for you. To do this, we use a consensus protocol – an algorithm which makes sure that your data is safely stored on multiple machines, and that those machines agree on the current state even if some of them are temporarily disconnected.

当您将数据写入 CockroachDB 时(例如,如果您通过 SQL 客户端向表中插入一行),我们会为您处理复制。 为此,我们使用共识协议——一种确保您的数据安全存储在多台机器上的算法,并且即使其中一些机器暂时断开连接,这些机器也能就当前状态达成一致。

In this post, I will give an overview of common implementation concerns and how we address these concerns in CockroachDB. Then I will abandon these earthly constraints and explore how we could improve consensus algorithms. Specifically, what would it take to make them faster?

在这篇文章中,我将概述常见的实施问题以及我们如何在 CockroachDB 中解决这些问题。 然后我将放弃这些现实的限制并探索如何改进共识算法。 具体来说,怎样才能让它们更快?

阅读全文 »
0%