Review: "CAP Twelve Years Later"

I prepared a one-pager review for Eric Brewer’s **article **(paid on IEEE Library) on CAP theorem a while ago, now publishing it.

# Review: “CAP: Twelve Years Later: How the Rules Have Changed”

In this article, Eric Brewer talks about how the distributed systems design paradigms and approaches are evolved since early 1990s when the CAP theorem coined by himself. According to the article, consistency (C), high availability (A) and partition tolerance (P) concepts slightly changed meanwhile and the notion of having at most of “2 of these 3” is no longer strict as it was before and can be seen as misleading today. He later on points out the connection between ACID–BASE philosophies and CAP theorem in distributed systems. Later on mostly his argument revolves around handling partitions so that a well CA (consistency-availability) can be achieved and work at a fine granularity by giving up only a bit.

He later on talks about partition decisions which is the strategy to how to decide on a partition and what to do with an ongoing operation. The proposed solutions are canceling the operation and sacrificing availability and wait longer for a response from a probably-partitioned end and therefore sacrifice consistency.

As the years have passed distributed systems became so-called less prone to software failures and errors under favor of evolved recovery and self-correction techniques; however, the systems had to grow in terms of number of nodes and thus failure rates of a machine in a cluster became significant. Not just limited to this argument, in the recent years, the case where thin clients continuously connected and synchronized with server applications became more common. Most of the case it is mobile apps or web interfaced wrappers for protocols such as Gmail for mail protocols, WhatsApp for chat, where network partition happens a lot due to connectivity through mobile and ad-hoc networks. Such cases are however, tolerant to inconsistencies and neatly handle nonpermanent communication failures with well designed retry mechanisms like exponential-backoff strategy.

In the rest of the article, Brewer talks about several strategies to managing network-partitioned cases by strategies like limiting some operations, recovering partitioned partitions, “long-running transactions” and purpose-specific solutions like “last-writer wins”. Due to fact that wide purpose of existing systems, partition management and compensation is a trade-off that is tightly bounded to application constraints. For instance, distributed databases like MongoDB handles partitions by keeping journals (called oplogs) and replay oplogs for crashed servers. However this does not work especially if partitioned servers were down for long durations and oplogs are insufficient. In this case, partitioned servers perform a full snapshot synchronization from master and recover. Such NoSQL databases do not provide a global lock or transaction support by design. Even the client-side applications like Google Docs, as Eric Brewer talked about, force a regional lock on the browser to prevent same regions to be concurrently edited by connected users as conflict resolution. At the end, such partition compensation and conflict resolution techniques highly depend on the application.

The compelling arguments of NoSQL databases like giving up consistency for short times and a narrow permitted set of operations compared to RDBMSes drew a lot of attention. The understanding of CAP theorem today led to design of new distributed systems which are CA oriented just because there are infrequent partitions and more convenient partition handling techniques where almost all design decisions are made upon several trade-offs. For some domains and access patterns, there are rock-solid solutions built upon CA-P trade-off and widely used in the industry today. ■

# Review: “CAP: Twelve Years Later: How the Rules Have Changed”

Comments