2000case studiespaper #13 / 29

The CAP Theorem

by Eric Brewer

Gilbert & Nancy Lynch When designing distributed web services, there are three properties that are commonly desired: consistency, availability, and partition tolerance. It is impossible to achieve all three. In this note, we prove this conjecture in the asynchronous network model, and then discuss solutions to this dilemma in the partially synchronous model.

Why this paper matters

Eric Brewer’s CAP Theorem, formalized by Gilbert and Lynch in 2002, stands as one of the most foundational results in distributed systems theory, shaping how engineers reason about fault tolerance and consistency in the presence of network partitions. In the pre-cloud era, distributed systems were often designed with implicit assumptions about network reliability, leading to brittle architectures that failed catastrophically under partial failure. CAP exposed this flaw by proving that in asynchronous networks, you cannot simultaneously guarantee consistency (C), availability (A), and partition tolerance (P). This was not merely a theoretical curiosity-it became a litmus test for system design, forcing architects to explicitly choose trade-offs rather than hope for the best.

By the 2020s, CAP had transcended academia and become a cultural artifact in system design discourse. Engineers routinely invoke CAP to justify database choices, API contracts, and even organizational processes. The theorem didn’t just predict failure modes; it redefined how we evaluate distributed correctness. In 2026, CAP remains the first slide in any distributed systems talk, a shorthand for discussing resilience, latency, and consistency without rederiving consensus algorithms from scratch. It also catalyzed the rise of “CAP-aware” systems that expose explicit levers for tuning consistency, enabling fine-grained control over trade-offs in multi-tenant, globally distributed deployments.

Crucially, CAP bridged theory and practice by forcing a simple question: What do you do when the network lies? Before CAP, engineers assumed networks were reliable enough to build synchronous protocols atop. After CAP, systems were designed with partition tolerance as a first-class requirement, not an afterthought. This shift underpins nearly every modern distributed database, from edge caches to planetary-scale transactional stores. Without CAP, concepts like quorum-based replication, hinted handoff, and CRDTs would lack a unifying theoretical framework. In short, CAP didn’t just matter-it redefined what it meant to build reliable distributed software.

Key contributions

Formal proof that in asynchronous networks, it is impossible to simultaneously satisfy consistency, availability, and partition tolerance-even under partial failures. The proof hinges on the FLP impossibility result and shows that any algorithm guaranteeing C and A cannot tolerate partitions, and vice versa.
Introduction of the CAP trade-off triangle as a mental model for system design, enabling engineers to reason about failure modes without deep algorithmic knowledge.
Demonstration that partition tolerance (P) must be prioritized in modern networks, where partitions are not rare exceptions but common occurrences due to global scale, mobility, and heterogeneous infrastructure.
Clarification of semantic guarantees: consistency here refers to linearizability or atomicity, availability to non-failure responses, and partition tolerance to continued operation during network splits.
Provision of a framework for classifying distributed systems by their CAP behavior-e.g., CP systems prioritize consistency during partitions (e.g., Spanner during inter-region splits), while AP systems favor availability (e.g., Dynamo during AZ failures).

Impact on modern systems

CAP’s influence permeates nearly every distributed database released since 2002, often in ways that are invisible to users but critical to correctness. Take CockroachDB, for instance. Released in 2017, CockroachDB explicitly advertises itself as a CP system during network partitions, sacrificing availability to preserve linearizable consistency. This choice is directly motivated by CAP: when a partition occurs between regions, CockroachDB enters a “partition-aware” mode, reducing lease holders in affected regions and halting writes to maintain consistency. The system exposes this behavior through its cockroach demo CLI, where users can simulate a network split and observe how writes halt in one region to preserve correctness. This aligns with CAP’s formal result-when P is active, either C or A must yield-and CockroachDB opts for C.

Similarly, FoundationDB (acquired by Apple, now open-source) is designed as a CP system under partitions, using a variant of Paxos for consensus. Its transactional layer depends on a stable leader and quorum-based replication. When a partition occurs, FoundationDB reduces its quorum size but will not serve stale data, adhering to the CP model. Engineers at Apple use this property to build strongly consistent services like iCloud Keychain, where correctness is paramount. The system’s latency profile-typically 5-10ms for single-region transactions-reflects this choice: lower latency is sacrificed for correctness during failure.

CAP also informed the evolution of MongoDB. Starting with version 3.6 (2018), MongoDB introduced “Causal Consistency,” a middle ground between strong and eventual consistency. This model allows clients within a single partition to see causally consistent views while avoiding the overhead of global linearizability. During a partition, MongoDB can be configured to remain available (AP) or prioritize consistency (CP), depending on the deployment mode. The readConcern and writeConcern parameters give users knobs to tune CAP behavior per operation, reflecting the paper’s original insight that no single configuration fits all workloads.

The theorem’s reach extends beyond transactional stores. ScyllaDB, a Cassandra-compatible database written in C++, leverages CAP to optimize for high throughput and low tail latency. By default, it prioritizes availability during partitions (AP), using a gossip-based anti-entropy protocol to eventually converge. However, during critical operations like range movements or node repairs, ScyllaDB can switch to CP mode, blocking writes until consensus is restored. This adaptive behavior is a direct application of CAP: it acknowledges that no system maintains all three properties simultaneously, but allows context-specific tuning.

Even PostgreSQL, the 1980s-era relational stalwart, has been retrofitted with CAP awareness. With logical replication and tools like Patroni, PostgreSQL clusters can be configured to fail over automatically during leader loss, prioritizing availability (AP). However, when synchronous replication is enabled (synchronous_commit = 'on'), the system becomes CP during partitions, refusing to acknowledge writes until a quorum is reached. This behavior is explicitly documented in PostgreSQL’s manual as a CAP trade-off, showing how even legacy systems have absorbed the theorem’s principles.

No discussion of CAP’s modern impact would be complete without mentioning Amazon DynamoDB. Launched in 2012, DynamoDB was explicitly designed as an AP system, prioritizing availability and partition tolerance over strong consistency. Its use of Dynamo-style replication and eventual consistency ensures that writes always succeed, even during partitions, with conflict resolution handled via last-write-wins or application-level logic. This design choice directly reflects CAP’s trade-offs: DynamoDB sacrifices immediate consistency to guarantee availability and tolerate network splits, a decision that enabled its global scalability and low-latency performance for web-scale applications.

Another example is Google Cloud Spanner, released in 2017. Spanner is a CP system par excellence, offering globally consistent, externally serializable transactions across regions. To achieve this, it uses a combination of TrueTime (a synchronized clock service) and Paxos-based consensus. When a network partition occurs between regions, Spanner reduces its availability to preserve consistency, ensuring that all transactions see a globally consistent snapshot. This behavior is a direct consequence of CAP: Spanner opts for C and P, sacrificing A during partitions. The system’s design demonstrates how CAP’s principles can be engineered around, rather than ignored, to build highly reliable global systems.

Finally, CAP influenced architectural patterns like CRDTs and event sourcing, which are now common in systems designed for edge deployment. These systems often embrace AP semantics by default, using conflict-free merge semantics to guarantee eventual consistency. This mirrors CAP’s insight that availability and partition tolerance are often easier to achieve than strong consistency in large-scale systems. The shift from ACID to BASE (BASE: an Acid Alternative) is, in many ways, a practical embodiment of CAP’s trade-offs.

AI era: how LLMs and vector databases relate to this paper

CAP’s core tension-between consistency, availability, and partition tolerance-has re-emerged in the AI era, particularly in vector databases and retrieval-augmented generation (RAG) systems. These systems must balance three competing demands: fast, available retrieval; accurate, up-to-date answers; and resilience to network or node failures. Vector databases like Pinecone, Weaviate, and Qdrant are fundamentally distributed systems, often deployed across multiple availability zones or regions. Their designers must decide: during a partition, should the system serve stale embeddings (AP) or refuse queries until consistency is restored (CP)?

Consider Pinecone. It offers two consistency modes: “strong” and “eventual.” In strong mode, the system behaves as CP during updates: it blocks writes until the index is propagated across all shards. This ensures that queries return the most recent vectors but increases latency and reduces availability during network splits. In eventual mode, Pinecone prioritizes availability (AP), allowing queries to proceed even if some shards are unreachable, with eventual convergence. This mirrors CAP’s trade-off directly: strong consistency sacrifices availability during partitions, while eventual consistency ensures liveness at the cost of staleness.

Similarly, pgvector, the PostgreSQL extension for vector search, inherits PostgreSQL’s CAP behavior. When synchronous replication is enabled, it becomes CP during partitions, refusing to acknowledge writes until a quorum is reached. This is critical for AI applications that require accurate embeddings, such as semantic search or recommendation systems. If stale embeddings are served, RAG pipelines may produce hallucinated or outdated answers. Thus, CAP’s insight directly impacts AI reliability: consistency in vector stores is not an academic concern but a requirement for trustworthy outputs.

RAG systems amplify this tension. A typical RAG pipeline consists of: embedding generation (LLM), vector storage (vector DB), retrieval, and synthesis (LLM). Each component can fail or become partitioned. The vector database’s CAP behavior determines whether the system returns stale context (AP) or waits for fresh data (CP). If the embedding store is AP, the LLM may generate responses based on outdated knowledge, leading to factual errors. If it’s CP, the system may time out during retrieval, increasing LLM inference latency. This is why production RAG systems often implement hybrid strategies: use CP for critical knowledge (e.g., medical or legal data) and AP for less sensitive content (e.g., internal wikis).

LLM inference servers themselves face CAP-like trade-offs. Systems like vLLM or TensorRT-LLM often prioritize availability (AP) during hardware failures, using model sharding and speculative decoding to maintain throughput. However, when serving stateful agents or chatbots, strong consistency is required for session state. This is where agent state stores come in-systems like Dify or LangGraph use databases with explicit CAP semantics to store conversation history. During a partition, they may fall back to local storage (AP) or block new interactions until state is synchronized (CP), depending on the application’s risk profile.

Semantic indexes, a core component of AI pipelines, also embody CAP trade-offs. Milvus, for instance, uses a hybrid architecture where the metadata layer (e.g., collection schemas) is strongly consistent (CP), while the vector data layer can be eventually consistent (AP). This ensures that queries don’t return invalid schemas during partitions but may serve slightly stale vectors. The system exposes consistency_level parameters to let users tune this behavior per query, directly applying CAP’s principles at runtime.

Another critical area is embedding serving, where systems like NVIDIA’s NeMo Retriever or Voyage AI’s API must balance freshness and latency. These services often use a CP model for embedding updates, ensuring that the latest vectors are available for retrieval, but tolerate eventual consistency for queries to maintain availability. This is particularly important in real-time applications like fraud detection, where stale embeddings could lead to incorrect classifications.

Finally, LLM-driven query planning introduces new CAP-like challenges. Systems like LangChain or LlamaIndex dynamically decompose user queries into sub-queries, routing them to different data sources. If one source is partitioned, the planner must decide: skip the source (AP), wait for recovery (CP), or synthesize an answer from available data. This is a modern incarnation of CAP: the planner becomes a distributed algorithm making trade-offs under uncertainty.

In 2026, the AI stack’s reliance on vector databases and RAG systems means CAP is no longer just a backend concern-it’s a user-facing reliability issue. A single stale vector can lead to a hallucinated answer, and a partitioned index can block an entire AI service. Thus, CAP’s legacy persists: the theorem that began as a critique of network optimism now underpins the reliability of the systems that power generative AI.

The CAP Theorem

Why this paper matters

Key contributions

Impact on modern systems

AI era: how LLMs and vector databases relate to this paper

Further reading