CAP Theorem in System Design

Overview

CAP Theorem states that in a distributed system, it is impossible to simultaneously provide more than two out of the following three guarantees: Consistency (C), Availability (A), and Partition Tolerance (P). In modern distributed architectures, network partitions are inevitable, making Partition Tolerance a mandatory requirement. Therefore, architects must primarily choose between Consistency and Availability during a network failure.

Key Insights

1. The Core Trade-off (C vs A)

Since Partition Tolerance (P) is a given in distributed systems, the decision-making process simplifies to:

  • CP (Consistency + Partition Tolerance): The system returns an error or time-out if it cannot guarantee the most recent data. Preferred when data accuracy is critical.
  • AP (Availability + Partition Tolerance): The system always returns a response, even if the data is stale. Preferred when system uptime is critical.

2. Industry Use Cases

  • CP Systems (Accuracy > Uptime):
    • Ticket Booking: Avoiding double-booked seats.
    • Inventory Systems: Ensuring stock levels are accurate (e.g., Amazon’s last item).
    • Financial Systems: Order books and stock trading where the value must be precise.
  • AP Systems (Uptime > Accuracy):
    • Social Media: Profile updates or post visibility (it’s okay if a post takes seconds to appear to others).
    • Content Platforms: Netflix movie descriptions or Yelp reviews.

3. Implementation Strategies

  • CP Design: Use distributed transactions, single-node databases (to eliminate propagation lag), or strong consistency modes in SQL/NoSQL (e.g., PostgreSQL, Google Spanner, DynamoDB in strong-consistency mode).
  • AP Design: Use multi-replica setups, eventual consistency, and Change Data Capture (CDC) (e.g., Cassandra, standard DynamoDB).

4. Advanced Consistency Levels

Beyond the binary C vs A, systems can implement nuanced consistency levels:

  • Causal Consistency: Ensures related events appear in the same order (e.g., a reply to a comment doesn’t appear before the original comment).
  • Read-Your-Own-Writes: A user immediately sees their own updates even if other users see stale data for a short period.
  • Eventual Consistency: The system will eventually become consistent once the partition is resolved.

References