CAP Theorem in System Design
Overview
CAP Theorem states that in a distributed system, it is impossible to simultaneously provide more than two out of the following three guarantees: Consistency (C), Availability (A), and Partition Tolerance (P). In modern distributed architectures, network partitions are inevitable, making Partition Tolerance a mandatory requirement. Therefore, architects must primarily choose between Consistency and Availability during a network failure.
Key Insights
1. The Core Trade-off (C vs A)
Since Partition Tolerance (P) is a given in distributed systems, the decision-making process simplifies to:
- CP (Consistency + Partition Tolerance): The system returns an error or time-out if it cannot guarantee the most recent data. Preferred when data accuracy is critical.
- AP (Availability + Partition Tolerance): The system always returns a response, even if the data is stale. Preferred when system uptime is critical.
2. Industry Use Cases
- CP Systems (Accuracy > Uptime):
- Ticket Booking: Avoiding double-booked seats.
- Inventory Systems: Ensuring stock levels are accurate (e.g., Amazon’s last item).
- Financial Systems: Order books and stock trading where the value must be precise.
- AP Systems (Uptime > Accuracy):
- Social Media: Profile updates or post visibility (it’s okay if a post takes seconds to appear to others).
- Content Platforms: Netflix movie descriptions or Yelp reviews.
3. Implementation Strategies
- CP Design: Use distributed transactions, single-node databases (to eliminate propagation lag), or strong consistency modes in SQL/NoSQL (e.g., PostgreSQL, Google Spanner, DynamoDB in strong-consistency mode).
- AP Design: Use multi-replica setups, eventual consistency, and Change Data Capture (CDC) (e.g., Cassandra, standard DynamoDB).
4. Advanced Consistency Levels
Beyond the binary C vs A, systems can implement nuanced consistency levels:
- Causal Consistency: Ensures related events appear in the same order (e.g., a reply to a comment doesn’t appear before the original comment).
- Read-Your-Own-Writes: A user immediately sees their own updates even if other users see stale data for a short period.
- Eventual Consistency: The system will eventually become consistent once the partition is resolved.