Popular System Design Concepts Every Software Engineer Should Know
Popular System Design Concepts Every Software Engineer Should Know
System design is one of the most critical skills for software engineers, especially as applications grow from serving a handful of users to millions. Whether you're preparing for a system design interview or building production systems, understanding these fundamental concepts will help you architect robust, scalable solutions.
1. Load Balancing
Load balancing is the practice of distributing workloads across multiple computing resources to ensure no single server becomes overwhelmed.
How It Works
A load balancer sits between clients and backend servers, routing incoming requests based on various algorithms:
- Round Robin: Distributes requests sequentially across servers
- Least Connections: Routes to the server with fewest active connections
- IP Hash: Uses client IP to determine server assignment (ensures session affinity)
- Weighted Distribution: Assigns more traffic to more powerful servers
Types of Load Balancers
- Layer 4 (Transport Layer): Routes based on IP and TCP/UDP ports
- Layer 7 (Application Layer): Routes based on HTTP headers, URLs, and content
Popular Solutions
- Nginx: Widely used reverse proxy and load balancer
- HAProxy: High-performance TCP/HTTP load balancer
- AWS ALB/NLB: Amazon's Application and Network Load Balancers
- Cloudflare Load Balancing: Global server load balancing (GSLB)
Client β Load Balancer β [Server 1, Server 2, Server 3]
2. Caching Strategies
Caching stores frequently accessed data in fast storage layers to reduce latency and database load.
Cache Levels
Browser Cache: Static assets stored on the client side CDN Cache: Content distributed across edge locations globally Application Cache: In-memory caches like Redis or Memcached Database Cache: Query result caches and buffer pools
Cache Eviction Policies
- LRU (Least Recently Used): Removes the least recently accessed items
- LFU (Least Frequently Used): Removes the least frequently accessed items
- FIFO (First In, First Out): Removes the oldest cached item
- TTL (Time to Live): Expires items after a set duration
Cache-Aside vs Write-Through
Cache-Aside (Lazy Loading):
Application β Check Cache β If miss β Query DB β Store in Cache β Return
Write-Through:
Application β Write to Cache & DB simultaneously
Write-Behind:
Application β Write to Cache β Async flush to DB
Common Pitfalls
- Cache Stampede: Multiple requests for expired key simultaneously hit DB
- Cache Penetration: Requests for non-existent data bypass cache
- Cache Avalanche: Many keys expire at once, overwhelming the database
3. Database Sharding
Sharding is a horizontal partitioning technique that splits large databases across multiple machines.
Sharding Strategies
Range-Based Sharding: Divides data based on value ranges
Shard 1: Users A-F
Shard 2: Users G-M
Shard 3: Users N-Z
Hash-Based Sharding: Applies hash function to determine shard placement
shard = hash(user_id) % number_of_shards
Directory-Based Sharding: Uses a lookup table to map data to shards
When to Shard
- Database size exceeds single server capacity
- Write throughput becomes a bottleneck
- Geographic distribution requires data locality
Challenges
- Cross-shard joins become complex and slow
- Rebalancing data when adding/removing shards
- Distributed transactions require two-phase commit
4. Replication Patterns
Replication maintains multiple copies of data across different nodes for availability and read scalability.
Single Leader Replication
Writes β Leader Node β Replicated to β Follower Nodes
Reads β Can go to Leader or Followers
Pros: Simple, consistent writes Cons: Write bottleneck, leader failure are critical
Multi-Leader Replication
Writes β Any Leader β Synced to β Other Leaders & Followers
Use Case: Multi-datacenter deployments, collaborative editing Challenge: Write conflicts require resolution strategies
Leaderless Replication
Clients write directly to multiple nodes (e.g., DynamoDB, Cassandra) Quorum: Read/write must reach minimum node count for consistency
Replication Lag
- Synchronous: Writes confirmed only after all replicas update
- Asynchronous: Replicas update eventually (risk of stale reads)
5. CAP Theorem
The CAP theorem states that a distributed system can only guarantee two of three properties:
- Consistency: Every read receives the most recent write
- Availability: Every request receives a response (without guarantee it's the latest)
- Partition Tolerance: System continues despite network failures
Real-World Trade-offs
CP Systems (Consistency + Partition Tolerance):
- MongoDB
- Redis
- HBase
AP Systems (Availability + Partition Tolerance):
- Cassandra
- DynamoDB
- CouchDB
CA Systems (Consistency + Availability):
- Traditional RDBMS (single node only)
Consistency
/\
/ \
/ \
/ \
CA / \ CP
/ \
/ \
/______________\
Availability Partition Tolerance
6. Microservices Architecture
Microservices decompose applications into small, independent services communicating over well-defined APIs.
Key Principles
- Single Responsibility: Each service owns one business capability
- Decentralized Data Management: Each service owns its database
- Independent Deployment: Services can be deployed without coordinating others
- Failure Isolation: One service failure doesn't cascade
Communication Patterns
Synchronous:
- REST APIs
- gRPC
- GraphQL
Asynchronous:
- Message queues (RabbitMQ, Kafka)
- Event streaming
- Pub/Sub patterns
Service Mesh
Modern microservices use service meshes (Istio, Linkerd) for:
- Service discovery
- Load balancing
- Retry logic
- mTLS encryption
- Observability
7. Event-Driven Architecture
Events represent state changes in a system. Event-driven architecture uses these events to trigger and communicate between services.
Core Components
- Event Producer: Generates events when state changes
- Event Router/Broker: Routes events to interested consumers
- Event Consumer: Reacts to events
[Order Service] β Event: "OrderCreated" β [Message Broker]
β
ββββββββββββββββΌβββββββββββββββ
β β β
[Email Service] [Inventory] [Analytics]
Event Sourcing
Instead of storing current state, store all events that led to it:
Account Balance = Sum of all transactions
Benefits:
- Complete audit trail
- Temporal queries (state at any point in time)
- Easy replay for debugging
8. Consistent Hashing
Consistent hashing distributes data across nodes while minimizing reorganization when nodes join or leave.
How It Works
- Hash ring maps both data keys and nodes to the same hash space
- Each key is assigned to the next node clockwise on the ring
- When a node leaves, only its keys move to the next node
- When a node joins, it takes keys from its neighbors
Virtual Nodes
To improve distribution, each physical node maps to multiple virtual nodes on the ring, preventing hotspots.
Use Cases:
- Distributed caches (Memcached, Redis Cluster)
- CDN routing
- Distributed databases (Cassandra, DynamoDB)
9. Rate Limiting
Rate limiting controls how many requests a client can make within a time window.
Algorithms
Fixed Window Counter: Simple counting per time window, but allows burst at boundaries
Sliding Window Log: Tracks timestamps of each request for precision
Token Bucket: Tokens refill at constant rate; requests consume tokens
Leaky Bucket: Requests processed at constant rate, excess queued or dropped
Implementation Example
# Token Bucket Implementation
class TokenBucket:
def __init__(self, rate, capacity):
self.rate = rate # tokens per second
self.capacity = capacity
self.tokens = capacity
self.last_refill = time.now()
def allow_request(self):
self._refill()
if self.tokens >= 1:
self.tokens -= 1
return True
return False
def _refill(self):
elapsed = time.now() - self.last_refill
self.tokens = min(
self.capacity,
self.tokens + (elapsed * self.rate)
)
10. Circuit Breaker Pattern
The circuit breaker prevents cascading failures when a service is unhealthy.
States
Closed: Normal operation, requests pass through Open: Failures exceeded threshold, requests fail immediately Half-Open: Allow limited requests to test if service recovered
[Closed] ββfailures exceed thresholdβββ [Open]
β β
β timeout expires
β β
βββββββββββrecovery detectedβββββ [Half-Open]
Benefits
- Fail Fast: Don't wait for timeouts on known-down services
- Self-Healing: Automatically recover when dependency is healthy
- Graceful Degradation: Provide fallback responses
Putting It All Together: A Practical Example
Consider designing a URL shortener like bit.ly:
Client β CDN (cached redirects)
β
Load Balancer
β
βββββββββββΌββββββββββ
β β β
[API] [API] [API] (Application Servers)
β β β
βββββββββββΌββββββββββ
β
[Redis Cache] β Frequently accessed URLs
β
[Sharded Database] β URL mappings
βββ Shard 1: A-M
βββ Shard 2: N-Z
Design Decisions:
- Load Balancer: Distribute traffic across API servers
- CDN + Redis: Cache popular redirects for sub-millisecond response
- Database Sharding: Split by short URL hash for horizontal scaling
- Consistent Hashing: Minimize reshuffling when adding shards
- Rate Limiting: Prevent abuse per IP address
- Circuit Breaker: Handle database failures gracefully
Key Takeaways
- No Silver Bullet: Each pattern solves specific problems with trade-offs
- Start Simple: Don't over-engineer before you need scale
- Measure First: Profile bottlenecks before optimizing
- Design for Failure: Assume components will fail and plan accordingly
- Iterate: Systems evolve; build for maintainability, not perfection
Further Reading
- "Designing Data-Intensive Applications" by Martin Kleppmann
- "System Design Interview" by Alex Xu
- High Scalability Blog: http://highscalability.com/
- The Architecture of Open Source Applications: http://aosabook.org/
What system design concepts do you find most useful? Have you implemented any of these patterns in production? Share your experiences in the comments below!