Popular System Design Concepts Every Software Engineer Should Know

Popular System Design Concepts Every Software Engineer Should Know

System design is one of the most critical skills for software engineers, especially as applications grow from serving a handful of users to millions. Whether you're preparing for a system design interview or building production systems, understanding these fundamental concepts will help you architect robust, scalable solutions.

1. Load Balancing

Load balancing is the practice of distributing workloads across multiple computing resources to ensure no single server becomes overwhelmed.

How It Works

A load balancer sits between clients and backend servers, routing incoming requests based on various algorithms:

  • Round Robin: Distributes requests sequentially across servers
  • Least Connections: Routes to the server with fewest active connections
  • IP Hash: Uses client IP to determine server assignment (ensures session affinity)
  • Weighted Distribution: Assigns more traffic to more powerful servers

Types of Load Balancers

  • Layer 4 (Transport Layer): Routes based on IP and TCP/UDP ports
  • Layer 7 (Application Layer): Routes based on HTTP headers, URLs, and content

Popular Solutions

  • Nginx: Widely used reverse proxy and load balancer
  • HAProxy: High-performance TCP/HTTP load balancer
  • AWS ALB/NLB: Amazon's Application and Network Load Balancers
  • Cloudflare Load Balancing: Global server load balancing (GSLB)
Client β†’ Load Balancer β†’ [Server 1, Server 2, Server 3]

2. Caching Strategies

Caching stores frequently accessed data in fast storage layers to reduce latency and database load.

Cache Levels

Browser Cache: Static assets stored on the client side CDN Cache: Content distributed across edge locations globally Application Cache: In-memory caches like Redis or Memcached Database Cache: Query result caches and buffer pools

Cache Eviction Policies

  • LRU (Least Recently Used): Removes the least recently accessed items
  • LFU (Least Frequently Used): Removes the least frequently accessed items
  • FIFO (First In, First Out): Removes the oldest cached item
  • TTL (Time to Live): Expires items after a set duration

Cache-Aside vs Write-Through

Cache-Aside (Lazy Loading):

Application β†’ Check Cache β†’ If miss β†’ Query DB β†’ Store in Cache β†’ Return

Write-Through:

Application β†’ Write to Cache & DB simultaneously

Write-Behind:

Application β†’ Write to Cache β†’ Async flush to DB

Common Pitfalls

  • Cache Stampede: Multiple requests for expired key simultaneously hit DB
  • Cache Penetration: Requests for non-existent data bypass cache
  • Cache Avalanche: Many keys expire at once, overwhelming the database

3. Database Sharding

Sharding is a horizontal partitioning technique that splits large databases across multiple machines.

Sharding Strategies

Range-Based Sharding: Divides data based on value ranges

Shard 1: Users A-F
Shard 2: Users G-M
Shard 3: Users N-Z

Hash-Based Sharding: Applies hash function to determine shard placement

shard = hash(user_id) % number_of_shards

Directory-Based Sharding: Uses a lookup table to map data to shards

When to Shard

  • Database size exceeds single server capacity
  • Write throughput becomes a bottleneck
  • Geographic distribution requires data locality

Challenges

  • Cross-shard joins become complex and slow
  • Rebalancing data when adding/removing shards
  • Distributed transactions require two-phase commit

4. Replication Patterns

Replication maintains multiple copies of data across different nodes for availability and read scalability.

Single Leader Replication

Writes β†’ Leader Node β†’ Replicated to β†’ Follower Nodes
Reads  β†’ Can go to Leader or Followers

Pros: Simple, consistent writes Cons: Write bottleneck, leader failure are critical

Multi-Leader Replication

Writes β†’ Any Leader β†’ Synced to β†’ Other Leaders & Followers

Use Case: Multi-datacenter deployments, collaborative editing Challenge: Write conflicts require resolution strategies

Leaderless Replication

Clients write directly to multiple nodes (e.g., DynamoDB, Cassandra) Quorum: Read/write must reach minimum node count for consistency

Replication Lag

  • Synchronous: Writes confirmed only after all replicas update
  • Asynchronous: Replicas update eventually (risk of stale reads)

5. CAP Theorem

The CAP theorem states that a distributed system can only guarantee two of three properties:

  • Consistency: Every read receives the most recent write
  • Availability: Every request receives a response (without guarantee it's the latest)
  • Partition Tolerance: System continues despite network failures

Real-World Trade-offs

CP Systems (Consistency + Partition Tolerance):

  • MongoDB
  • Redis
  • HBase

AP Systems (Availability + Partition Tolerance):

  • Cassandra
  • DynamoDB
  • CouchDB

CA Systems (Consistency + Availability):

  • Traditional RDBMS (single node only)
            Consistency
               /\
              /  \
             /    \
            /      \
      CA   /        \   CP
          /          \
         /            \
        /______________\
    Availability    Partition Tolerance

6. Microservices Architecture

Microservices decompose applications into small, independent services communicating over well-defined APIs.

Key Principles

  • Single Responsibility: Each service owns one business capability
  • Decentralized Data Management: Each service owns its database
  • Independent Deployment: Services can be deployed without coordinating others
  • Failure Isolation: One service failure doesn't cascade

Communication Patterns

Synchronous:

  • REST APIs
  • gRPC
  • GraphQL

Asynchronous:

  • Message queues (RabbitMQ, Kafka)
  • Event streaming
  • Pub/Sub patterns

Service Mesh

Modern microservices use service meshes (Istio, Linkerd) for:

  • Service discovery
  • Load balancing
  • Retry logic
  • mTLS encryption
  • Observability

7. Event-Driven Architecture

Events represent state changes in a system. Event-driven architecture uses these events to trigger and communicate between services.

Core Components

  • Event Producer: Generates events when state changes
  • Event Router/Broker: Routes events to interested consumers
  • Event Consumer: Reacts to events
[Order Service] β†’ Event: "OrderCreated" β†’ [Message Broker]
                                               ↓
                                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                ↓              ↓              ↓
                          [Email Service] [Inventory]  [Analytics]

Event Sourcing

Instead of storing current state, store all events that led to it:

Account Balance = Sum of all transactions

Benefits:

  • Complete audit trail
  • Temporal queries (state at any point in time)
  • Easy replay for debugging

8. Consistent Hashing

Consistent hashing distributes data across nodes while minimizing reorganization when nodes join or leave.

How It Works

  1. Hash ring maps both data keys and nodes to the same hash space
  2. Each key is assigned to the next node clockwise on the ring
  3. When a node leaves, only its keys move to the next node
  4. When a node joins, it takes keys from its neighbors

Virtual Nodes

To improve distribution, each physical node maps to multiple virtual nodes on the ring, preventing hotspots.

Use Cases:

  • Distributed caches (Memcached, Redis Cluster)
  • CDN routing
  • Distributed databases (Cassandra, DynamoDB)

9. Rate Limiting

Rate limiting controls how many requests a client can make within a time window.

Algorithms

Fixed Window Counter: Simple counting per time window, but allows burst at boundaries

Sliding Window Log: Tracks timestamps of each request for precision

Token Bucket: Tokens refill at constant rate; requests consume tokens

Leaky Bucket: Requests processed at constant rate, excess queued or dropped

Implementation Example

# Token Bucket Implementation
class TokenBucket:
    def __init__(self, rate, capacity):
        self.rate = rate  # tokens per second
        self.capacity = capacity
        self.tokens = capacity
        self.last_refill = time.now()

    def allow_request(self):
        self._refill()
        if self.tokens >= 1:
            self.tokens -= 1
            return True
        return False

    def _refill(self):
        elapsed = time.now() - self.last_refill
        self.tokens = min(
            self.capacity,
            self.tokens + (elapsed * self.rate)
        )

10. Circuit Breaker Pattern

The circuit breaker prevents cascading failures when a service is unhealthy.

States

Closed: Normal operation, requests pass through Open: Failures exceeded threshold, requests fail immediately Half-Open: Allow limited requests to test if service recovered

[Closed] ──failures exceed threshold──→ [Open]
   ↑                                       β”‚
   β”‚                                 timeout expires
   β”‚                                       ↓
   └──────────recovery detected───── [Half-Open]

Benefits

  • Fail Fast: Don't wait for timeouts on known-down services
  • Self-Healing: Automatically recover when dependency is healthy
  • Graceful Degradation: Provide fallback responses

Putting It All Together: A Practical Example

Consider designing a URL shortener like bit.ly:

Client β†’ CDN (cached redirects)
          ↓
     Load Balancer
          ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
↓         ↓         ↓
[API]   [API]     [API] (Application Servers)
↓         ↓         ↓
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          ↓
     [Redis Cache] ← Frequently accessed URLs
          ↓
[Sharded Database] ← URL mappings
     β”œβ”€β”€ Shard 1: A-M
     └── Shard 2: N-Z

Design Decisions:

  • Load Balancer: Distribute traffic across API servers
  • CDN + Redis: Cache popular redirects for sub-millisecond response
  • Database Sharding: Split by short URL hash for horizontal scaling
  • Consistent Hashing: Minimize reshuffling when adding shards
  • Rate Limiting: Prevent abuse per IP address
  • Circuit Breaker: Handle database failures gracefully

Key Takeaways

  1. No Silver Bullet: Each pattern solves specific problems with trade-offs
  2. Start Simple: Don't over-engineer before you need scale
  3. Measure First: Profile bottlenecks before optimizing
  4. Design for Failure: Assume components will fail and plan accordingly
  5. Iterate: Systems evolve; build for maintainability, not perfection

Further Reading


What system design concepts do you find most useful? Have you implemented any of these patterns in production? Share your experiences in the comments below!