Complete System Design Guide

Design YouTube Video Platform

Build a global video-sharing platform handling 500+ hours of uploads per minute

Master video transcoding, adaptive bitrate streaming, CDN architecture, and serving billions of video views daily

All System Design

What You Will Learn

Problem Statement & Requirements
Scale & Traffic Estimation
High-Level Architecture
Video Upload Pipeline
Video Transcoding & Encoding
Adaptive Bitrate Streaming
Content Delivery Network (CDN)
Database Design
Search & Discovery
View Count at Scale
Comments & Engagement
Recommendations System

1Understanding the Problem

YouTube is the world's largest video-sharing platform where users can upload, view, share, and comment on videos. The platform serves over 2 billion logged-in users monthly, making it one of the most complex distributed systems in the world.

What Makes Video Platforms Challenging?

Unlike text or image sharing, video platforms deal with massive file sizes, complex encoding requirements, and the need for smooth playback across varying network conditions.

Large File Sizes

A 10-min 4K video can be 3-5 GB raw

Heavy Processing

Transcoding requires significant compute

Global Delivery

Low latency streaming worldwide

Functional Requirements

  • Users can upload videos in various formats (MP4, MOV, AVI, etc.)
  • Stream videos with adaptive quality based on network speed
  • Search videos by title, description, and tags
  • Like, comment, and share videos
  • Subscribe to channels and receive notifications
  • View accurate video view counts

Non-Functional Requirements

  • High Availability: 99.99% uptime for playback
  • Low Latency: Time to First Frame (TTFF) < 200ms
  • Scalability: 2B+ monthly users, 500+ hrs video/min uploaded
  • Durability: Zero video loss (replicated storage)
  • Global Reach: Fast streaming from any location
  • Cost Efficiency: Optimize storage and bandwidth costs

Scope Note: This design focuses on Video-on-Demand (VOD). Live streaming has different requirements (real-time encoding, ultra-low latency) and is typically a separate system.

2Scale & Traffic Estimation

Understanding YouTube's scale helps us make informed architectural decisions. These numbers are staggering.

Real-World Scale (Approximate)

  • Monthly Active Users: 2+ billion
  • Daily Video Views: 5+ billion
  • Video Uploads: 500+ hours of video every minute
  • Average Video Duration: 11.7 minutes
  • Average Watch Time: 40 minutes per user per day
  • Total Videos: 800+ million videos

Traffic Calculations:

// Video Views Per Second

5B views / 86,400 sec ≈ 58,000 views/sec

// Video Uploads Per Second

500 hours × 60 min / 60 sec ≈ 500 minutes of video/sec

// Storage Added Per Day (assume avg 100MB/min after compression)

500 min/sec × 86,400 sec × 100MB ≈ 4+ PB/day

// Bandwidth for Streaming (assume 5 Mbps avg bitrate)

58K concurrent × 5 Mbps × peak factor(3x) ≈ 870+ Tbps peak

Storage Requirements

  • Raw Upload: Temporary storage for processing
  • Transcoded Videos: Multiple resolutions (144p to 4K)
  • Thumbnails: Multiple sizes per video
  • Metadata: Title, description, tags, analytics
  • Estimated Total Storage: Exabytes (1000s of PB)

Read vs Write Ratio

  • Reads (Views): ~58,000/sec
  • Writes (Uploads): ~8/sec (videos, not bytes)
  • Ratio: ~7000:1 (extremely read-heavy)
  • Implication: Optimize for read performance
  • Strategy: Heavy caching, CDN for delivery

Key Insight: The system is extremely read-heavy but writes are compute-intensive (transcoding). We need to optimize reads through CDN and caching, while handling writes through asynchronous processing pipelines.

3High-Level Architecture

The YouTube architecture can be divided into two main flows: Upload Path (write) and View Path (read).

System Architecture Overview


┌─────────────────────────────────────────────────────────────────────────────┐
│                            CLIENT LAYER                                      │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │  Mobile App  │  │   Web App    │  │   Smart TV   │  │  Gaming Console│    │
│  └──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘     │
└─────────────────────────────────────────────────────────────────────────────┘
                        │                              │
            ┌───────────┘                              └───────────┐
            │ UPLOAD PATH                              VIEW PATH   │
            ▼                                                      ▼
┌───────────────────────┐                          ┌───────────────────────────┐
│    Upload Service     │                          │         CDN               │
│  (Chunked Uploads)    │                          │   (Edge Servers)          │
└───────────┬───────────┘                          └─────────────┬─────────────┘
            │                                                    │ Cache Miss
            ▼                                                    ▼
┌───────────────────────┐                          ┌───────────────────────────┐
│   Original Storage    │                          │      Origin Servers       │
│   (Blob Storage)      │                          │    (Video Streaming)      │
└───────────┬───────────┘                          └───────────────────────────┘
            │
            ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                        TRANSCODING PIPELINE                                  │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐  ┌────────────┐             │
│  │   Split    │─▶│  Encode    │─▶│    Merge   │─▶│  Package   │             │
│  │  (Chunks)  │  │ (Parallel) │  │  (Stitch)  │  │ (HLS/DASH) │             │
│  └────────────┘  └────────────┘  └────────────┘  └────────────┘             │
└─────────────────────────────────────────────────────────────────────────────┘
            │
            ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                           DATA LAYER                                         │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │   Metadata   │  │    Video     │  │    Redis     │  │    Kafka     │     │
│  │   (MySQL)    │  │   (GCS/S3)   │  │   (Cache)    │  │  (Events)    │     │
│  └──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘     │
└─────────────────────────────────────────────────────────────────────────────┘

Upload Path (Write)

  1. User uploads video via chunked upload
  2. Video stored in temporary blob storage
  3. Upload Service publishes "video uploaded" event
  4. Transcoding Pipeline picks up the job
  5. Video encoded into multiple formats/resolutions
  6. Encoded videos stored in permanent storage
  7. Metadata updated, video marked as "ready"
  8. User notified of successful upload

View Path (Read)

  1. User requests video playback
  2. Request hits nearest CDN edge server
  3. If cached: serve directly (fast!)
  4. If not cached: fetch from origin
  5. Origin serves video manifest (HLS/DASH)
  6. Player requests video segments
  7. Adaptive bitrate adjusts quality
  8. View count incremented asynchronously

Why Separate Paths? The upload path is compute-heavy (transcoding) while the view path is bandwidth-heavy (streaming). Separating them allows independent scaling and optimization.

4Video Upload Pipeline

Users upload videos of varying sizes (from a few MB to several GB). The upload system must handle unreliable networks, support resume, and provide progress feedback.

Resumable Chunked Upload

Large videos are split into chunks (typically 5-10 MB each). If upload fails, only the failed chunk needs to be re-uploaded.

// Chunked Upload Flow
Client                          Upload Service                    Storage
  │                                   │                              │
  │──── 1. Initialize Upload ────────▶│                              │
  │      (filename, size, mime)       │                              │
  │◀─── 2. Return upload_id ──────────│                              │
  │                                   │                              │
  │──── 3. Upload Chunk 1 ───────────▶│────── Store Chunk 1 ────────▶│
  │◀─── 4. Chunk 1 ACK ───────────────│                              │
  │                                   │                              │
  │──── 5. Upload Chunk 2 ───────────▶│────── Store Chunk 2 ────────▶│
  │◀─── 6. Chunk 2 ACK ───────────────│                              │
  │        ... (repeat)               │                              │
  │                                   │                              │
  │──── 7. Complete Upload ──────────▶│                              │
  │      (all chunk IDs)              │────── Combine Chunks ───────▶│
  │                                   │────── Trigger Processing ───▶│
  │◀─── 8. Upload Complete ───────────│                              │

Upload API Design

// 1. Initialize Upload
POST /api/v1/videos/upload/init
{
  "filename": "my_video.mp4",
  "fileSize": 1073741824,  // 1 GB
  "mimeType": "video/mp4",
  "title": "My Awesome Video",
  "description": "..."
}
Response: { "uploadId": "abc123", "chunkSize": 10485760 }

// 2. Upload Chunk
PUT /api/v1/videos/upload/{uploadId}/chunks/{chunkIndex}
Content-Type: application/octet-stream
Body: <binary chunk data>
Response: { "chunkId": "chunk_001", "received": 10485760 }

// 3. Complete Upload
POST /api/v1/videos/upload/{uploadId}/complete
{
  "chunks": ["chunk_001", "chunk_002", ...]
}
Response: { "videoId": "vid_xyz", "status": "processing" }

Pre-signed URLs for Direct Upload

For better scalability, clients can upload directly to blob storage (S3/GCS) using pre-signed URLs, bypassing application servers.

// Get Pre-signed URL
POST /api/v1/videos/upload/presigned
{
  "filename": "chunk_001.bin",
  "contentType": "application/octet-stream"
}
Response: {
  "uploadUrl": "https://storage.example.com/...",
  "expiresAt": "2024-01-15T12:00:00Z"
}

// Client uploads directly to storage URL
PUT {uploadUrl}
Content-Type: application/octet-stream
Body: <chunk data>

Upload Validation & Safety

Format Validation

Check file headers (magic bytes) to verify actual video format, not just extension

Size Limits

Max file size (e.g., 256 GB), max duration (e.g., 12 hours), per-user quotas

Content Moderation

ML-based scanning for copyright, violence, adult content before publishing

5Video Transcoding Pipeline

Transcoding converts the uploaded video into multiple formats and resolutions. This is the most compute-intensive part of the system.

Why Transcode?

Multiple Resolutions

144p, 240p, 360p, 480p, 720p, 1080p, 1440p, 4K

Codec Compatibility

H.264, H.265/HEVC, VP9, AV1 for different devices

Compression

Reduce file size while maintaining quality

Segmentation

Split into chunks for adaptive streaming

Transcoding Pipeline (DAG)

The pipeline uses a Directed Acyclic Graph (DAG) to parallelize independent tasks while respecting dependencies.


                        ┌─────────────────────────────────────────────────────┐
                        │                 ORIGINAL VIDEO                       │
                        └─────────────────────────┬───────────────────────────┘
                                                  │
                                    ┌─────────────┴─────────────┐
                                    ▼                           ▼
                           ┌────────────────┐          ┌────────────────┐
                           │  Video Split   │          │  Audio Extract │
                           │  (into chunks) │          │    (AAC/Opus)  │
                           └───────┬────────┘          └───────┬────────┘
                                   │                           │
         ┌─────────────────────────┼─────────────────────────┐ │
         ▼                         ▼                         ▼ │
┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Encode 1080p   │     │  Encode 720p    │     │  Encode 480p    │  ...
│  (H.264/VP9)    │     │  (H.264/VP9)    │     │  (H.264/VP9)    │
└────────┬────────┘     └────────┬────────┘     └────────┬────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
                                 ▼
                        ┌────────────────┐
                        │    Merge &     │◀──── Audio
                        │   Package      │
                        │  (HLS/DASH)    │
                        └───────┬────────┘
                                │
              ┌─────────────────┼─────────────────┐
              ▼                 ▼                 ▼
     ┌────────────────┐ ┌────────────────┐ ┌────────────────┐
     │  Thumbnail     │ │  Manifest      │ │  Content       │
     │  Generation    │ │  Creation      │ │  Storage       │
     └────────────────┘ └────────────────┘ └────────────────┘

Encoding Presets

Resolution  Bitrate(H.264)  Bitrate(VP9)
─────────────────────────────────────────
4K (2160p)  35-45 Mbps      18-25 Mbps
1440p       16 Mbps         9 Mbps
1080p       8 Mbps          5 Mbps
720p        5 Mbps          3 Mbps
480p        2.5 Mbps        1.5 Mbps
360p        1 Mbps          0.6 Mbps
240p        0.5 Mbps        0.3 Mbps
144p        0.2 Mbps        0.1 Mbps

Note: VP9/AV1 achieve same quality
at ~50% lower bitrate than H.264

Parallelization Strategy

  • 1.Split video into 10-second chunks
  • 2.Distribute chunks to worker pool
  • 3.Encode in parallel across resolutions AND chunks
  • 4.Stitch chunks back together per resolution
  • 5.Generate manifest linking all versions

Result: A 10-minute video can be transcoded in ~2 minutes with sufficient parallelism instead of 30+ minutes sequentially.

Worker Queue Architecture

// Message Queue for Transcoding Jobs (Kafka/SQS)

TranscodeJob {
    job_id:        UUID
    video_id:      UUID
    source_path:   "gs://uploads/raw/abc123.mp4"
    output_path:   "gs://videos/processed/abc123/"
    tasks: [
        { type: "split", status: "completed" },
        { type: "encode_1080p", chunk: 1, status: "processing" },
        { type: "encode_1080p", chunk: 2, status: "queued" },
        { type: "encode_720p", chunk: 1, status: "queued" },
        // ... more tasks
    ]
    created_at:    timestamp
    priority:      int  // Premium users get higher priority
}

// Workers pick jobs from queue based on priority
// Each worker handles one encoding task at a time
// Results written back to storage, status updated in DB

6Adaptive Bitrate Streaming (ABR)

ABR automatically adjusts video quality based on network conditions, ensuring smooth playback without buffering. This is what makes YouTube work well on slow connections.

How ABR Works


┌─────────────────────────────────────────────────────────────────────────────┐
│                        ADAPTIVE BITRATE STREAMING                            │
└─────────────────────────────────────────────────────────────────────────────┘

1. Video is split into small segments (2-10 seconds each)
2. Each segment is encoded at multiple quality levels
3. Client downloads a manifest file listing all available qualities
4. Player monitors bandwidth and buffer level
5. Player requests appropriate quality for each segment

         Network Fast                    Network Slow
              │                               │
              ▼                               ▼
    ┌─────────────────┐            ┌─────────────────┐
    │    1080p        │            │    480p         │
    │   Segment 1     │            │   Segment 5     │
    └─────────────────┘            └─────────────────┘
              │                               │
              ▼                               ▼
    ┌─────────────────┐            ┌─────────────────┐
    │    1080p        │            │    360p         │
    │   Segment 2     │            │   Segment 6     │
    └─────────────────┘            └─────────────────┘
              │           Network          │
              ▼           Recovers         ▼
    ┌─────────────────┐    ────▶   ┌─────────────────┐
    │    720p         │            │    720p         │
    │   Segment 3     │            │   Segment 7     │
    └─────────────────┘            └─────────────────┘

Result: Smooth playback with quality matching available bandwidth

HLS (HTTP Live Streaming)

Apple's protocol, widely supported on iOS, Safari, and most devices.

#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
360p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1400000,RESOLUTION=854x480
480p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2800000,RESOLUTION=1280x720
720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/playlist.m3u8

// Each resolution playlist:
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:10
#EXTINF:10.0,
segment_001.ts
#EXTINF:10.0,
segment_002.ts
...

DASH (Dynamic Adaptive Streaming)

Open standard, preferred for web browsers and Android.

<?xml version="1.0"?>
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011"
     type="static" mediaPresentationDuration="PT10M">
  <Period>
    <AdaptationSet mimeType="video/mp4">
      <Representation id="360p" bandwidth="800000"
                      width="640" height="360">
        <SegmentTemplate media="360p_$Number$.m4s"
                         initialization="360p_init.mp4"/>
      </Representation>
      <Representation id="720p" bandwidth="2800000"
                      width="1280" height="720">
        <SegmentTemplate media="720p_$Number$.m4s"
                         initialization="720p_init.mp4"/>
      </Representation>
      <!-- More representations -->
    </AdaptationSet>
  </Period>
</MPD>

ABR Algorithm Factors

Bandwidth Estimation

Measure download speed of recent segments

Buffer Level

How many seconds of video are buffered ahead

Device Capability

Screen resolution, decoder support

Stability

Avoid frequent quality switches

Time to First Frame (TTFF): The time from clicking play to seeing the first frame. YouTube optimizes this by starting with a lower quality and quickly switching up. Target: <200ms.

7Content Delivery Network (CDN)

A CDN is essential for serving video content globally with low latency. YouTube uses a massive distributed network of edge servers.

CDN Architecture


┌──────────────────────────────────────────────────────────────────────────────┐
│                              GLOBAL CDN                                       │
└──────────────────────────────────────────────────────────────────────────────┘

          User in Tokyo           User in London          User in NYC
               │                       │                       │
               ▼                       ▼                       ▼
        ┌─────────────┐         ┌─────────────┐         ┌─────────────┐
        │ Tokyo Edge  │         │ London Edge │         │  NYC Edge   │
        │   Server    │         │   Server    │         │   Server    │
        └──────┬──────┘         └──────┬──────┘         └──────┬──────┘
               │ Cache                 │ Cache                 │ Cache
               │ Miss?                 │ Miss?                 │ Miss?
               ▼                       ▼                       ▼
        ┌─────────────┐         ┌─────────────┐         ┌─────────────┐
        │ Asia-Pacific│         │   Europe    │         │  US-East    │
        │   Regional  │         │  Regional   │         │  Regional   │
        │    Cache    │         │   Cache     │         │   Cache     │
        └──────┬──────┘         └──────┬──────┘         └──────┬──────┘
               │                       │                       │
               └───────────────────────┼───────────────────────┘
                                       │
                                       ▼
                              ┌─────────────────┐
                              │  Origin Server  │
                              │  (Video Storage)│
                              └─────────────────┘

Cache Hit: Edge serves directly (~10-50ms latency)
Cache Miss: Fetch from regional → origin, then cache for future requests

CDN Caching Strategy

  • Popular videos: Cached at edge (95%+ hit rate)
  • Moderately popular: Regional cache
  • Long-tail videos: Served from origin
  • Cache key: video_id + resolution + segment_number
  • TTL: 1 year (videos rarely change)

Google's CDN Innovation

  • Google Global Cache: Servers inside ISPs
  • Benefit: Reduces ISP bandwidth costs
  • Coverage: 1000s of ISPs worldwide
  • Result: Sub-10ms latency to most users
  • Cost: ISPs host hardware for free content

Video URL Structure

// Video Request Flow
1. Client requests: GET /watch?v=dQw4w9WgXcQ

2. Server returns video page with manifest URL:
   https://manifest.googlevideo.com/api/manifest/hls_variant/
   ?video_id=dQw4w9WgXcQ&signature=abc123...

3. Client fetches manifest, chooses quality, requests segments:
   https://rr3---sn-abc123.googlevideo.com/videoplayback
   ?id=dQw4w9WgXcQ
   &itag=22        // Quality identifier (1080p H.264)
   &range=0-1000   // Byte range for segment
   &signature=...  // Signed URL for security
   &expire=...     // URL expiration time

Key: "sn-abc123" identifies the specific edge server
     Signature prevents hotlinking and unauthorized access

8Database Design

YouTube's data model includes video metadata, user information, engagement data, and relationships. Different data types require different storage solutions.

Core Database Schema

-- Videos Table (sharded by video_id)
CREATE TABLE videos (
    video_id        VARCHAR(11) PRIMARY KEY,  -- YouTube uses 11-char IDs
    channel_id      VARCHAR(24) NOT NULL,
    title           VARCHAR(100) NOT NULL,
    description     TEXT,
    duration_sec    INT NOT NULL,
    upload_date     TIMESTAMP DEFAULT NOW(),
    status          ENUM('processing', 'ready', 'blocked', 'deleted'),
    visibility      ENUM('public', 'unlisted', 'private'),
    category_id     INT,
    default_language VARCHAR(5),
    INDEX idx_channel (channel_id),
    INDEX idx_upload_date (upload_date)
);

-- Video Stats (separate table for frequent updates)
CREATE TABLE video_stats (
    video_id        VARCHAR(11) PRIMARY KEY,
    view_count      BIGINT DEFAULT 0,
    like_count      BIGINT DEFAULT 0,
    dislike_count   BIGINT DEFAULT 0,
    comment_count   BIGINT DEFAULT 0,
    last_updated    TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (video_id) REFERENCES videos(video_id)
);

-- Channels Table
CREATE TABLE channels (
    channel_id      VARCHAR(24) PRIMARY KEY,
    user_id         BIGINT NOT NULL,
    channel_name    VARCHAR(100) NOT NULL,
    description     TEXT,
    subscriber_count BIGINT DEFAULT 0,
    video_count     INT DEFAULT 0,
    created_at      TIMESTAMP DEFAULT NOW(),
    profile_pic_url VARCHAR(255),
    banner_url      VARCHAR(255),
    INDEX idx_user (user_id)
);

-- Subscriptions Table (sharded by subscriber_id)
CREATE TABLE subscriptions (
    subscriber_id   BIGINT,
    channel_id      VARCHAR(24),
    subscribed_at   TIMESTAMP DEFAULT NOW(),
    notifications   BOOLEAN DEFAULT true,
    PRIMARY KEY (subscriber_id, channel_id),
    INDEX idx_channel_subs (channel_id)
);

-- Comments Table (sharded by video_id)
CREATE TABLE comments (
    comment_id      BIGINT PRIMARY KEY AUTO_INCREMENT,
    video_id        VARCHAR(11) NOT NULL,
    user_id         BIGINT NOT NULL,
    parent_id       BIGINT,  -- NULL for top-level comments
    content         TEXT NOT NULL,
    like_count      INT DEFAULT 0,
    created_at      TIMESTAMP DEFAULT NOW(),
    INDEX idx_video_comments (video_id, created_at),
    INDEX idx_parent (parent_id)
);

Storage Technology Choices

  • MySQL/Vitess: Video metadata, user data, channels
  • Google Cloud Storage/S3: Video files, thumbnails
  • BigTable/Cassandra: View counts, analytics (high write throughput)
  • Redis: Session data, hot video metadata cache
  • Elasticsearch: Video search, autocomplete

Sharding Strategy

  • Videos: Shard by video_id (hash-based)
  • Comments: Shard by video_id (locality)
  • Subscriptions: Shard by subscriber_id
  • User data: Shard by user_id
  • Watch history: Shard by user_id + time

Why Separate video_stats? View counts update millions of times per second. Keeping them in a separate table prevents write contention on the main videos table and allows using specialized high-write-throughput storage.

9View Count at Scale

Tracking view counts seems simple, but at YouTube's scale (58,000+ views/sec), it's a significant engineering challenge. We need accuracy without impacting performance.

Why Not Just INCREMENT?

Database Hotspot

A viral video would hammer one DB row, causing contention

Bot Detection

Need to filter fake views before counting

Read Pressure

Displaying count on every view request

Solution: Event Streaming + Batch Processing


┌─────────────────────────────────────────────────────────────────────────────┐
│                        VIEW COUNT PIPELINE                                   │
└─────────────────────────────────────────────────────────────────────────────┘

User Views Video
      │
      ▼
┌─────────────┐     ┌─────────────────────────────────────────────────────┐
│ View Event  │────▶│                    KAFKA                             │
│  Producer   │     │  Topic: video-views (partitioned by video_id)       │
└─────────────┘     └──────────────────────────┬──────────────────────────┘
                                               │
              ┌────────────────────────────────┼────────────────────────┐
              ▼                                ▼                        ▼
     ┌─────────────────┐            ┌─────────────────┐      ┌─────────────────┐
     │  Bot Detection  │            │  Real-time      │      │  Batch Job      │
     │     Filter      │            │   Counter       │      │  (Hourly)       │
     │ (Spam, Repeats) │            │  (Redis INCR)   │      │  (Exact Count)  │
     └────────┬────────┘            └────────┬────────┘      └────────┬────────┘
              │                              │                        │
              │ Valid Views                  │ Approximate            │ Accurate
              ▼                              ▼                        ▼
     ┌─────────────────┐            ┌─────────────────┐      ┌─────────────────┐
     │  Analytics DB   │            │  Display Count  │      │   Persistent    │
     │  (BigQuery)     │            │  (Fast, ~5min)  │      │      DB         │
     └─────────────────┘            └─────────────────┘      └─────────────────┘

Two-tier counting:
1. Real-time (Redis): Fast, approximate, for display
2. Batch (BigQuery): Accurate, for monetization & analytics

Bot Detection Heuristics

  • • Same IP viewing same video repeatedly
  • • Watch duration too short (<30 seconds)
  • • Unusual viewing patterns (no seeks, no pauses)
  • • Known bot user agents
  • • Geographic anomalies (views from unusual regions)
  • • Views from logged-out users spike

Redis Counter Pattern

// Increment view count (atomic)
INCR video:dQw4w9WgXcQ:views

// Get current count
GET video:dQw4w9WgXcQ:views

// Periodic sync to persistent DB
function syncViewCounts() {
  for each video_id in KEYS("video:*:views"):
    count = GETSET video:{video_id}:views 0
    UPDATE video_stats
    SET view_count = view_count + count
    WHERE video_id = video_id
}

Interesting Fact: YouTube intentionally slows down view count updates for new videos to allow time for bot detection. That's why you sometimes see a video stuck at "301 views" for hours.

10Search & Discovery

With 800+ million videos, helping users find relevant content is crucial. Search and recommendation are key to user engagement.

Video Search Architecture


User Query: "how to make pasta"
         │
         ▼
┌────────────────────┐
│  Query Processing  │
│  - Spell check     │
│  - Tokenization    │
│  - Synonym expansion│
└─────────┬──────────┘
          │
          ▼
┌────────────────────┐     ┌───────────────────────────────────────────┐
│   Elasticsearch    │────▶│ Inverted Index                            │
│   (Video Index)    │     │ "pasta" → [vid1, vid2, vid3, vid47, ...]  │
│                    │     │ "make"  → [vid2, vid5, vid47, vid99, ...] │
└─────────┬──────────┘     │ "how"   → [vid1, vid2, vid47, vid55, ...] │
          │                └───────────────────────────────────────────┘
          ▼
┌────────────────────┐
│    Ranking Layer   │
│  - Relevance score │
│  - Video quality   │
│  - Engagement rate │
│  - Freshness       │
│  - Personalization │
└─────────┬──────────┘
          │
          ▼
    Top 20 Results

Search Ranking Signals

  • Text Relevance: Title, description, tags, captions
  • Engagement: CTR, watch time, likes, comments
  • Quality: Video resolution, audio quality
  • Freshness: Recent uploads ranked higher for trending topics
  • Channel Authority: Subscriber count, upload frequency
  • Personalization: User's watch history, preferences

Indexing Pipeline

// Video becomes searchable
1. Video uploaded & transcoded
2. Extract searchable text:
   - Title & description
   - Auto-generated captions (ML)
   - OCR from video frames
   - Audio transcription
3. Generate embeddings (ML)
4. Index in Elasticsearch
5. Update in real-time as
   engagement metrics change

Recommendation System (Simplified)

YouTube's recommendation engine uses multiple signals to suggest videos:

Candidate Generation

ML models generate millions of candidate videos from user history, subscriptions, similar users

Ranking

Deep neural network scores each candidate based on predicted watch time and engagement

Filtering

Remove already watched, age-restricted, or policy-violating content

11Comments & Engagement

The comments system handles millions of comments per day with nested replies, likes, spam filtering, and real-time updates.

Comment System Architecture

// Comment Data Model
Comment {
    comment_id:     BIGINT
    video_id:       VARCHAR(11)
    user_id:        BIGINT
    parent_id:      BIGINT (NULL for top-level)
    content:        TEXT
    like_count:     INT
    reply_count:    INT (for top-level only)
    created_at:     TIMESTAMP
    is_pinned:      BOOLEAN
    is_hearted:     BOOLEAN (creator liked)
}

// Fetching Comments (Top + Newest)
SELECT * FROM comments
WHERE video_id = 'dQw4w9WgXcQ'
  AND parent_id IS NULL
ORDER BY
  is_pinned DESC,
  like_count DESC,
  created_at DESC
LIMIT 20;

// Fetching Replies
SELECT * FROM comments
WHERE parent_id = 12345
ORDER BY created_at ASC
LIMIT 10;

Spam & Moderation

  • • ML model scores comment spam probability
  • • High-risk comments held for review
  • • Creator can set word filters
  • • Report system for community moderation
  • • Slow mode during high-traffic events

Likes System

// Like a video (idempotent)
INSERT INTO video_likes
  (video_id, user_id, created_at)
VALUES ('dQw4w9', 123, NOW())
ON CONFLICT DO NOTHING;

// Update count asynchronously
// (not in critical path)

12Subscriptions & Notifications

When a channel with millions of subscribers uploads a video, we need to notify all subscribers efficiently without overwhelming the system.

Notification Fan-Out Strategy


Channel uploads new video
         │
         ▼
┌────────────────────┐
│ Video Published    │
│     Event          │
└─────────┬──────────┘
          │
          ▼
┌────────────────────┐
│ Check Subscriber   │
│     Count          │
└─────────┬──────────┘
          │
    ┌─────┴─────┐
    │           │
Small (<100K)   Large (>100K)
    │           │
    ▼           ▼
┌────────────┐  ┌────────────────────────┐
│ Fan-out on │  │   Fan-out on Read      │
│   Write    │  │   (Pull Model)         │
│            │  │                        │
│ Create     │  │ Store: "Channel X      │
│ notification│  │ uploaded at time T"   │
│ for each   │  │                        │
│ subscriber │  │ On app open:           │
│            │  │ "Get uploads from my   │
│            │  │  subscriptions since   │
│            │  │  last check"           │
└────────────┘  └────────────────────────┘

Hybrid approach balances write amplification vs read latency

Push Notification Flow

  1. Video published event triggers notification job
  2. Query subscribers with notifications enabled
  3. Batch subscribers (1000 per batch)
  4. Send to FCM/APNs in parallel
  5. Track delivery status for analytics

Subscription Feed Generation

// Get subscription feed
SELECT v.* FROM videos v
JOIN subscriptions s
  ON v.channel_id = s.channel_id
WHERE s.subscriber_id = :user_id
  AND v.upload_date > NOW() - INTERVAL 7 DAY
  AND v.visibility = 'public'
ORDER BY v.upload_date DESC
LIMIT 50;

// Cached per user, invalidated on new upload

Key Design Principles

  • Separate Upload & View Paths: Upload is compute-heavy (transcoding), viewing is bandwidth-heavy (streaming). Scale them independently.
  • Chunked & Resumable Uploads: Large files need to handle network failures gracefully. Use pre-signed URLs for direct-to-storage uploads.
  • Parallel Transcoding Pipeline: Split video into chunks, encode in parallel across resolutions, use a DAG for task dependencies.
  • Adaptive Bitrate Streaming: HLS/DASH with multiple quality levels allows smooth playback across varying network conditions.
  • CDN for Global Delivery: Cache popular videos at edge servers. Target 95%+ cache hit rate for low latency worldwide.
  • Event-Driven View Counting: Use Kafka for event streaming, Redis for real-time counts, batch processing for accurate analytics.
  • Hybrid Notification Fan-out: Fan-out on write for small channels, fan-out on read for large channels to balance latency and write amplification.

Ready to Design YouTube?

Practice with an AI interviewer and get instant feedback on your system design skills.

Related System Design Questions: