Build a global video-sharing platform handling 500+ hours of uploads per minute
Master video transcoding, adaptive bitrate streaming, CDN architecture, and serving billions of video views daily
YouTube is the world's largest video-sharing platform where users can upload, view, share, and comment on videos. The platform serves over 2 billion logged-in users monthly, making it one of the most complex distributed systems in the world.
Unlike text or image sharing, video platforms deal with massive file sizes, complex encoding requirements, and the need for smooth playback across varying network conditions.
Large File Sizes
A 10-min 4K video can be 3-5 GB raw
Heavy Processing
Transcoding requires significant compute
Global Delivery
Low latency streaming worldwide
Scope Note: This design focuses on Video-on-Demand (VOD). Live streaming has different requirements (real-time encoding, ultra-low latency) and is typically a separate system.
Understanding YouTube's scale helps us make informed architectural decisions. These numbers are staggering.
5B views / 86,400 sec ≈ 58,000 views/sec
500 hours × 60 min / 60 sec ≈ 500 minutes of video/sec
500 min/sec × 86,400 sec × 100MB ≈ 4+ PB/day
58K concurrent × 5 Mbps × peak factor(3x) ≈ 870+ Tbps peak
Key Insight: The system is extremely read-heavy but writes are compute-intensive (transcoding). We need to optimize reads through CDN and caching, while handling writes through asynchronous processing pipelines.
The YouTube architecture can be divided into two main flows: Upload Path (write) and View Path (read).
┌─────────────────────────────────────────────────────────────────────────────┐
│ CLIENT LAYER │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Mobile App │ │ Web App │ │ Smart TV │ │ Gaming Console│ │
│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
│ │
┌───────────┘ └───────────┐
│ UPLOAD PATH VIEW PATH │
▼ ▼
┌───────────────────────┐ ┌───────────────────────────┐
│ Upload Service │ │ CDN │
│ (Chunked Uploads) │ │ (Edge Servers) │
└───────────┬───────────┘ └─────────────┬─────────────┘
│ │ Cache Miss
▼ ▼
┌───────────────────────┐ ┌───────────────────────────┐
│ Original Storage │ │ Origin Servers │
│ (Blob Storage) │ │ (Video Streaming) │
└───────────┬───────────┘ └───────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ TRANSCODING PIPELINE │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Split │─▶│ Encode │─▶│ Merge │─▶│ Package │ │
│ │ (Chunks) │ │ (Parallel) │ │ (Stitch) │ │ (HLS/DASH) │ │
│ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ DATA LAYER │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Metadata │ │ Video │ │ Redis │ │ Kafka │ │
│ │ (MySQL) │ │ (GCS/S3) │ │ (Cache) │ │ (Events) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
Why Separate Paths? The upload path is compute-heavy (transcoding) while the view path is bandwidth-heavy (streaming). Separating them allows independent scaling and optimization.
Users upload videos of varying sizes (from a few MB to several GB). The upload system must handle unreliable networks, support resume, and provide progress feedback.
Large videos are split into chunks (typically 5-10 MB each). If upload fails, only the failed chunk needs to be re-uploaded.
// Chunked Upload Flow Client Upload Service Storage │ │ │ │──── 1. Initialize Upload ────────▶│ │ │ (filename, size, mime) │ │ │◀─── 2. Return upload_id ──────────│ │ │ │ │ │──── 3. Upload Chunk 1 ───────────▶│────── Store Chunk 1 ────────▶│ │◀─── 4. Chunk 1 ACK ───────────────│ │ │ │ │ │──── 5. Upload Chunk 2 ───────────▶│────── Store Chunk 2 ────────▶│ │◀─── 6. Chunk 2 ACK ───────────────│ │ │ ... (repeat) │ │ │ │ │ │──── 7. Complete Upload ──────────▶│ │ │ (all chunk IDs) │────── Combine Chunks ───────▶│ │ │────── Trigger Processing ───▶│ │◀─── 8. Upload Complete ───────────│ │
// 1. Initialize Upload
POST /api/v1/videos/upload/init
{
"filename": "my_video.mp4",
"fileSize": 1073741824, // 1 GB
"mimeType": "video/mp4",
"title": "My Awesome Video",
"description": "..."
}
Response: { "uploadId": "abc123", "chunkSize": 10485760 }
// 2. Upload Chunk
PUT /api/v1/videos/upload/{uploadId}/chunks/{chunkIndex}
Content-Type: application/octet-stream
Body: <binary chunk data>
Response: { "chunkId": "chunk_001", "received": 10485760 }
// 3. Complete Upload
POST /api/v1/videos/upload/{uploadId}/complete
{
"chunks": ["chunk_001", "chunk_002", ...]
}
Response: { "videoId": "vid_xyz", "status": "processing" }For better scalability, clients can upload directly to blob storage (S3/GCS) using pre-signed URLs, bypassing application servers.
// Get Pre-signed URL
POST /api/v1/videos/upload/presigned
{
"filename": "chunk_001.bin",
"contentType": "application/octet-stream"
}
Response: {
"uploadUrl": "https://storage.example.com/...",
"expiresAt": "2024-01-15T12:00:00Z"
}
// Client uploads directly to storage URL
PUT {uploadUrl}
Content-Type: application/octet-stream
Body: <chunk data>Format Validation
Check file headers (magic bytes) to verify actual video format, not just extension
Size Limits
Max file size (e.g., 256 GB), max duration (e.g., 12 hours), per-user quotas
Content Moderation
ML-based scanning for copyright, violence, adult content before publishing
Transcoding converts the uploaded video into multiple formats and resolutions. This is the most compute-intensive part of the system.
Multiple Resolutions
144p, 240p, 360p, 480p, 720p, 1080p, 1440p, 4K
Codec Compatibility
H.264, H.265/HEVC, VP9, AV1 for different devices
Compression
Reduce file size while maintaining quality
Segmentation
Split into chunks for adaptive streaming
The pipeline uses a Directed Acyclic Graph (DAG) to parallelize independent tasks while respecting dependencies.
┌─────────────────────────────────────────────────────┐
│ ORIGINAL VIDEO │
└─────────────────────────┬───────────────────────────┘
│
┌─────────────┴─────────────┐
▼ ▼
┌────────────────┐ ┌────────────────┐
│ Video Split │ │ Audio Extract │
│ (into chunks) │ │ (AAC/Opus) │
└───────┬────────┘ └───────┬────────┘
│ │
┌─────────────────────────┼─────────────────────────┐ │
▼ ▼ ▼ │
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Encode 1080p │ │ Encode 720p │ │ Encode 480p │ ...
│ (H.264/VP9) │ │ (H.264/VP9) │ │ (H.264/VP9) │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
└───────────────────────┼───────────────────────┘
│
▼
┌────────────────┐
│ Merge & │◀──── Audio
│ Package │
│ (HLS/DASH) │
└───────┬────────┘
│
┌─────────────────┼─────────────────┐
▼ ▼ ▼
┌────────────────┐ ┌────────────────┐ ┌────────────────┐
│ Thumbnail │ │ Manifest │ │ Content │
│ Generation │ │ Creation │ │ Storage │
└────────────────┘ └────────────────┘ └────────────────┘Resolution Bitrate(H.264) Bitrate(VP9) ───────────────────────────────────────── 4K (2160p) 35-45 Mbps 18-25 Mbps 1440p 16 Mbps 9 Mbps 1080p 8 Mbps 5 Mbps 720p 5 Mbps 3 Mbps 480p 2.5 Mbps 1.5 Mbps 360p 1 Mbps 0.6 Mbps 240p 0.5 Mbps 0.3 Mbps 144p 0.2 Mbps 0.1 Mbps Note: VP9/AV1 achieve same quality at ~50% lower bitrate than H.264
Result: A 10-minute video can be transcoded in ~2 minutes with sufficient parallelism instead of 30+ minutes sequentially.
// Message Queue for Transcoding Jobs (Kafka/SQS)
TranscodeJob {
job_id: UUID
video_id: UUID
source_path: "gs://uploads/raw/abc123.mp4"
output_path: "gs://videos/processed/abc123/"
tasks: [
{ type: "split", status: "completed" },
{ type: "encode_1080p", chunk: 1, status: "processing" },
{ type: "encode_1080p", chunk: 2, status: "queued" },
{ type: "encode_720p", chunk: 1, status: "queued" },
// ... more tasks
]
created_at: timestamp
priority: int // Premium users get higher priority
}
// Workers pick jobs from queue based on priority
// Each worker handles one encoding task at a time
// Results written back to storage, status updated in DBABR automatically adjusts video quality based on network conditions, ensuring smooth playback without buffering. This is what makes YouTube work well on slow connections.
┌─────────────────────────────────────────────────────────────────────────────┐
│ ADAPTIVE BITRATE STREAMING │
└─────────────────────────────────────────────────────────────────────────────┘
1. Video is split into small segments (2-10 seconds each)
2. Each segment is encoded at multiple quality levels
3. Client downloads a manifest file listing all available qualities
4. Player monitors bandwidth and buffer level
5. Player requests appropriate quality for each segment
Network Fast Network Slow
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ 1080p │ │ 480p │
│ Segment 1 │ │ Segment 5 │
└─────────────────┘ └─────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ 1080p │ │ 360p │
│ Segment 2 │ │ Segment 6 │
└─────────────────┘ └─────────────────┘
│ Network │
▼ Recovers ▼
┌─────────────────┐ ────▶ ┌─────────────────┐
│ 720p │ │ 720p │
│ Segment 3 │ │ Segment 7 │
└─────────────────┘ └─────────────────┘
Result: Smooth playback with quality matching available bandwidthApple's protocol, widely supported on iOS, Safari, and most devices.
#EXTM3U #EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360 360p/playlist.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=1400000,RESOLUTION=854x480 480p/playlist.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=2800000,RESOLUTION=1280x720 720p/playlist.m3u8 #EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080 1080p/playlist.m3u8 // Each resolution playlist: #EXTM3U #EXT-X-VERSION:3 #EXT-X-TARGETDURATION:10 #EXTINF:10.0, segment_001.ts #EXTINF:10.0, segment_002.ts ...
Open standard, preferred for web browsers and Android.
<?xml version="1.0"?>
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011"
type="static" mediaPresentationDuration="PT10M">
<Period>
<AdaptationSet mimeType="video/mp4">
<Representation id="360p" bandwidth="800000"
width="640" height="360">
<SegmentTemplate media="360p_$Number$.m4s"
initialization="360p_init.mp4"/>
</Representation>
<Representation id="720p" bandwidth="2800000"
width="1280" height="720">
<SegmentTemplate media="720p_$Number$.m4s"
initialization="720p_init.mp4"/>
</Representation>
<!-- More representations -->
</AdaptationSet>
</Period>
</MPD>Bandwidth Estimation
Measure download speed of recent segments
Buffer Level
How many seconds of video are buffered ahead
Device Capability
Screen resolution, decoder support
Stability
Avoid frequent quality switches
Time to First Frame (TTFF): The time from clicking play to seeing the first frame. YouTube optimizes this by starting with a lower quality and quickly switching up. Target: <200ms.
A CDN is essential for serving video content globally with low latency. YouTube uses a massive distributed network of edge servers.
┌──────────────────────────────────────────────────────────────────────────────┐
│ GLOBAL CDN │
└──────────────────────────────────────────────────────────────────────────────┘
User in Tokyo User in London User in NYC
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Tokyo Edge │ │ London Edge │ │ NYC Edge │
│ Server │ │ Server │ │ Server │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ Cache │ Cache │ Cache
│ Miss? │ Miss? │ Miss?
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Asia-Pacific│ │ Europe │ │ US-East │
│ Regional │ │ Regional │ │ Regional │
│ Cache │ │ Cache │ │ Cache │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└───────────────────────┼───────────────────────┘
│
▼
┌─────────────────┐
│ Origin Server │
│ (Video Storage)│
└─────────────────┘
Cache Hit: Edge serves directly (~10-50ms latency)
Cache Miss: Fetch from regional → origin, then cache for future requests// Video Request Flow
1. Client requests: GET /watch?v=dQw4w9WgXcQ
2. Server returns video page with manifest URL:
https://manifest.googlevideo.com/api/manifest/hls_variant/
?video_id=dQw4w9WgXcQ&signature=abc123...
3. Client fetches manifest, chooses quality, requests segments:
https://rr3---sn-abc123.googlevideo.com/videoplayback
?id=dQw4w9WgXcQ
&itag=22 // Quality identifier (1080p H.264)
&range=0-1000 // Byte range for segment
&signature=... // Signed URL for security
&expire=... // URL expiration time
Key: "sn-abc123" identifies the specific edge server
Signature prevents hotlinking and unauthorized accessYouTube's data model includes video metadata, user information, engagement data, and relationships. Different data types require different storage solutions.
-- Videos Table (sharded by video_id)
CREATE TABLE videos (
video_id VARCHAR(11) PRIMARY KEY, -- YouTube uses 11-char IDs
channel_id VARCHAR(24) NOT NULL,
title VARCHAR(100) NOT NULL,
description TEXT,
duration_sec INT NOT NULL,
upload_date TIMESTAMP DEFAULT NOW(),
status ENUM('processing', 'ready', 'blocked', 'deleted'),
visibility ENUM('public', 'unlisted', 'private'),
category_id INT,
default_language VARCHAR(5),
INDEX idx_channel (channel_id),
INDEX idx_upload_date (upload_date)
);
-- Video Stats (separate table for frequent updates)
CREATE TABLE video_stats (
video_id VARCHAR(11) PRIMARY KEY,
view_count BIGINT DEFAULT 0,
like_count BIGINT DEFAULT 0,
dislike_count BIGINT DEFAULT 0,
comment_count BIGINT DEFAULT 0,
last_updated TIMESTAMP DEFAULT NOW(),
FOREIGN KEY (video_id) REFERENCES videos(video_id)
);
-- Channels Table
CREATE TABLE channels (
channel_id VARCHAR(24) PRIMARY KEY,
user_id BIGINT NOT NULL,
channel_name VARCHAR(100) NOT NULL,
description TEXT,
subscriber_count BIGINT DEFAULT 0,
video_count INT DEFAULT 0,
created_at TIMESTAMP DEFAULT NOW(),
profile_pic_url VARCHAR(255),
banner_url VARCHAR(255),
INDEX idx_user (user_id)
);
-- Subscriptions Table (sharded by subscriber_id)
CREATE TABLE subscriptions (
subscriber_id BIGINT,
channel_id VARCHAR(24),
subscribed_at TIMESTAMP DEFAULT NOW(),
notifications BOOLEAN DEFAULT true,
PRIMARY KEY (subscriber_id, channel_id),
INDEX idx_channel_subs (channel_id)
);
-- Comments Table (sharded by video_id)
CREATE TABLE comments (
comment_id BIGINT PRIMARY KEY AUTO_INCREMENT,
video_id VARCHAR(11) NOT NULL,
user_id BIGINT NOT NULL,
parent_id BIGINT, -- NULL for top-level comments
content TEXT NOT NULL,
like_count INT DEFAULT 0,
created_at TIMESTAMP DEFAULT NOW(),
INDEX idx_video_comments (video_id, created_at),
INDEX idx_parent (parent_id)
);Why Separate video_stats? View counts update millions of times per second. Keeping them in a separate table prevents write contention on the main videos table and allows using specialized high-write-throughput storage.
Tracking view counts seems simple, but at YouTube's scale (58,000+ views/sec), it's a significant engineering challenge. We need accuracy without impacting performance.
Database Hotspot
A viral video would hammer one DB row, causing contention
Bot Detection
Need to filter fake views before counting
Read Pressure
Displaying count on every view request
┌─────────────────────────────────────────────────────────────────────────────┐
│ VIEW COUNT PIPELINE │
└─────────────────────────────────────────────────────────────────────────────┘
User Views Video
│
▼
┌─────────────┐ ┌─────────────────────────────────────────────────────┐
│ View Event │────▶│ KAFKA │
│ Producer │ │ Topic: video-views (partitioned by video_id) │
└─────────────┘ └──────────────────────────┬──────────────────────────┘
│
┌────────────────────────────────┼────────────────────────┐
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Bot Detection │ │ Real-time │ │ Batch Job │
│ Filter │ │ Counter │ │ (Hourly) │
│ (Spam, Repeats) │ │ (Redis INCR) │ │ (Exact Count) │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
│ Valid Views │ Approximate │ Accurate
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Analytics DB │ │ Display Count │ │ Persistent │
│ (BigQuery) │ │ (Fast, ~5min) │ │ DB │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Two-tier counting:
1. Real-time (Redis): Fast, approximate, for display
2. Batch (BigQuery): Accurate, for monetization & analytics// Increment view count (atomic)
INCR video:dQw4w9WgXcQ:views
// Get current count
GET video:dQw4w9WgXcQ:views
// Periodic sync to persistent DB
function syncViewCounts() {
for each video_id in KEYS("video:*:views"):
count = GETSET video:{video_id}:views 0
UPDATE video_stats
SET view_count = view_count + count
WHERE video_id = video_id
}Interesting Fact: YouTube intentionally slows down view count updates for new videos to allow time for bot detection. That's why you sometimes see a video stuck at "301 views" for hours.
With 800+ million videos, helping users find relevant content is crucial. Search and recommendation are key to user engagement.
User Query: "how to make pasta"
│
▼
┌────────────────────┐
│ Query Processing │
│ - Spell check │
│ - Tokenization │
│ - Synonym expansion│
└─────────┬──────────┘
│
▼
┌────────────────────┐ ┌───────────────────────────────────────────┐
│ Elasticsearch │────▶│ Inverted Index │
│ (Video Index) │ │ "pasta" → [vid1, vid2, vid3, vid47, ...] │
│ │ │ "make" → [vid2, vid5, vid47, vid99, ...] │
└─────────┬──────────┘ │ "how" → [vid1, vid2, vid47, vid55, ...] │
│ └───────────────────────────────────────────┘
▼
┌────────────────────┐
│ Ranking Layer │
│ - Relevance score │
│ - Video quality │
│ - Engagement rate │
│ - Freshness │
│ - Personalization │
└─────────┬──────────┘
│
▼
Top 20 Results// Video becomes searchable 1. Video uploaded & transcoded 2. Extract searchable text: - Title & description - Auto-generated captions (ML) - OCR from video frames - Audio transcription 3. Generate embeddings (ML) 4. Index in Elasticsearch 5. Update in real-time as engagement metrics change
YouTube's recommendation engine uses multiple signals to suggest videos:
Candidate Generation
ML models generate millions of candidate videos from user history, subscriptions, similar users
Ranking
Deep neural network scores each candidate based on predicted watch time and engagement
Filtering
Remove already watched, age-restricted, or policy-violating content
The comments system handles millions of comments per day with nested replies, likes, spam filtering, and real-time updates.
// Comment Data Model
Comment {
comment_id: BIGINT
video_id: VARCHAR(11)
user_id: BIGINT
parent_id: BIGINT (NULL for top-level)
content: TEXT
like_count: INT
reply_count: INT (for top-level only)
created_at: TIMESTAMP
is_pinned: BOOLEAN
is_hearted: BOOLEAN (creator liked)
}
// Fetching Comments (Top + Newest)
SELECT * FROM comments
WHERE video_id = 'dQw4w9WgXcQ'
AND parent_id IS NULL
ORDER BY
is_pinned DESC,
like_count DESC,
created_at DESC
LIMIT 20;
// Fetching Replies
SELECT * FROM comments
WHERE parent_id = 12345
ORDER BY created_at ASC
LIMIT 10;// Like a video (idempotent)
INSERT INTO video_likes
(video_id, user_id, created_at)
VALUES ('dQw4w9', 123, NOW())
ON CONFLICT DO NOTHING;
// Update count asynchronously
// (not in critical path)When a channel with millions of subscribers uploads a video, we need to notify all subscribers efficiently without overwhelming the system.
Channel uploads new video
│
▼
┌────────────────────┐
│ Video Published │
│ Event │
└─────────┬──────────┘
│
▼
┌────────────────────┐
│ Check Subscriber │
│ Count │
└─────────┬──────────┘
│
┌─────┴─────┐
│ │
Small (<100K) Large (>100K)
│ │
▼ ▼
┌────────────┐ ┌────────────────────────┐
│ Fan-out on │ │ Fan-out on Read │
│ Write │ │ (Pull Model) │
│ │ │ │
│ Create │ │ Store: "Channel X │
│ notification│ │ uploaded at time T" │
│ for each │ │ │
│ subscriber │ │ On app open: │
│ │ │ "Get uploads from my │
│ │ │ subscriptions since │
│ │ │ last check" │
└────────────┘ └────────────────────────┘
Hybrid approach balances write amplification vs read latency// Get subscription feed SELECT v.* FROM videos v JOIN subscriptions s ON v.channel_id = s.channel_id WHERE s.subscriber_id = :user_id AND v.upload_date > NOW() - INTERVAL 7 DAY AND v.visibility = 'public' ORDER BY v.upload_date DESC LIMIT 50; // Cached per user, invalidated on new upload
Practice with an AI interviewer and get instant feedback on your system design skills.