Skip to content

Latest commit

 

History

History
 
 

README.md

Design a video sharing service like Youtube or Netflix

Goals

1) Upload a video
2) View a video

3) Search for a video
4) Capture stats like, comments, likes, views on a video.

5) System should be highly available
6) System should be hightly reliable
7) System can take a hit on Consistency


Scope

Let's design this system for multiple users.

Capacity Estimations

Our system is read heavy.

Let's have 5 Million active users.

Let's assume that 1 user views about 5 videos per day
Thus total video views 5*5 Million = 25 Million views per day.

Let's assume there is 1 video uploaded per second.

Upload Storage Estimations
Let's assume per minute we get about 1 hour worth of video data
Let's assume the size of 1 hour of data is 1 GB
Per day we'd need 60 Min* 24Hour = 1440GB

Upload Bandwidth Estimations
Since we are getting 1 hour of video data per min.
That is 60 mins of data per min
And if we can upload 10MB per min.
We'd need to upload 60*10MB = 600MB per min

High Level design

Upload Video



Watch Video



Search Video



Code

Classes

User
ID
UserName
VidsUploaded = [List of vid ids]
HistoryOfVidsWatched = [List Of Vid Ids]
Subscription = [List Of User ID]

Video
ID
Title
Description
Comments = [List Of Comment Objects]
Video = Location of vid file
Thumbnail = Location of image
UserID (user who uploaded this video)
Date and time of creation
Total Number of views
Total number of Likes

Comment
ID
Text
UserId (who created the comment)
VidId (Where the comment was left)


APIs

UploadVid(user object, vid object, vid Content)
vid content = the video stream
Returns: Success if the video was uploaded successfully

WatchVid(user Object, vid url)
Returns: Stream of vid requested

SearchVid(user object, search term)
Returns: top 10 vid objects

Note: instead of user object, we can send user api key.
We can eliminate the hacker attacks if we send the api key.
So if we decide to send the api key then APIs would look like this
UploadVid(apiKey, vidStream, title, description, timeStamp)
WatchVid(apiKey, vidUrl)
SearchVid(apiKey, searchTerm, [optional]VidCountToReturn)


Detailed Component Design

There are 3 components to this system.

Upload


Watch


Search


Reason why I have 3 different databases for user content, video content, metadata content is because there are 3 different data types here. 1) videos 2) Text/objects (for user data) 3) Image data (metadata may contain thumbnail images).
So each database represents a storage system
We can use different storage systems for different types of data
Why are we doing this ?
It’s recommended to store large static files like videos and images separately as it has better performance and is much easier to organize and scale.

Database design

The service is read heavy.
So we need to design a database server that can fetch content fast.
For this purposes we use relational database
The user to video relationship can be like this

Video storage system
For storing the video we can store it in a distributed file storage system.
Now if you are in USA and you want to watch a video that is stored in Indian database server, it will take time for that video to load.
So to avoid this issue we have CDN.(content delivary network)
A CDN is system if distributed servers, that are distributed over geographic locations.
The CDN server will host your content on their servers. So users instead of accessing content from the main server that may be far away from them, can instead access content from a server that's near to them physically.
Advantage: your content is replicated (that means it has a backup)
But we have a lot of data !! can we replicate all of it on CDN servers ?
One straightforward approach is to host popular videos in CDN and less popular videos are stored in our own servers by location


Scale the system

Sharding Metadata