Key Value Store

What is a key value store ?
The most simple answer is Hash Table.
Drawbacks:
What if the data is too big and cannot fit in memory ?
Answer:
Distributed key value storage system.

Goals

Design a distributed key value sotrage system.
Take in a key and return value.
Make sure your system is:
1) Highly available, and
2) Consistant

Questions to ask the interviewer:

What type of data am I storing here ?
1. What data type is the key and what data type is the value ?
How much data am I storing ?
How frequently will the data be accessed ?
Is the traffic evenly distributed ?
Is this a read or a write heavy system ?

Scope

How many people will be using it ?
1 Million

Capacity Estimations

How many requests per second at worst time ?
1 request per second.

High Level Design

Detailed Component Design

When you say distributed key value store, it means the data is distributed.

Question: How do you store distributed data ?
Answer: Sharding

Sharding

Rule: We need even distribution of data.

Range Based partitioning
Store all the data starting from letter A on one shard etc.
Cons:
Produces inconsistency, one shard can grow bigger than another.
What if one shard becomes hot ?
ID/Hash-Key Based Partitioning
Run the key through some sort of hash function, this gets you some id
Based on this ID we can store the values.

Scale the system

System availability

Master can have latest and slaves can copy from master.
If master goes down, one of the slaves becomes the new master.

Consistency

The reads can be done from master.
The writes can be done to one of the slaves.
Then after some time period, we make this most updated slave the new master.
And the original master now becomes the slave.
The master slave architecture should be such that the slaves continuously copy off the master, so when we change the master, the slaves should be now copying off the new master. Thus making them also updated.
This approch will compromise the latency of the most consistent data.

Cache

We can put the most accessed data in LRU cache on the server.

Cluster

As the data grows and the customers grow, we'd eventually need to split the architecture into multiple clusters.
We can use consistent hashing to add or remove the servers.
Since we have decided to shard based on hashId, and if we decide to add more servers, this approch of load balencing would make sense

Name		Name	Last commit message	Last commit date
parent directory ..
img		img
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Key Value Store

Goals

Scope

Capacity Estimations

High Level Design

Detailed Component Design

Sharding

Scale the system

System availability

Consistency

Cache

Cluster

FilesExpand file tree

Key Value Store

Directory actions

More options

Directory actions

More options

Latest commit

History

Key Value Store

Folders and files

parent directory

README.md

Key Value Store

Goals

Scope

Capacity Estimations

High Level Design

Detailed Component Design

Sharding

Scale the system

System availability

Consistency

Cache

Cluster