Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

README.md

Design Yelp or nearby things


Goals

  1. Given location of the user return the top 10 "things" aroung the user.
    For simplicity, things = restaurants.
  2. The restaurants should be within 2 mile radius of the user.
  3. The users should be able to add/delete/update the restaurant information.
  4. The users should be able to add comments/feedback/review about the restaurant.
  5. The result should have minimul latency.

Too long didn't read:
Input: User location in latitude and longitude
Output: Top 10 restaurents around user within 2 mile radius.

Scope

For now let's design this for one user.
We are assuming that the restaurent data is already populated.


High Level Diagram

Search restaurant



Code

API

Search(user object, latitude, longitude)
Returns: Top 10 Restaurant objects.
The mobile client can then render these objects into list that the user can add comment/feedback to.

Classes



Detailed component design

The main crux of the problem is, how do we store this data and how do we retrieve it.

Trees

You can store the world map in a tree like structure like R-Tree, Quad tree. I am gonna be discussing about quad trees here.
Quad Tree
You take the globe and divide it into 4 quadrants.
Then you further take each quadrant and divide it into 4 more quadrants.
So densely populated areas will have more rectangles on it, and sparsely populated areas will have less rectangels over it.

For simplicity let's assume the most desely populated area has grid size of 2 miles by 2 miles.
Further more, the leaf nodes can be connected so we can easily search over large areas.
So for below use case:


Since the leafes are connected all we'd need to do is fetch the appropriate result from each leaf node.

The leaf nodes have the Ids of the restaurants contained within that grid locaiton.
So when the user requests for the information, we can query the resulting leaf nodes, get the restaurents and perform the K nearest point algo on those restaurents to get our top 10 list.
Insert
So to insert a restaurant, we find the appropriate leaf node and add the restaurant ID in it.
Let's say the max capacity a leaf can hold is 200 restaurant Ids and we reach that, then we split the leaf node into 4 quadrants and put appropriate restaurents in each new leaf.

With this approch both insert and search would be log(n) time.



Scale the system

Sharding

What if the entire quad tree cannot fit in memory?

Region Based Sharding

Drawbacks:
What if one region becomes hot ?
What if one region contains more data than other ?

Grid Id based sharding

We can fit grids on the database, instead of specific location.

Both the sharding approches are similar, but with grid id we are fitting the quadrants on database instead of city or state.


Data Replication

We can have master slave architecture to ensure availability.
Reads: The reads can be done from the master.
Writes: The writes can be done to one of the slave, and after some time period we can make this slave the master, so other slaves can copy data off of this new master.


Cache

To deal with hot Places, we can introduce a cache in front of our database.

Load Balencing

We can add Load Balencer in following 2 places:
1) Between Clients and Application servers
2) Between Application servers and Dababase server.