@@ -33,57 +33,57 @@ Something like this.</br>
3333</ul >
3434</br >
3535
36- <h2>Constraints</h2>
37- <ul>
38- <li>Traffic is not evenly distributed</li>
39- <content>May have some popular searches</content>
40- </br>
41-
42- <li>Need to have low latency</li>
43- <content>Can we compromise on consistency ?</content>
44- </br>
45-
46- <li>Need to detect cycles</li>
47- </br>
48-
49- <li>Pages need to be crawled regularly to ensure freshness</li>
50- <content>On an average 1ce per week</content>
51- </ul>
52- </br>
53-
54- <h2>Scale</h2>
55- <ul>
56- <li>1 billion links to crawl</li>
57- </br>
58-
59- <li>100 billion searches per month</li>
60- </ul>
61- </br>
62-
63- <h2>High Level design</h2>
64- <img src="proxy.php?url=https%3A%2F%2Fgithub.com%2Fimg%2FHighLevelArchitecture.PNG" />
65- </br>
66-
67- <h2>Indivudial component design</h2>
68- <h3>Web crawler</h3>
69- <img src="proxy.php?url=https%3A%2F%2Fgithub.com%2Fimg%2FWebCrawler+Component.PNG" />
70-
71- <h3>Class Design</h3>
72- <pre><code >
73- class Page(object):
74- def __init__(self, url, title ):
75- self.title = title
76- self.url = url
77- self.timeStamp = DateTime.now()
78- self.childUrls = []
79- </code></pre>
80-
81- <h2>Determining when to update the crawl results</h2>
82- <p>We can have another micro service that perodically updates all the crawled pages thus updating timeStamp.</br >
83- This service can update both pages and indexes database.
84- </p>
85- </br >
86-
87- <h2>User inputs a search term and sees a list of relevant pages with titles and snippets</h2>
88- <img src="proxy.php?url=https%3A%2F%2Fgithub.com%2Fimg%2FClientServerInteraction.PNG" / >
89- </br >
36+ <h2 >Constraints</h2 >
37+ <ul >
38+ <li >Traffic is not evenly distributed</li >
39+ <content >May have some popular searches</content >
40+ </br >
41+
42+ <li >Need to have low latency</li >
43+ <content >Can we compromise on consistency ?</content >
44+ </br >
45+
46+ <li >Need to detect cycles</li >
47+ </br >
48+
49+ <li >Pages need to be crawled regularly to ensure freshness</li >
50+ <content >On an average 1ce per week</content >
51+ </ul >
52+ </br >
53+
54+ <h2 >Scale</h2 >
55+ <ul >
56+ <li >1 billion links to crawl</li >
57+ </br >
58+
59+ <li >100 billion searches per month</li >
60+ </ul >
61+ </br >
62+
63+ <h2 >High Level design</h2 >
64+ <img src =" img/HighLevelArchitecture.PNG " />
65+ </br >
66+
67+ <h2 >Indivudial component design</h2 >
68+ <h3 >Web crawler</h3 >
69+ <img src =" img/WebCrawler Component.PNG " />
70+ </ br >
71+
72+ < h3 >Class Design</ h3 >
73+ < pre >< code >
74+ class Page(object ):
75+ def __init__( self, url, title):
76+ self.title = title
77+ self.url = url
78+ self.timeStamp = DateTime.now()
79+ self.childUrls = []
80+ </ code ></ pre >
81+
82+ < h2 >Determining when to update the crawl results</ h2 >
83+ < p >We can have another micro service that perodically updates all the crawled pages thus updating timeStamp.</ br >
84+ This service can update both pages and indexes database.
85+ </ p >
86+ </ br >
87+
88+ < h2 >User inputs a search term and sees a list of relevant pages with titles and snippets</ h2 >
89+ < img src = " img/ClientServerInteraction.PNG " / >
0 commit comments