Skip to content

jacygao/spiderman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spiderman

A simple web crawler that crawls a website n-links deep and calculate the number of unique rendered words found on each page and in total.

One-time setup

Install Gumbo (https://github.com/google/gumbo-parser)

  git clone https://github.com/google/gumbo-parser.git
  
  $ ./autogen.sh
  $ ./configure
  $ make
  $ sudo make install

For Mac with Homebrew, do:

  brew install gumbo-parser

Clone Spiderman repo

  git clone https://github.com/JacyGao/spiderman.git

To compile Spiderman, do:

  tools/all.sh

To run Spiderman, do:

  ./a.out {url} {depth}

For example

  ./a.out http://www.ea.com 1

About

A simple web crawler that crawls a website n-links deep and calculate the number of unique rendered words found on each page and in total.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages