Skip to content

andrijaJ01/ProjectSCRAPPER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Search engine scrapper

A Python library that queries Google, Bing, Yahoo and other search engines and collects the results from multiple search engine results pages.
Please note that web-scraping may be against the Terms of Service of some search engines, and may result in a temporary ban.

Supported search engines

Google
Bing
Yahoo
Duckduckgo
Startpage
Aol
Dogpile
Ask
Mojeek
Torch

Features

  • Can save output files (html, csv, json).
  • Search operators (url, title, text) are supported
  • HTTP and SOCKS proxy support.
  • Can collect dark web links with Torch.
  • Easily extensible with new search engines. They(search engines) can be added by creating a new class in search_engines/engines/ and add it to the search_engines_dict dictionary in search_engines/engines/__init__.py. The new class should subclass SearchEngine, and override the following methods: _selectors, _first_page, _next_page.
  • Python2 - Python3 are both supported.

Requirements

git Python 2.7 - 3.7 with
Requests and
BeautifulSoup

Installation

Clone this repository: $ git clone https://github.com/andrijaJ01/ProjectSCRAPPER/.
Change directory to project folder(where setup.py is located): $ cd ProjectSCRAPPER.
Run the setup file (sudo might not be necessary): $ sudo python setup.py install.
Done!

Usage

Can be used as a library:

from search_engines import Google

engine = Google()
results = engine.search("my query")
links = results.links()

print(links)

Or as a CLI script:

$ python search_engines_cli.py -e google,bing -q "my query" -o json,print

For Usage please use:

$ python search_engines_cli.py -h

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors