Skip to content

hesgit/scry

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 

Repository files navigation

scry

Web scraping engines with Python and Scrapy

Implemented Engines

Engine: kickass

Automatically perform category / keyword searches at kickasstorrents.com and queue them for download with transmission bit torrent client. Setup a cron job to completely automate torrent searching and downloading.

Quick Start

  • git clone [email protected]:arapidhs/scry.git
  • cd /kickass
  • $ scrapy crawl kickass -a category=books -a keywords='python,java'
  • Setup a cron job to keep scraping at intervals

Example that runs the spider every morning at 09:00

00 09 * * * export DISPLAY=:0.0 && cd ~/scry/kickass && \ /usr/local/bin/scrapy crawl kickass -a category=books \ -a keywords='python,java,sclala topics' >> ~/scrapy.log 2>&1

Email an hourly updated list of downloading torrents and search results via cronjob

0 */1 * * * cd ~/development/scrapy/kickass && ~/development/scrapy/kickass/mail.sh

Installing Scrapy

  • sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 627220E7
  • echo 'deb http://archive.scrapy.org/ubuntu scrapy main' | sudo tee /etc/apt/sources.list.d/scrapy.list
  • sudo apt-get update && sudo apt-get install scrapy-0.24

Official installation instructions http://doc.scrapy.org/en/latest/intro/install.html#intro-install-platform-notes

About

Web scraping engines with Python and Scrapy

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Python 80.1%
  • Shell 19.9%