Skip to content

Karelgit/SparkCrawl.Topic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

1、爬虫主程序入口com.gengyun.entry.OnSparkKicker


2、RDDURLQueue并未添加过滤功能


3、添加深度控制功能


4、添加协议控制


5、添加后缀控制


6、以tachyon作为已爬取数据存储


7、链接去重

集群模式构建 mvn package -P clusterdep -Dmaven.test.skip=true

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors