Skip to content

PeterCarragher/NetNeighbors

Repository files navigation

Interactive CommonCrawl Webgraph Demo

Live Demo

Discover related domains using link webgraph topology analysis.

Given a list of seed domains, discovers other domains that are connected via backlinks or outlinks in the CommonCrawl web graph.

Setup Instructions

pip install pyccwebgraph

If you are interested in general network analysis, checkout the pyccwebgraph package.

There is also a separate repository for a colab notebook that you can use to self host an instance of this demo.

Discovery Interface

WEBGRAPH_DIR=/content/webgraphs/ WEBGRAPH_VERSION=cc-main-2024-feb-apr-may python force_graph_vis.py

Citation & References

If you use this notebook or the discovery interface in your research, please cite:

@article{carragher2024detection,
  title={Detection and Discovery of Misinformation Sources using Attributed Webgraphs},
  author={Carragher, Peter and Williams, Evan M and Carley, Kathleen M},
  journal={Proceedings of the International AAAI Conference on Web and Social Media},
  volume={18},
  pages={218--229},
  year={2024},
  url={https://arxiv.org/abs/2401.02379}
}

@article{carragher2025misinformation,
  title={Misinformation Resilient Search Rankings with Attributed Webgraphs},
  author={Carragher, Peter and Williams, Evan M and Spezzano, Francesca and Carley, Kathleen M},
  journal={ACM Transactions on Intelligent Systems and Technology},
  year={2025},
  url={https://dl.acm.org/doi/pdf/10.1145/3670410}
}

Links:

Acknowledgments: This demo uses the CommonCrawl web graph dataset and the WebGraph framework developed by Sebastiano Vigna and Paolo Boldi.

About

Notebook for discovering domains with CommonCrawl webgraph

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors