python getLinksFromDB.py- Copy the list of article names from the
articlesfile to Wikipedia: Special Export - Download the XML file that will contain the list of articles you input to Wikipedia: Special Export
python run.py <Wikipedia XML file>python hierarchy_tree.py- Entropy calculation is done with:
python entropy.py
If you want to change the starting page for crawl, modify line 119 in
getLinksFromDB.py to be start_page = <Wikipedia article name>
To change how many articles deep to crawl, modify line 123 in
getLinksFromDB.py
To change k for k-means, change line 90 in run.py
Our getLinksFromDB.py sends request to an external server which
will not exist 1 week after the last day of exams
python run.py WWII.xmlpython hierarchy_tree.py- Entropy calculation is done with:
python entropy.py
The Javascript code for visualization was created on JSFiddle Code:
- Tree View:
- Cluster View:
It is not extremely simple to copy the json object into this interface, therefore we are not providing a way to view it. Currently the sites show Run 1 on the World War II dataset.
The generation for JSON objects is in gen_json.py
To change which clusters hierarchy is generated, modify line 172 in
hierarchy_tree.py to i == <cluster number>