Skip to content

superman550/canopyClusteringPython

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Canopy Clustering using MapReduce in Hadoop

Files Included:

  • Gen.py
  • Stage 1:
    • MapperStg1.py
    • ReducerStg1.py
  • Stage 2:
    • MapperStg2.py
    • ReducerStg2.py

Functions of each of the files will be updated at a later date.

To replicate running:

  1. Run gen.py to create the DataSet in dataPoints.txt
  2. To get a list of Canopy Centers pipe the files of Stage 1.
    • "cat dataPoints.txt | ./mapperStg1.py | sort | ./reducerStg1.py"
      • Output will be a list of Canopy Centers stored in canopyCenters.txt
        • Output will be in the format "1\tDataPoint"
  1. Pipe that to Stage 2 to assign each data point to a Canopy Center.
    • Output will be in the format "CanopyCenter\tDataPoint"

Note:

  • If running on windows cmd, you have to create your own Sort function to sort input from the mapper.
  • Personally, I'd recommend just using a linux OS to smoothen it all out.

About

Canopy Clustering using MapReduce [Hadoop]

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors