Canopy Clustering using MapReduce in Hadoop

Files Included:

Gen.py
Stage 1:
MapperStg1.py

ReducerStg1.py
Stage 2:
MapperStg2.py

ReducerStg2.py

Functions of each of the files will be updated at a later date.

To replicate running:

Run gen.py to create the DataSet in dataPoints.txt

To get a list of Canopy Centers pipe the files of Stage 1.

"cat dataPoints.txt | ./mapperStg1.py | sort | ./reducerStg1.py"

Output will be a list of Canopy Centers stored in canopyCenters.txt

Output will be in the format "1\tDataPoint"

Pipe that to Stage 2 to assign each data point to a Canopy Center.

Output will be in the format "CanopyCenter\tDataPoint"

Note:

If running on windows cmd, you have to create your own Sort function to sort input from the mapper.

Personally, I'd recommend just using a linux OS to smoothen it all out.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
404.html		404.html
README.md		README.md
gen.py		gen.py
mapperStg1.py		mapperStg1.py
mapperStg2.py		mapperStg2.py
reducerStg1.py		reducerStg1.py
reducerStg2.py		reducerStg2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Canopy Clustering using MapReduce in Hadoop

Files Included:

To replicate running:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Canopy Clustering using MapReduce in Hadoop

Files Included:

To replicate running:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages