clustering

Clustering

Welcome to the Clustering section! This folder provides an introduction to clustering techniques, which are essential for unsupervised learning tasks. Clustering helps you group similar data points together, making it easier to identify patterns or categories within your data.

Note: The notebooks here are designed for beginners. They introduce foundational concepts but do not cover all available clustering methods or advanced techniques. For a more comprehensive understanding, please refer to the recommended resources provided below.

📂 Structure

This folder currently includes:

Agglomerative Clustering: A hierarchical clustering technique based on merging clusters.
DBSCAN: A density-based clustering method, useful for identifying clusters of arbitrary shape.
K-Means: A popular clustering method that partitions data into a specified number of clusters.
K-Medoids: A robust clustering algorithm that partitions data into clusters by selecting actual data points (medoids) as cluster centers. It minimizes the sum of dissimilarities between points and their assigned medoid, making it more resilient to outliers than K-Means.
Spectral Clustering: A technique for clustering data that is not linearly separable, using eigenvalues of a similarity matrix.
Clustering Quality Evaluation: Evaluate the performance of various clustering algorithms
Comparison of Various Clustering Algorithms: Compare various clustering algorithms to find which one suits your data the best.

Each section includes assignments to help reinforce your understanding, along with solutions for self-assessment.

🔗 Learning Flow

Follow these steps to build a strong foundation in clustering techniques:

1. Agglomerative Clustering

Purpose: This hierarchical clustering method begins by treating each data point as an individual cluster, then successively merges the closest clusters.
Topics to Cover:
- Basics of hierarchical clustering
- Linkage criteria (single, complete, average)
- Dendrograms for visualizing hierarchical clusters
Resources:
- Agglomerative Clustering (Sklearn Documentation)
- Hierarchical Clustering Tutorial (DataCamp)
- StatQuest's Hierarchical Clustering (video)

2. DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

Purpose: DBSCAN identifies clusters based on data density, making it useful for detecting clusters with arbitrary shapes.
Topics to Cover:
- Core points, border points, and noise points
- Selecting parameters like epsilon and minimum samples
- DBSCAN’s advantages for handling noise and non-linear shapes
Resources:
- DBSCAN (Sklearn Documentation)
- DBSCAN Tutorial (Towards Data Science)
- StatQuest's DBSCAN (video)

3. K-Means

Purpose: K-Means is a partitioning clustering algorithm that aims to split data into a predefined number of clusters.
Topics to Cover:
- Centroid calculation and cluster assignment
- Elbow method for choosing the optimal number of clusters
- Limitations of K-Means (e.g., sensitivity to initial centroids)
Resources:
- K-Means (Sklearn Documentation)
- K-Means Clustering Guide (Kaggle tutorial)
- K means Clustering Algorithm (video)

4. K-Medoids

Purpose: K-Medoids is a partitioning clustering algorithm that selects actual data points (medoids) as cluster centers, making it more robust to outliers than K-Means.
Topics to Cover:
- Medoid selection and cluster assignment
- Differences between K-Medoids and K-Means (e.g., robustness to noise and outliers)
Resources:
- K-Medoids (Sklearn extra Documentation)
- K-Medoids (pyclustering documentation)

5. Spectral Clustering

Purpose: Spectral Clustering is a technique for clustering data that is not linearly separable. It uses the eigenvalues of a similarity matrix to perform dimensionality reduction before clustering in the lower-dimensional space.
Topics to Cover:
- Affinity matrix and graph Laplacian
- Eigenvalue decomposition and its role in clustering
- Parameter selection (e.g., n_clusters, affinity, gamma)
Resources:
- Spectral Clustering (Sklearn Documentation)
- Spectral Clustering Explained (Towards Data Science)
- Wikipedia: Spectral Clustering

6. Clustering Quality Evaluation

Purpose: Clustering evaluation metrics help assess the performance of the Clustering Algorithms.
Topics to Cover:
- Silhouette Score
- Davies Bouldin Index
- Adjusted Rand Score (ARI)
Resources:
- Clustering Performance Evaluation (sklearn documentation)
- Clustering Metrics in Machine Learning (GeeksForGeeks)

7. Comparing Various Clustering Algorithms

Purpose: Comparing different clustering algorithms.
Topics to Cover:
- Compare K - Means, Hierarchical and DBSCAN (Density Based Spatial Clustering of Applications with Noise).
- Compare strengths and weaknesses of each.
- Compare the runtime complexity of each.
Resources:

📝 Assignments and Solutions

Each clustering method comes with assignments designed to help you apply the concepts you've learned. Solutions are provided for self-evaluation. Try to complete the assignments independently before checking the solutions for the best learning experience.

🏁 Getting Started

Begin with Agglomerative Clustering: Start by understanding how hierarchical clustering builds clusters step-by-step.
Explore DBSCAN: Learn how DBSCAN groups data based on density, making it robust for non-linear data.
Try K-Means: Experiment with partitioning data into clusters, focusing on selecting the optimal number of clusters.
Try K-Medoids: Experiment with partitioning data into clusters while minimizing dissimilarity within each cluster. Unlike K-Means, K-Medoids selects actual data points as cluster centers, making it more robust to outliers
Dive into Spectral Clustering: Understand how Spectral Clustering handles non-linearly separable data using eigenvalues and similarity matrices.
Evaluate Performance: Assess the performance of the above-mentioned algorithms and find the most suitable one for your data.
Explore various clustering algorithms: Experiment with various other clustering algorithms to compare and see which one is the best for your data.

Happy clustering! Developing these skills will enable you to analyze data and identify patterns effectively. For further learning, refer to the documentation and tutorials linked above.

Name		Name	Last commit message	Last commit date
parent directory ..
kmedoids		kmedoids
Assignment.pdf		Assignment.pdf
README.md		README.md
agglomerative_hierarchical_clustering.ipynb		agglomerative_hierarchical_clustering.ipynb
clustering_comparison.ipynb		clustering_comparison.ipynb
clustering_evaluation_metrics.ipynb		clustering_evaluation_metrics.ipynb
dbscan.ipynb		dbscan.ipynb
k-means_clustering.ipynb		k-means_clustering.ipynb
spectral_clustering.ipynb		spectral_clustering.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Clustering

📂 Structure

🔗 Learning Flow

1. Agglomerative Clustering

2. DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

3. K-Means

4. K-Medoids

5. Spectral Clustering

6. Clustering Quality Evaluation

7. Comparing Various Clustering Algorithms

📝 Assignments and Solutions

🏁 Getting Started

FilesExpand file tree

clustering

Directory actions

More options

Directory actions

More options

Latest commit

History

clustering

Folders and files

parent directory

README.md

Clustering

📂 Structure

🔗 Learning Flow

1. Agglomerative Clustering

2. DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

3. K-Means

4. K-Medoids

5. Spectral Clustering

6. Clustering Quality Evaluation

7. Comparing Various Clustering Algorithms

📝 Assignments and Solutions

🏁 Getting Started