GMM Clustering with Polars

This repository contains Python scripts that demonstrate how to perform Gaussian Mixture Model (GMM) clustering using Polars for data manipulation and scikit-learn for the machine learning component.

Files

gmm_with_polars.py - Comprehensive example with visualization and performance comparison
simple_gmm_polars.py - Simple, focused example of GMM with Polars
requirements.txt - Required Python packages

Setup

1. Install Dependencies

pip install -r requirements.txt

Or install packages individually:

pip install polars numpy matplotlib seaborn scikit-learn pandas

2. Run the Examples

Simple Example

python simple_gmm_polars.py

Comprehensive Example (with plots)

python gmm_with_polars.py

What These Scripts Do

Simple Example (`simple_gmm_polars.py`)

Creates synthetic data with 3 natural clusters
Preprocesses data using Polars operations
Performs GMM clustering with scikit-learn
Analyzes results using Polars aggregations
Saves results to CSV

Comprehensive Example (`gmm_with_polars.py`)

All features of the simple example, plus:
Data visualization with matplotlib
Performance comparison between Polars and Pandas
Detailed clustering analysis and metrics
Model parameter inspection

Key Features

Polars Operations Used

pl.DataFrame() - Creating DataFrames
with_columns() - Adding computed columns
group_by().agg() - Aggregating data
select() - Selecting columns
to_numpy() - Converting to NumPy arrays
write_csv() - Saving results

GMM Features

Automatic cluster detection
Probability estimates for each prediction
Multiple covariance types
Model convergence information
Cluster centers and parameters

Example Output

The scripts will generate:

Console output with clustering analysis
CSV files with results
Visualizations (comprehensive example)
Performance comparisons

Customization

You can modify these scripts to:

Use your own data (replace the create_sample_dataset() function)
Change the number of clusters (n_components parameter)
Use different features for clustering
Adjust preprocessing steps
Change visualization styles

Why Polars?

Polars offers several advantages for data preprocessing in ML workflows:

Speed: Faster than Pandas for many operations
Memory efficiency: Better memory usage
Lazy evaluation: Optimized query planning
Modern API: Clean, consistent syntax
Type safety: Better type handling

This makes it an excellent choice for data preparation before applying machine learning algorithms like GMM.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
gmm_with_polars.py		gmm_with_polars.py
requirements.txt		requirements.txt
simple_gmm_polars.py		simple_gmm_polars.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GMM Clustering with Polars

Files

Setup

1. Install Dependencies

2. Run the Examples

Simple Example

Comprehensive Example (with plots)

What These Scripts Do

Simple Example (`simple_gmm_polars.py`)

Comprehensive Example (`gmm_with_polars.py`)

Key Features

Polars Operations Used

GMM Features

Example Output

Customization

Why Polars?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GMM Clustering with Polars

Files

Setup

1. Install Dependencies

2. Run the Examples

Simple Example

Comprehensive Example (with plots)

What These Scripts Do

Simple Example (simple_gmm_polars.py)

Comprehensive Example (gmm_with_polars.py)

Key Features

Polars Operations Used

GMM Features

Example Output

Customization

Why Polars?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Simple Example (`simple_gmm_polars.py`)

Comprehensive Example (`gmm_with_polars.py`)

Packages