Skip to content

stchakwdev/can-kan-grok

Repository files navigation

Can KAN Grok?

Yes! And KAN groks multiplication 12x faster than MLP.

Python 3.9+ PyTorch 2.0+ CI License: MIT

Empirical investigation of grokking in Kolmogorov-Arnold Networks

KAN vs MLP Grokking Results

Key Results

We ran 40 experiments (2 models x 4 operations x 5 seeds) on modular arithmetic tasks with p=113. All experiments successfully grokked (100% success rate).

Operation MLP (epochs) KAN (epochs) KAN Speedup
Addition 1,360 ± 1,800 693 ± 54 2.0x faster
Subtraction 550 ± 192 708 ± 31 0.8x
Multiplication 8,558 ± 1,919 708 ± 48 12.1x faster
Division 615 ± 186 692 ± 47 0.9x

Key Findings

  1. KAN groks multiplication 12x faster than MLP - the most striking result
  2. KAN shows remarkably consistent grokking times - low variance across seeds
  3. MLP has high variance - one addition seed took 4,554 epochs vs 321 for another
  4. For simple operations, MLP and KAN are comparable - subtraction and division show similar speeds

Research Questions & Answers

Question Answer
Does KAN grok? Yes, 100% success rate (20/20 KAN experiments)
Does KAN grok faster? Yes, significantly for multiplication (12x) and addition (2x)
Is KAN more consistent? Yes, lower variance in grokking times across seeds

Background

Grokking is a phenomenon where neural networks suddenly generalize long after memorizing training data. First discovered by Power et al. (2022) and mechanistically analyzed by Nanda et al. (2023).

Kolmogorov-Arnold Networks (KAN) replace fixed activation functions with learnable B-spline functions on edges, as introduced by Liu et al. (2024).

This project empirically tests whether KAN's architectural differences affect grokking behavior.


Installation

# Clone the repository
git clone https://github.com/stchakwdev/can-kan-grok.git
cd can-kan-grok

# Create environment
conda create -n grok python=3.10
conda activate grok

# Install PyTorch
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# Install efficient-kan (required for KAN models)
pip install git+https://github.com/Blealtan/efficient-kan.git

# Install package
pip install -e .

Quick Start

# Run a single experiment
python scripts/run_experiment.py --model kan --operation multiplication --seed 42

# Run full comparison (40 experiments)
python scripts/run_experiment.py --model all --operation all --seeds 5

# Analyze results and generate figures
python scripts/analyze_modular_results.py

Experimental Setup

Task: Modular Arithmetic

Following Nanda et al. (2023):

  • Input: (a, b) pairs where a, b ∈ {0, 1, ..., p-1}
  • Output: operation(a, b) mod p
  • Prime: p = 113 (12,769 total pairs)
  • Split: 50% train, 50% validation

Models

Model Architecture Parameters
MLP Embed → Concat → Linear(128) → ReLU → Linear → Unembed ~78K
KAN Embed → Concat → KANLinear(64) → KANLinear → Unembed ~75K

Training Configuration

Critical hyperparameters for grokking:

  • Weight Decay: 1.0 (essential for grokking)
  • Learning Rate: 1e-3
  • Optimizer: AdamW
  • Batch Size: Full batch
  • Max Epochs: 100,000

Grokking Detection

The project implements smart stopping based on phase detection:

Phase 1: Random           → Both accuracies low
Phase 2: Memorization     → Train > 99%, Val < 50%
Phase 3: Circuit Formation → Val climbing (50-80%)
Phase 4: Cleanup          → Val approaching threshold (80-95%)
Phase 5: Grokked          → Val > 95% ✓

Project Structure

can-kan-grok/
├── can_kan_grok/
│   ├── configs/           # Configuration dataclasses
│   ├── models/            # MLP and KAN implementations
│   ├── data/              # Modular arithmetic datasets
│   ├── training/          # PyTorch Lightning infrastructure
│   ├── detection/         # Grokking detection algorithms
│   └── visualization/     # Plotting utilities
├── scripts/
│   ├── run_experiment.py          # Main experiment runner
│   └── analyze_modular_results.py # Results analysis
├── results/
│   └── modular/
│       └── figures/       # Generated visualizations
└── tests/                 # Unit tests

Visualizations

All figures are generated in results/modular/figures/:

Figure Description
publication_figure.png 4-panel publication-ready figure
kan_speedup_factor.png KAN speedup over MLP
grokking_boxplots.png Distribution by operation
multiplication_curves_comparison.png Training curves
grokking_heatmap.png Summary heatmap

Citation

@misc{can_kan_grok_2025,
    title={Can KAN Grok? Empirical Investigation of Grokking in Kolmogorov-Arnold Networks},
    author={Samuel T. Chakwera},
    year={2025},
    url={https://github.com/stchakwdev/can-kan-grok},
}

References


License

MIT License - see LICENSE for details.

About

Empirical investigation of grokking in KAN. Key finding: KAN groks multiplication 12x faster than MLP!

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors