Can KAN Grok?

Yes! And KAN groks multiplication 12x faster than MLP.

Empirical investigation of grokking in Kolmogorov-Arnold Networks

Key Results

We ran 40 experiments (2 models x 4 operations x 5 seeds) on modular arithmetic tasks with p=113. All experiments successfully grokked (100% success rate).

Operation	MLP (epochs)	KAN (epochs)	KAN Speedup
Addition	1,360 ± 1,800	693 ± 54	2.0x faster
Subtraction	550 ± 192	708 ± 31	0.8x
Multiplication	8,558 ± 1,919	708 ± 48	12.1x faster
Division	615 ± 186	692 ± 47	0.9x

Key Findings

KAN groks multiplication 12x faster than MLP - the most striking result
KAN shows remarkably consistent grokking times - low variance across seeds
MLP has high variance - one addition seed took 4,554 epochs vs 321 for another
For simple operations, MLP and KAN are comparable - subtraction and division show similar speeds

Research Questions & Answers

Question	Answer
Does KAN grok?	Yes, 100% success rate (20/20 KAN experiments)
Does KAN grok faster?	Yes, significantly for multiplication (12x) and addition (2x)
Is KAN more consistent?	Yes, lower variance in grokking times across seeds

Background

Grokking is a phenomenon where neural networks suddenly generalize long after memorizing training data. First discovered by Power et al. (2022) and mechanistically analyzed by Nanda et al. (2023).

Kolmogorov-Arnold Networks (KAN) replace fixed activation functions with learnable B-spline functions on edges, as introduced by Liu et al. (2024).

This project empirically tests whether KAN's architectural differences affect grokking behavior.

Installation

# Clone the repository
git clone https://github.com/stchakwdev/can-kan-grok.git
cd can-kan-grok

# Create environment
conda create -n grok python=3.10
conda activate grok

# Install PyTorch
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# Install efficient-kan (required for KAN models)
pip install git+https://github.com/Blealtan/efficient-kan.git

# Install package
pip install -e .

Quick Start

# Run a single experiment
python scripts/run_experiment.py --model kan --operation multiplication --seed 42

# Run full comparison (40 experiments)
python scripts/run_experiment.py --model all --operation all --seeds 5

# Analyze results and generate figures
python scripts/analyze_modular_results.py

Experimental Setup

Task: Modular Arithmetic

Following Nanda et al. (2023):

Input: (a, b) pairs where a, b ∈ {0, 1, ..., p-1}
Output: operation(a, b) mod p
Prime: p = 113 (12,769 total pairs)
Split: 50% train, 50% validation

Models

Model	Architecture	Parameters
MLP	Embed → Concat → Linear(128) → ReLU → Linear → Unembed	~78K
KAN	Embed → Concat → KANLinear(64) → KANLinear → Unembed	~75K

Training Configuration

Critical hyperparameters for grokking:

Weight Decay: 1.0 (essential for grokking)
Learning Rate: 1e-3
Optimizer: AdamW
Batch Size: Full batch
Max Epochs: 100,000

Grokking Detection

The project implements smart stopping based on phase detection:

Phase 1: Random           → Both accuracies low
Phase 2: Memorization     → Train > 99%, Val < 50%
Phase 3: Circuit Formation → Val climbing (50-80%)
Phase 4: Cleanup          → Val approaching threshold (80-95%)
Phase 5: Grokked          → Val > 95% ✓

Project Structure

can-kan-grok/
├── can_kan_grok/
│   ├── configs/           # Configuration dataclasses
│   ├── models/            # MLP and KAN implementations
│   ├── data/              # Modular arithmetic datasets
│   ├── training/          # PyTorch Lightning infrastructure
│   ├── detection/         # Grokking detection algorithms
│   └── visualization/     # Plotting utilities
├── scripts/
│   ├── run_experiment.py          # Main experiment runner
│   └── analyze_modular_results.py # Results analysis
├── results/
│   └── modular/
│       └── figures/       # Generated visualizations
└── tests/                 # Unit tests

Visualizations

All figures are generated in results/modular/figures/:

Figure	Description
`publication_figure.png`	4-panel publication-ready figure
`kan_speedup_factor.png`	KAN speedup over MLP
`grokking_boxplots.png`	Distribution by operation
`multiplication_curves_comparison.png`	Training curves
`grokking_heatmap.png`	Summary heatmap

Citation

@misc{can_kan_grok_2025,
    title={Can KAN Grok? Empirical Investigation of Grokking in Kolmogorov-Arnold Networks},
    author={Samuel T. Chakwera},
    year={2025},
    url={https://github.com/stchakwdev/can-kan-grok},
}

References

License

MIT License - see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github		.github
can_kan_grok		can_kan_grok
results		results
scripts		scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Can KAN Grok?

Yes! And KAN groks multiplication 12x faster than MLP.

Key Results

Key Findings

Research Questions & Answers

Background

Installation

Quick Start

Experimental Setup

Task: Modular Arithmetic

Models

Training Configuration

Grokking Detection

Project Structure

Visualizations

Citation

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Can KAN Grok?

Yes! And KAN groks multiplication 12x faster than MLP.

Key Results

Key Findings

Research Questions & Answers

Background

Installation

Quick Start

Experimental Setup

Task: Modular Arithmetic

Models

Training Configuration

Grokking Detection

Project Structure

Visualizations

Citation

References

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages