ClusterFuG: Clustering Fully connected Graphs by Multicut (arxiv)
Solve multicut problem on complete graph alleviating need for graph structure design. Similarity between a pair of nodes is computed by inner product of input node features.
We use GCC 10. Other combinations might also work but not tested. CMake is required for compilation.
git clone [email protected]:aabbas90/cluster-fug.git
cd cluster-fug
git submodule update --init --recursive
mkdir build
cd build
cmake ..
make -j 4We also provide python bindings using pybind. Simply run the following command:
python -m pip install git+https://github.com/aabbas90/cluster-fug.gitWe require multicut instance stored in a (.txt) file in the following format:
FACTORIZED COMPLETE MULTICUT
a
n d
feature_node_1_dim_1 ... feature_node_1_dim_d
...
feature_node_n_dim_1 ... feature_node_n_dim_d
which corresponds to a graph with n nodes, and d-dimensional node features. The value of a controls affinity strength where increasing the value helps in creating more clusters and vice versa.
Afterwards run (for our DAppLAEC algorithm):
./src/dense_multicut_text_input <PATH_TO_PROBLEM_INSTANCE_TXT> HNSW laec_bf_laterOther algorithms mentioned in the paper can be run as:
./src/dense_multicut_text_input <PATH_TO_PROBLEM_INSTANCE_TXT> <INDEX_TYPE> <CONTRACTION_TYPE>where INDEX_TYPE and CONTRACTION_TYPE can be choosen as:
| Algorithm | INDEX_TYPE | CONTRACTION_TYPE |
|---|---|---|
| GAEC | brute_force | gaec |
| DGAEC | faiss_brute_force | dense_gaec |
| DGAECInc | faiss_brute_force | dense_gaec_inc |
| DLAEC | faiss_brute_force | dense_laec |
| DAppLAEC | HNSW | dense_laec_bf_later |
For example to run GAEC:
bash ./src/dense_multicut_text_input <PATH_TO_PROBLEM_INSTANCE_TXT> brute_force gaec
For more information run:
./src/dense_multicut_text_input --help
```
### Python solver:
An example to compute multicut on a set of features:
```python
import dense_multicut_py
import numpy as np
num_nodes = 100
dim = 16
affinity_strength = 5.0
k = 5
k_cap = k
# DAppLAEC:
INDEX_TYPE, CONTRACTION_TYPE = "HNSW", "dense_laec_bf_later"
features = np.random.rand(num_nodes, dim).astype(np.float32)
node_labels = dense_multicut_py.dense_multicut(features.flatten(), num_nodes, dim, affinity_strength, k, INDEX_TYPE, CONTRACTION_TYPE, k_cap)Instances used in the paper can be obtained from structured-prediction-prob-archive.
If you use this work please cite as
@article{abbas2023clusterfug,
title={ClusterFuG: Clustering Fully connected Graphs by Multicut},
author={Abbas, Ahmed and Swoboda, Paul},
journal={arXiv preprint arXiv:2301.12159},
year={2023}
}