RMB-CLE: Robust Multi-Task Boosting using Clustering and Local Ensembling

This repository contains the official Python implementation of RMB-CLE, a robust multi-task learning framework that mitigates negative transfer through error-driven task clustering and cluster-wise local ensembling.

Overview

RMB-CLE addresses task heterogeneity in multi-task learning by:

Estimating task similarity using cross-task generalization errors
Discovering latent task clusters via hierarchical clustering
Training cluster-specific local ensembles to enable selective knowledge sharing
Preventing negative transfer across unrelated tasks

The framework is model-agnostic and is instantiated in this repository using LightGBM and MTGB-based boosting blocks.

License

The package is licensed under the GNU Lesser General Public License v2.1.

Installation

Clone the repository and install dependencies:

git clone https://github.com/your-repo/rmb-cle.git
cd rmb-cle

python -m venv venv
source venv/bin/activate 

pip install -r requirements.txt

How to Run

To generate toy datasets:

python gen_data.py

This creates datasets in:

toy_datasets/{num_clusters}clusters/batch{batch}/

Load Data

Synthetic Data

import pandas as pd

train_df = pd.read_csv("train_reg.csv")

X = train_df.drop(columns=["Target"]).values
y = train_df["Target"].values

Real Data

from read_real_data import ReadData

X_train, y_train, X_test, y_test = ReadData(
    dataset="adult_gender",
    random_state=42
)

Train RMB-CLE

from rmb_cle import RMB_CLE
from sklearn.ensemble import HistGradientBoostingRegressor

model = RMB_CLE(
    residual_model_cls=HistGradientBoostingRegressor,
    task_model_cls=HistGradientBoostingRegressor,
    residual_model_as_cls=True,
    n_iter_1st=50,
    n_iter_3rd=50,
    max_iter=100,
    learning_rate=0.1,
    regression=True,
    n_clusters="auto",
    random_state=42
)

model.fit(X, y)
predictions = model.predict(X)

Input Format (Important)

[Feature1, Feature2, ..., FeatureD, TaskID]

The last column must be the task identifier.
Tasks are internally remapped to consecutive indices

Citations

If you use RMB-CLE in your research or work, please consider citing this project using the following citation format.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
datautil		datautil
libs		libs
real_case_datasets		real_case_datasets
rmb_cle		rmb_cle
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RMB-CLE: Robust Multi-Task Boosting using Clustering and Local Ensembling

Overview

License

Installation

How to Run

Load Data

Train RMB-CLE

Citations

Authors

Release Information

Version

Updated

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RMB-CLE: Robust Multi-Task Boosting using Clustering and Local Ensembling

Overview

License

Installation

How to Run

Load Data

Train RMB-CLE

Citations

Authors

Release Information

Version

Updated

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages