This repository contains the official Python implementation of RMB-CLE, a robust multi-task learning framework that mitigates negative transfer through error-driven task clustering and cluster-wise local ensembling.
RMB-CLE addresses task heterogeneity in multi-task learning by:
- Estimating task similarity using cross-task generalization errors
- Discovering latent task clusters via hierarchical clustering
- Training cluster-specific local ensembles to enable selective knowledge sharing
- Preventing negative transfer across unrelated tasks
The framework is model-agnostic and is instantiated in this repository using LightGBM and MTGB-based boosting blocks.
The package is licensed under the GNU Lesser General Public License v2.1.
Clone the repository and install dependencies:
git clone https://github.com/your-repo/rmb-cle.git
cd rmb-cle
python -m venv venv
source venv/bin/activate
pip install -r requirements.txtTo generate toy datasets:
python gen_data.pyThis creates datasets in:
toy_datasets/{num_clusters}clusters/batch{batch}/Synthetic Data
import pandas as pd
train_df = pd.read_csv("train_reg.csv")
X = train_df.drop(columns=["Target"]).values
y = train_df["Target"].valuesReal Data
from read_real_data import ReadData
X_train, y_train, X_test, y_test = ReadData(
dataset="adult_gender",
random_state=42
)from rmb_cle import RMB_CLE
from sklearn.ensemble import HistGradientBoostingRegressor
model = RMB_CLE(
residual_model_cls=HistGradientBoostingRegressor,
task_model_cls=HistGradientBoostingRegressor,
residual_model_as_cls=True,
n_iter_1st=50,
n_iter_3rd=50,
max_iter=100,
learning_rate=0.1,
regression=True,
n_clusters="auto",
random_state=42
)
model.fit(X, y)
predictions = model.predict(X)Input Format (Important)
[Feature1, Feature2, ..., FeatureD, TaskID]- The last column must be the task identifier.
- Tasks are internally remapped to consecutive indices
If you use RMB-CLE in your research or work, please consider citing this project using the following citation format.
0.0.1
14 Jan 2026