Nanocluster Transformers

This repository contains the code and resources for the study on atomically precise 13-atom icosahedral nanoclusters built from Group IV hosts (Ti, Zr, Hf) doped with a single transition-metal (3d/4d/5d). The project utilizes a TabTransformer-based model (Feature Tokenizer Transformer) for property prediction, enhanced with uncertainty quantification via Conformal Prediction and explainability via SHAP.

Features

FT-Transformer Model: A deep learning model designed for tabular data, using feature tokenization and transformer blocks.
Conformal Prediction: Provides statistically valid uncertainty intervals for predictions, including Mondrian conformal prediction for grouped data.
Explainable AI (XAI):
- SHAP Analysis: Feature importance and contribution analysis.
- Embedding Analysis: KNN-based search and visualization of model representations.
Out-of-Distribution (OOD) Detection: Analysis of model performance on data outside the training distribution.
Automated Pipeline: End-to-end pipeline for training, fine-tuning, and generating scientific reports.
Hyperparameter Optimization: Integrated with Optuna for efficient hyperparameter search.

Project Structure

The codebase is organized as follows:

engine/: Core Python package containing the application logic.
- analysis/: Modules for conformal prediction, OOD analysis, residuals, and scientific reporting.
- config/: Configuration settings and model parameters.
- data/: Data loading, processing, and feature engineering.
- modeling/: Model definitions (FTTransformer, contrastive learning).
- pipeline/: Pipeline orchestration (pipeline_launcher.py, training loops).
- xai/: Explainability tools (SHAP, embeddings).
- ohp_search/: Optuna hyperparameter search logic.
scripts/: Standalone scripts for specific tasks (plotting, advanced analysis, utilities).
full_model/: Contains trained model artifacts, weights, and hyperparameter configurations.
processed_data/: Directory for storing enriched and processed datasets.

Installation

Ensure you have a Python environment set up (Python 3.8+ recommended). Key dependencies include:

tensorflow
numpy
pandas
scikit-learn
optuna
shap
dask
matplotlib
seaborn
sqlalchemy

Usage

Running the Pipeline

The main entry point for training and analysis is the pipeline launcher. You can run it using the following command:

python -m engine.pipeline.pipeline_launcher --hparams_json full_model/hparams_final.json

Common Arguments:

--hparams_json: Path to the hyperparameters JSON file.
--base_model_dir: Path to an existing base model directory (to skip full training).
--skip: Comma-separated list of stages to skip (e.g., shap_analysis,ood_analysis).
--enable: Comma-separated list of optional stages to enable (e.g., post_cp_ood,embedding_analysis).
--run_dir: Directory to store run artifacts.

Example Commands

Run full pipeline:

python -m engine.pipeline.pipeline_launcher --hparams_json full_model/hparams_final.json

Run only fine-tuning and analysis (using existing base model):

python -m engine.pipeline.pipeline_launcher --base_model_dir full_model --skip shap_analysis

Run with advanced XAI features:

python -m engine.pipeline.pipeline_launcher --hparams_json full_model/hparams_final.json --enable embedding_analysis,consolidated_report

Scripts

The scripts/ directory contains utilities for various tasks. For example, to visualize model weights:

python scripts/inspect_model_weights.py --model_dir full_model

To run pretraining:

python scripts/pretrain_contrastive_multitask.py ...

Configuration

Global settings and paths are defined in engine/config/config.py and engine/config/models.py. You can adjust data paths and model defaults there.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
engine		engine
full_model		full_model
processed_data		processed_data
rerun		rerun
scripts		scripts
Newarch-t6-trial33.json		Newarch-t6-trial33.json
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nanocluster Transformers

Features

Project Structure

Installation

Usage

Running the Pipeline

Example Commands

Scripts

Configuration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Nanocluster Transformers

Features

Project Structure

Installation

Usage

Running the Pipeline

Example Commands

Scripts

Configuration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages