Skip to content

pelancha/galaxyHackers

Repository files navigation

galaxyHackers

Project aims to find and analyse galaxy clusters using AI in microwave and IK range

Installating Dependencies (UNIX)

We assume Python 3.10+ is installed. Dependencies can be managed via Poetry or pip.

Set Up Virtual Environment

  1. Navigate to the project directory:
    cd galaxyHackers

Installing Dependencies

Option 1: using Poetry 2.0.0+

  1. Enter the Poetry environment:

    poetry env activate
  2. Install the project dependencies:

    poetry install

Option 2: using pip

If Poetry doesn't suit your needs or it fails to fetch all required libraries, you can use pip to install all dependencies.

  1. (Optional) Set up and activate the virtual environment:

    python3.10 -m venv venv
    source ./venv/bin/activate
  2. Install the necessary packages using pip:

    pip install torch torchvision timm torch_optimizer tqdm
    pip install numpy pandas matplotlib scikit-learn Pillow
    pip install astropy astroquery pixell dynaconf wget
    pip install comet_ml h5py ultralytics mlflow

Training Models

Once dependencies are installed, you can start training the models.

Train a model by running the main.py script:

python3 -m galaxy.main --models MODEL_NAME --epochs NUM_EPOCH --data DATASET

Available flags:

Flag Description Default / Options
--models Model(s) to train Baseline, ResNet18, EfficientNet, DenseNet, SpinalNet_ResNet, SpinalNet_VGG, ViTL16, AlexNet_VGG, CNN_MLP, YOLO12n, YOLO12s, YOLO12m, YOLO12l, YOLO12x, Swin_Tiny, Swin_Small, Swin_Base, Swin_Large, SwinV2_Tiny, DavitTiny, DavitSmall (default: all except YOLO12s, YOLO12l, Swin_Small, Swin_Large, DavitSmall)
--epochs Number of training epochs Default: 5
--mm Momentum (for SGD or RMSprop) Default: 0.9
--optimizer Optimizer to use Adam, SGD, Rprop, NAdam, RAdam, AdamW, RMSprop, DiffGrad (default: AdamW)
--repoptimizer Wrap optimizer using RepOptimizer Optional (see paper Re-parameterizing Your Optimizers rather than Architectures)
--segment Generate segmentation maps after training Optional
--data Dataset to use WISE (W1/W2 bands) or ACT (90, 150, 220 GHz from LAMBDA) (default: WISE)
--seed Random seed (integer number) Default: 1
--comet Enable Comet.ml tracking Optional
--mlflow Enable MLflow tracking Optional

Important

Only one optimizer can be passed in command-line.

In case script fails to download any dataset, it will address Legacy Survey website to download data of the infrared bands W1, W2 (for Russia available with VPN only).

Learning rate is found automatically using LR range test, so it is not passed using command-line.

For more information see A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay

Running Multiple Models Simultaneously

You can run several models simultaneously with the same optimizer and learning rate.


Example Commands

Single model training:

python3 -m galaxy.main --models YOLO12n --epochs 5 --data ACT --segment

Multiple models training:

python3 -m galaxy.main --models AlexNet_VGG ResNet18 --epochs 20 --data WISE --segment
python3 -m galaxy.main --models --epochs 20 --data WISE --segment

Standalone Full-Sky Segmentation (ACT)

Full-sky ACT segmentation is no longer launched from galaxy.main. Use the standalone galaxy.full_sky module instead.

Note

If you use uv, replace python3 -m ... with uv run python -m ....

Available Commands

Start a new block-based full-sky run from an existing checkpoint:

python3 -m galaxy.full_sky start --checkpoint /abs/path/to/checkpoint.pth --model YOLO12n --optimizer AdamW --data ACT

Start a new run with a custom prediction-center step:

python3 -m galaxy.full_sky start --checkpoint /abs/path/to/checkpoint.pth --model YOLO12n --optimizer AdamW --data ACT --center-step 3

Resume an existing run:

python3 -m galaxy.full_sky resume --run /abs/path/to/storage/segmentation/full_sky/ACT/YOLO12n_AdamW_seed_1

Assemble a partial full-canvas FITS from completed blocks:

python3 -m galaxy.full_sky assemble-partial --run /abs/path/to/storage/segmentation/full_sky/ACT/YOLO12n_AdamW_seed_1

Render a PNG preview from the assembled partial FITS:

python3 -m galaxy.full_sky preview-partial --run /abs/path/to/storage/segmentation/full_sky/ACT/YOLO12n_AdamW_seed_1

Command Summary

Command What it does
start Creates a new self-contained run directory, copies the checkpoint there, calibrates throughput, splits the sky into blocks, and starts block-by-block inference.
resume Continues an existing run from its saved run_dir, using the checkpoint already stored inside that run.
assemble-partial Builds partial_probability_map.fits from all currently available completed block files, keeping missing blocks as NaN.
preview-partial Builds partial_probability_map.png from partial_probability_map.fits so the already computed sky can be inspected visually.

Important Options

Flag Description Default / Options
--checkpoint Path to saved model weights for a new run Required for start
--model Model architecture name Baseline, ResNet18, EfficientNet, DenseNet, SpinalNet_ResNet, SpinalNet_VGG, ViTL16, AlexNet_VGG, CNN_MLP, YOLO12n, YOLO12s, YOLO12m, YOLO12l, YOLO12x
--optimizer Optimizer name used in run identity SGD, Rprop, Adam, NAdam, RAdam, AdamW, RMSprop, DiffGrad
--data Dataset for standalone full-sky inference ACT only
--seed Seed used in run identity Default: 1
--center-step Prediction-center stride in pixels along both axes Default: 5
--run Path to an existing full-sky run directory Required for resume, assemble-partial, preview-partial

Typical Workflow

  1. Train a model with galaxy.main.
  2. Start a long full-sky ACT run with galaxy.full_sky start.
  3. If needed, continue it later with galaxy.full_sky resume.
  4. At any point, inspect the already computed part with:
    • galaxy.full_sky assemble-partial
    • galaxy.full_sky preview-partial

Logging

The script supports Comet.ml and MLflow for experiment tracking.

Comet.ml

  1. Rename .example.secrets.toml to .secrets.toml
  2. In .secrets.toml pass your Comet API to the varible COMET_API_KEY and the name of the workspace into COMET_WORKSPACE
  3. Enable --comet flag

Important

The script will not work without renaming .example.secrets.toml to .secrets.toml- this way you pass empty key.

MLflow

  1. Enable --mlflow flag

Full documentation can be viewed here.

About

Algorithm to classify galaxy clusters using various architectures: CNN, MLP and Transformer - and to compare their efficiencies on the infrared (IR) data of the WISE survey (W1, W2 bands) and the microwave data of ACT+Planck (f90, f150, f220 frequencies).

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors