GitHub - adi2355/Seismic-Velocity-Inversion: Deep learning system for seismic velocity inversion using a SincNet-GAT-UNet architecture, physics-informed temporal encoding, graph attention fusion, and FiLM-conditioned spatial prediction

Overview

A deep learning system for seismic velocity inversion — the geophysical inverse problem of predicting subsurface velocity structure from surface-recorded seismic waveforms. The architecture is a three-stage pipeline: a SincNet temporal encoder extracts physics-informed frequency features from raw shot gathers, a Graph Attention Network fuses multi-shot spatial relationships through learned attention, and a U-Net decoder with FiLM conditioning generates high-resolution 2D velocity maps.

The model processes 5 shot gathers per sample (10,001 time steps across 31 receivers each) and outputs a full 300 x 1,259 subsurface velocity field. The champion configuration achieves ~0.0655% MAPE on the held-out evaluation set.

Technology Stack

Core Framework
Graph Neural Networks
Signal Processing
Loss & Optimization
Compute
Visualization

Engineering Principles

1. Encode domain physics directly into learnable parameters

The SincNet encoder replaces generic 1D convolutions with parametric sinc-function bandpass filters. Each filter learns its own center frequency and bandwidth — the network discovers which seismic frequency bands carry velocity information rather than learning arbitrary kernel weights. Blackman windowing suppresses spectral leakage. Logarithmic filter initialization allocates finer resolution to the lower frequencies where seismic signals concentrate energy.

Goal: Impose physically meaningful spectral constraints while preserving full gradient-based adaptability.

2. Model spatial relationships through structure, not concatenation

Five shot gathers recorded at different source positions encode complementary views of the same subsurface. Rather than concatenating these views and hoping a CNN disentangles them, the Graph Attention Network treats each shot as a node in a fully-connected graph. GATv2 attention heads learn which shot pairs carry complementary information for each spatial region. The graph pooling stage produces a single fused embedding that captures multi-offset spatial dependencies.

Goal: Let the network learn inter-shot importance explicitly through attention, not implicitly through convolution.

3. Condition the decoder without corrupting it

The U-Net decoder is a proven architecture for dense spatial prediction. FiLM (Feature-wise Linear Modulation) injects the GAT-fused context at the bottleneck via learned scale and shift parameters, with a residual formulation: output = target + (gamma * target + beta). The gamma and beta projections are zero-initialized, so the FiLM layer acts as an identity function at training start. The U-Net begins training as if unconditional, and gradually integrates multi-shot context as the FiLM parameters diverge from zero.

Goal: Integrate global context without destabilizing the spatial decoder's convergence.

4. Compose losses that capture complementary error modalities

A single loss function cannot simultaneously optimize pixel-level accuracy, structural fidelity, and geological smoothness. The training pipeline combines log-space MAE (scale-invariant accuracy), multi-scale structural similarity (spatial coherence), and anisotropic total variation (geological layering bias) through SoftAdapt-weighted combination with curriculum warmup. The loss composition adapts its own weighting schedule based on relative improvement rates.

Goal: Each loss component corrects failure modes the others miss, with adaptive balancing that prevents any single term from dominating.

Model Architecture

SincNet Temporal Encoder

Each of the 5 shot gathers is processed independently through a domain-adapted SincNet layer. The SincConv1d_SeismicAdapted module implements 60 learnable bandpass filters with 1,001-point kernels operating at stride 1 — a critical anti-aliasing decision that preserves temporal fidelity at the cost of compute. Filter frequencies are parameterized by learnable low_hz and band_hz variables normalized against the Nyquist frequency.

The raw SincNet output feeds into a hierarchical 2D CNN aggregator (PerShotTemporalEncoder) that reshapes the 1D temporal features into a 2D time-receiver grid and applies four progressive pooling stages with factors [5, 5, 4, 2] temporally and [2, 2, 2, 1] spatially. Anti-aliased downsampling via BlurPool2D (Gaussian pre-filtering) prevents aliasing artifacts at each stage. Each shot produces a 128-dimensional embedding.

Key parameters:

Parameter	Value	Rationale
Filters	60	Optimal spectral coverage without redundancy
Kernel size	1,001	Low-frequency resolution down to ~10 Hz
Stride	1	Eliminates aliasing (critical fix over stride > 1)
Frequency range	40–1,000 Hz	Matches seismic signal spectral content
Window	Blackman	Superior side-lobe suppression
Initialization	Logarithmic	Finer resolution at lower frequencies

Graph Attention Fusion

The 5 shot embeddings form a fully-connected graph of 5 nodes. LightweightGATFusion applies a single GATv2Conv layer with 4 attention heads (32 dimensions per head), feature dropout of 0.3, and attention dropout of 0.2. GATv2's dynamic attention mechanism computes attention coefficients as a function of both source and target node features, enabling the network to learn asymmetric shot-pair relationships.

A GlobalAttention pooling layer with a learned gate network produces the final 128-dimensional graph-level embedding. This fused representation encodes multi-offset spatial dependencies — the network determines which source positions contribute most to each prediction.

U-Net with FiLM Conditioning

BaselineUNet implements a 4-stage encoder-decoder with asymmetric pooling factors ((4,2), (4,2), (5,2), (5,2)) designed for the non-square input geometry. The encoder compresses through progressively deeper feature maps up to 512 channels at the bottleneck. The decoder mirrors this with bilinear upsampling and skip connections. A final interpolation layer maps the output to the target resolution of 300 x 1,259.

At the bottleneck, ResidualFiLMLayer modulates the 512-channel feature maps using gamma and beta parameters projected from the 128-dimensional GAT context via a 2-layer MLP (hidden dimension 256). The residual formulation and zero-initialization ensure training stability — the decoder starts as a pure U-Net and progressively incorporates multi-shot context.

Data flow:

Input (B, 5, 10001, 31)
  → SincNet per shot → 5 × 128-dim embeddings
    → GAT fusion → 128-dim context vector
      → LayerNorm → FiLM modulation at U-Net bottleneck
        → U-Net decoder → Output (B, 1, 300, 1259)

Training Pipeline

Hybrid Loss Function

The champion model uses RefinedLogSpaceMAEHybridLoss with three components at fixed weights:

Component	Weight	Function
`AdaptiveLogSpaceMAE`	1.0	Scale-invariant pixel accuracy with momentum-based adaptive offset
`StabilizedSeismicMSSSIM`	0.12	Multi-scale structural similarity in log-space with A100 stability fixes
`AnisotropicTotalVariationLoss`	0.007	Asymmetric regularization (horizontal=1.0, vertical=0.3) for geological layering

The loss pipeline includes SoftAdapt adaptive weighting and curriculum warmup (log-MAE only for initial epochs). FiLM parameter regularization (L2 on residual gamma and beta) prevents the conditioning pathway from overpowering the spatial decoder.

Optimization

Optimizer: AdamW with Sharpness-Aware Minimization (SAM) for flatter loss landscape convergence
Schedule: Plateau detection transitioning to cosine annealing
Precision: Mixed-precision training with TF32 disabled for A100 numerical stability
Checkpointing: Best-MAPE model selection with Google Drive persistence

Hardest Problems Solved

1. Anti-Aliased Temporal Processing at Full Resolution

Early experiments with strided SincNet convolutions produced aliased frequency representations — the learnable bandpass filters captured correct center frequencies but the downsampled output folded high-frequency content into lower bands. The fix required processing 10,001 time steps at stride 1 (a significant compute cost), combined with BlurPool1D/BlurPool2D Gaussian pre-filtering before every spatial reduction. The model is alias-free from input to embedding.

Resolution: Stride-1 SincNet convolution with anti-aliased progressive pooling. Correct signal processing at every stage, regardless of cost.

2. Stable Multi-Scale Loss on A100 Hardware

The multi-scale structural similarity loss produced NaN gradients on A100 GPUs due to TF32 precision interactions with small-valued intermediate computations in the MS-SSIM kernel. Disabling TF32 globally degraded training throughput. The solution was targeted: stabilize the SSIM computation with epsilon guards and min-clamping in the log-space transform while keeping TF32 disabled only for the loss computation path. This preserved full A100 training speed for the forward and backward passes.

Resolution: Surgical numerical stabilization at the loss boundary, not global precision downgrade. Training speed preserved.

3. Zero-Shot FiLM Integration Without Warm-Start

Naively adding FiLM conditioning to a pre-trained U-Net destroyed convergence — the randomly initialized gamma/beta parameters immediately distorted learned feature maps. The ResidualFiLMLayer solves this through zero-initialization of the final projection layers. At step 0, gamma_res = 0 and beta_res = 0, so the FiLM output equals the input. The conditioning signal emerges gradually as training progresses, never shocking the decoder. This enabled direct fine-tuning of a pre-trained U-Net with FiLM conditioning added at the bottleneck.

Resolution: Zero-initialized residual FiLM conditioning. The U-Net starts unconditional and learns to integrate context without losing pre-trained spatial representations.

Layering and System Domains

Layer	Responsibility	Key Modules
Temporal Encoding	Physics-informed frequency decomposition of raw waveforms	`SincConv1d_SeismicAdapted`, `BlurPool1D`, `PerShotTemporalEncoder`
Spatial Fusion	Multi-shot relationship learning via graph attention	`ShotGraphBuilder`, `LightweightGATFusion`, `GlobalAttention`
Dense Prediction	High-resolution velocity map generation with context conditioning	`BaselineUNet`, `ResidualFiLMLayer`
Loss Composition	Multi-objective training with adaptive weighting	`AdaptiveLogSpaceMAE`, `StabilizedSeismicMSSSIM`, `AnisotropicTotalVariationLoss`
Optimization	Flat-minima seeking with curriculum scheduling	SAM optimizer, plateau-to-cosine schedule, FiLM regularization
Inference	Deterministic prediction with exact parameter recovery	`corrected_inference_pipeline.py`, champion checkpoint loading

Champion Model Configuration

CHAMPION_CONFIG = {
    # SincNet Temporal Encoder
    'sinc_out_channels': 60,
    'sinc_kernel_size': 1001,
    'sinc_stride': 1,
    'sinc_min_low_hz': 40,
    'sinc_max_learnable_hz': 1000,
    'sinc_min_band_hz': 10,
    'sinc_window_func': 'blackman',
    'sinc_init_type': 'logarithmic',
    'shot_embedding_dim': 128,

    # GAT Fusion
    'gat_hidden_per_head': 32,
    'gat_num_heads': 4,
    'gat_layers': 1,
    'gat_dropout_feat': 0.3,
    'gat_dropout_attn': 0.2,
    'fused_embedding_dim': 128,

    # U-Net Decoder
    'n_unet_output_channels': 1,
    'unet_bilinear': True,
    'unet_bottleneck_channels': 512,

    # FiLM Conditioning (CRITICAL: must be '2_layer', not 'linear')
    'film_context_dim': 128,
    'film_target_channels': 512,
    'film_generator_mlp_type': '2_layer',
    'film_mlp_hidden_dim': 256,

    # Loss Weights
    'loss_weights': [1.0, 0.12, 0.007],  # [LogMAE, MS-SSIM, ATV]

    # Input Specification
    'sample_rate': 10001,
    'num_receivers': 31,
    'num_shots': 5,
    'source_coordinates': [1, 75, 150, 225, 300],
}

Checkpoint: cfg_06_plateau_to_cosine_PhaseB_FiLMFinetune_best_mape.pth

Repository Structure

├── sincnet_seismic_encoder.py              # SincNet temporal encoder with anti-aliased pooling
├── seismic_gat_fusion.py                   # Graph Attention Network for multi-shot fusion
├── complete_sincgat_unet_integration.py    # Full SincNet-GAT-UNet pipeline with FiLM conditioning
├── phase2_experimental_framework.py        # Training pipeline: losses, optimization, experiments
├── phase2_integration_notebook_cell.py     # Integration utilities for notebook-based training
├── corrected_inference_pipeline.py         # Champion model inference with exact parameters
├── colab_ready_inference_pipeline.py       # Colab-compatible inference variant
├── enhanced_experimental_suite_with_checkpointing.py  # Experiment runner with Drive checkpointing
├── download_experimental_results.py        # Result download and analysis utilities
├── utils.py                                # Shared utilities and helper functions
├── INFERENCE_PIPELINE_SUMMARY.md           # Critical findings from inference pipeline construction
├── MAIN_898_with_diagnostic_framework.ipynb          # Primary training notebook with diagnostics
├── MAIN_898of_0_898model_speed_and_structure_starter_notebook.ipynb  # Competition starter notebook
└── collectCode.mjs                         # Code collection utility

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Technology Stack

Engineering Principles

1. Encode domain physics directly into learnable parameters

2. Model spatial relationships through structure, not concatenation

3. Condition the decoder without corrupting it

4. Compose losses that capture complementary error modalities

Model Architecture

SincNet Temporal Encoder

Graph Attention Fusion

U-Net with FiLM Conditioning

Training Pipeline

Hybrid Loss Function

Optimization

Hardest Problems Solved

1. Anti-Aliased Temporal Processing at Full Resolution

2. Stable Multi-Scale Loss on A100 Hardware

3. Zero-Shot FiLM Integration Without Warm-Start

Layering and System Domains

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
__pycache__		__pycache__
.gitignore		.gitignore
0_898model_speed_and_structure_starter_notebook.py		0_898model_speed_and_structure_starter_notebook.py
INFERENCE_PIPELINE_SUMMARY.md		INFERENCE_PIPELINE_SUMMARY.md
MAIN_898_with_diagnostic_framework.ipynb		MAIN_898_with_diagnostic_framework.ipynb
MAIN_898of_0_898model_speed_and_structure_starter_notebook.ipynb		MAIN_898of_0_898model_speed_and_structure_starter_notebook.ipynb
README.md		README.md
another_copy_of_main_898of_0_898model_speed_and_structure_starter_notebook.py		another_copy_of_main_898of_0_898model_speed_and_structure_starter_notebook.py
code-collection.txt		code-collection.txt
colab_ready_inference_pipeline.py		colab_ready_inference_pipeline.py
collectCode.mjs		collectCode.mjs
complete_sincgat_unet_integration.py		complete_sincgat_unet_integration.py
copy_of_main_898of_0_898model_speed_and_structure_starter_notebook.py		copy_of_main_898of_0_898model_speed_and_structure_starter_notebook.py
corrected_inference_pipeline.py		corrected_inference_pipeline.py
download_experimental_results.py		download_experimental_results.py
enhanced_experimental_suite_with_checkpointing.py		enhanced_experimental_suite_with_checkpointing.py
main_898of_0_898model_speed_and_structure_starter_notebook.py		main_898of_0_898model_speed_and_structure_starter_notebook.py
phase2_experimental_framework.py		phase2_experimental_framework.py
phase2_integration_notebook_cell.py		phase2_integration_notebook_cell.py
seismic_gat_fusion.py		seismic_gat_fusion.py
sincnet_seismic_encoder.py		sincnet_seismic_encoder.py
terminal-bottom-panel.svg		terminal-bottom-panel.svg
terminal-top-panel.svg		terminal-top-panel.svg
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

Overview

Technology Stack

Engineering Principles

1. Encode domain physics directly into learnable parameters

2. Model spatial relationships through structure, not concatenation

3. Condition the decoder without corrupting it

4. Compose losses that capture complementary error modalities

Model Architecture

SincNet Temporal Encoder

Graph Attention Fusion

U-Net with FiLM Conditioning

Training Pipeline

Hybrid Loss Function

Optimization

Hardest Problems Solved

1. Anti-Aliased Temporal Processing at Full Resolution

2. Stable Multi-Scale Loss on A100 Hardware

3. Zero-Shot FiLM Integration Without Warm-Start

Layering and System Domains

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages