Skip to content

CPU-only sampling fails with CUDA error #40

@AFEScalante

Description

@AFEScalante

TabSyn sampling fails on CPU-only machines despite --gpu -1 flag

Description

I'm unable to run TabSyn sampling on a machine without GPU access, even when explicitly setting --gpu -1 to force CPU usage. The sampling pipeline appears to have hardcoded CUDA dependencies that prevent CPU-only execution.

Steps to Reproduce

  1. Set up tabsyn on a machine without CUDA GPUs
  2. Train a model successfully using python main.py --dataname shoppers --method vae --mode train --gpu -1 and python main.py --dataname shoppers --method tabsyn --mode train --gpu -1
  3. Attempt to generate synthetic data using:
python main.py --dataname shoppers --method tabsyn --mode sample --gpu -1

Expected Behavior

The sampling should run successfully on CPU, generating synthetic data without requiring GPU access.

Actual Behavior

The following error occurs:

No NaNs in numerical features, skipping
Traceback (most recent call last):
  File "/home/angel-escalante/Escritorio/tabsyn/main.py", line 15, in <module>
    main_fn(args)
  File "/home/angel-escalante/Escritorio/tabsyn/tabsyn/sample.py", line 39, in main
    x_next = sample(model.denoise_fn_D, num_samples, sample_dim)
  File "/home/angel-escalante/Escritorio/tabsyn/tabsyn/diffusion_utils.py", line 23, in sample
    latents = torch.randn([num_samples, dim], device=device)
  File "/home/angel-escalante/miniconda3/envs/tabsyn/lib/python3.10/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available

Analysis

I've identified several issues that seem to be causing this problem:

  1. Main entry point: Looking at main.py, it seems like the device detection doesn't properly handle the --gpu -1 case
  2. Device parameter: The sample() function in diffusion_utils.py appears to default to device='cuda:0' and this parameter might not be passed correctly from sample.py
  3. Model loading: The torch.load() calls might be trying to load CUDA tensors without proper CPU mapping

Possible Solution Areas

I think the fix would involve:

  1. Fixing device detection in main.py to properly respect --gpu -1
  2. Ensuring device parameter is passed correctly through the sampling pipeline
  3. Adding proper model loading with map_location='cpu' when needed

Environment

  • OS: Linux (Ubuntu)
  • Python: 3.10
  • PyTorch: CPU-only installation (no CUDA)
  • Hardware: Machine without CUDA GPUs
  • tabsyn: Latest version from main branch

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions