TabSyn sampling fails on CPU-only machines despite --gpu -1 flag
Description
I'm unable to run TabSyn sampling on a machine without GPU access, even when explicitly setting --gpu -1 to force CPU usage. The sampling pipeline appears to have hardcoded CUDA dependencies that prevent CPU-only execution.
Steps to Reproduce
- Set up
tabsyn on a machine without CUDA GPUs
- Train a model successfully using
python main.py --dataname shoppers --method vae --mode train --gpu -1 and python main.py --dataname shoppers --method tabsyn --mode train --gpu -1
- Attempt to generate synthetic data using:
python main.py --dataname shoppers --method tabsyn --mode sample --gpu -1
Expected Behavior
The sampling should run successfully on CPU, generating synthetic data without requiring GPU access.
Actual Behavior
The following error occurs:
No NaNs in numerical features, skipping
Traceback (most recent call last):
File "/home/angel-escalante/Escritorio/tabsyn/main.py", line 15, in <module>
main_fn(args)
File "/home/angel-escalante/Escritorio/tabsyn/tabsyn/sample.py", line 39, in main
x_next = sample(model.denoise_fn_D, num_samples, sample_dim)
File "/home/angel-escalante/Escritorio/tabsyn/tabsyn/diffusion_utils.py", line 23, in sample
latents = torch.randn([num_samples, dim], device=device)
File "/home/angel-escalante/miniconda3/envs/tabsyn/lib/python3.10/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init
torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available
Analysis
I've identified several issues that seem to be causing this problem:
- Main entry point: Looking at
main.py, it seems like the device detection doesn't properly handle the --gpu -1 case
- Device parameter: The
sample() function in diffusion_utils.py appears to default to device='cuda:0' and this parameter might not be passed correctly from sample.py
- Model loading: The
torch.load() calls might be trying to load CUDA tensors without proper CPU mapping
Possible Solution Areas
I think the fix would involve:
- Fixing device detection in
main.py to properly respect --gpu -1
- Ensuring device parameter is passed correctly through the sampling pipeline
- Adding proper model loading with
map_location='cpu' when needed
Environment
- OS: Linux (Ubuntu)
- Python: 3.10
- PyTorch: CPU-only installation (no CUDA)
- Hardware: Machine without CUDA GPUs
- tabsyn: Latest version from
main branch
TabSyn sampling fails on CPU-only machines despite
--gpu -1flagDescription
I'm unable to run TabSyn sampling on a machine without GPU access, even when explicitly setting
--gpu -1to force CPU usage. The sampling pipeline appears to have hardcoded CUDA dependencies that prevent CPU-only execution.Steps to Reproduce
tabsynon a machine without CUDA GPUspython main.py --dataname shoppers --method vae --mode train --gpu -1andpython main.py --dataname shoppers --method tabsyn --mode train --gpu -1Expected Behavior
The sampling should run successfully on CPU, generating synthetic data without requiring GPU access.
Actual Behavior
The following error occurs:
Analysis
I've identified several issues that seem to be causing this problem:
main.py, it seems like the device detection doesn't properly handle the--gpu -1casesample()function indiffusion_utils.pyappears to default todevice='cuda:0'and this parameter might not be passed correctly fromsample.pytorch.load()calls might be trying to load CUDA tensors without proper CPU mappingPossible Solution Areas
I think the fix would involve:
main.pyto properly respect--gpu -1map_location='cpu'when neededEnvironment
mainbranch