Skip to content

Conversation

ichsan2895
Copy link
Contributor

@ichsan2895 ichsan2895 commented May 29, 2025

Sometimes, I create a new feature with new bugs. This repo works well for single GPU when using --use-bilateral-grid and/or --use-fused-bilagrid.

When I use multi-GPU (for example CUDA_VISIBLE_DEVICES=0,1 python3 examples/simple_trainer.py mcmc --use-fused-bilagrid), it gives a brand new error:

Running garden
Distributed worker: 2 / 2
Warning: image_path not found for reconstruction
[Parser] 185 images, taken by 1 cameras.
Downscaling images by 4x from data/360_v2/garden/images to data/360_v2/garden/images_4_png.
Distributed worker: 1 / 2
Warning: image_path not found for reconstruction
[Parser] 185 images, taken by 1 cameras.
Downscaling images by 4x from data/360_v2/garden/images to data/360_v2/garden/images_4_png.
100%|██████████| 185/185 [00:00<00:00, 5715.49it/s]
100%|██████████| 185/185 [00:00<00:00, 5567.73it/s]
Scene scale: 1.2265263065808907
Model initialized. Number of GS: 69383
Scene scale: 1.2265263065808907
Model initialized. Number of GS: 69383
Traceback (most recent call last):
  File "/workspace/GSPLAT_152n3/gsplat-dev/examples/simple_trainer_ErrorReproduction.py", line 1268, in <module>
    cli(main, cfg, verbose=True)
  File "/usr/local/lib/python3.10/dist-packages/gsplat/distributed.py", line 344, in cli
    process_context.join()
  File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 163, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 74, in _wrap
    fn(i, *args)
  File "/usr/local/lib/python3.10/dist-packages/gsplat/distributed.py", line 295, in _distributed_worker
    fn(local_rank, world_rank, world_size, args)
  File "/workspace/GSPLAT_152n3/gsplat-dev/examples/simple_trainer_ErrorReproduction.py", line 1174, in main
    runner = Runner(local_rank, world_rank, world_size, cfg)
  File "/workspace/GSPLAT_152n3/gsplat-dev/examples/simple_trainer_ErrorReproduction.py", line 442, in __init__
    self.bil_grids = BilateralGrid(
NameError: name 'BilateralGrid' is not defined

This PR fixed that error. But my code is a little bit like spaghetti. Feels free for refactoring it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants