You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I've tried mcmc strategy on smaller subset (about 542 images, 150, 000 initial point cloud), which works quite good. But when I'm using the whole dataset (about 973 images, 360, 000 initial point cloud). After about 6400 steps. It will raise cuda oom issue. And when I was trying to lower the cam_max to 500_000 for each GPU, the validation results would be blank images. I tested with the default strategy and it works well.
My log:
2024-11-12 14:31:14.833
Step 6200: Relocated 934401 GSs.
2024-11-12 14:31:14.833
Step 6200: Added 46996 GSs. Now having 986928 GSs.
2024-11-12 14:31:14.833
Step 6300: Relocated 984383 GSs.
2024-11-12 14:31:14.833
Step 6300: Added 13072 GSs. Now having 1000000 GSs.
2024-11-12 14:31:14.833
Step 6400: Relocated 995335 GSs.
2024-11-12 14:31:14.833
Step 6400: Added 0 GSs. Now having 1000000 GSs.
2024-11-12 14:31:18.561
Traceback (most recent call last):
2024-11-12 14:31:18.561
File "/aistudio/workspace/aigc/wangqihang013/aigc3d/repos/neural_rendering/high_quality/gsplat/examples/simple_trainer.py", line 1076, in
2024-11-12 14:31:18.575
cli(main, cfg, verbose=True)
2024-11-12 14:31:18.575
File "/aistudio/workspace/system-default/envs/gsplat/lib/python3.10/site-packages/gsplat/distributed.py", line 344, in cli
2024-11-12 14:31:18.579
process_context.join()
2024-11-12 14:31:18.579
File "/aistudio/workspace/system-default/envs/gsplat/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 163, in join
2024-11-12 14:31:18.580
raise ProcessRaisedException(msg, error_index, failed_process.pid)
2024-11-12 14:31:18.580
torch.multiprocessing.spawn.ProcessRaisedException:
2024-11-12 14:31:18.580
2024-11-12 14:31:18.580
-- Process 0 terminated with the following error:
2024-11-12 14:31:18.580
Traceback (most recent call last):
2024-11-12 14:31:18.580
File "/aistudio/workspace/system-default/envs/gsplat/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 74, in _wrap
2024-11-12 14:31:18.580
fn(i, *args)
2024-11-12 14:31:18.580
File "/aistudio/workspace/system-default/envs/gsplat/lib/python3.10/site-packages/gsplat/distributed.py", line 295, in _distributed_worker
2024-11-12 14:31:18.580
fn(local_rank, world_rank, world_size, args)
2024-11-12 14:31:18.580
File "/aistudio/workspace/aigc/wangqihang013/aigc3d/repos/neural_rendering/high_quality/gsplat/examples/simple_trainer.py", line 1021, in main
2024-11-12 14:31:18.580
runner.train()
2024-11-12 14:31:18.580
File "/aistudio/workspace/aigc/wangqihang013/aigc3d/repos/neural_rendering/high_quality/gsplat/examples/simple_trainer.py", line 589, in train
2024-11-12 14:31:18.580
renders, alphas, info = self.rasterize_splats(
2024-11-12 14:31:18.580
File "/aistudio/workspace/aigc/wangqihang013/aigc3d/repos/neural_rendering/high_quality/gsplat/examples/simple_trainer.py", line 469, in rasterize_splats
2024-11-12 14:31:18.580
render_colors, render_alphas, info = rasterization(
2024-11-12 14:31:18.580
File "/aistudio/workspace/system-default/envs/gsplat/lib/python3.10/site-packages/gsplat/rendering.py", line 497, in rasterization
2024-11-12 14:31:18.580
tiles_per_gauss, isect_ids, flatten_ids = isect_tiles(
2024-11-12 14:31:18.580
File "/aistudio/workspace/system-default/envs/gsplat/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-11-12 14:31:18.580
return func(*args, **kwargs)
2024-11-12 14:31:18.580
File "/aistudio/workspace/system-default/envs/gsplat/lib/python3.10/site-packages/gsplat/cuda/_wrapper.py", line 382, in isect_tiles
2024-11-12 14:31:18.580
tiles_per_gauss, isect_ids, flatten_ids = _make_lazy_cuda_func("isect_tiles")(
2024-11-12 14:31:18.580
File "/aistudio/workspace/system-default/envs/gsplat/lib/python3.10/site-packages/gsplat/cuda/_wrapper.py", line 14, in call_cuda
2024-11-12 14:31:18.580
return getattr(_C, name)(*args, **kwargs)
2024-11-12 14:31:18.580
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.24 GiB. GPU 0 has a total capacty of 39.42 GiB of which 3.14 GiB is free. Process 126936 has 36.29 GiB memory in use. Of the allocated memory 30.47 GiB is allocated by PyTorch, and 2.55 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
The text was updated successfully, but these errors were encountered:
Alright, I've known why this could happen. Because my dataset was twice larger than the previous one. And there are many noises in my point cloud. Consequently, the mcmc strategy under current configs will produce very large scale gaussians which prevent the gaussians to fit the scene. And according to this issue:
Hi, I've tried mcmc strategy on smaller subset (about 542 images, 150, 000 initial point cloud), which works quite good. But when I'm using the whole dataset (about 973 images, 360, 000 initial point cloud). After about 6400 steps. It will raise cuda oom issue. And when I was trying to lower the cam_max to 500_000 for each GPU, the validation results would be blank images. I tested with the default strategy and it works well.
My command:
CUDA_VISIBLE_DEVICES=0, 1, 2, 3, 4, 5
python examples/simple_trainer.py mcmc
--data_dir {My_DATASET_DIR}
--data_factor 1
--result_dir ./results/{MY_OUTPUT_DIR}
--max_steps 50_000
--eval_steps 7_000 30_000 40_000 50_000
--save_steps 7_000 30_000 40_000 50_000
--use_bilateral_grid \
My log:
2024-11-12 14:31:14.833
Step 6200: Relocated 934401 GSs.
2024-11-12 14:31:14.833
Step 6200: Added 46996 GSs. Now having 986928 GSs.
2024-11-12 14:31:14.833
Step 6300: Relocated 984383 GSs.
2024-11-12 14:31:14.833
Step 6300: Added 13072 GSs. Now having 1000000 GSs.
2024-11-12 14:31:14.833
Step 6400: Relocated 995335 GSs.
2024-11-12 14:31:14.833
Step 6400: Added 0 GSs. Now having 1000000 GSs.
2024-11-12 14:31:18.561
Traceback (most recent call last):
2024-11-12 14:31:18.561
File "/aistudio/workspace/aigc/wangqihang013/aigc3d/repos/neural_rendering/high_quality/gsplat/examples/simple_trainer.py", line 1076, in
2024-11-12 14:31:18.575
cli(main, cfg, verbose=True)
2024-11-12 14:31:18.575
File "/aistudio/workspace/system-default/envs/gsplat/lib/python3.10/site-packages/gsplat/distributed.py", line 344, in cli
2024-11-12 14:31:18.579
process_context.join()
2024-11-12 14:31:18.579
File "/aistudio/workspace/system-default/envs/gsplat/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 163, in join
2024-11-12 14:31:18.580
raise ProcessRaisedException(msg, error_index, failed_process.pid)
2024-11-12 14:31:18.580
torch.multiprocessing.spawn.ProcessRaisedException:
2024-11-12 14:31:18.580
2024-11-12 14:31:18.580
-- Process 0 terminated with the following error:
2024-11-12 14:31:18.580
Traceback (most recent call last):
2024-11-12 14:31:18.580
File "/aistudio/workspace/system-default/envs/gsplat/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 74, in _wrap
2024-11-12 14:31:18.580
fn(i, *args)
2024-11-12 14:31:18.580
File "/aistudio/workspace/system-default/envs/gsplat/lib/python3.10/site-packages/gsplat/distributed.py", line 295, in _distributed_worker
2024-11-12 14:31:18.580
fn(local_rank, world_rank, world_size, args)
2024-11-12 14:31:18.580
File "/aistudio/workspace/aigc/wangqihang013/aigc3d/repos/neural_rendering/high_quality/gsplat/examples/simple_trainer.py", line 1021, in main
2024-11-12 14:31:18.580
runner.train()
2024-11-12 14:31:18.580
File "/aistudio/workspace/aigc/wangqihang013/aigc3d/repos/neural_rendering/high_quality/gsplat/examples/simple_trainer.py", line 589, in train
2024-11-12 14:31:18.580
renders, alphas, info = self.rasterize_splats(
2024-11-12 14:31:18.580
File "/aistudio/workspace/aigc/wangqihang013/aigc3d/repos/neural_rendering/high_quality/gsplat/examples/simple_trainer.py", line 469, in rasterize_splats
2024-11-12 14:31:18.580
render_colors, render_alphas, info = rasterization(
2024-11-12 14:31:18.580
File "/aistudio/workspace/system-default/envs/gsplat/lib/python3.10/site-packages/gsplat/rendering.py", line 497, in rasterization
2024-11-12 14:31:18.580
tiles_per_gauss, isect_ids, flatten_ids = isect_tiles(
2024-11-12 14:31:18.580
File "/aistudio/workspace/system-default/envs/gsplat/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2024-11-12 14:31:18.580
return func(*args, **kwargs)
2024-11-12 14:31:18.580
File "/aistudio/workspace/system-default/envs/gsplat/lib/python3.10/site-packages/gsplat/cuda/_wrapper.py", line 382, in isect_tiles
2024-11-12 14:31:18.580
tiles_per_gauss, isect_ids, flatten_ids = _make_lazy_cuda_func("isect_tiles")(
2024-11-12 14:31:18.580
File "/aistudio/workspace/system-default/envs/gsplat/lib/python3.10/site-packages/gsplat/cuda/_wrapper.py", line 14, in call_cuda
2024-11-12 14:31:18.580
return getattr(_C, name)(*args, **kwargs)
2024-11-12 14:31:18.580
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.24 GiB. GPU 0 has a total capacty of 39.42 GiB of which 3.14 GiB is free. Process 126936 has 36.29 GiB memory in use. Of the allocated memory 30.47 GiB is allocated by PyTorch, and 2.55 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
The text was updated successfully, but these errors were encountered: