When training own dataset, an error occurs when changing numberclasses to the corresponding category. If it is the default, it will report an error #42

hx358031364 · 2021-09-13T06:43:09Z

AMP not enabled. Training in float32.
Using native Torch DistributedDataParallel.
Scheduled epochs: 310
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [15,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
Traceback (most recent call last):
File "main.py", line 948, in
main()
File "main.py", line 664, in main
optimizers=optimizers)
File "main.py", line 782, in train_one_epoch
output = model(input)
File "/root/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/miniconda3/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 610, in forward
self._sync_params()
File "/root/miniconda3/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 1048, in _sync_params
authoritative_rank,
File "/root/miniconda3/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 979, in _distributed_broadcast_coalesced
self.process_group, tensors, buffer_size, authoritative_rank
RuntimeError: CUDA error: device-side assert triggered
terminate called after throwing an instance of 'std::runtime_error'
what(): NCCL error in: /pytorch/torch/lib/c10d/../c10d/NCCLUtils.hpp:136, unhandled cuda error, NCCL version 2.7.8

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When training own dataset, an error occurs when changing numberclasses to the corresponding category. If it is the default, it will report an error #42

When training own dataset, an error occurs when changing numberclasses to the corresponding category. If it is the default, it will report an error #42

hx358031364 commented Sep 13, 2021

When training own dataset, an error occurs when changing numberclasses to the corresponding category. If it is the default, it will report an error #42

When training own dataset, an error occurs when changing numberclasses to the corresponding category. If it is the default, it will report an error #42

Comments

hx358031364 commented Sep 13, 2021