-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failures of anisotropy correction of half-maps with tutorial data #17
Comments
Hi,
The error I believe comes from these lines in
spIsoNet/models/network_n2n.py:
if torch.__version__ >= "2.0.0":
GPU_capability = torch.cuda.get_device_capability()
if GPU_capability[0] >= 7:
torch.set_float32_matmul_precision('high')
model = torch.compile(model)
Would you comment out these lines and try again?
…On Wednesday, 20 November 2024, suqi-z ***@***.***> wrote:
*The data and command are the same as in the tutorial, but when running
Anisotropy Correction of half-maps, it reports the following problem:*
spisonet.py reconstruct emd_8731_half_map_1.mrc emd_8731_half_map_2.mrc
--aniso_file FSC3D.mrc --mask emd_8731_msk_1.mrc --limit_res 3.5 --epochs
30 --alpha 1 --beta 0.5 --output_dir isonet_maps --gpuID 0,1,2,3
--acc_batches 2
11-20 00:52:42, INFO voxel_size 1.309999942779541
11-20 00:52:43, INFO spIsoNet correction until resolution 3.5A!
Information beyond 3.5A remains unchanged
11-20 00:52:57, INFO Start preparing subvolumes!
11-20 00:53:06, INFO Done preparing subvolumes!
11-20 00:53:06, INFO Start training!
11-20 00:53:09, INFO Port number: 51405
learning rate 0.0003
['isonet_maps/emd_8731_half_map_1_data', 'isonet_maps/emd_8731_half_
map_2_data']
0%| | 0/125 [00:00<?, ?batch/s][rank1]:W1120 00:53:26.205000
139648681686848 torch/_logging/_internal.py:1034] [0/0] Profiler function
<class 'torch.autograd.profiler.record_function'> will be ignored
[rank3]:W1120 00:53:26.225000 140587963545408 torch/_logging/_internal.py:1034]
[0/0] Profiler function <class 'torch.autograd.profiler.record_function'>
will be ignored
[rank0]:W1120 00:53:26.263000 140600396216128 torch/_logging/_internal.py:1034]
[0/0] Profiler function <class 'torch.autograd.profiler.record_function'>
will be ignored
[rank2]:W1120 00:53:26.357000 139692869912384 torch/
*logging/internal.py:1034] [0/0] Profiler function <class
'torch.autograd.profiler.record_function'> will be ignored
/tmp/tmpb9ffwjno/main.c:6:23: fatal error: stdatomic.h: No such file or
directory #include <stdatomic.h> ^ compilation terminated.
/tmp/tmpl5yntb4i/main.c:6:23: fatal error: stdatomic.h: No such file or
directory #include <stdatomic.h> ^ compilation terminated.
/tmp/tmph2p9sytq/main.c:6:23: fatal error: stdatomic.h: No such file or
directory #include <stdatomic.h> ^ compilation terminated.
/tmp/tmpbvg8egds/main.c:6:23: fatal error: stdatomic.h: No such file or
directory #include <stdatomic.h> ^ compilation terminated.
/tmp/tmph4ckum7q/main.c:6:23: fatal error: stdatomic.h: No such file or
directory #include <stdatomic.h> ^ compilation terminated.
/tmp/tmpmj2mj6b6/main.c:6:23: fatal error: stdatomic.h: No such file or
directory #include <stdatomic.h> ^ compilation terminated.
/tmp/tmpowqgpc9/main.c:6:23: fatal error: stdatomic.h: No such file or
directory #include <stdatomic.h> ^ compilation terminated.
/tmp/tmp_vj7apqe/main.c:6:23: fatal error: stdatomic.h: No such file or
directory #include <stdatomic.h> ^ compilation terminated. /tmp/tmptthkhvx*/main.c:6:23:
fatal error: stdatomic.h: No such file or directory
#include <stdatomic.h>
^
compilation terminated.
/tmp/tmpjo5sie2e/main.c:6:23: fatal error: stdatomic.h: No such file or
directory
#include <stdatomic.h>
^
compilation terminated.
/tmp/tmpqvdup2d8/main.c:6:23: fatal error: stdatomic.h: No such file or
directory
#include <stdatomic.h>
^
compilation terminated.
/tmp/tmp8xzss5bc/main.c:6:23: fatal error: stdatomic.h: No such file or
directory
#include <stdatomic.h>
^
compilation terminated.
0%| | 0/125 [00:08<?, ?batch/s]
W1120 00:53:33.367000 139899853596480 torch/multiprocessing/spawn.py:146]
Terminating process 40535 via signal SIGTERM
W1120 00:53:33.367000 139899853596480 torch/multiprocessing/spawn.py:146]
Terminating process 47366 via signal SIGTERM
W1120 00:53:33.368000 139899853596480 torch/multiprocessing/spawn.py:146]
Terminating process 47503 via signal SIGTERM
Traceback (most recent call last):
File "/spshared/apps/miniconda3/envs/spisonet/bin/spisonet.py", line 8,
in
sys.exit(main())
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/spIsoNet/bin/spisonet.py", line 549, in main
fire.Fire(ISONET)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/fire/core.py",
line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/fire/core.py",
line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/fire/core.py",
line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/spIsoNet/bin/spisonet.py", line 182, in reconstruct
map_refine_n2n(halfmap1,halfmap2, mask_vol, fsc3d, alpha =
alpha,beta=beta, voxel_size=voxel_size, output_dir=output_dir,
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/spIsoNet/bin/map_refine.py", line 145, in map_refine_n2n
network.train([data_dir_1,data_dir_2], output_dir, alpha=alpha,beta=beta,
output_base=output_base0, batch_size=batch_size, epochs = epochs,
steps_per_epoch = 1000,
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/spIsoNet/models/network_n2n.py", line 265, in train
mp.spawn(ddp_train, args=(self.world_size, self.port_number,
self.model,alpha,beta,
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/multiprocessing/spawn.py", line 282, in spawn
return start_processes(fn, args, nprocs, join, daemon,
start_method="spawn")
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/multiprocessing/spawn.py", line 238, in
start_processes
while not context.join():
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/multiprocessing/spawn.py", line 189, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/multiprocessing/spawn.py", line 76, in _wrap
fn(i, *args)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/spIsoNet/models/network_n2n.py", line 116, in ddp_train
preds = model(x1)# + noise.cuda())
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/nn/modules/module.py", line 1553, in
_wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/eval_frame.py", line 433, in _fn
return fn(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/external_utils.py", line 38, in inner
return fn(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/nn/modules/module.py", line 1553, in
_wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/nn/parallel/distributed.py", line 1636, in forward
else self._run_ddp_forward(*inputs, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/nn/parallel/distributed.py", line 1454, in
_run_ddp_forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/nn/modules/module.py", line 1553, in
_wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/spIsoNet/models/unet.py", line 97, in forward
x, down_sampling_features = self.encoder(x)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/spIsoNet/models/unet.py", line 98, in
torch_dynamo_resume_in_forward_at_97
x = self.decoder(x, down_sampling_features)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/convert_frame.py", line 1110, in *call*
return hijacked_callback(
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/convert_frame.py", line 948, in *call*
result = self._inner_convert(
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/convert_frame.py", line 472, in *call*
return _compile(
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_utils_internal.py", line 84, in wrapper_function
return StrobelightCompileTimeProfiler.profile_compile_time(
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_strobelight/compile_time_profiler.py", line 129, in
profile_compile_time
return func(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/contextlib.py",
line 79, in inner
return func(*args, **kwds)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/convert_frame.py", line 817, in _compile
guarded_code = compile_inner(code, one_graph, hooks, transform)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/convert_frame.py", line 636, in compile_inner
out_code = transform_code_object(code, transform)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/bytecode_transformation.py", line 1185, in
transform_code_object
transformations(instructions, code_options)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/convert_frame.py", line 178, in _fn
return fn(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/convert_frame.py", line 582, in transform
tracer.run()
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/symbolic_convert.py", line 2451, in run
super().run()
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/symbolic_convert.py", line 893, in run
while self.step():
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/symbolic_convert.py", line 805, in step
self.dispatch_table[inst.opcode](self, inst)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/symbolic_convert.py", line 2642, in
RETURN_VALUE
self._return(inst)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/symbolic_convert.py", line 2627, in _return
self.output.compile_subgraph(
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/output_graph.py", line 1098, in
compile_subgraph
self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/contextlib.py",
line 79, in inner
return func(*args, **kwds)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/output_graph.py", line 1318, in
compile_and_call_fx_graph
compiled_fn = self.call_user_compiler(gm)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/output_graph.py", line 1409, in
call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/output_graph.py", line 1390, in
call_user_compiler
compiled_fn = compiler_fn(gm, self.example_inputs())
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/
*dynamo/backends/distributed.py", line 565, in compile_fn return
self.backend_compile_fn(gm, example_inputs) File
"/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/dynamo/repro/after_dynamo.py",
line 129, in call compiled_gm = compiler_fn(gm, example_inputs) File
"/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/init.py",
line 1951, in call return compile_fx(model, inputs*,
config_patches=self.config)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/contextlib.py",
line 79, in inner
return func(*args, **kwds)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_inductor/compile_fx.py", line 1505, in compile_fx
return aot_autograd(
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/backends/common.py", line 69, in *call*
cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_functorch/aot_autograd.py", line 954, in
aot_module_simplified
compiled_fn, _ = create_aot_dispatcher_function(
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_functorch/aot_autograd.py", line 687, in
create_aot_dispatcher_function
compiled_fn, fw_metadata = compiler_fn(
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py",
line 461, in aot_dispatch_autograd
compiled_fw_func = aot_config.fw_compiler(fw_module, adjusted_flat_args)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_inductor/compile_fx.py", line 1410, in
fw_compiler_base
return inner_compile(
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/repro/after_aot.py", line 84, in debug_wrapper
inner_compiled_fn = compiler_fn(gm, example_inputs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_inductor/debug.py", line 304, in inner
return fn(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/contextlib.py",
line 79, in inner
return func(*args, **kwds)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/contextlib.py",
line 79, in inner
return func(*args, **kwds)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_inductor/compile_fx.py", line 527, in
compile_fx_inner
compiled_graph = fx_codegen_and_compile(
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/contextlib.py",
line 79, in inner
return func(*args, **kwds)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_inductor/compile_fx.py", line 831, in
fx_codegen_and_compile
compiled_fn = graph.compile_to_fn()
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_inductor/graph.py", line 1751, in compile_to_fn
return self.compile_to_module().call
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_inductor/graph.py", line 1680, in compile_to_module
self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_inductor/graph.py", line 1640, in codegen
self.scheduler.codegen()
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_inductor/scheduler.py", line 2741, in codegen
self.get_backend(device).codegen_node(node)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_inductor/codegen/cuda_combined_scheduling.py", line
69, in codegen_node
return self._triton_scheduling.codegen_node(node)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_inductor/codegen/simd.py", line 1148, in codegen_node
return self.codegen_node_schedule(node_schedule, buf_accesses, numel,
rnumel)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_inductor/codegen/simd.py", line 1317, in
codegen_node_schedule
src_code = kernel.codegen_kernel()
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_inductor/codegen/triton.py", line 2159, in
codegen_kernel
**self.inductor_meta_common(),
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/_inductor/codegen/triton.py", line 2047, in
inductor_meta_common
"backend_hash": torch.utils._triton.triton_hash_with_backend(),
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/utils/_triton.py", line 63, in
triton_hash_with_backend
backend = triton_backend()
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/torch/utils/_triton.py", line 49, in triton_backend
target = driver.active.get_current_target()
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/triton/runtime/driver.py", line 23, in *getattr*
self._initialize_obj()
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/triton/runtime/driver.py", line 20, in _initialize_obj
self._obj = self._init_fn()
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/triton/runtime/driver.py", line 9, in _create_driver
return actives0
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/triton/backends/nvidia/driver.py", line 371, in *init*
self.utils = CudaUtils() # TODO: make static
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/triton/backends/nvidia/driver.py", line 80, in *init*
mod = compile_module_from_src(Path(os.path.join(dirname,
"driver.c")).read_text(), "cuda_utils")
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/
site-packages/triton/backends/nvidia/driver.py", line 57, in
compile_module_from_src
so =
*build(name, src_path, tmpdir, library_dirs(), include_dir, libraries)
File
"/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/triton/runtime/build.py",
line 48, in build ret = subprocess.check_call(cc_cmd) File
"/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/subprocess.py",
line 369, in check_call raise CalledProcessError(retcode, cmd)
torch.dynamo.exc.BackendCompilerFailed: backend='compile_fn' raised:
CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmptthkhvx/main.c',
'-O3', '-shared', '-fPIC', '-o',
'/tmp/tmptthkhvx/cuda_utils.cpython-310-x86_64-linux-gnu.so
<http://cuda_utils.cpython-310-x86_64-linux-gnu.so>', '-lcuda',
'-L/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/triton/backends/nvidia/lib',
'-L/lib64',
'-I/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/triton/backends/nvidia/include',
'-I/tmp/tmptthkhvx*', '-I/spshared/apps/miniconda3/
envs/spisonet/include/python3.10']' returned non-zero exit status 1.
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True
*Thanks for your help!*
—
Reply to this email directly, view it on GitHub
<#17>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEEHPLBBWFBLOWV4LX3ED232BR2TBAVCNFSM6AAAAABSENDI76VHI2DSMVQWIX3LMV43ASLTON2WKOZSGY3TKNRVGE2TGMA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
Yuntao Liu, Postdoc.
California NanoSystem Institute
University of California Los Angeles
|
Thanks for your reply! After commenting out these lines, I rerun this step. It seems that there is no change. The error information as followed: spisonet.py reconstruct emd_8731_half_map_1.mrc emd_8731_half_map_2.mrc --aniso_file FSC3D.mrc --mask emd_8731_msk_1.mrc --limit_res 3.5 --epochs 30 --alpha 1 --beta 0.5 --output_dir isonet_maps --gpuID 0,1,2,3 --acc_batches 2 -- Process 3 terminated with the following error: The above exception was the direct cause of the following exception: Traceback (most recent call last): Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information You can suppress this exception and fall back to eager by setting: Thanks for your help! |
I have same issue, is there any update? |
Hi I do not have too much insight on this and can not reproduce this error. It said stdatomic.h can not found. This file should be in the C standard library. There are two possibilities, your system have a outdated C library or your system have it but can not find. In my ubtuntu system, I do have stdatomic.h in /usr/lib/gcc/x86_64-linux-gnu/11/include, and I think when I do any C compiling with -std=c11, the stdatomic.h will be included. Similar error should be occur when installing pytorch I do not quite know but maybe install a new compiler will help? |
The data and command are the same as in the tutorial, but when running Anisotropy Correction of half-maps, it reports the following problem:
spisonet.py reconstruct emd_8731_half_map_1.mrc emd_8731_half_map_2.mrc --aniso_file FSC3D.mrc --mask emd_8731_msk_1.mrc --limit_res 3.5 --epochs 30 --alpha 1 --beta 0.5 --output_dir isonet_maps --gpuID 0,1,2,3 --acc_batches 2
11-20 00:52:42, INFO voxel_size 1.309999942779541
11-20 00:52:43, INFO spIsoNet correction until resolution 3.5A!
Information beyond 3.5A remains unchanged
11-20 00:52:57, INFO Start preparing subvolumes!
11-20 00:53:06, INFO Done preparing subvolumes!
11-20 00:53:06, INFO Start training!
11-20 00:53:09, INFO Port number: 51405
learning rate 0.0003
['isonet_maps/emd_8731_half_map_1_data', 'isonet_maps/emd_8731_half_map_2_data']
0%| | 0/125 [00:00<?, ?batch/s][rank1]:W1120 00:53:26.205000 139648681686848 torch/_logging/_internal.py:1034] [0/0] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
[rank3]:W1120 00:53:26.225000 140587963545408 torch/_logging/_internal.py:1034] [0/0] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
[rank0]:W1120 00:53:26.263000 140600396216128 torch/_logging/_internal.py:1034] [0/0] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
[rank2]:W1120 00:53:26.357000 139692869912384 torch/logging/internal.py:1034] [0/0] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
/tmp/tmpb9ffwjno/main.c:6:23: fatal error: stdatomic.h: No such file or directory
#include <stdatomic.h>
^
compilation terminated.
/tmp/tmpl5yntb4i/main.c:6:23: fatal error: stdatomic.h: No such file or directory
#include <stdatomic.h>
^
compilation terminated.
/tmp/tmph2p9sytq/main.c:6:23: fatal error: stdatomic.h: No such file or directory
#include <stdatomic.h>
^
compilation terminated.
/tmp/tmpbvg8egds/main.c:6:23: fatal error: stdatomic.h: No such file or directory
#include <stdatomic.h>
^
compilation terminated.
/tmp/tmph4ckum7q/main.c:6:23: fatal error: stdatomic.h: No such file or directory
#include <stdatomic.h>
^
compilation terminated.
/tmp/tmpmj2mj6b6/main.c:6:23: fatal error: stdatomic.h: No such file or directory
#include <stdatomic.h>
^
compilation terminated.
/tmp/tmpowqgpc9/main.c:6:23: fatal error: stdatomic.h: No such file or directory
#include <stdatomic.h>
^
compilation terminated.
/tmp/tmp_vj7apqe/main.c:6:23: fatal error: stdatomic.h: No such file or directory
#include <stdatomic.h>
^
compilation terminated.
/tmp/tmptthkhvx/main.c:6:23: fatal error: stdatomic.h: No such file or directory
#include <stdatomic.h>
^
compilation terminated.
/tmp/tmpjo5sie2e/main.c:6:23: fatal error: stdatomic.h: No such file or directory
#include <stdatomic.h>
^
compilation terminated.
/tmp/tmpqvdup2d8/main.c:6:23: fatal error: stdatomic.h: No such file or directory
#include <stdatomic.h>
^
compilation terminated.
/tmp/tmp8xzss5bc/main.c:6:23: fatal error: stdatomic.h: No such file or directory
#include <stdatomic.h>
^
compilation terminated.
0%| | 0/125 [00:08<?, ?batch/s]
W1120 00:53:33.367000 139899853596480 torch/multiprocessing/spawn.py:146] Terminating process 40535 via signal SIGTERM
W1120 00:53:33.367000 139899853596480 torch/multiprocessing/spawn.py:146] Terminating process 47366 via signal SIGTERM
W1120 00:53:33.368000 139899853596480 torch/multiprocessing/spawn.py:146] Terminating process 47503 via signal SIGTERM
Traceback (most recent call last):
File "/spshared/apps/miniconda3/envs/spisonet/bin/spisonet.py", line 8, in
sys.exit(main())
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/spIsoNet/bin/spisonet.py", line 549, in main
fire.Fire(ISONET)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/spIsoNet/bin/spisonet.py", line 182, in reconstruct
map_refine_n2n(halfmap1,halfmap2, mask_vol, fsc3d, alpha = alpha,beta=beta, voxel_size=voxel_size, output_dir=output_dir,
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/spIsoNet/bin/map_refine.py", line 145, in map_refine_n2n
network.train([data_dir_1,data_dir_2], output_dir, alpha=alpha,beta=beta, output_base=output_base0, batch_size=batch_size, epochs = epochs, steps_per_epoch = 1000,
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/spIsoNet/models/network_n2n.py", line 265, in train
mp.spawn(ddp_train, args=(self.world_size, self.port_number, self.model,alpha,beta,
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 282, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 238, in start_processes
while not context.join():
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 189, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 76, in _wrap
fn(i, *args)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/spIsoNet/models/network_n2n.py", line 116, in ddp_train
preds = model(x1)# + noise.cuda())
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 433, in _fn
return fn(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 38, in inner
return fn(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1636, in forward
else self._run_ddp_forward(*inputs, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1454, in _run_ddp_forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/spIsoNet/models/unet.py", line 97, in forward
x, down_sampling_features = self.encoder(x)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/spIsoNet/models/unet.py", line 98, in torch_dynamo_resume_in_forward_at_97
x = self.decoder(x, down_sampling_features)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1110, in call
return hijacked_callback(
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 948, in call
result = self._inner_convert(
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 472, in call
return _compile(
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_utils_internal.py", line 84, in wrapper_function
return StrobelightCompileTimeProfiler.profile_compile_time(
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_strobelight/compile_time_profiler.py", line 129, in profile_compile_time
return func(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 817, in _compile
guarded_code = compile_inner(code, one_graph, hooks, transform)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 636, in compile_inner
out_code = transform_code_object(code, transform)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1185, in transform_code_object
transformations(instructions, code_options)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 178, in _fn
return fn(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 582, in transform
tracer.run()
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2451, in run
super().run()
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 893, in run
while self.step():
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 805, in step
self.dispatch_table[inst.opcode](self, inst)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2642, in RETURN_VALUE
self._return(inst)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2627, in _return
self.output.compile_subgraph(
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1098, in compile_subgraph
self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1318, in compile_and_call_fx_graph
compiled_fn = self.call_user_compiler(gm)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1409, in call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1390, in call_user_compiler
compiled_fn = compiler_fn(gm, self.example_inputs())
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/dynamo/backends/distributed.py", line 565, in compile_fn
return self.backend_compile_fn(gm, example_inputs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/dynamo/repro/after_dynamo.py", line 129, in call
compiled_gm = compiler_fn(gm, example_inputs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/init.py", line 1951, in call
return compile_fx(model, inputs, config_patches=self.config)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1505, in compile_fx
return aot_autograd(
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/backends/common.py", line 69, in call
cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 954, in aot_module_simplified
compiled_fn, _ = create_aot_dispatcher_function(
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 687, in create_aot_dispatcher_function
compiled_fn, fw_metadata = compiler_fn(
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 461, in aot_dispatch_autograd
compiled_fw_func = aot_config.fw_compiler(fw_module, adjusted_flat_args)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1410, in fw_compiler_base
return inner_compile(
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 84, in debug_wrapper
inner_compiled_fn = compiler_fn(gm, example_inputs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/debug.py", line 304, in inner
return fn(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 527, in compile_fx_inner
compiled_graph = fx_codegen_and_compile(
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 831, in fx_codegen_and_compile
compiled_fn = graph.compile_to_fn()
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1751, in compile_to_fn
return self.compile_to_module().call
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1680, in compile_to_module
self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1640, in codegen
self.scheduler.codegen()
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, **kwargs)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/scheduler.py", line 2741, in codegen
self.get_backend(device).codegen_node(node)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/codegen/cuda_combined_scheduling.py", line 69, in codegen_node
return self._triton_scheduling.codegen_node(node)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/codegen/simd.py", line 1148, in codegen_node
return self.codegen_node_schedule(node_schedule, buf_accesses, numel, rnumel)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/codegen/simd.py", line 1317, in codegen_node_schedule
src_code = kernel.codegen_kernel()
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/codegen/triton.py", line 2159, in codegen_kernel
**self.inductor_meta_common(),
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/_inductor/codegen/triton.py", line 2047, in inductor_meta_common
"backend_hash": torch.utils._triton.triton_hash_with_backend(),
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/utils/_triton.py", line 63, in triton_hash_with_backend
backend = triton_backend()
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/torch/utils/_triton.py", line 49, in triton_backend
target = driver.active.get_current_target()
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/triton/runtime/driver.py", line 23, in getattr
self._initialize_obj()
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/triton/runtime/driver.py", line 20, in _initialize_obj
self._obj = self._init_fn()
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/triton/runtime/driver.py", line 9, in _create_driver
return actives0
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 371, in init
self.utils = CudaUtils() # TODO: make static
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 80, in init
mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils")
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 57, in compile_module_from_src
so = build(name, src_path, tmpdir, library_dirs(), include_dir, libraries)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/triton/runtime/build.py", line 48, in build
ret = subprocess.check_call(cc_cmd)
File "/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/subprocess.py", line 369, in check_call
raise CalledProcessError(retcode, cmd)
torch.dynamo.exc.BackendCompilerFailed: backend='compile_fn' raised:
CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmptthkhvx/main.c', '-O3', '-shared', '-fPIC', '-o', '/tmp/tmptthkhvx/cuda_utils.cpython-310-x86_64-linux-gnu.so', '-lcuda', '-L/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/triton/backends/nvidia/lib', '-L/lib64', '-I/spshared/apps/miniconda3/envs/spisonet/lib/python3.10/site-packages/triton/backends/nvidia/include', '-I/tmp/tmptthkhvx', '-I/spshared/apps/miniconda3/envs/spisonet/include/python3.10']' returned non-zero exit status 1.
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True
Thanks for your help!
The text was updated successfully, but these errors were encountered: