Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 8.9 #953

Open
rafaeltiveron opened this issue Jun 26, 2024 · 7 comments

Comments

@rafaeltiveron
Copy link

rafaeltiveron commented Jun 26, 2024

Problems between tensorflow, cuda, jaxlib... I don't know how to solve... I'm using CUDA 12.3.0 and nvidia/cuda:12.3-base-ubuntu20.04

I0626 18:28:37.506076 138088085884928 run_docker.py:116] Mounting /mnt/sdb1/backup/alphafold_databases/uniref90 -> /mnt/sdb1/backup/alphafold_mountpoint/uniref90_database_path
I0626 18:28:37.506205 138088085884928 run_docker.py:116] Mounting /mnt/sdb1/backup/alphafold_databases/mgnify -> /mnt/sdb1/backup/alphafold_mountpoint/mgnify_database_path
I0626 18:28:37.506297 138088085884928 run_docker.py:116] Mounting /mnt/sdb1/backup/alphafold_databases -> /mnt/sdb1/backup/alphafold_mountpoint/data_dir
I0626 18:28:37.506383 138088085884928 run_docker.py:116] Mounting /mnt/sdb1/backup/alphafold_databases/pdb_mmcif/mmcif_files -> /mnt/sdb1/backup/alphafold_mountpoint/template_mmcif_dir
I0626 18:28:37.506479 138088085884928 run_docker.py:116] Mounting /mnt/sdb1/backup/alphafold_databases/pdb_mmcif -> /mnt/sdb1/backup/alphafold_mountpoint/obsolete_pdbs_path
I0626 18:28:37.506588 138088085884928 run_docker.py:116] Mounting /mnt/sdb1/backup/alphafold_databases/pdb70 -> /mnt/sdb1/backup/alphafold_mountpoint/pdb70_database_path
I0626 18:28:37.506686 138088085884928 run_docker.py:116] Mounting /mnt/sdb1/backup/alphafold_databases/small_bfd -> /mnt/sdb1/backup/alphafold_mountpoint/small_bfd_database_path
I0626 18:28:42.291707 138088085884928 run_docker.py:258] I0626 21:28:42.290248 133795469678400 templates.py:858] Using precomputed obsolete pdbs /mnt/sdb1/backup/alphafold_mountpoint/obsolete_pdbs_path/obsolete.dat.
I0626 18:28:44.780305 138088085884928 run_docker.py:258] I0626 21:28:44.779594 133795469678400 xla_bridge.py:353] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker:
I0626 18:28:44.968280 138088085884928 run_docker.py:258] I0626 21:28:44.967394 133795469678400 xla_bridge.py:353] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter Host CUDA
I0626 18:28:44.968665 138088085884928 run_docker.py:258] I0626 21:28:44.968128 133795469678400 xla_bridge.py:353] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
I0626 18:28:44.968835 138088085884928 run_docker.py:258] I0626 21:28:44.968231 133795469678400 xla_bridge.py:353] Unable to initialize backend 'plugin': xla_extension has no attributes named get_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults to false) to enable this.
I0626 18:28:58.638173 138088085884928 run_docker.py:258] I0626 21:28:58.637206 133795469678400 run_alphafold.py:524] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0']
I0626 18:28:58.638509 138088085884928 run_docker.py:258] I0626 21:28:58.637522 133795469678400 run_alphafold.py:538] Using random seed 703549940514398884 for the data pipeline
I0626 18:28:58.638677 138088085884928 run_docker.py:258] I0626 21:28:58.638001 133795469678400 run_alphafold.py:245] Predicting teste
I0626 18:28:58.639462 138088085884928 run_docker.py:258] I0626 21:28:58.638904 133795469678400 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpuc2_2bha/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/sdb1/backup/alphafold_mountpoint/fasta_path_0/teste.faa /mnt/sdb1/backup/alphafold_mountpoint/uniref90_database_path/uniref90.fasta"
I0626 18:28:58.640388 138088085884928 run_docker.py:258] I0626 21:28:58.639911 133795469678400 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I0626 18:39:55.007368 138088085884928 run_docker.py:258] I0626 21:39:55.006375 133795469678400 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 656.366 seconds
I0626 18:40:01.191493 138088085884928 run_docker.py:258] I0626 21:40:01.190375 133795469678400 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpllsuut3p/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/sdb1/backup/alphafold_mountpoint/fasta_path_0/teste.faa /mnt/sdb1/backup/alphafold_mountpoint/mgnify_database_path/mgy_clusters_2022_05.fa"
I0626 18:40:01.192113 138088085884928 run_docker.py:258] I0626 21:40:01.191634 133795469678400 utils.py:36] Started Jackhmmer (mgy_clusters_2022_05.fa) query
I0626 18:56:02.226763 138088085884928 run_docker.py:258] I0626 21:56:02.225653 133795469678400 utils.py:40] Finished Jackhmmer (mgy_clusters_2022_05.fa) query in 961.034 seconds
I0626 18:56:10.551476 138088085884928 run_docker.py:258] I0626 21:56:10.550518 133795469678400 hhsearch.py:85] Launching subprocess "/usr/bin/hhsearch -i /tmp/tmpcedwwkll/query.a3m -o /tmp/tmpcedwwkll/output.hhr -maxseq 1000000 -d /mnt/sdb1/backup/alphafold_mountpoint/pdb70_database_path/pdb70"
I0626 18:56:10.552288 138088085884928 run_docker.py:258] I0626 21:56:10.551964 133795469678400 utils.py:36] Started HHsearch query
I0626 18:57:57.670601 138088085884928 run_docker.py:258] I0626 21:57:57.669543 133795469678400 utils.py:40] Finished HHsearch query in 107.117 seconds
I0626 18:58:02.625957 138088085884928 run_docker.py:258] I0626 21:58:02.625021 133795469678400 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpt6dq4gsr/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/sdb1/backup/alphafold_mountpoint/fasta_path_0/teste.faa /mnt/sdb1/backup/alphafold_mountpoint/small_bfd_database_path/bfd-first_non_consensus_sequences.fasta"
I0626 18:58:02.626496 138088085884928 run_docker.py:258] I0626 21:58:02.626078 133795469678400 utils.py:36] Started Jackhmmer (bfd-first_non_consensus_sequences.fasta) query
I0626 19:00:15.475680 138088085884928 run_docker.py:258] I0626 22:00:15.474710 133795469678400 utils.py:40] Finished Jackhmmer (bfd-first_non_consensus_sequences.fasta) query in 132.849 seconds
I0626 19:00:17.375041 138088085884928 run_docker.py:258] I0626 22:00:17.373987 133795469678400 templates.py:879] Searching for template for: MDGQNPKPLPLTSLLLPVSIAVDLVARKLVWADATRGTIESLDLVTVFTGTPFIVQQIKSHVSSVSAAHNEIHWTSRNNSSLECIDISAEPHVRRSVLLPTGRNGSYSRRVLIASQVPDFEPGPCGRNNGGCSHTCLPVRTTERACFCPPGMALGTNNITCRVENGTCRPHELPCAGGCIAATYWCDGHKDCADNADEATCGTTCPPKDFTCQNGKCIDTAWRCDGYDDCDDQSDEANCPYRTCASDQFNCKSGACLPFYWRCDGANDCPDGDDELDCRTLRCPNGHDRCANGQCIPRDWSCDGHADCSDSSDETNCTETVSCFEDDFHCANGQCIDKRLRCDHDEDCEDNSDESGCDYAPANNSKCVKGMVGCGDGQCVYTHDMCDGYADCHNGWDEHNCSAPVCQSAEFFCPGTKRCILQSWLCDGDDDCGDAMDELLARCRPTTLPPPTDAPCWSDQFQCGSHECIAWSSVCDGRSDCADFSDEGSHCEHHCATANGGCAHICRESPSGPQCSCRPGYRLNNDRKSCDDIDECLSPGHCSHFCQNSKGGYK
I0626 19:00:17.948528 138088085884928 run_docker.py:258] I0626 22:00:17.947738 133795469678400 templates.py:267] Found an exact template match 3m0c_C.
I0626 19:00:18.206189 138088085884928 run_docker.py:258] I0626 22:00:18.205397 133795469678400 templates.py:267] Found an exact template match 1n7d_A.
I0626 19:00:18.226388 138088085884928 run_docker.py:258] I0626 22:00:18.225925 133795469678400 templates.py:267] Found an exact template match 3m0c_C.
I0626 19:00:18.244172 138088085884928 run_docker.py:258] I0626 22:00:18.243780 133795469678400 templates.py:267] Found an exact template match 1n7d_A.
I0626 19:00:18.264382 138088085884928 run_docker.py:258] I0626 22:00:18.264040 133795469678400 templates.py:267] Found an exact template match 3m0c_C.
I0626 19:00:18.280781 138088085884928 run_docker.py:258] I0626 22:00:18.280326 133795469678400 templates.py:267] Found an exact template match 3m0c_C.
I0626 19:00:18.296232 138088085884928 run_docker.py:258] I0626 22:00:18.295779 133795469678400 templates.py:913] Skipped invalid hit 3M0C_C Proprotein convertase subtilisin/kexin type 9; PROTEIN COMPLEX, BETA PROPELLER, CHOLESTEROL; 7.01A {Homo sapiens}, error: None, warning: 3m0c_C (sum_probs: 150.4, rank: 5): feature extracting errors: Template all atom mask was all zeros: 3m0c_C. Residue range: 21-225, mmCIF parsing errors: {}
I0626 19:00:18.296375 138088085884928 run_docker.py:258] I0626 22:00:18.296076 133795469678400 templates.py:267] Found an exact template match 1n7d_A.
I0626 19:00:18.915404 138088085884928 run_docker.py:258] I0626 22:00:18.914566 133795469678400 templates.py:267] Found an exact template match 4a0p_A.
I0626 19:00:19.795932 138088085884928 run_docker.py:258] I0626 22:00:19.795142 133795469678400 templates.py:267] Found an exact template match 6h15_B.
I0626 19:00:20.890163 138088085884928 run_docker.py:258] I0626 22:00:20.889332 133795469678400 templates.py:267] Found an exact template match 5b4x_D.
I0626 19:00:21.271526 138088085884928 run_docker.py:258] I0626 22:00:21.270835 133795469678400 templates.py:267] Found an exact template match 3p5c_L.
I0626 19:00:21.617020 138088085884928 run_docker.py:258] I0626 22:00:21.616351 133795469678400 templates.py:267] Found an exact template match 3p5b_L.
I0626 19:00:21.630503 138088085884928 run_docker.py:258] I0626 22:00:21.630137 133795469678400 templates.py:267] Found an exact template match 4a0p_A.
I0626 19:00:21.653079 138088085884928 run_docker.py:258] I0626 22:00:21.652633 133795469678400 templates.py:267] Found an exact template match 6h15_B.
I0626 19:00:21.919118 138088085884928 run_docker.py:258] I0626 22:00:21.918352 133795469678400 templates.py:267] Found an exact template match 4dg6_A.
I0626 19:00:25.451145 138088085884928 run_docker.py:258] I0626 22:00:25.450296 133795469678400 templates.py:267] Found an exact template match 5o32_H.
I0626 19:00:25.463952 138088085884928 run_docker.py:258] I0626 22:00:25.463468 133795469678400 templates.py:267] Found an exact template match 4dg6_A.
I0626 19:00:26.142724 138088085884928 run_docker.py:258] I0626 22:00:26.141979 133795469678400 templates.py:267] Found an exact template match 3s94_A.
I0626 19:00:26.607441 138088085884928 run_docker.py:258] I0626 22:00:26.606480 133795469678400 templates.py:267] Found an exact template match 3sob_B.
I0626 19:00:26.864853 138088085884928 run_docker.py:258] I0626 22:00:26.864096 133795469678400 templates.py:267] Found an exact template match 3sov_A.
I0626 19:00:26.876659 138088085884928 run_docker.py:258] I0626 22:00:26.876217 133795469678400 templates.py:267] Found an exact template match 3s94_A.
I0626 19:00:28.133055 138088085884928 run_docker.py:258] I0626 22:00:28.132189 133795469678400 pipeline.py:234] Uniref90 MSA size: 10000 sequences.
I0626 19:00:28.133450 138088085884928 run_docker.py:258] I0626 22:00:28.132347 133795469678400 pipeline.py:235] BFD MSA size: 5659 sequences.
I0626 19:00:28.133652 138088085884928 run_docker.py:258] I0626 22:00:28.132400 133795469678400 pipeline.py:236] MGnify MSA size: 501 sequences.
I0626 19:00:28.133876 138088085884928 run_docker.py:258] I0626 22:00:28.132448 133795469678400 pipeline.py:237] Final (deduplicated) MSA size: 13266 sequences.
I0626 19:00:28.134085 138088085884928 run_docker.py:258] I0626 22:00:28.132694 133795469678400 pipeline.py:239] Total number of templates (NB: this can include bad templates and is later filtered to top 4): 20.
I0626 19:00:28.249109 138088085884928 run_docker.py:258] I0626 22:00:28.248342 133795469678400 run_alphafold.py:276] Running model model_1_pred_0 on teste
I0626 19:00:33.687656 138088085884928 run_docker.py:258] 2024-06-26 22:00:33.683810: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 646744032 exceeds 10% of free system memory.
I0626 19:00:33.700572 138088085884928 run_docker.py:258] 2024-06-26 22:00:33.699746: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 650249744 exceeds 10% of free system memory.
I0626 19:00:34.524705 138088085884928 run_docker.py:258] 2024-06-26 22:00:34.522261: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 650249744 exceeds 10% of free system memory.
I0626 19:00:34.999829 138088085884928 run_docker.py:258] 2024-06-26 22:00:34.996149: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 650249744 exceeds 10% of free system memory.
I0626 19:00:35.482020 138088085884928 run_docker.py:258] 2024-06-26 22:00:35.480341: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 650249744 exceeds 10% of free system memory.
I0626 19:00:35.981338 138088085884928 run_docker.py:258] I0626 22:00:35.980508 133795469678400 model.py:165] Running predict with shape(feat) = {'aatype': (4, 554), 'residue_index': (4, 554), 'seq_length': (4,), 'template_aatype': (4, 4, 554), 'template_all_atom_masks': (4, 4, 554, 37), 'template_all_atom_positions': (4, 4, 554, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 554), 'msa_mask': (4, 508, 554), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 554, 3), 'template_pseudo_beta_mask': (4, 4, 554), 'atom14_atom_exists': (4, 554, 14), 'residx_atom14_to_atom37': (4, 554, 14), 'residx_atom37_to_atom14': (4, 554, 37), 'atom37_atom_exists': (4, 554, 37), 'extra_msa': (4, 5120, 554), 'extra_msa_mask': (4, 5120, 554), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 554), 'true_msa': (4, 508, 554), 'extra_has_deletion': (4, 5120, 554), 'extra_deletion_value': (4, 5120, 554), 'msa_feat': (4, 508, 554, 49), 'target_feat': (4, 554, 22)}
I0626 19:00:36.020103 138088085884928 run_docker.py:258] 2024-06-26 22:00:36.019399: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:231] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 8.9
I0626 19:00:36.020381 138088085884928 run_docker.py:258] 2024-06-26 22:00:36.019477: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:234] Used ptxas at ptxas
I0626 19:00:36.042867 138088085884928 run_docker.py:258] 2024-06-26 22:00:36.042181: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:628] failed to get PTX kernel "shift_right_logical" from module: CUDA_ERROR_NOT_FOUND: named symbol not found
I0626 19:00:36.042988 138088085884928 run_docker.py:258] 2024-06-26 22:00:36.042287: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: INTERNAL: Could not find the corresponding function
I0626 19:00:36.043369 138088085884928 run_docker.py:258] Traceback (most recent call last):
I0626 19:00:36.043655 138088085884928 run_docker.py:258] File "/app/alphafold/run_alphafold.py", line 570, in <module>
I0626 19:00:36.043834 138088085884928 run_docker.py:258] app.run(main)
I0626 19:00:36.043925 138088085884928 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/absl/app.py", line 312, in run
I0626 19:00:36.044022 138088085884928 run_docker.py:258] _run_main(main, args)
I0626 19:00:36.044106 138088085884928 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/absl/app.py", line 258, in _run_main
I0626 19:00:36.044203 138088085884928 run_docker.py:258] sys.exit(main(argv))
I0626 19:00:36.044284 138088085884928 run_docker.py:258] File "/app/alphafold/run_alphafold.py", line 543, in main
I0626 19:00:36.044366 138088085884928 run_docker.py:258] predict_structure(
I0626 19:00:36.044445 138088085884928 run_docker.py:258] File "/app/alphafold/run_alphafold.py", line 284, in predict_structure
I0626 19:00:36.044534 138088085884928 run_docker.py:258] prediction_result = model_runner.predict(processed_feature_dict,
I0626 19:00:36.044613 138088085884928 run_docker.py:258] File "/app/alphafold/alphafold/model/model.py", line 167, in predict
I0626 19:00:36.044701 138088085884928 run_docker.py:258] result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat)
I0626 19:00:36.044782 138088085884928 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/random.py", line 132, in PRNGKey
I0626 19:00:36.044874 138088085884928 run_docker.py:258] key = prng.seed_with_impl(impl, seed)
I0626 19:00:36.044963 138088085884928 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 267, in seed_with_impl
I0626 19:00:36.045042 138088085884928 run_docker.py:258] return random_seed(seed, impl=impl)
I0626 19:00:36.045121 138088085884928 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 580, in random_seed
I0626 19:00:36.045211 138088085884928 run_docker.py:258] return random_seed_p.bind(seeds_arr, impl=impl)
I0626 19:00:36.045290 138088085884928 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/core.py", line 329, in bind
I0626 19:00:36.045368 138088085884928 run_docker.py:258] return self.bind_with_trace(find_top_trace(args), args, params)
I0626 19:00:36.045447 138088085884928 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/core.py", line 332, in bind_with_trace
I0626 19:00:36.045540 138088085884928 run_docker.py:258] out = trace.process_primitive(self, map(trace.full_raise, args), params)
I0626 19:00:36.045619 138088085884928 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/core.py", line 712, in process_primitive
I0626 19:00:36.045696 138088085884928 run_docker.py:258] return primitive.impl(*tracers, **params)
I0626 19:00:36.045773 138088085884928 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 592, in random_seed_impl
I0626 19:00:36.045865 138088085884928 run_docker.py:258] base_arr = random_seed_impl_base(seeds, impl=impl)
I0626 19:00:36.045945 138088085884928 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 597, in random_seed_impl_base
I0626 19:00:36.046033 138088085884928 run_docker.py:258] return seed(seeds)
I0626 19:00:36.046113 138088085884928 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 832, in threefry_seed
I0626 19:00:36.046206 138088085884928 run_docker.py:258] lax.shift_right_logical(seed, lax_internal._const(seed, 32)))
I0626 19:00:36.046286 138088085884928 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/lax/lax.py", line 515, in shift_right_logical
I0626 19:00:36.046365 138088085884928 run_docker.py:258] return shift_right_logical_p.bind(x, y)
I0626 19:00:36.046445 138088085884928 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/core.py", line 329, in bind
I0626 19:00:36.046524 138088085884928 run_docker.py:258] return self.bind_with_trace(find_top_trace(args), args, params)
I0626 19:00:36.046602 138088085884928 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/core.py", line 332, in bind_with_trace
I0626 19:00:36.046698 138088085884928 run_docker.py:258] out = trace.process_primitive(self, map(trace.full_raise, args), params)
I0626 19:00:36.046777 138088085884928 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/core.py", line 712, in process_primitive
I0626 19:00:36.046856 138088085884928 run_docker.py:258] return primitive.impl(*tracers, **params)
I0626 19:00:36.046934 138088085884928 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/dispatch.py", line 115, in apply_primitive
I0626 19:00:36.047012 138088085884928 run_docker.py:258] return compiled_fun(*args)
I0626 19:00:36.047091 138088085884928 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/dispatch.py", line 200, in <lambda>
I0626 19:00:36.047170 138088085884928 run_docker.py:258] return lambda *args, **kw: compiled(*args, **kw)[0]
I0626 19:00:36.047248 138088085884928 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/dispatch.py", line 895, in _execute_compiled
I0626 19:00:36.047358 138088085884928 run_docker.py:258] out_flat = compiled.execute(in_flat)
I0626 19:00:36.047441 138088085884928 run_docker.py:258] jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Could not find the corresponding function
@ocstx
Copy link

ocstx commented Jul 3, 2024

Just found the same problem in a computer with AF 2.2.4(RTX4090), but another with 2.2.2 (RTX 3080) works and another with 2.3.2 (GTX 1080) also works
they all have the same OS (Alma linux 8.10 fully updated and with exactly the same installations)

@rafaeltiveron
Copy link
Author

rafaeltiveron commented Jul 3, 2024

My here is RTX 4060, Linux Ubuntu 20.04.

@rafaeltiveron
Copy link
Author

rafaeltiveron commented Jul 4, 2024

Update: Problem compatiblity in ptxas library. Using version for cuda 11.0.221 is not compatible anymore. See:

I0704 11:05:23.537240 130401123889152 run_docker.py:258] I0704 14:05:23.536448 130748471080768 model.py:165] Running predict with shape(feat) = {'aatype': (4, 554), 'residue_index': (4, 554), 'seq_length': (4,), 'template_aatype': (4, 4, 554), 'template_all_atom_masks': (4, 4, 554, 37), 'template_all_atom_positions': (4, 4, 554, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 554), 'msa_mask': (4, 508, 554), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 554, 3), 'template_pseudo_beta_mask': (4, 4, 554), 'atom14_atom_exists': (4, 554, 14), 'residx_atom14_to_atom37': (4, 554, 14), 'residx_atom37_to_atom14': (4, 554, 37), 'atom37_atom_exists': (4, 554, 37), 'extra_msa': (4, 5120, 554), 'extra_msa_mask': (4, 5120, 554), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 554), 'true_msa': (4, 508, 554), 'extra_has_deletion': (4, 5120, 554), 'extra_deletion_value': (4, 5120, 554), 'msa_feat': (4, 508, 554, 49), 'target_feat': (4, 554, 22)}
I0704 11:05:23.569453 130401123889152 run_docker.py:258] 2024-07-04 14:05:23.569057: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:114] *** WARNING *** You are using ptxas 11.0.221, which is older than 11.1. ptxas before 11.1 is known to miscompile XLA code, leading to incorrect results or invalid-address errors.
I0704 11:15:40.483330 133848093241344 run_docker.py:258] 
I0704 11:15:40.485312 133848093241344 run_docker.py:258] 2024-07-04 14:15:40.484941: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:231] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 8.9

In first field of this forum, I was using a newer version. Here, just a new try with an older one, for nvidia/cuda:11.0.3-base-ubuntu20.04.

@rafaeltiveron
Copy link
Author

rafaeltiveron commented Jul 4, 2024

Update: now, trying to change jax=0.4.14 and jaxlib=0.4.14+cuda11.cudnn86 in docker/Dockerfile. Using nvidia/cuda:11.8.0-base-ubuntu20.04. Messages changed:

I0704 12:49:28.978974 135516872704000 run_docker.py:258] I0704 15:49:28.978063 127455589734208 model.py:165] Running predict with shape(feat) = {'aatype': (4, 554), 'residue_index': (4, 554), 'seq_length': (4,), 'template_aatype': (4, 4, 554), 'template_all_atom_masks': (4, 4, 554, 37), 'template_all_atom_positions': (4, 4, 554, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 554), 'msa_mask': (4, 508, 554), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 554, 3), 'template_pseudo_beta_mask': (4, 4, 554), 'atom14_atom_exists': (4, 554, 14), 'residx_atom14_to_atom37': (4, 554, 14), 'residx_atom37_to_atom14': (4, 554, 37), 'atom37_atom_exists': (4, 554, 37), 'extra_msa': (4, 5120, 554), 'extra_msa_mask': (4, 5120, 554), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 554), 'true_msa': (4, 508, 554), 'extra_has_deletion': (4, 5120, 554), 'extra_deletion_value': (4, 5120, 554), 'msa_feat': (4, 508, 554, 49), 'target_feat': (4, 554, 22)}
I0704 12:49:29.001860 135516872704000 run_docker.py:258] 2024-07-04 15:49:29.001439: E external/xla/xla/stream_executor/cuda/cuda_dnn.cc:445] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
I0704 12:49:29.001980 135516872704000 run_docker.py:258] 2024-07-04 15:49:29.001503: E external/xla/xla/stream_executor/cuda/cuda_dnn.cc:449] Memory usage: 5668274176 bytes free, 8223653888 bytes total.
I0704 12:49:29.015192 135516872704000 run_docker.py:258] Traceback (most recent call last):
I0704 12:49:29.015326 135516872704000 run_docker.py:258] File "/app/alphafold/run_alphafold.py", line 570, in <module>
I0704 12:49:29.015546 135516872704000 run_docker.py:258] app.run(main)
I0704 12:49:29.015712 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/absl/app.py", line 312, in run
I0704 12:49:29.015855 135516872704000 run_docker.py:258] _run_main(main, args)
I0704 12:49:29.015998 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/absl/app.py", line 258, in _run_main
I0704 12:49:29.016181 135516872704000 run_docker.py:258] sys.exit(main(argv))
I0704 12:49:29.016333 135516872704000 run_docker.py:258] File "/app/alphafold/run_alphafold.py", line 543, in main
I0704 12:49:29.016473 135516872704000 run_docker.py:258] predict_structure(
I0704 12:49:29.016618 135516872704000 run_docker.py:258] File "/app/alphafold/run_alphafold.py", line 284, in predict_structure
I0704 12:49:29.016754 135516872704000 run_docker.py:258] prediction_result = model_runner.predict(processed_feature_dict,
I0704 12:49:29.016890 135516872704000 run_docker.py:258] File "/app/alphafold/alphafold/model/model.py", line 167, in predict
I0704 12:49:29.017024 135516872704000 run_docker.py:258] result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat)
I0704 12:49:29.017159 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/random.py", line 177, in PRNGKey
I0704 12:49:29.017295 135516872704000 run_docker.py:258] return _return_prng_keys(True, _key('PRNGKey', seed, impl))
I0704 12:49:29.017437 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/random.py", line 139, in _key
I0704 12:49:29.017572 135516872704000 run_docker.py:258] return prng.seed_with_impl(impl, seed)
I0704 12:49:29.017708 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 406, in seed_with_impl
I0704 12:49:29.017842 135516872704000 run_docker.py:258] return random_seed(seed, impl=impl)
I0704 12:49:29.017977 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 689, in random_seed
I0704 12:49:29.018167 135516872704000 run_docker.py:258] return random_seed_p.bind(seeds_arr, impl=impl)
I0704 12:49:29.018305 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 701, in random_seed_impl
I0704 12:49:29.018439 135516872704000 run_docker.py:258] base_arr = random_seed_impl_base(seeds, impl=impl)
I0704 12:49:29.018572 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 706, in random_seed_impl_base
I0704 12:49:29.018705 135516872704000 run_docker.py:258] return seed(seeds)
I0704 12:49:29.018847 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 935, in threefry_seed
I0704 12:49:29.018981 135516872704000 run_docker.py:258] return _threefry_seed(seed)
I0704 12:49:29.019113 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/traceback_util.py", line 166, in reraise_with_filtered_traceback
I0704 12:49:29.019246 135516872704000 run_docker.py:258] return fun(*args, **kwargs)
I0704 12:49:29.019407 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/pjit.py", line 253, in cache_miss
I0704 12:49:29.019543 135516872704000 run_docker.py:258] outs, out_flat, out_tree, args_flat, jaxpr = _python_pjit_helper(
I0704 12:49:29.019678 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/pjit.py", line 166, in _python_pjit_helper
I0704 12:49:29.019812 135516872704000 run_docker.py:258] out_flat = pjit_p.bind(*args_flat, **params)
I0704 12:49:29.019963 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/core.py", line 2596, in bind
I0704 12:49:29.020098 135516872704000 run_docker.py:258] return self.bind_with_trace(top_trace, args, params)
I0704 12:49:29.020233 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/core.py", line 389, in bind_with_trace
I0704 12:49:29.020367 135516872704000 run_docker.py:258] out = trace.process_primitive(self, map(trace.full_raise, args), params)
I0704 12:49:29.020500 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/core.py", line 821, in process_primitive
I0704 12:49:29.020633 135516872704000 run_docker.py:258] return primitive.impl(*tracers, **params)
I0704 12:49:29.020767 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/pjit.py", line 1209, in _pjit_call_impl
I0704 12:49:29.020902 135516872704000 run_docker.py:258] return xc._xla.pjit(name, f, call_impl_cache_miss, [], [], donated_argnums,
I0704 12:49:29.021036 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/pjit.py", line 1192, in call_impl_cache_miss
I0704 12:49:29.021247 135516872704000 run_docker.py:258] out_flat, compiled = _pjit_call_impl_python(
I0704 12:49:29.021384 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/pjit.py", line 1128, in _pjit_call_impl_python
I0704 12:49:29.021517 135516872704000 run_docker.py:258] always_lower=False, lowering_platform=None).compile()
I0704 12:49:29.021650 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/interpreters/pxla.py", line 2206, in compile
I0704 12:49:29.021784 135516872704000 run_docker.py:258] executable = UnloadedMeshExecutable.from_hlo(
I0704 12:49:29.021917 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/interpreters/pxla.py", line 2544, in from_hlo
I0704 12:49:29.022051 135516872704000 run_docker.py:258] xla_executable, compile_options = _cached_compilation(
I0704 12:49:29.022186 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/interpreters/pxla.py", line 2454, in _cached_compilation
I0704 12:49:29.022320 135516872704000 run_docker.py:258] xla_executable = dispatch.compile_or_get_cached(
I0704 12:49:29.022456 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/dispatch.py", line 496, in compile_or_get_cached
I0704 12:49:29.022590 135516872704000 run_docker.py:258] return backend_compile(backend, computation, compile_options,
I0704 12:49:29.022725 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/profiler.py", line 314, in wrapper
I0704 12:49:29.022860 135516872704000 run_docker.py:258] return func(*args, **kwargs)
I0704 12:49:29.022995 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/dispatch.py", line 464, in backend_compile
I0704 12:49:29.023130 135516872704000 run_docker.py:258] return backend.compile(built_c, compile_options=options)
I0704 12:49:29.023265 135516872704000 run_docker.py:258] jax._src.traceback_util.UnfilteredStackTrace: jaxlib.xla_extension.XlaRuntimeError: FAILED_PRECONDITION: DNN library initialization failed. Look at the errors above for more details.
I0704 12:49:29.023427 135516872704000 run_docker.py:258] 
I0704 12:49:29.023565 135516872704000 run_docker.py:258] The stack trace below excludes JAX-internal frames.
I0704 12:49:29.023708 135516872704000 run_docker.py:258] The preceding is the original exception that occurred, unmodified.
I0704 12:49:29.023853 135516872704000 run_docker.py:258] 
I0704 12:49:29.023988 135516872704000 run_docker.py:258] --------------------
I0704 12:49:29.024125 135516872704000 run_docker.py:258] 
I0704 12:49:29.024260 135516872704000 run_docker.py:258] The above exception was the direct cause of the following exception:
I0704 12:49:29.024394 135516872704000 run_docker.py:258] 
I0704 12:49:29.024529 135516872704000 run_docker.py:258] Traceback (most recent call last):
I0704 12:49:29.024663 135516872704000 run_docker.py:258] File "/app/alphafold/run_alphafold.py", line 570, in <module>
I0704 12:49:29.024796 135516872704000 run_docker.py:258] app.run(main)
I0704 12:49:29.024932 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/absl/app.py", line 312, in run
I0704 12:49:29.025067 135516872704000 run_docker.py:258] _run_main(main, args)
I0704 12:49:29.025202 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/absl/app.py", line 258, in _run_main
I0704 12:49:29.025339 135516872704000 run_docker.py:258] sys.exit(main(argv))
I0704 12:49:29.025475 135516872704000 run_docker.py:258] File "/app/alphafold/run_alphafold.py", line 543, in main
I0704 12:49:29.025609 135516872704000 run_docker.py:258] predict_structure(
I0704 12:49:29.025745 135516872704000 run_docker.py:258] File "/app/alphafold/run_alphafold.py", line 284, in predict_structure
I0704 12:49:29.025880 135516872704000 run_docker.py:258] prediction_result = model_runner.predict(processed_feature_dict,
I0704 12:49:29.026014 135516872704000 run_docker.py:258] File "/app/alphafold/alphafold/model/model.py", line 167, in predict
I0704 12:49:29.026148 135516872704000 run_docker.py:258] result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat)
I0704 12:49:29.026283 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/random.py", line 177, in PRNGKey
I0704 12:49:29.026417 135516872704000 run_docker.py:258] return _return_prng_keys(True, _key('PRNGKey', seed, impl))
I0704 12:49:29.026550 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/random.py", line 139, in _key
I0704 12:49:29.026683 135516872704000 run_docker.py:258] return prng.seed_with_impl(impl, seed)
I0704 12:49:29.026816 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 406, in seed_with_impl
I0704 12:49:29.026991 135516872704000 run_docker.py:258] return random_seed(seed, impl=impl)
I0704 12:49:29.027127 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 689, in random_seed
I0704 12:49:29.027269 135516872704000 run_docker.py:258] return random_seed_p.bind(seeds_arr, impl=impl)
I0704 12:49:29.027416 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/core.py", line 386, in bind
I0704 12:49:29.027553 135516872704000 run_docker.py:258] return self.bind_with_trace(find_top_trace(args), args, params)
I0704 12:49:29.027688 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/core.py", line 389, in bind_with_trace
I0704 12:49:29.027823 135516872704000 run_docker.py:258] out = trace.process_primitive(self, map(trace.full_raise, args), params)
I0704 12:49:29.027958 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/core.py", line 821, in process_primitive
I0704 12:49:29.028106 135516872704000 run_docker.py:258] return primitive.impl(*tracers, **params)
I0704 12:49:29.028242 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 701, in random_seed_impl
I0704 12:49:29.028375 135516872704000 run_docker.py:258] base_arr = random_seed_impl_base(seeds, impl=impl)
I0704 12:49:29.028510 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 706, in random_seed_impl_base
I0704 12:49:29.028643 135516872704000 run_docker.py:258] return seed(seeds)
I0704 12:49:29.028788 135516872704000 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 935, in threefry_seed
I0704 12:49:29.028922 135516872704000 run_docker.py:258] return _threefry_seed(seed)
I0704 12:49:29.029057 135516872704000 run_docker.py:258] jaxlib.xla_extension.XlaRuntimeError: FAILED_PRECONDITION: DNN library initialization failed. Look at the errors above for more details.

Same with versions:

  • 0.4.13;
  • 0.4.12.

@rafaeltiveron
Copy link
Author

With version 0.4.1:

I0704 18:08:18.171931 134394245103616 run_docker.py:258] I0704 21:08:18.170888 138079524632384 model.py:165] Running predict with shape(feat) = {'aatype': (4, 554), 'residue_index': (4, 554), 'seq_length': (4,), 'template_aatype': (4, 4, 554), 'template_all_atom_masks': (4, 4, 554, 37), 'template_all_atom_positions': (4, 4, 554, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 554), 'msa_mask': (4, 508, 554), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 554, 3), 'template_pseudo_beta_mask': (4, 4, 554), 'atom14_atom_exists': (4, 554, 14), 'residx_atom14_to_atom37': (4, 554, 14), 'residx_atom37_to_atom14': (4, 554, 37), 'atom37_atom_exists': (4, 554, 37), 'extra_msa': (4, 5120, 554), 'extra_msa_mask': (4, 5120, 554), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 554), 'true_msa': (4, 508, 554), 'extra_has_deletion': (4, 5120, 554), 'extra_deletion_value': (4, 5120, 554), 'msa_feat': (4, 508, 554, 49), 'target_feat': (4, 554, 22)}
I0704 18:08:18.205620 134394245103616 run_docker.py:258] 2024-07-04 21:08:18.205061: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:109] Couldn't get ptxas version : FAILED_PRECONDITION: Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version
I0704 18:08:18.206917 134394245103616 run_docker.py:258] 2024-07-04 21:08:18.206386: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:451] ptxas returned an error during compilation of ptx to sass: 'INTERNAL: Failed to launch ptxas'  If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.
I0704 18:08:18.207054 134394245103616 run_docker.py:258] Fatal Python error: Aborted
I0704 18:08:18.207155 134394245103616 run_docker.py:258] 
I0704 18:08:18.207271 134394245103616 run_docker.py:258] Thread 0x00007d9525043740 (most recent call first):
I0704 18:08:18.207398 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/dispatch.py", line 1014 in backend_compile
I0704 18:08:18.207481 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/profiler.py", line 314 in wrapper
I0704 18:08:18.207566 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/dispatch.py", line 1079 in compile_or_get_cached
I0704 18:08:18.207675 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/interpreters/pxla.py", line 3439 in from_hlo
I0704 18:08:18.207759 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/interpreters/pxla.py", line 3170 in _compile_unloaded
I0704 18:08:18.207827 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/interpreters/pxla.py", line 3202 in compile
I0704 18:08:18.207895 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/dispatch.py", line 359 in _xla_callable_uncached
I0704 18:08:18.207963 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/dispatch.py", line 202 in xla_primitive_callable
I0704 18:08:18.208030 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/util.py", line 247 in cached
I0704 18:08:18.208098 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/util.py", line 254 in wrapper
I0704 18:08:18.208165 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/dispatch.py", line 118 in apply_primitive
I0704 18:08:18.208233 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/core.py", line 712 in process_primitive
I0704 18:08:18.208317 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/core.py", line 332 in bind_with_trace
I0704 18:08:18.208404 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/core.py", line 329 in bind
I0704 18:08:18.208493 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/lax/lax.py", line 509 in shift_right_logical
I0704 18:08:18.208568 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 827 in threefry_seed
I0704 18:08:18.208651 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 592 in random_seed_impl_base
I0704 18:08:18.208725 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 587 in random_seed_impl
I0704 18:08:18.208798 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/core.py", line 712 in process_primitive
I0704 18:08:18.208871 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/core.py", line 332 in bind_with_trace
I0704 18:08:18.208944 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/core.py", line 329 in bind
I0704 18:08:18.209018 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 575 in random_seed
I0704 18:08:18.209091 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 267 in seed_with_impl
I0704 18:08:18.209164 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/random.py", line 133 in PRNGKey
I0704 18:08:18.209252 134394245103616 run_docker.py:258] File "/app/alphafold/alphafold/model/model.py", line 167 in predict
I0704 18:08:18.209327 134394245103616 run_docker.py:258] File "/app/alphafold/run_alphafold.py", line 284 in predict_structure
I0704 18:08:18.209400 134394245103616 run_docker.py:258] File "/app/alphafold/run_alphafold.py", line 543 in main
I0704 18:08:18.209487 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/absl/app.py", line 258 in _run_main
I0704 18:08:18.209563 134394245103616 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/absl/app.py", line 312 in run
I0704 18:08:18.209637 134394245103616 run_docker.py:258] File "/app/alphafold/run_alphafold.py", line 570 in <module>
I0704 18:08:18.209711 134394245103616 run_docker.py:258] 
I0704 18:08:18.218139 134394245103616 run_docker.py:258] Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, scipy._lib._ccallback_c, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.linalg._flinalg, scipy.special._ellip_harm_2, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.hashing, pandas._libs.tslib, pandas._libs.ops, pandas._libs.arrays, pandas._libs.sparse, pandas._libs.reduction, pandas._libs.indexing, pandas._libs.index, pandas._libs.internals, pandas._libs.join, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.testing, pandas._libs.parsers, pandas._libs.json, yaml._yaml, jaxlib.cpu_feature_guard, numpy.linalg.lapack_lite, google._upb._message, tensorflow.python.framework.fast_tensor_util, _brotli, zstandard.backend_c, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.utils, h5py.h5t, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5r, h5py._proxy, h5py._conv, h5py.h5z, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, openmm._openmm, openmm.app.internal.compiled (total: 120)
I0704 18:08:18.734498 134394245103616 run_docker.py:258] /app/run_alphafold.sh: line 3:     8 Aborted                 (core dumped) python /app/alphafold/run_alphafold.py "$@"

@rafaeltiveron
Copy link
Author

With version 0.4.2:

I0704 18:52:06.823572 127555933581312 run_docker.py:258] I0704 21:52:06.822473 139827476477760 model.py:165] Running predict with shape(feat) = {'aatype': (4, 554), 'residue_index': (4, 554), 'seq_length': (4,), 'template_aatype': (4, 4, 554), 'template_all_atom_masks': (4, 4, 554, 37), 'template_all_atom_positions': (4, 4, 554, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 554), 'msa_mask': (4, 508, 554), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 554, 3), 'template_pseudo_beta_mask': (4, 4, 554), 'atom14_atom_exists': (4, 554, 14), 'residx_atom14_to_atom37': (4, 554, 14), 'residx_atom37_to_atom14': (4, 554, 37), 'atom37_atom_exists': (4, 554, 37), 'extra_msa': (4, 5120, 554), 'extra_msa_mask': (4, 5120, 554), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 554), 'true_msa': (4, 508, 554), 'extra_has_deletion': (4, 5120, 554), 'extra_deletion_value': (4, 5120, 554), 'msa_feat': (4, 508, 554, 49), 'target_feat': (4, 554, 22)}
I0704 18:52:06.849806 127555933581312 run_docker.py:258] Traceback (most recent call last):
I0704 18:52:06.850139 127555933581312 run_docker.py:258] File "/app/alphafold/run_alphafold.py", line 570, in <module>
I0704 18:52:06.850346 127555933581312 run_docker.py:258] app.run(main)
I0704 18:52:06.850502 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/absl/app.py", line 312, in run
I0704 18:52:06.850649 127555933581312 run_docker.py:258] _run_main(main, args)
I0704 18:52:06.850791 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/absl/app.py", line 258, in _run_main
I0704 18:52:06.850931 127555933581312 run_docker.py:258] sys.exit(main(argv))
I0704 18:52:06.851069 127555933581312 run_docker.py:258] File "/app/alphafold/run_alphafold.py", line 543, in main
I0704 18:52:06.851209 127555933581312 run_docker.py:258] predict_structure(
I0704 18:52:06.851346 127555933581312 run_docker.py:258] File "/app/alphafold/run_alphafold.py", line 284, in predict_structure
I0704 18:52:06.851576 127555933581312 run_docker.py:258] prediction_result = model_runner.predict(processed_feature_dict,
I0704 18:52:06.851717 127555933581312 run_docker.py:258] File "/app/alphafold/alphafold/model/model.py", line 167, in predict
I0704 18:52:06.851854 127555933581312 run_docker.py:258] result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat)
I0704 18:52:06.851989 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/random.py", line 133, in PRNGKey
I0704 18:52:06.852125 127555933581312 run_docker.py:258] key = prng.seed_with_impl(impl, seed)
I0704 18:52:06.852261 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 267, in seed_with_impl
I0704 18:52:06.852396 127555933581312 run_docker.py:258] return random_seed(seed, impl=impl)
I0704 18:52:06.852531 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 575, in random_seed
I0704 18:52:06.852667 127555933581312 run_docker.py:258] return random_seed_p.bind(seeds_arr, impl=impl)
I0704 18:52:06.852801 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/core.py", line 343, in bind
I0704 18:52:06.852937 127555933581312 run_docker.py:258] return self.bind_with_trace(find_top_trace(args), args, params)
I0704 18:52:06.853073 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/core.py", line 346, in bind_with_trace
I0704 18:52:06.853209 127555933581312 run_docker.py:258] out = trace.process_primitive(self, map(trace.full_raise, args), params)
I0704 18:52:06.853344 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/core.py", line 728, in process_primitive
I0704 18:52:06.853479 127555933581312 run_docker.py:258] return primitive.impl(*tracers, **params)
I0704 18:52:06.853614 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 587, in random_seed_impl
I0704 18:52:06.853748 127555933581312 run_docker.py:258] base_arr = random_seed_impl_base(seeds, impl=impl)
I0704 18:52:06.853884 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 592, in random_seed_impl_base
I0704 18:52:06.854073 127555933581312 run_docker.py:258] return seed(seeds)
I0704 18:52:06.854213 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 827, in threefry_seed
I0704 18:52:06.854350 127555933581312 run_docker.py:258] lax.shift_right_logical(seed, lax_internal._const(seed, 32)))
I0704 18:52:06.854486 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/lax/lax.py", line 509, in shift_right_logical
I0704 18:52:06.854623 127555933581312 run_docker.py:258] return shift_right_logical_p.bind(x, y)
I0704 18:52:06.854757 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/core.py", line 343, in bind
I0704 18:52:06.854892 127555933581312 run_docker.py:258] return self.bind_with_trace(find_top_trace(args), args, params)
I0704 18:52:06.855027 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/core.py", line 346, in bind_with_trace
I0704 18:52:06.855162 127555933581312 run_docker.py:258] out = trace.process_primitive(self, map(trace.full_raise, args), params)
I0704 18:52:06.855297 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/core.py", line 728, in process_primitive
I0704 18:52:06.855454 127555933581312 run_docker.py:258] return primitive.impl(*tracers, **params)
I0704 18:52:06.855591 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/dispatch.py", line 118, in apply_primitive
I0704 18:52:06.855725 127555933581312 run_docker.py:258] compiled_fun = xla_primitive_callable(prim, *unsafe_map(arg_spec, args),
I0704 18:52:06.855859 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/util.py", line 254, in wrapper
I0704 18:52:06.855993 127555933581312 run_docker.py:258] return cached(config._trace_context(), *args, **kwargs)
I0704 18:52:06.856128 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/util.py", line 247, in cached
I0704 18:52:06.856264 127555933581312 run_docker.py:258] return f(*args, **kwargs)
I0704 18:52:06.856401 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/dispatch.py", line 202, in xla_primitive_callable
I0704 18:52:06.856537 127555933581312 run_docker.py:258] compiled = _xla_callable_uncached(lu.wrap_init(prim_fun), device, None,
I0704 18:52:06.856673 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/dispatch.py", line 359, in _xla_callable_uncached
I0704 18:52:06.856807 127555933581312 run_docker.py:258] return computation.compile(_allow_propagation_to_outputs=True).unsafe_call
I0704 18:52:06.856942 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/interpreters/pxla.py", line 3204, in compile
I0704 18:52:06.857078 127555933581312 run_docker.py:258] executable = self._compile_unloaded(
I0704 18:52:06.857214 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/interpreters/pxla.py", line 3167, in _compile_unloaded
I0704 18:52:06.857380 127555933581312 run_docker.py:258] return UnloadedMeshExecutable.from_hlo(
I0704 18:52:06.857519 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/interpreters/pxla.py", line 3447, in from_hlo
I0704 18:52:06.857655 127555933581312 run_docker.py:258] xla_executable = dispatch.compile_or_get_cached(
I0704 18:52:06.857791 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/dispatch.py", line 1086, in compile_or_get_cached
I0704 18:52:06.857926 127555933581312 run_docker.py:258] return backend_compile(backend, serialized_computation, compile_options,
I0704 18:52:06.858062 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/profiler.py", line 314, in wrapper
I0704 18:52:06.858197 127555933581312 run_docker.py:258] return func(*args, **kwargs)
I0704 18:52:06.858332 127555933581312 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/dispatch.py", line 1024, in backend_compile
I0704 18:52:06.858468 127555933581312 run_docker.py:258] return backend.compile(built_c, compile_options=options)
I0704 18:52:06.858603 127555933581312 run_docker.py:258] jaxlib.xla_extension.XlaRuntimeError: FAILED_PRECONDITION: Couldn't get ptxas/nvlink version string: INTERNAL: Couldn't invoke ptxas --version

@rafaeltiveron
Copy link
Author

A error transition between what has been seen in version 0.4.14 and 0.4.2 is now in version 0.4.3:

I0704 19:25:35.424535 123442240143360 run_docker.py:258] I0704 22:25:35.423431 135581519476544 model.py:165] Running predict with shape(feat) = {'aatype': (4, 554), 'residue_index': (4, 554), 'seq_length': (4,), 'template_aatype': (4, 4, 554), 'template_all_atom_masks': (4, 4, 554, 37), 'template_all_atom_positions': (4, 4, 554, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 554), 'msa_mask': (4, 508, 554), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 554, 3), 'template_pseudo_beta_mask': (4, 4, 554), 'atom14_atom_exists': (4, 554, 14), 'residx_atom14_to_atom37': (4, 554, 14), 'residx_atom37_to_atom14': (4, 554, 37), 'atom37_atom_exists': (4, 554, 37), 'extra_msa': (4, 5120, 554), 'extra_msa_mask': (4, 5120, 554), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 554), 'true_msa': (4, 508, 554), 'extra_has_deletion': (4, 5120, 554), 'extra_deletion_value': (4, 5120, 554), 'msa_feat': (4, 508, 554, 49), 'target_feat': (4, 554, 22)}
I0704 19:25:35.440326 123442240143360 run_docker.py:258] 2024-07-04 22:25:35.439725: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:429] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
I0704 19:25:35.443235 123442240143360 run_docker.py:258] 2024-07-04 22:25:35.442633: E external/org_tensorflow/tensorflow/compiler/xla/status_macros.cc:57] INTERNAL: RET_CHECK failure (external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gpu_compiler.cc:626) dnn != nullptr
I0704 19:25:35.443496 123442240143360 run_docker.py:258] *** Begin stack trace ***
I0704 19:25:35.443680 123442240143360 run_docker.py:258] 
I0704 19:25:35.443971 123442240143360 run_docker.py:258] 
I0704 19:25:35.444115 123442240143360 run_docker.py:258] 
I0704 19:25:35.444251 123442240143360 run_docker.py:258] 
I0704 19:25:35.444384 123442240143360 run_docker.py:258] 
I0704 19:25:35.444516 123442240143360 run_docker.py:258] 
I0704 19:25:35.444659 123442240143360 run_docker.py:258] 
I0704 19:25:35.444790 123442240143360 run_docker.py:258] 
I0704 19:25:35.444933 123442240143360 run_docker.py:258] 
I0704 19:25:35.445064 123442240143360 run_docker.py:258] 
I0704 19:25:35.445197 123442240143360 run_docker.py:258] 
I0704 19:25:35.445329 123442240143360 run_docker.py:258] 
I0704 19:25:35.445461 123442240143360 run_docker.py:258] 
I0704 19:25:35.445593 123442240143360 run_docker.py:258] 
I0704 19:25:35.445724 123442240143360 run_docker.py:258] 
I0704 19:25:35.445854 123442240143360 run_docker.py:258] 
I0704 19:25:35.445986 123442240143360 run_docker.py:258] 
I0704 19:25:35.446117 123442240143360 run_docker.py:258] _PyObject_MakeTpCall
I0704 19:25:35.446251 123442240143360 run_docker.py:258] 
I0704 19:25:35.446384 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.446518 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.446651 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.446784 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.446918 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.447050 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.447181 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.447313 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.447470 123442240143360 run_docker.py:258] PyObject_Call
I0704 19:25:35.447604 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.447738 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.447869 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.448002 123442240143360 run_docker.py:258] 
I0704 19:25:35.448135 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.448268 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.448400 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.448541 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.448674 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.448805 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.448943 123442240143360 run_docker.py:258] 
I0704 19:25:35.449076 123442240143360 run_docker.py:258] PyObject_Call
I0704 19:25:35.449223 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.449356 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.449488 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.449620 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.449756 123442240143360 run_docker.py:258] 
I0704 19:25:35.449888 123442240143360 run_docker.py:258] 
I0704 19:25:35.450020 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.450154 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.450286 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.450419 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.450551 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.450684 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.450815 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.450948 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.451081 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.451213 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.451345 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.451499 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.451633 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.451766 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.451899 123442240143360 run_docker.py:258] PyObject_Call
I0704 19:25:35.452030 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.452162 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.452294 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.452426 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.452558 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.452690 123442240143360 run_docker.py:258] 
I0704 19:25:35.452824 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.452970 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.453102 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.453234 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.453368 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.453500 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.453632 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.453765 123442240143360 run_docker.py:258] 
I0704 19:25:35.453896 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.454028 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.454284 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.454421 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.454560 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.454692 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.454824 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.454955 123442240143360 run_docker.py:258] _PyFunction_Vectorcall
I0704 19:25:35.455087 123442240143360 run_docker.py:258] _PyEval_EvalFrameDefault
I0704 19:25:35.455218 123442240143360 run_docker.py:258] 
I0704 19:25:35.455360 123442240143360 run_docker.py:258] PyEval_EvalCode
I0704 19:25:35.455498 123442240143360 run_docker.py:258] 
I0704 19:25:35.455631 123442240143360 run_docker.py:258] 
I0704 19:25:35.455762 123442240143360 run_docker.py:258] 
I0704 19:25:35.455895 123442240143360 run_docker.py:258] _PyRun_SimpleFileObject
I0704 19:25:35.456028 123442240143360 run_docker.py:258] _PyRun_AnyFileObject
I0704 19:25:35.456161 123442240143360 run_docker.py:258] Py_RunMain
I0704 19:25:35.456295 123442240143360 run_docker.py:258] Py_BytesMain
I0704 19:25:35.456428 123442240143360 run_docker.py:258] __libc_start_main
I0704 19:25:35.456560 123442240143360 run_docker.py:258] 
I0704 19:25:35.456692 123442240143360 run_docker.py:258] *** End stack trace ***
I0704 19:25:35.456824 123442240143360 run_docker.py:258] 
I0704 19:25:35.456957 123442240143360 run_docker.py:258] Traceback (most recent call last):
I0704 19:25:35.457090 123442240143360 run_docker.py:258] File "/app/alphafold/run_alphafold.py", line 570, in <module>
I0704 19:25:35.457223 123442240143360 run_docker.py:258] app.run(main)
I0704 19:25:35.457358 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/absl/app.py", line 312, in run
I0704 19:25:35.457491 123442240143360 run_docker.py:258] _run_main(main, args)
I0704 19:25:35.457625 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/absl/app.py", line 258, in _run_main
I0704 19:25:35.457759 123442240143360 run_docker.py:258] sys.exit(main(argv))
I0704 19:25:35.457892 123442240143360 run_docker.py:258] File "/app/alphafold/run_alphafold.py", line 543, in main
I0704 19:25:35.458025 123442240143360 run_docker.py:258] predict_structure(
I0704 19:25:35.458159 123442240143360 run_docker.py:258] File "/app/alphafold/run_alphafold.py", line 284, in predict_structure
I0704 19:25:35.458291 123442240143360 run_docker.py:258] prediction_result = model_runner.predict(processed_feature_dict,
I0704 19:25:35.458423 123442240143360 run_docker.py:258] File "/app/alphafold/alphafold/model/model.py", line 167, in predict
I0704 19:25:35.458555 123442240143360 run_docker.py:258] result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat)
I0704 19:25:35.458687 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/random.py", line 136, in PRNGKey
I0704 19:25:35.458821 123442240143360 run_docker.py:258] key = prng.seed_with_impl(impl, seed)
I0704 19:25:35.458954 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 267, in seed_with_impl
I0704 19:25:35.459086 123442240143360 run_docker.py:258] return random_seed(seed, impl=impl)
I0704 19:25:35.459219 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 570, in random_seed
I0704 19:25:35.459362 123442240143360 run_docker.py:258] return random_seed_p.bind(seeds_arr, impl=impl)
I0704 19:25:35.459499 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/core.py", line 343, in bind
I0704 19:25:35.459639 123442240143360 run_docker.py:258] return self.bind_with_trace(find_top_trace(args), args, params)
I0704 19:25:35.459770 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/core.py", line 346, in bind_with_trace
I0704 19:25:35.459893 123442240143360 run_docker.py:258] out = trace.process_primitive(self, map(trace.full_raise, args), params)
I0704 19:25:35.460016 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/core.py", line 728, in process_primitive
I0704 19:25:35.460139 123442240143360 run_docker.py:258] return primitive.impl(*tracers, **params)
I0704 19:25:35.460264 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 582, in random_seed_impl
I0704 19:25:35.460387 123442240143360 run_docker.py:258] base_arr = random_seed_impl_base(seeds, impl=impl)
I0704 19:25:35.460510 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 587, in random_seed_impl_base
I0704 19:25:35.460633 123442240143360 run_docker.py:258] return seed(seeds)
I0704 19:25:35.460756 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/prng.py", line 822, in threefry_seed
I0704 19:25:35.460879 123442240143360 run_docker.py:258] lax.shift_right_logical(seed, lax_internal._const(seed, 32)))
I0704 19:25:35.461003 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/lax/lax.py", line 511, in shift_right_logical
I0704 19:25:35.461125 123442240143360 run_docker.py:258] return shift_right_logical_p.bind(x, y)
I0704 19:25:35.461248 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/core.py", line 343, in bind
I0704 19:25:35.461371 123442240143360 run_docker.py:258] return self.bind_with_trace(find_top_trace(args), args, params)
I0704 19:25:35.461494 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/core.py", line 346, in bind_with_trace
I0704 19:25:35.461618 123442240143360 run_docker.py:258] out = trace.process_primitive(self, map(trace.full_raise, args), params)
I0704 19:25:35.461741 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/core.py", line 728, in process_primitive
I0704 19:25:35.461863 123442240143360 run_docker.py:258] return primitive.impl(*tracers, **params)
I0704 19:25:35.461987 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/dispatch.py", line 122, in apply_primitive
I0704 19:25:35.462112 123442240143360 run_docker.py:258] compiled_fun = xla_primitive_callable(prim, *unsafe_map(arg_spec, args),
I0704 19:25:35.462237 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/util.py", line 253, in wrapper
I0704 19:25:35.462361 123442240143360 run_docker.py:258] return cached(config._trace_context(), *args, **kwargs)
I0704 19:25:35.462484 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/util.py", line 246, in cached
I0704 19:25:35.462609 123442240143360 run_docker.py:258] return f(*args, **kwargs)
I0704 19:25:35.462733 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/dispatch.py", line 201, in xla_primitive_callable
I0704 19:25:35.462857 123442240143360 run_docker.py:258] compiled = _xla_callable_uncached(lu.wrap_init(prim_fun), device, None,
I0704 19:25:35.462981 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/dispatch.py", line 354, in _xla_callable_uncached
I0704 19:25:35.463105 123442240143360 run_docker.py:258] return computation.compile(_allow_propagation_to_outputs=allow_prop).unsafe_call
I0704 19:25:35.463230 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/interpreters/pxla.py", line 3203, in compile
I0704 19:25:35.463384 123442240143360 run_docker.py:258] executable = self._compile_unloaded(
I0704 19:25:35.463504 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/interpreters/pxla.py", line 3174, in _compile_unloaded
I0704 19:25:35.463620 123442240143360 run_docker.py:258] return UnloadedMeshExecutable.from_hlo(
I0704 19:25:35.463736 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/interpreters/pxla.py", line 3456, in from_hlo
I0704 19:25:35.463852 123442240143360 run_docker.py:258] xla_executable = dispatch.compile_or_get_cached(
I0704 19:25:35.463967 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/dispatch.py", line 1081, in compile_or_get_cached
I0704 19:25:35.464082 123442240143360 run_docker.py:258] return backend_compile(backend, serialized_computation, compile_options,
I0704 19:25:35.464198 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/profiler.py", line 314, in wrapper
I0704 19:25:35.464313 123442240143360 run_docker.py:258] return func(*args, **kwargs)
I0704 19:25:35.464430 123442240143360 run_docker.py:258] File "/opt/conda/lib/python3.10/site-packages/jax/_src/dispatch.py", line 1026, in backend_compile
I0704 19:25:35.464544 123442240143360 run_docker.py:258] return backend.compile(built_c, compile_options=options)
I0704 19:25:35.464660 123442240143360 run_docker.py:258] jaxlib.xla_extension.XlaRuntimeError: INTERNAL: RET_CHECK failure (external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gpu_compiler.cc:626) dnn != nullptr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants