Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump PyTorch pin to 20241112 #1367

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Bump PyTorch pin to 20241112 #1367

wants to merge 13 commits into from

Conversation

Jack-Khuu
Copy link
Contributor

@Jack-Khuu Jack-Khuu commented Nov 12, 2024

Copy link

pytorch-bot bot commented Nov 12, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1367

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 22 New Failures, 2 Cancelled Jobs

As of commit 5b91d46 with merge base b809b69 (image):

NEW FAILURES - The following jobs have failed:

  • pull / compile-gguf (macos-14) (gh)
    NotImplementedError: Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_convert_weight_to_int4pack' is only available for these backends: [MPS, Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].
  • pull / runner-aoti (macos-14-xlarge) (gh)
    torch._inductor.exc.CppCompileError: C++ compile error
  • pull / test-build-runner-et-android / linux-job (gh)
    RuntimeError: Command docker exec -t 5fe5264e2bb12c67eb6007a01a9abd59cb97c02184cdac2d71e4c468cb098000 /exec failed with exit code 1
  • pull / test-cpu-aoti (aarch64, stories15M) (gh)
    torch._inductor.exc.CppCompileError: C++ compile error
  • pull / test-cpu-aoti (x86_64, stories15M) (gh)
    NotImplementedError: Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_convert_weight_to_int4pack' is only available for these backends: [Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].
  • pull / test-cpu-compile (aarch64, stories15M) (gh)
    CppCompileError: C++ compile error
  • pull / test-cpu-compile (x86_64, stories15M) (gh)
    NotImplementedError: Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_convert_weight_to_int4pack' is only available for these backends: [Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].
  • pull / test-cpu-eval-sanity-check (aarch64, stories15M) (gh)
    CppCompileError: C++ compile error
  • pull / test-cpu-eval-sanity-check (x86_64, stories15M) (gh)
    NotImplementedError: Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_convert_weight_to_int4pack' is only available for these backends: [Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].
  • pull / test-cpu-eval-sanity-check-float16 (aarch64, stories15M) (gh)
    Process completed with exit code 1.
  • pull / test-cpu-eval-sanity-check-float16 (x86_64, stories15M) (gh)
    NotImplementedError: Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_convert_weight_to_int4pack' is only available for these backends: [Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].
  • pull / test-cpu-eval-sanity-check-float32 (aarch64, stories15M) (gh)
    Process completed with exit code 1.
  • pull / test-cpu-eval-sanity-check-float32 (x86_64, stories15M) (gh)
    NotImplementedError: Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_convert_weight_to_int4pack' is only available for these backends: [Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].
  • pull / test-gpu-aoti-bfloat16 (cuda, stories15M) / linux-job (gh)
    RuntimeError: Command docker exec -t dbd5f139e8f32cc1cda94796f44861d0d8d79a25301f51db1faecefdf770625d /exec failed with exit code 1
  • pull / test-gpu-aoti-float16 (cuda, stories15M) / linux-job (gh)
    RuntimeError: Command docker exec -t 3cf85dd23196fff6109be8949df7e81694400aa4908310d0f9b83bae7d89a1c0 /exec failed with exit code 1
  • pull / test-gpu-aoti-float32 (cuda, stories15M) / linux-job (gh)
    RuntimeError: Command docker exec -t 3c3789616283c48728d70b9bee8dd708a20fd4be6884b65bd3330041067f8f3f /exec failed with exit code 1
  • pull / test-gpu-compile (cuda, stories15M) / linux-job (gh)
    RuntimeError: Command docker exec -t f7ddcd2315031a765e2621a48995a301cdc8853662fe033c4f769114cda4b7d5 /exec failed with exit code 1
  • pull / test-gpu-eval-sanity-check (cuda, stories15M) / linux-job (gh)
    RuntimeError: Command docker exec -t d525897ce7b387275750040e0e9c21e13c0e5793bab6d6ce016bc69ea38a09bb /exec failed with exit code 1
  • pull / test-tinystories-executorch (macos-14-xlarge) (gh)
    fatal: unable to access 'https://review.mlplatform.org/ml/ethos-u/ethos-u-core-driver/': Failed to connect to review.mlplatform.org port 443 after 88 ms: Couldn't connect to server
  • pull / test-torchao-experimental (macos-14-xlarge) (gh)
    ninja: error: '/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/lib/libomp.dylib', needed by 'libtorchao_ops_aten.dylib', missing and no known rule to make it
  • Run parallel prefill / test-cuda / linux-job (gh)
    RuntimeError: Command docker exec -t 9f5f891ef29a961fd1a8f5a3dd3885f09828032f925e1b6c8a47783a91d96b4b /exec failed with exit code 1
  • Run the aoti runner with CUDA using stories / test-runner-aot-cuda / linux-job (gh)
    RuntimeError: Command docker exec -t 93a34b4464330f1a020d28d0833d77c4407bcba4e6399c40c30e1e037661b0e3 /exec failed with exit code 1

CANCELLED JOBS - The following jobs were cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 12, 2024
@swolchok
Copy link
Contributor

Could not find a version that satisfies the requirement torchvision==0.20.0.dev20241111

this looks accurate; according to https://download.pytorch.org/whl/nightly/torchvision/ there are only windows builds for that day. 20241112 appears to have both linux and windows.

@Jack-Khuu Jack-Khuu changed the title Bump PyTorch pin to 20241111 Bump PyTorch pin to 20241112 Nov 12, 2024
@swolchok
Copy link
Contributor

initial debugging shows the test-cpu-aoti segfault is within aoti_torch_cpu_cat, which is automatically generated by https://github.com/pytorch/pytorch/blob/7e86a7c0155295539996e0cf422883571126073e/torchgen/gen_aoti_c_shim.py . digging up the generated source now.

@@ -96,6 +96,7 @@ def _load_checkpoints_from_storage(
checkpoint_path,
map_location=builder_args.device,
mmap=True,
weight_only=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it needs false? All LLMs should be loadable with weights_only, shouldn't they? (Also, there are no such option as weight_only (or so I hope :P ))

Suggested change
weight_only=False,
weights_only=True,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the typo;

As for setting it to False: I'd rather keep it behavior consistent in a pin bump PR; we can flip in a separate PR

@@ -238,7 +238,7 @@ def _to_core_aten(
raise ValueError(
f"Expected passed in model to be an instance of fx.GraphModule, got {type(model)}"
)
core_aten_ep = export(model, example_inputs, dynamic_shapes=dynamic_shapes)
core_aten_ep = export_for_training(model, example_inputs, dynamic_shapes=dynamic_shapes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what we are doing here, but shouldn't TorchChat be exporting for inference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was picked up from @tugsbayasgalan's PR migrating away from export(), but export_for_inference does sound more in line with what we want

@tugsbayasgalan Can you share info on the new APIs?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep the intended use for inference IR is that user will export to a training IR and call run_decompositions() to lower to inference IR. In this flow, after core_aten_ep, there is to_edge call which lowers to inference. Export team is moving the IR to non-functional training IR so export_for_training will exist as an alias to official export. After we actually migrate official export, we will replace this call with export.

@swolchok
Copy link
Contributor

digging up the generated source now.

generated source looks OK. here's what doesn't look OK in the generated inductor .cpp file:

    AtenTensorHandle buf0_handle;
    AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_12, int_array_13, cached_torch_dtype_uint8, cached_torch_device_type_cpu, this->device_idx_, &buf0_handle));
    RAIIAtenTensorHandle buf0(buf0_handle);
    AtenTensorHandle buf1_handle;
    AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_12, int_array_13, cached_torch_dtype_uint8, cached_torch_device_type_cpu, this->device_idx_, &buf1_handle));
    RAIIAtenTensorHandle buf1(buf1_handle);
    cpp_fused_div_remainder_0((const uint8_t*)(self___model_tok_embeddings__buffers__weight.data_ptr()), (uint8_t*)(buf0.data_ptr()), (uint8_t*)(buf1.data_ptr()));
    // Topologically Sorted Source Nodes: [weight_unpacked], Original ATen: [aten.stack]
    static constexpr int64_t int_array_0[] = {32000LL, 144LL, 1LL};
    static constexpr int64_t int_array_1[] = {144LL, 1LL, 0LL};
    auto tmp_tensor_handle_0 = reinterpret_tensor_wrapper(buf0, 3, int_array_0, int_array_1, 0LL);
    auto tmp_tensor_handle_1 = reinterpret_tensor_wrapper(buf1, 3, int_array_0, int_array_1, 0LL);
    const AtenTensorHandle var_array_0[] = {wrap_with_raii_handle_if_needed(tmp_tensor_handle_0), wrap_with_raii_handle_if_needed(tmp_tensor_handle_1)};
    AtenTensorHandle buf3_handle;
    AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_cpu_cat(var_array_0, 2, -1LL, &buf3_handle));

The problem seems to be const AtenTensorHandle var_array_0[] = {wrap_with_raii_handle_if_needed(tmp_tensor_handle_0), wrap_with_raii_handle_if_needed(tmp_tensor_handle_1)}; -- this is creating RAIIATenTensorHandles, whose operator ATenTensorHandle is immediately called, and then they're destroyed (which decrements the refcount), so the net effect is (I think) to create dangling ATenTensorHandles.

@swolchok
Copy link
Contributor

@desertfire any change the above is a quick fix for you?

@swolchok
Copy link
Contributor

actually we might just need pytorch/pytorch#139411

@swolchok
Copy link
Contributor

no torchvision nightly again today. I'm guessing we could probably use torchvision from yesterday with torch from today?

@Jack-Khuu
Copy link
Contributor Author

Jack-Khuu commented Nov 13, 2024

I had issues with Vision nightlies requiring the corresponding PT nightly few weeks back, I'll give it another go

Update: yup, vision is strict; will need to wait again

@swolchok
Copy link
Contributor

_convert_weight_to_int4pack breakage appears to be from pytorch/pytorch#139611; I guess it's now called _convert_weight_to_int4pack_for_cpu .

@Jack-Khuu
Copy link
Contributor Author

Jack-Khuu commented Nov 14, 2024

Best me to it; luckily AO has a fix so we'll need a bump there too: pytorch/ao#1278

@Jack-Khuu
Copy link
Contributor Author

pytorch/pytorch#139411 Also got reverted on pt/pt so that's fun

@desertfire
Copy link
Contributor

pytorch/pytorch#139411 Also got reverted on pt/pt so that's fun

pytorch/pytorch#139411 is relanded.

@Jack-Khuu
Copy link
Contributor Author

Need to bump everything cuda related: pytorch/pytorch#140885

@swolchok
Copy link
Contributor

swolchok commented Nov 23, 2024

Best me to it; luckily AO has a fix so we'll need a bump there too: pytorch/ao#1278

also need to manually edit torchchat/utils/gguf_loader.py.

looks like that and spurious complaints about missing OMP on Mac are the two blockers left.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants