Bump PyTorch pin to 20241112 #1367

Jack-Khuu · 2024-11-12T19:27:23Z

Accounts for:

PyTorch changing weight_only default from False to True weights_only default flip for torch.load #1356
Moving from export to export_for_training Use training IR in torchchat export #1319
Should also fix cuDNN error: RuntimeError: cuDNN Frontend error: [cudnn_frontend] Error: No execution plans support the graph. huggingface/diffusers#9704
_convert_weight_to_int4pack API change from Split int4wo weight packing pytorch#139611, requiring AO PinBump for Add Int4CPULayout and update int4 woq ao#1278
Change in CUDA support in PT wheels Nightly builds missing from PyTorch cu121 repository since November 12, 2024 pytorch#140885

pytorch-bot · 2024-11-12T19:27:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1367

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[DomainsOnly] Jobs fail with GLIBC version not found

❌ 22 New Failures, 2 Cancelled Jobs

As of commit 5b91d46 with merge base b809b69 ():

NEW FAILURES - The following jobs have failed:

pull / compile-gguf (macos-14) (gh)
NotImplementedError: Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_convert_weight_to_int4pack' is only available for these backends: [MPS, Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].
pull / runner-aoti (macos-14-xlarge) (gh)
torch._inductor.exc.CppCompileError: C++ compile error
pull / test-build-runner-et-android / linux-job (gh)
RuntimeError: Command docker exec -t 5fe5264e2bb12c67eb6007a01a9abd59cb97c02184cdac2d71e4c468cb098000 /exec failed with exit code 1
pull / test-cpu-aoti (aarch64, stories15M) (gh)
torch._inductor.exc.CppCompileError: C++ compile error
pull / test-cpu-aoti (x86_64, stories15M) (gh)
NotImplementedError: Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_convert_weight_to_int4pack' is only available for these backends: [Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].
pull / test-cpu-compile (aarch64, stories15M) (gh)
CppCompileError: C++ compile error
pull / test-cpu-compile (x86_64, stories15M) (gh)
NotImplementedError: Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_convert_weight_to_int4pack' is only available for these backends: [Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].
pull / test-cpu-eval-sanity-check (aarch64, stories15M) (gh)
CppCompileError: C++ compile error
pull / test-cpu-eval-sanity-check (x86_64, stories15M) (gh)
NotImplementedError: Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_convert_weight_to_int4pack' is only available for these backends: [Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].
pull / test-cpu-eval-sanity-check-float16 (aarch64, stories15M) (gh)
Process completed with exit code 1.
pull / test-cpu-eval-sanity-check-float16 (x86_64, stories15M) (gh)
NotImplementedError: Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_convert_weight_to_int4pack' is only available for these backends: [Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].
pull / test-cpu-eval-sanity-check-float32 (aarch64, stories15M) (gh)
Process completed with exit code 1.
pull / test-cpu-eval-sanity-check-float32 (x86_64, stories15M) (gh)
NotImplementedError: Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_convert_weight_to_int4pack' is only available for these backends: [Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].
pull / test-gpu-aoti-bfloat16 (cuda, stories15M) / linux-job (gh)
RuntimeError: Command docker exec -t dbd5f139e8f32cc1cda94796f44861d0d8d79a25301f51db1faecefdf770625d /exec failed with exit code 1
pull / test-gpu-aoti-float16 (cuda, stories15M) / linux-job (gh)
RuntimeError: Command docker exec -t 3cf85dd23196fff6109be8949df7e81694400aa4908310d0f9b83bae7d89a1c0 /exec failed with exit code 1
pull / test-gpu-aoti-float32 (cuda, stories15M) / linux-job (gh)
RuntimeError: Command docker exec -t 3c3789616283c48728d70b9bee8dd708a20fd4be6884b65bd3330041067f8f3f /exec failed with exit code 1
pull / test-gpu-compile (cuda, stories15M) / linux-job (gh)
RuntimeError: Command docker exec -t f7ddcd2315031a765e2621a48995a301cdc8853662fe033c4f769114cda4b7d5 /exec failed with exit code 1
pull / test-gpu-eval-sanity-check (cuda, stories15M) / linux-job (gh)
RuntimeError: Command docker exec -t d525897ce7b387275750040e0e9c21e13c0e5793bab6d6ce016bc69ea38a09bb /exec failed with exit code 1
pull / test-tinystories-executorch (macos-14-xlarge) (gh)
fatal: unable to access 'https://review.mlplatform.org/ml/ethos-u/ethos-u-core-driver/': Failed to connect to review.mlplatform.org port 443 after 88 ms: Couldn't connect to server
pull / test-torchao-experimental (macos-14-xlarge) (gh)
ninja: error: '/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/lib/libomp.dylib', needed by 'libtorchao_ops_aten.dylib', missing and no known rule to make it
Run parallel prefill / test-cuda / linux-job (gh)
RuntimeError: Command docker exec -t 9f5f891ef29a961fd1a8f5a3dd3885f09828032f925e1b6c8a47783a91d96b4b /exec failed with exit code 1
Run the aoti runner with CUDA using stories / test-runner-aot-cuda / linux-job (gh)
RuntimeError: Command docker exec -t 93a34b4464330f1a020d28d0833d77c4407bcba4e6399c40c30e1e037661b0e3 /exec failed with exit code 1

CANCELLED JOBS - The following jobs were cancelled. Please retry:

pull / runner-aoti (16-core-ubuntu) (gh)
##[error]The operation was canceled.
pull / test-tinystories-executorch (16-core-ubuntu) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

swolchok · 2024-11-12T19:48:16Z

Could not find a version that satisfies the requirement torchvision==0.20.0.dev20241111

this looks accurate; according to https://download.pytorch.org/whl/nightly/torchvision/ there are only windows builds for that day. 20241112 appears to have both linux and windows.

swolchok · 2024-11-12T23:35:21Z

initial debugging shows the test-cpu-aoti segfault is within aoti_torch_cpu_cat, which is automatically generated by https://github.com/pytorch/pytorch/blob/7e86a7c0155295539996e0cf422883571126073e/torchgen/gen_aoti_c_shim.py . digging up the generated source now.

malfet · 2024-11-12T23:41:15Z

torchchat/distributed/checkpoint.py

@@ -96,6 +96,7 @@ def _load_checkpoints_from_storage(
        checkpoint_path,
        map_location=builder_args.device,
        mmap=True,
+        weight_only=False,


Why does it needs false? All LLMs should be loadable with weights_only, shouldn't they? (Also, there are no such option as weight_only (or so I hope :P ))

Suggested change

weight_only=False,

weights_only=True,

Good catch on the typo;

As for setting it to False: I'd rather keep it behavior consistent in a pin bump PR; we can flip in a separate PR

malfet · 2024-11-12T23:41:56Z

torchchat/export.py

@@ -238,7 +238,7 @@ def _to_core_aten(
            raise ValueError(
                f"Expected passed in model to be an instance of fx.GraphModule, got {type(model)}"
            )
-        core_aten_ep = export(model, example_inputs, dynamic_shapes=dynamic_shapes)
+        core_aten_ep = export_for_training(model, example_inputs, dynamic_shapes=dynamic_shapes)


Not sure what we are doing here, but shouldn't TorchChat be exporting for inference?

This was picked up from @tugsbayasgalan's PR migrating away from export(), but export_for_inference does sound more in line with what we want

@tugsbayasgalan Can you share info on the new APIs?

Yep the intended use for inference IR is that user will export to a training IR and call run_decompositions() to lower to inference IR. In this flow, after core_aten_ep, there is to_edge call which lowers to inference. Export team is moving the IR to non-functional training IR so export_for_training will exist as an alias to official export. After we actually migrate official export, we will replace this call with export.

swolchok · 2024-11-12T23:51:16Z

digging up the generated source now.

generated source looks OK. here's what doesn't look OK in the generated inductor .cpp file:

    AtenTensorHandle buf0_handle;
    AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_12, int_array_13, cached_torch_dtype_uint8, cached_torch_device_type_cpu, this->device_idx_, &buf0_handle));
    RAIIAtenTensorHandle buf0(buf0_handle);
    AtenTensorHandle buf1_handle;
    AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_12, int_array_13, cached_torch_dtype_uint8, cached_torch_device_type_cpu, this->device_idx_, &buf1_handle));
    RAIIAtenTensorHandle buf1(buf1_handle);
    cpp_fused_div_remainder_0((const uint8_t*)(self___model_tok_embeddings__buffers__weight.data_ptr()), (uint8_t*)(buf0.data_ptr()), (uint8_t*)(buf1.data_ptr()));
    // Topologically Sorted Source Nodes: [weight_unpacked], Original ATen: [aten.stack]
    static constexpr int64_t int_array_0[] = {32000LL, 144LL, 1LL};
    static constexpr int64_t int_array_1[] = {144LL, 1LL, 0LL};
    auto tmp_tensor_handle_0 = reinterpret_tensor_wrapper(buf0, 3, int_array_0, int_array_1, 0LL);
    auto tmp_tensor_handle_1 = reinterpret_tensor_wrapper(buf1, 3, int_array_0, int_array_1, 0LL);
    const AtenTensorHandle var_array_0[] = {wrap_with_raii_handle_if_needed(tmp_tensor_handle_0), wrap_with_raii_handle_if_needed(tmp_tensor_handle_1)};
    AtenTensorHandle buf3_handle;
    AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_cpu_cat(var_array_0, 2, -1LL, &buf3_handle));

The problem seems to be const AtenTensorHandle var_array_0[] = {wrap_with_raii_handle_if_needed(tmp_tensor_handle_0), wrap_with_raii_handle_if_needed(tmp_tensor_handle_1)}; -- this is creating RAIIATenTensorHandles, whose operator ATenTensorHandle is immediately called, and then they're destroyed (which decrements the refcount), so the net effect is (I think) to create dangling ATenTensorHandles.

swolchok · 2024-11-13T00:00:43Z

@desertfire any change the above is a quick fix for you?

swolchok · 2024-11-13T00:04:08Z

actually we might just need pytorch/pytorch#139411

swolchok · 2024-11-13T15:51:12Z

no torchvision nightly again today. I'm guessing we could probably use torchvision from yesterday with torch from today?

Jack-Khuu · 2024-11-13T18:54:29Z

I had issues with Vision nightlies requiring the corresponding PT nightly few weeks back, I'll give it another go

Update: yup, vision is strict; will need to wait again

swolchok · 2024-11-14T16:37:50Z

_convert_weight_to_int4pack breakage appears to be from pytorch/pytorch#139611; I guess it's now called _convert_weight_to_int4pack_for_cpu .

Jack-Khuu · 2024-11-14T17:28:07Z

Best me to it; luckily AO has a fix so we'll need a bump there too: pytorch/ao#1278

Jack-Khuu · 2024-11-14T19:34:07Z

pytorch/pytorch#139411 Also got reverted on pt/pt so that's fun

desertfire · 2024-11-18T14:48:54Z

pytorch/pytorch#139411 Also got reverted on pt/pt so that's fun

pytorch/pytorch#139411 is relanded.

Jack-Khuu · 2024-11-18T20:07:05Z

Need to bump everything cuda related: pytorch/pytorch#140885

swolchok · 2024-11-23T00:08:53Z

Best me to it; luckily AO has a fix so we'll need a bump there too: pytorch/ao#1278

also need to manually edit torchchat/utils/gguf_loader.py.

looks like that and spurious complaints about missing OMP on Mac are the two blockers left.

Bump PyTorch pin to 20241111

bcdfc54

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 12, 2024

bump to 1112

a976734

Jack-Khuu changed the title ~~Bump PyTorch pin to 20241111~~ Bump PyTorch pin to 20241112 Nov 12, 2024

Merge branch 'main' into pinbump1111

23b4536

malfet approved these changes Nov 12, 2024

View reviewed changes

Jack-Khuu mentioned this pull request Nov 13, 2024

Use training IR in torchchat export #1319

Closed

Jack-Khuu added 4 commits November 13, 2024 11:11

Update install_requirements.sh

6328935

Update install_requirements.sh

7aa96d7

Merge branch 'main' into pinbump1111

4a977a5

Update checkpoint.py typo

774ebb6

This was referenced Nov 15, 2024

cpp_wrapper_cpu: Ensure reinterpret_view results in RAIIAtenTensorHandle pytorch/pytorch#139411

Closed

Add Intel XPU device support to generate and serve #1361

Open

Merge branch 'main' into pinbump1111

655dc4a

Jack-Khuu mentioned this pull request Nov 16, 2024

AOTI filesize regression *.pt2 filesize is bigger than .*so #1365

Open

Update install_requirements.sh

a6cb90c

Jack-Khuu mentioned this pull request Nov 18, 2024

Add Int4CPULayout and update int4 woq pytorch/ao#1278

Open

Merge branch 'main' into pinbump1111

8cb415d

Update install_requirements.sh

f9d0a29

Jack-Khuu added 2 commits November 19, 2024 07:47

Update install_requirements.sh

c3f18c6

Merge branch 'main' into pinbump1111

5b91d46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump PyTorch pin to 20241112 #1367

Bump PyTorch pin to 20241112 #1367

Jack-Khuu commented Nov 12, 2024 •

edited

Loading

pytorch-bot bot commented Nov 12, 2024 •

edited

Loading

swolchok commented Nov 12, 2024

swolchok commented Nov 12, 2024

malfet Nov 12, 2024

Jack-Khuu Nov 13, 2024

malfet Nov 12, 2024

Jack-Khuu Nov 13, 2024

tugsbayasgalan Nov 13, 2024

swolchok commented Nov 12, 2024

swolchok commented Nov 13, 2024

swolchok commented Nov 13, 2024

swolchok commented Nov 13, 2024

Jack-Khuu commented Nov 13, 2024 •

edited

Loading

swolchok commented Nov 14, 2024

Jack-Khuu commented Nov 14, 2024 •

edited

Loading

Jack-Khuu commented Nov 14, 2024

desertfire commented Nov 18, 2024

Jack-Khuu commented Nov 18, 2024

swolchok commented Nov 23, 2024 •

edited

Loading

Bump PyTorch pin to 20241112 #1367

Are you sure you want to change the base?

Bump PyTorch pin to 20241112 #1367

Conversation

Jack-Khuu commented Nov 12, 2024 • edited Loading

pytorch-bot bot commented Nov 12, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1367

❗ 1 Active SEVs

❌ 22 New Failures, 2 Cancelled Jobs

swolchok commented Nov 12, 2024

swolchok commented Nov 12, 2024

malfet Nov 12, 2024

Choose a reason for hiding this comment

Jack-Khuu Nov 13, 2024

Choose a reason for hiding this comment

malfet Nov 12, 2024

Choose a reason for hiding this comment

Jack-Khuu Nov 13, 2024

Choose a reason for hiding this comment

tugsbayasgalan Nov 13, 2024

Choose a reason for hiding this comment

swolchok commented Nov 12, 2024

swolchok commented Nov 13, 2024

swolchok commented Nov 13, 2024

swolchok commented Nov 13, 2024

Jack-Khuu commented Nov 13, 2024 • edited Loading

swolchok commented Nov 14, 2024

Jack-Khuu commented Nov 14, 2024 • edited Loading

Jack-Khuu commented Nov 14, 2024

desertfire commented Nov 18, 2024

Jack-Khuu commented Nov 18, 2024

swolchok commented Nov 23, 2024 • edited Loading

Jack-Khuu commented Nov 12, 2024 •

edited

Loading

pytorch-bot bot commented Nov 12, 2024 •

edited

Loading

Jack-Khuu commented Nov 13, 2024 •

edited

Loading

Jack-Khuu commented Nov 14, 2024 •

edited

Loading

swolchok commented Nov 23, 2024 •

edited

Loading