Refactor for wgpu v23 compatibility #2435

AsherJingkongChen · 2024-10-29T03:30:09Z

Checklist

Confirmed that run-checks all script has been executed.
Made sure the book is up to date with changes in this PR.
- After this PR merged, contributor-book/src/guides/adding-a-new-operation-to-burn.md needs to be updated.
Check dependencies.

Related Issues/PRs

Dependent on v23.0.0 release gfx-rs/wgpu#6465
Dependent on Refactor for wgpu v23 compatibility with an example of wgpu device sharing cubecl#211
Blocked by Proper bounds checking for CubeCL kernels in Burn crates #2451

Changes

Update cubecl
Update wgpu
Update WgpuDevice::id() in burn-tensor

Testing

I run ./run-checks.sh all, but too many errors occurred on macOS (Metal) 🥺. Fixing the bugs ...

codecov · 2024-10-29T04:23:57Z

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Project coverage is 82.91%. Comparing base (6e71aaf) to head (975a54a).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
crates/burn-tensor/src/lib.rs	0.00%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2435   +/-   ##
=======================================
  Coverage   82.91%   82.91%           
=======================================
  Files         810      810           
  Lines      104904   104903    -1     
=======================================
  Hits        86984    86984           
+ Misses      17920    17919    -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

* The comments in `WgpuDevice::id()` are removed since `id` in `WgpuDevice::Existing` is always `u32`

AsherJingkongChen · 2024-10-29T20:43:28Z

Discovery

The least recent version of wgpu that breaks burn-wgpu tests on Metal is found, see gfx-rs/wgpu@9c781d6.

AsherJingkongChen · 2024-10-31T14:58:22Z

Discovery

I started to do some experiments on macOS.

I noticed that the test cargo t -p burn-wgpu jit::module_avg_pool2d::tests::test_avg_pool2d_simple failed when the kernel pool2d_direct was in Checked mode, but the test passed in Unchecked mode.

In conclusion, the reason of these 200+ errors on macOS tests should be some out-of-bound writes occur in Checked mode.
Since wgpu-hal/metal used BoundsCheckPolicy::Restrict in v23.0.0, these writes cause undesired outputs.
You can see the source code here.

What are the possible solutions?

Changing the execution mode of kernels to Unchecked with manual bound checks added.

#[cube(launch_unchecked)]
fn select_kernel<T: Numeric, I: Numeric>(
    input: &Tensor<T>,
    indices: &Tensor<I>,
    output: &mut Tensor<T>,
    dim: u32,
) {
    if ABSOLUTE_POS >= output.len() {
        return;
    }
    // ...
}

Adding manual bound checks only.

#[cube(launch)]
pub(crate) fn kernel_scalar_binop<C: Numeric, O: BinaryOp<C>>(
    input: &Tensor<Line<C>>,
    scalar: C,
    output: &mut Tensor<Line<C>>,
) {
    let offset_output = ABSOLUTE_POS;

    if offset_output >= output.len() {
        return;
    }
    // ...
}

AsherJingkongChen · 2024-11-12T19:09:42Z

Status Update

Currently, I am busy working at my personal projects. I would run CI every one or two weeks.

It seems that there are more errors occurred on macOS. I would resolve them after I finish my own works.

SSIM on wasm causes bad glitches. I suspect it's related to tracel-ai/burn#2435, but hard to say.

AsherJingkongChen · 2024-11-25T20:20:09Z

Status Update

After merging the latest cubecl into my branch,
I have executed ./run-checks.sh all on my macOS (Metal 3+, Vulkan 1.2+), it looks no error here.

[2024-11-25T19:59:54Z INFO  xtask] Time elapsed for the current execution: 00:08:29

After running CI on tracel/burn, I see there are 8 errors in linux-std-tests. Nevertheless, this change passing macos-std-tests is a good news.

Thanks to the contributions of @ArthurBrussee.

AsherJingkongChen · 2024-11-25T20:39:03Z

Miscellaneous

Continued with the previous post, I paste the error logs on CI linux-std-tests here:


  failures:
  
  ---- tests::jit::autodiff::f32_ty::ad_deform_conv2d::tests::test_deform_conv2d_batched stdout ----
  Testing bias
  Testing input
  Testing offset
  thread 'tests::jit::autodiff::f32_ty::ad_deform_conv2d::tests::test_deform_conv2d_batched' panicked at crates/burn-wgpu/src/lib.rs:108:5:
  Tensors are not approx eq:
    => Position 72: 270 != 0 | difference 270 > tolerance 0.2700000000000001
    => Position 73: 277.5 != 0 | difference 277.5 > tolerance 0.2775000000000001
    => Position 74: 285 != 0 | difference 285 > tolerance 0.2850000000000001
    => Position 75: 292.5 != 0 | difference 292.5 > tolerance 0.2925000000000001
    => Position 76: 67.5 != 0 | difference 67.5 > tolerance 0.06750000000000002
  67 more errors...
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
  
  ---- tests::jit::autodiff::f32_ty::ad_deform_conv2d::tests::test_deform_conv2d_different_kernel_size stdout ----
  Testing bias
  Testing input
  Testing offset
  thread 'tests::jit::autodiff::f32_ty::ad_deform_conv2d::tests::test_deform_conv2d_different_kernel_size' panicked at crates/burn-wgpu/src/lib.rs:108:5:
  Tensors are not approx eq:
    => Position 1: 5.355899810791016 != 0.007523147854954004 | difference 5.348376662936062 > tolerance 0.005355899810791017
    => Position 2: 11.715299606323242 != 0.34953704476356506 | difference 11.365762561559677 > tolerance 0.011715299606323245
    => Position 3: 0.3125 != 0.10416662693023682 | difference 0.20833337306976318 > tolerance 0.0010000000000000002
    => Position 4: 8 != 2.6666667461395264 | difference 5.333333253860474 > tolerance 0.008000000000000002
    => Position 5: 10 != 3.3333334922790527 | difference 6.666666507720947 > tolerance 0.010000000000000002
  235 more errors...
  
  ---- tests::jit::autodiff::f32_ty::ad_deform_conv2d::tests::test_deform_conv2d_basic stdout ----
  Testing bias
  Testing input
  Testing offset
  thread 'tests::jit::autodiff::f32_ty::ad_deform_conv2d::tests::test_deform_conv2d_basic' panicked at crates/burn-wgpu/src/lib.rs:108:5:
  Tensors are not approx eq:
    => Position 1: 15 != 6 | difference 9 > tolerance 0.015000000000000003
    => Position 2: 30 != 12 | difference 18 > tolerance 0.030000000000000006
    => Position 3: 45 != 18 | difference 27 > tolerance 0.04500000000000001
    => Position 5: 3.75 != 1.5 | difference 2.25 > tolerance 0.0037500000000000007
    => Position 6: 7.5 != 3 | difference 4.5 > tolerance 0.0075000000000000015
  65 more errors...
  
  ---- tests::jit::autodiff::f32_ty::ad_deform_conv2d::tests::test_deform_conv2d_different_padding stdout ----
  Testing bias
  Testing input
  Testing offset
  thread 'tests::jit::autodiff::f32_ty::ad_deform_conv2d::tests::test_deform_conv2d_different_padding' panicked at crates/burn-wgpu/src/lib.rs:108:5:
  Tensors are not approx eq:
    => Position 10: 0.8950999975204468 != 0 | difference 0.8950999975204468 > tolerance 0.0010000000000000002
    => Position 11: 14.760600090026855 != 0.03129822388291359 | difference 14.729301866143942 > tolerance 0.01476060009002686
    => Position 12: 17.60420036315918 != 0.5347222089767456 | difference 17.069478154182434 > tolerance 0.017604200363159184
    => Position 13: 20.69809913635254 != 1.1215760707855225 | difference 19.576523065567017 > tolerance 0.020698099136352545
    => Position 14: 22.20039939880371 != 1.624421238899231 | difference 20.57597815990448 > tolerance 0.022200399398803715
  415 more errors...
  
  ---- tests::jit_fusion::autodiff::f32_ty::ad_deform_conv2d::tests::test_deform_conv2d_batched stdout ----
  Testing bias
  Testing input
  Testing offset
  thread 'tests::jit_fusion::autodiff::f32_ty::ad_deform_conv2d::tests::test_deform_conv2d_batched' panicked at crates/burn-wgpu/src/lib.rs:108:5:
  Tensors are not approx eq:
    => Position 72: 270 != 0 | difference 270 > tolerance 0.2700000000000001
    => Position 73: 277.5 != 0 | difference 277.5 > tolerance 0.2775000000000001
    => Position 74: 285 != 0 | difference 285 > tolerance 0.2850000000000001
    => Position 75: 292.5 != 0 | difference 292.5 > tolerance 0.2925000000000001
    => Position 76: 67.5 != 0 | difference 67.5 > tolerance 0.06750000000000002
  67 more errors...
  
  ---- tests::jit_fusion::autodiff::f32_ty::ad_deform_conv2d::tests::test_deform_conv2d_basic stdout ----
  Testing bias
  Testing input
  Testing offset
  thread 'tests::jit_fusion::autodiff::f32_ty::ad_deform_conv2d::tests::test_deform_conv2d_basic' panicked at crates/burn-wgpu/src/lib.rs:108:5:
  Tensors are not approx eq:
    => Position 1: 15 != 6 | difference 9 > tolerance 0.015000000000000003
    => Position 2: 30 != 12 | difference 18 > tolerance 0.030000000000000006
    => Position 3: 45 != 18 | difference 27 > tolerance 0.04500000000000001
    => Position 5: 3.75 != 1.5 | difference 2.25 > tolerance 0.0037500000000000007
    => Position 6: 7.5 != 3 | difference 4.5 > tolerance 0.0075000000000000015
  65 more errors...
  
  ---- tests::jit_fusion::autodiff::f32_ty::ad_deform_conv2d::tests::test_deform_conv2d_different_kernel_size stdout ----
  Testing bias
  Testing input
  Testing offset
  thread 'tests::jit_fusion::autodiff::f32_ty::ad_deform_conv2d::tests::test_deform_conv2d_different_kernel_size' panicked at crates/burn-wgpu/src/lib.rs:108:5:
  Tensors are not approx eq:
    => Position 1: 5.355899810791016 != 0.007523147854954004 | difference 5.348376662936062 > tolerance 0.005355899810791017
    => Position 2: 11.715299606323242 != 0.34953704476356506 | difference 11.365762561559677 > tolerance 0.011715299606323245
    => Position 3: 0.3125 != 0.10416662693023682 | difference 0.20833337306976318 > tolerance 0.0010000000000000002
    => Position 4: 8 != 2.6666667461395264 | difference 5.333333253860474 > tolerance 0.008000000000000002
    => Position 5: 10 != 3.3333334922790527 | difference 6.666666507720947 > tolerance 0.010000000000000002
  235 more errors...
  
  ---- tests::jit_fusion::autodiff::f32_ty::ad_deform_conv2d::tests::test_deform_conv2d_different_padding stdout ----
  Testing bias
  Testing input
  Testing offset
  thread 'tests::jit_fusion::autodiff::f32_ty::ad_deform_conv2d::tests::test_deform_conv2d_different_padding' panicked at crates/burn-wgpu/src/lib.rs:108:5:
  Tensors are not approx eq:
    => Position 10: 0.8950999975204468 != 0 | difference 0.8950999975204468 > tolerance 0.0010000000000000002
    => Position 11: 14.760600090026855 != 0.03129822388291359 | difference 14.729301866143942 > tolerance 0.01476060009002686
    => Position 12: 17.60420036315918 != 0.5347222089767456 | difference 17.069478154182434 > tolerance 0.017604200363159184
    => Position 13: 20.69809913635254 != 1.1215760707855225 | difference 19.576523065567017 > tolerance 0.020698099136352545
    => Position 14: 22.20039939880371 != 1.624421238899231 | difference 20.57597815990448 > tolerance 0.022200399398803715
  415 more errors...
  
  
  failures:
      tests::jit::autodiff::f32_ty::ad_deform_conv2d::tests::test_deform_conv2d_basic
      tests::jit::autodiff::f32_ty::ad_deform_conv2d::tests::test_deform_conv2d_batched
      tests::jit::autodiff::f32_ty::ad_deform_conv2d::tests::test_deform_conv2d_different_kernel_size
      tests::jit::autodiff::f32_ty::ad_deform_conv2d::tests::test_deform_conv2d_different_padding
      tests::jit_fusion::autodiff::f32_ty::ad_deform_conv2d::tests::test_deform_conv2d_basic
      tests::jit_fusion::autodiff::f32_ty::ad_deform_conv2d::tests::test_deform_conv2d_batched
      tests::jit_fusion::autodiff::f32_ty::ad_deform_conv2d::tests::test_deform_conv2d_different_kernel_size
      tests::jit_fusion::autodiff::f32_ty::ad_deform_conv2d::tests::test_deform_conv2d_different_padding
  
  test result: FAILED. 2039 passed; 8 failed; 16 ignored; 0 measured; 0 filtered out; finished in 110.66s
  
  error: test failed, to rerun pass `-p burn-wgpu --lib`

ArthurBrussee · 2024-11-26T14:18:30Z

That's awesome! Thanks for checking again :) I think the remaining errors are the same as here: #2539

The deformable convolutions might need some attention from Genna, for the others, maybe @nathanielsimard knows?

…/wgpu-v23

Refactor for wgpu v23 compatibility

9779ef4

AsherJingkongChen mentioned this pull request Oct 29, 2024

Refactor for wgpu v23 compatibility with an example of wgpu device sharing tracel-ai/cubecl#211

Draft

3 tasks

AsherJingkongChen added 2 commits October 29, 2024 11:34

Format

b625082

Update CubeCL

001cbbc

Update CubeCL

615c174

* The comments in `WgpuDevice::id()` are removed since `id` in `WgpuDevice::Existing` is always `u32`

AsherJingkongChen added 2 commits October 31, 2024 02:48

Merge branch 'main' into refactor/wgpu-v23

3aa530b

Update CubeCL and Wgpu

230212e

AsherJingkongChen mentioned this pull request Nov 1, 2024

Proper bounds checking for CubeCL kernels in Burn crates #2451

Open

2 tasks

Merge remote-tracking branch 'origin/main' into refactor/wgpu-v23

975a54a

ArthurBrussee added a commit to ArthurBrussee/brush that referenced this pull request Nov 14, 2024

Disable SSIM on wasm for now

278091c

SSIM on wasm causes bad glitches. I suspect it's related to tracel-ai/burn#2435, but hard to say.

ArthurBrussee mentioned this pull request Nov 14, 2024

Disable SSIM on wasm for now ArthurBrussee/brush#31

Merged

ArthurBrussee added a commit to ArthurBrussee/brush that referenced this pull request Nov 14, 2024

Disable SSIM on wasm for now (#31)

4008571

SSIM on wasm causes bad glitches. I suspect it's related to tracel-ai/burn#2435, but hard to say.

ArthurBrussee mentioned this pull request Nov 22, 2024

Consistent OOB behaviour for wgpu tracel-ai/cubecl#296

Merged

AsherJingkongChen added 2 commits November 26, 2024 04:02

Merge remote-tracking branch 'origin/main' into refactor/wgpu-v23

0a47659

Patch Cargo.lock

591047f

AsherJingkongChen added 2 commits November 29, 2024 06:36

Merge commit '4258502a78404de87a7381bbd6be212174131cdd' into refactor…

3eee02b

…/wgpu-v23

Merge remote-tracking branch 'origin/main' into refactor/wgpu-v23

23d8679

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor for wgpu v23 compatibility #2435

Refactor for wgpu v23 compatibility #2435

AsherJingkongChen commented Oct 29, 2024 •

edited

Loading

codecov bot commented Oct 29, 2024 •

edited

Loading

AsherJingkongChen commented Oct 29, 2024 •

edited

Loading

AsherJingkongChen commented Oct 31, 2024 •

edited

Loading

AsherJingkongChen commented Nov 12, 2024 •

edited

Loading

AsherJingkongChen commented Nov 25, 2024

AsherJingkongChen commented Nov 25, 2024 •

edited

Loading

ArthurBrussee commented Nov 26, 2024

Refactor for wgpu v23 compatibility #2435

Are you sure you want to change the base?

Refactor for wgpu v23 compatibility #2435

Conversation

AsherJingkongChen commented Oct 29, 2024 • edited Loading

Checklist

Related Issues/PRs

Changes

Testing

codecov bot commented Oct 29, 2024 • edited Loading

Codecov Report

AsherJingkongChen commented Oct 29, 2024 • edited Loading

Discovery

AsherJingkongChen commented Oct 31, 2024 • edited Loading

Discovery

AsherJingkongChen commented Nov 12, 2024 • edited Loading

Status Update

AsherJingkongChen commented Nov 25, 2024

Status Update

AsherJingkongChen commented Nov 25, 2024 • edited Loading

Miscellaneous

ArthurBrussee commented Nov 26, 2024

AsherJingkongChen commented Oct 29, 2024 •

edited

Loading

codecov bot commented Oct 29, 2024 •

edited

Loading

AsherJingkongChen commented Oct 29, 2024 •

edited

Loading

AsherJingkongChen commented Oct 31, 2024 •

edited

Loading

AsherJingkongChen commented Nov 12, 2024 •

edited

Loading

AsherJingkongChen commented Nov 25, 2024 •

edited

Loading