Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Support fancy iterators in cuda.parallel #2788

Draft
wants to merge 103 commits into
base: main
Choose a base branch
from

Conversation

rwgk
Copy link
Contributor

@rwgk rwgk commented Nov 13, 2024

Description

closes #2479

WORK IN PROGRESS

See #2479 for context.

…t then fails with: Fatal Python error: Floating point exception
…resolves the Floating point exception (but the `cccl_device_reduce()` call still does not succeed)
LOOOK single_tile_kernel CALL /home/coder/cccl/c/parallel/src/reduce.cu:116

LOOOK EXCEPTION CUDA error: invalid argument  /home/coder/cccl/c/parallel/src/reduce.cu:703
…rametrize: `use_numpy_array`: `[True, False]`, `input_generator`: `["constant", "counting", "arbitrary", "nested"]`
…iterators.py (because numba.cuda cannot JIT classes).
… `unary_op`, which is then compiled with `numba.cuda.compile()`
… the `"map_mul2"` test and the added `"map_add10_map_mul2"` test works, too.
Copy link
Contributor

🟩 CI finished in 1h 46m: Pass: 100%/3 | Total: 29m 09s | Avg: 9m 43s | Max: 19m 58s
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 11s | Avg: 4m 35s | Max: 7m 12s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  9m 11s | Avg:  4m 35s | Max:  7m 12s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  9m 11s | Avg:  4m 35s | Max:  7m 12s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 11s | Avg:  4m 35s | Max:  7m 12s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  9m 11s | Avg:  4m 35s | Max:  7m 12s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  9m 11s | Avg:  4m 35s | Max:  7m 12s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  9m 11s | Avg:  4m 35s | Max:  7m 12s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  9m 11s | Avg:  4m 35s | Max:  7m 12s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  1m 59s | Avg:  1m 59s | Max:  1m 59s
      🟩 Test               Pass: 100%/1   | Total:  7m 12s | Avg:  7m 12s | Max:  7m 12s
    
  • 🟩 python: Pass: 100%/1 | Total: 19m 58s | Avg: 19m 58s | Max: 19m 58s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 19m 58s | Avg: 19m 58s | Max: 19m 58s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 19m 58s | Avg: 19m 58s | Max: 19m 58s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 19m 58s | Avg: 19m 58s | Max: 19m 58s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 19m 58s | Avg: 19m 58s | Max: 19m 58s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 19m 58s | Avg: 19m 58s | Max: 19m 58s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 19m 58s | Avg: 19m 58s | Max: 19m 58s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 19m 58s | Avg: 19m 58s | Max: 19m 58s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 19m 58s | Avg: 19m 58s | Max: 19m 58s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 3)

# Runner
2 linux-amd64-gpu-v100-latest-1
1 linux-amd64-cpu16

Copy link
Contributor

🟩 CI finished in 44m 00s: Pass: 100%/3 | Total: 31m 08s | Avg: 10m 22s | Max: 20m 33s
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 35s | Avg: 5m 17s | Max: 8m 32s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 10m 35s | Avg:  5m 17s | Max:  8m 32s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 10m 35s | Avg:  5m 17s | Max:  8m 32s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 10m 35s | Avg:  5m 17s | Max:  8m 32s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 10m 35s | Avg:  5m 17s | Max:  8m 32s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 10m 35s | Avg:  5m 17s | Max:  8m 32s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 10m 35s | Avg:  5m 17s | Max:  8m 32s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 10m 35s | Avg:  5m 17s | Max:  8m 32s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 03s | Avg:  2m 03s | Max:  2m 03s
      🟩 Test               Pass: 100%/1   | Total:  8m 32s | Avg:  8m 32s | Max:  8m 32s
    
  • 🟩 python: Pass: 100%/1 | Total: 20m 33s | Avg: 20m 33s | Max: 20m 33s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 20m 33s | Avg: 20m 33s | Max: 20m 33s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 20m 33s | Avg: 20m 33s | Max: 20m 33s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 20m 33s | Avg: 20m 33s | Max: 20m 33s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 20m 33s | Avg: 20m 33s | Max: 20m 33s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 20m 33s | Avg: 20m 33s | Max: 20m 33s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 20m 33s | Avg: 20m 33s | Max: 20m 33s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 20m 33s | Avg: 20m 33s | Max: 20m 33s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 20m 33s | Avg: 20m 33s | Max: 20m 33s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 3)

# Runner
2 linux-amd64-gpu-v100-latest-1
1 linux-amd64-cpu16

Copy link
Contributor

🟨 CI finished in 1h 32m: Pass: 66%/3 | Total: 24m 37s | Avg: 8m 12s | Max: 18m 38s
  • 🟨 cccl_c_parallel: Pass: 50%/2 | Total: 5m 59s | Avg: 2m 59s | Max: 3m 55s

    🚨 jobs: Test 🚨
      🟩 Build              Pass: 100%/1   | Total:  2m 04s | Avg:  2m 04s | Max:  2m 04s
      🔥 Test               Pass:   0%/1   | Total:  3m 55s | Avg:  3m 55s | Max:  3m 55s
    🟨 cpu
      🟨 amd64              Pass:  50%/2   | Total:  5m 59s | Avg:  2m 59s | Max:  3m 55s
    🟨 ctk
      🟨 12.6               Pass:  50%/2   | Total:  5m 59s | Avg:  2m 59s | Max:  3m 55s
    🟨 cudacxx
      🟨 nvcc12.6           Pass:  50%/2   | Total:  5m 59s | Avg:  2m 59s | Max:  3m 55s
    🟨 cudacxx_family
      🟨 nvcc               Pass:  50%/2   | Total:  5m 59s | Avg:  2m 59s | Max:  3m 55s
    🟨 cxx
      🟨 GCC13              Pass:  50%/2   | Total:  5m 59s | Avg:  2m 59s | Max:  3m 55s
    🟨 cxx_family
      🟨 GCC                Pass:  50%/2   | Total:  5m 59s | Avg:  2m 59s | Max:  3m 55s
    🟨 gpu
      🟨 v100               Pass:  50%/2   | Total:  5m 59s | Avg:  2m 59s | Max:  3m 55s
    
  • 🟩 python: Pass: 100%/1 | Total: 18m 38s | Avg: 18m 38s | Max: 18m 38s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 18m 38s | Avg: 18m 38s | Max: 18m 38s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 18m 38s | Avg: 18m 38s | Max: 18m 38s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 18m 38s | Avg: 18m 38s | Max: 18m 38s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 18m 38s | Avg: 18m 38s | Max: 18m 38s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 18m 38s | Avg: 18m 38s | Max: 18m 38s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 18m 38s | Avg: 18m 38s | Max: 18m 38s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 18m 38s | Avg: 18m 38s | Max: 18m 38s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 18m 38s | Avg: 18m 38s | Max: 18m 38s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 3)

# Runner
2 linux-amd64-gpu-v100-latest-1
1 linux-amd64-cpu16

Copy link
Contributor

🟨 CI finished in 20m 34s: Pass: 66%/3 | Total: 29m 18s | Avg: 9m 46s | Max: 19m 11s
  • 🟥 python: Pass: 0%/1 | Total: 19m 11s | Avg: 19m 11s | Max: 19m 11s

    🟥 cpu
      🟥 amd64              Pass:   0%/1   | Total: 19m 11s | Avg: 19m 11s | Max: 19m 11s
    🟥 ctk
      🟥 12.6               Pass:   0%/1   | Total: 19m 11s | Avg: 19m 11s | Max: 19m 11s
    🟥 cudacxx
      🟥 nvcc12.6           Pass:   0%/1   | Total: 19m 11s | Avg: 19m 11s | Max: 19m 11s
    🟥 cudacxx_family
      🟥 nvcc               Pass:   0%/1   | Total: 19m 11s | Avg: 19m 11s | Max: 19m 11s
    🟥 cxx
      🟥 GCC13              Pass:   0%/1   | Total: 19m 11s | Avg: 19m 11s | Max: 19m 11s
    🟥 cxx_family
      🟥 GCC                Pass:   0%/1   | Total: 19m 11s | Avg: 19m 11s | Max: 19m 11s
    🟥 gpu
      🟥 v100               Pass:   0%/1   | Total: 19m 11s | Avg: 19m 11s | Max: 19m 11s
    🟥 jobs
      🟥 Test               Pass:   0%/1   | Total: 19m 11s | Avg: 19m 11s | Max: 19m 11s
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 07s | Avg: 5m 03s | Max: 7m 36s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 10m 07s | Avg:  5m 03s | Max:  7m 36s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 10m 07s | Avg:  5m 03s | Max:  7m 36s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 10m 07s | Avg:  5m 03s | Max:  7m 36s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 10m 07s | Avg:  5m 03s | Max:  7m 36s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 10m 07s | Avg:  5m 03s | Max:  7m 36s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 10m 07s | Avg:  5m 03s | Max:  7m 36s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 10m 07s | Avg:  5m 03s | Max:  7m 36s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 31s | Avg:  2m 31s | Max:  2m 31s
      🟩 Test               Pass: 100%/1   | Total:  7m 36s | Avg:  7m 36s | Max:  7m 36s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 3)

# Runner
2 linux-amd64-gpu-v100-latest-1
1 linux-amd64-cpu16

Copy link
Contributor

🟩 CI finished in 1h 56m: Pass: 100%/3 | Total: 30m 23s | Avg: 10m 07s | Max: 19m 31s
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 52s | Avg: 5m 26s | Max: 8m 32s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 10m 52s | Avg:  5m 26s | Max:  8m 32s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 10m 52s | Avg:  5m 26s | Max:  8m 32s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 10m 52s | Avg:  5m 26s | Max:  8m 32s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 10m 52s | Avg:  5m 26s | Max:  8m 32s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 10m 52s | Avg:  5m 26s | Max:  8m 32s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 10m 52s | Avg:  5m 26s | Max:  8m 32s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 10m 52s | Avg:  5m 26s | Max:  8m 32s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 20s | Avg:  2m 20s | Max:  2m 20s
      🟩 Test               Pass: 100%/1   | Total:  8m 32s | Avg:  8m 32s | Max:  8m 32s
    
  • 🟩 python: Pass: 100%/1 | Total: 19m 31s | Avg: 19m 31s | Max: 19m 31s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 19m 31s | Avg: 19m 31s | Max: 19m 31s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 19m 31s | Avg: 19m 31s | Max: 19m 31s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 19m 31s | Avg: 19m 31s | Max: 19m 31s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 19m 31s | Avg: 19m 31s | Max: 19m 31s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 19m 31s | Avg: 19m 31s | Max: 19m 31s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 19m 31s | Avg: 19m 31s | Max: 19m 31s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 19m 31s | Avg: 19m 31s | Max: 19m 31s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 19m 31s | Avg: 19m 31s | Max: 19m 31s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 3)

# Runner
2 linux-amd64-gpu-v100-latest-1
1 linux-amd64-cpu16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

[FEA]: Support fancy iterators in cuda.parallel
2 participants