Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: inherit scalar indexing functionality from GPUArraysCore #268

Merged
merged 7 commits into from
Nov 13, 2024

Conversation

avik-pal
Copy link
Collaborator

@avik-pal avik-pal commented Nov 12, 2024

needs some tests before merging

Example Usage

julia> using Reactant

julia> using GPUArraysCore

julia> x_ra = ConcreteRArray(rand(3, 4))
3×4 ConcreteRArray{Float64, 2}:
 0.166621  0.415209   0.23444   0.225489
 0.323775  0.201456   0.885111  0.625804
 0.22719   0.0906565  0.244437  0.98303

julia> x_ra[1]
0.1666208268454895

julia> GPUArraysCore.allowscalar(false)

julia> x_ra[1]
ERROR: Scalar indexing is disallowed.
Invocation of getindex(::ConcreteRArray, ::Vararg{Int, N}) resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.

If you want to allow scalar iteration, use `allowscalar` or `@allowscalar`
to enable scalar iteration globally or for the operations in question.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] errorscalar(op::String)
   @ GPUArraysCore /mnt/.julia/packages/GPUArraysCore/aNaXo/src/GPUArraysCore.jl:151
 [3] _assertscalar(op::String, behavior::GPUArraysCore.ScalarIndexing)
   @ GPUArraysCore /mnt/.julia/packages/GPUArraysCore/aNaXo/src/GPUArraysCore.jl:124
 [4] assertscalar(op::String)
   @ GPUArraysCore /mnt/.julia/packages/GPUArraysCore/aNaXo/src/GPUArraysCore.jl:112
 [5] getindex(a::ConcreteRArray{Float64, 2}, args::Int64)
   @ Reactant /mnt/software/lux/Reactant.jl/src/ConcreteRArray.jl:175
 [6] top-level scope
   @ REPL[5]:1
 [7] top-level scope
   @ none:1

julia> @allowscalar x_ra[1]
0.1666208268454895

On CPU no error is ever thrown unless the user manually opts in for no-scalar indexing.

fixes #232

@avik-pal
Copy link
Collaborator Author

@mofeing can you check if this helps your case where you saw the scalar indexing warnings

@mofeing
Copy link
Collaborator

mofeing commented Nov 12, 2024

we confirm that this remove the infinite warnings we had in our code. Thanks @avik-pal!

I would approve the PR but it seems like this is breaking the tests?

@avik-pal
Copy link
Collaborator Author

The x86 ones are broken since we dont have the binaries in-place.

But I still need to add some tests before merging

src/Reactant.jl Outdated
@@ -110,12 +111,19 @@ function __init__()
end

function set_default_backend(backend::XLA.Client)
if backend === XLA.backends["cpu"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this won't quite work because we can end up with both cpu and gpu tensors

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For XLA Buffer I can do a check with buffer on cpu, but I couldn't figure out how to do it for TracedRArray.

One solution is to set the local_task_storage to ScalarAllowed for CPU when entering the compile function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah traced we should always err for there, because that has its own problems of accidentally splitting up tensor ops into a bunch of scalars, regardless of backend impl

Copy link
Collaborator

@mofeing mofeing Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TracedRArray doesn't know about the backend accelerator, be CPU, GPU or TPU. actually, HLO dialects neither know about which backend are they gonna run in.
@wsmoses correct me if I'm wrong but that step is done later in XLA when compiling HLO to native executable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean even if they did [and you're right they don't] we should err for traced

Copy link
Collaborator

@mofeing mofeing Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked to remove them for CPU because it doesn't make much sense to raise a warning for CPU and they pollute a loooot the stdout.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think removing for cpu concretearray is fine, but the problem is that it will equally pollute the IR we compile on traced, so we should still warn (or allowscalar)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the behavior to (default CUDA behavior):

  1. Allowed by warn in REPL
  2. Disallowed with error in scripts
  3. Can be locally allowed without warning using @allowscalar

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we reexport allowscalar?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

)
getindex_warned[] = true
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have this call a function of our own which calls gpuarrays assertscalar if it's loaded

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, but it is extremely lightweight:

julia> @time_imports using GPUArraysCore
      0.2 ms  Adapt
      0.4 ms  GPUArraysCore

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh okay I'm fine with this then

@@ -16,7 +14,7 @@ using InteractiveUtils

a = Reactant.ConcreteRArray(x)

c_res = sum(a)
c_res = @allowscalar sum(a)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay ideally this shouldn't be required. I feel like a load of a concretenumber/tracednumber itself should be automatically allowscalar

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These work fine if the backend is CPU but the default implementation of sum will just loop over the indices which fails the GPU ci

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait really, shouldn’t it fall back to a reduce?

if not this is definitely a bug

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry this is for the concretearray not traced

yeah we should still eventually make this a reduce, but for another time

@wsmoses wsmoses merged commit 5a60501 into main Nov 13, 2024
21 of 34 checks passed
@wsmoses wsmoses deleted the ap/scalar_indexing branch November 13, 2024 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use GPUArraysCore for scalar indexing flags
3 participants