Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPUArraysCore? #409

Closed
ChrisRackauckas opened this issue May 21, 2022 · 9 comments · Fixed by #411
Closed

GPUArraysCore? #409

ChrisRackauckas opened this issue May 21, 2022 · 9 comments · Fixed by #411

Comments

@ChrisRackauckas
Copy link
Member

ChrisRackauckas commented May 21, 2022

julia> @time_imports using GPUArrays
     16.5 ms      ┌ Preferences
     25.2 ms    ┌ JLLWrappers
    173.7 ms  ┌ LLVMExtra_jll
      4.3 ms    ┌ CEnum
     97.1 ms  ┌ LLVM
      2.0 ms  ┌ Adapt
   1000.1 ms  GPUArrays

Could there be a version that just defines the simplest stuff? GPUArrays.jl is too big to be a dependency everywhere, which makes Requires.jl usage a thing. We only need a few things in many places to get what's necessary:

https://github.com/JuliaArrays/ArrayInterface.jl/blob/master/lib/ArrayInterfaceGPUArrays/src/ArrayInterfaceGPUArrays.jl
https://github.com/SciML/RecursiveArrayTools.jl/blob/master/src/RecursiveArrayTools.jl#L27-L29
https://github.com/SciML/LinearSolve.jl/blob/8e31fd75dd10f478fca74fd46477f7b342590ab9/src/default.jl#L12
https://github.com/SciML/ExponentialUtilities.jl/blob/47f47b6f4ec95c3e88f94f2f41dd5f119f788f64/src/krylov_phiv.jl#L130
https://github.com/SciML/DiffEqNoiseProcess.jl/blob/03005350b6284fd8aaa8ca5374d990ea46bb3c80/src/wiener.jl#L19
https://github.com/SciML/DiffEqSensitivity.jl/blob/50fff85abca5452a90b155f8eb9833498d5b7aa4/src/concrete_solve.jl#L45

It would only need:

  • Top level things like GPUArrays.AbstractGPUArray
  • GPUArrays.@allowscalar

And the entirety of SciML would then cut out the LLVM dependency and 1 second off every package. Some packages only have a 1 second using time, so it's almost entirely GPUArrays that's the issue now, and just for those two things.

@oschulz
Copy link

oschulz commented May 21, 2022

A more lightweight interface package would be awesome!

@maleadt
Copy link
Member

maleadt commented May 22, 2022

What makes GPUArrays heavyweight? I'd rather somebody with some package load-time optimization experience has a look rather than we introduce yet another package just for the sake of reducing load time.

@oschulz
Copy link

oschulz commented May 22, 2022

What makes GPUArrays heavyweight?

Currently, GPUArrays has a load time of about 1.3 s, which makes it a "heavy" package:

julia> @time_imports import GPUArrays
     17.1 ms      ┌ Preferences
    273.6 ms    ┌ JLLWrappers
    278.5 ms  ┌ LLVMExtra_jll
      3.4 ms    ┌ CEnum
    134.3 ms  ┌ LLVM
      0.6 ms  ┌ Adapt
   1292.1 ms  GPUArrays

Of that, about 400 ms is LLVM and 900 ms is GPUArrays itself (which is a lot for a single package, we have few that are that heavy):

julia> @time_imports import LLVM, GPUArrays
     17.5 ms      ┌ Preferences
    277.5 ms    ┌ JLLWrappers
    282.5 ms  ┌ LLVMExtra_jll
      3.4 ms  ┌ CEnum
    417.2 ms  LLVM
      0.5 ms  ┌ Adapt
    883.3 ms  GPUArrays

I may be pessimistic, but I don't think any kind of load time optimization can bring that down to a level that an interface package would have. LLVM alone may not be reducible in load time and GPUArrays provides a lot of array function specializations.

@oschulz
Copy link

oschulz commented May 22, 2022

Maybe an inversion of dependencies could also help? Looking at @ChrisRackauckas' list above:

https://github.com/JuliaArrays/ArrayInterface.jl/blob/master/lib/ArrayInterfaceGPUArrays/src/ArrayInterfaceGPUArrays.jl

Now that we have a lightweight ArrayInterfaceCore.jl, maybe those specializations could be defined directly in GPUArrays.jl (which would then depend on ArrayInterfaceCore.jl)?

https://github.com/SciML/RecursiveArrayTools.jl/blob/master/src/RecursiveArrayTools.jl#L27-L29

I've proposed adding some sliced-array interfaces to ArrayInterfaceCore.jl (JuliaLang/julia#32310 (comment)), I hope to have a draft with usage examples ready in a few days. Maybe if that interface would cover some array-of-arrays functionality in general, the code in RecursiveArrayTools.jl#L27-L29 could also move into GPUArrays.jl (which would seem a natural place).

https://github.com/SciML/LinearSolve.jl/blob/8e31fd75dd10f478fca74fd46477f7b342590ab9/src/default.jl#L12

This is just b isa GPUArrays.AbstractGPUArray. I actually have "propose API to get backing device of objects" on my ToDo list, which would add AbstractComputingDevice, CPUDevice, AbstractGPUDevice and getdevice(x) to Adapt.jl. This would give us a common supertype for CuDevice and the others and allow b isa GPUArrays.AbstractGPUArray to be done as getdevice(b) isa AbstractGPUDevice.

https://github.com/SciML/ExponentialUtilities.jl/blob/47f47b6f4ec95c3e88f94f2f41dd5f119f788f64/src/krylov_phiv.jl#L130

ExponentialUtilities has a load time of about 120 ms on top of GPUArrays (about 1200 ms). It would make more sense for GPUArrays to depend on ExponentialUtilities than the other way round (current state). Maybe ExponentialUtilities itself could be lightened/optimized a bit?

https://github.com/SciML/DiffEqNoiseProcess.jl/blob/03005350b6284fd8aaa8ca5374d990ea46bb3c80/src/wiener.jl#L19

The @inline wiener_randn!(rng::AbstractRNG, rand_vec::GPUArrays.AbstractGPUArray) = randn!(rand_vec) seems like a bit of dirty hack, since it'll make results irreproducible when users pass an RNG explicitly. Maybe this needs to be solved differently in general (supporting rng for some types in rand for GPU arrays - I think there's Random123 support now? - and throwing an exception if the RNG type is not GPU compatible).

https://github.com/SciML/DiffEqSensitivity.jl/blob/50fff85abca5452a90b155f8eb9833498d5b7aa4/src/concrete_solve.jl#L45

This comes down to u0 isa GPUArrays.AbstractGPUArray again, so the same approach as proposed above (getdevice(u0)) would work. I'll see that I can prep that proposal for Adapt.jl soon, it would also solve JuliaArrays/ElasticArrays.jl#44 .

@maleadt
Copy link
Member

maleadt commented May 22, 2022

What makes GPUArrays heavyweight?

Currently, GPUArrays has a load time of about 1.3 s, which makes it a "heavy" package:

I realize that, I meant that it is not clear to me what functionality in GPUArrays inflates load time. Similarly, LLVM.jl shouldn't take 400ms to load, so I'd rather we look into that first before throwing dependencies out, because at some point (i.e. when you do want to use GPUs) you have to incur the load time anyway.

@ChrisRackauckas
Copy link
Member Author

because at some point (i.e. when you do want to use GPUs) you have to incur the load time anyway.

But for us, that's the slim-ist minority of users. We have those bits of code in there for the 1% of people who ever want to try a GPU for it to work. So in almost no cases would that load time actually be incurred, it's only incurred because the code for checking GPU support of functions also requires that you load the entirely of LLVM.jl and the beginnings of the GPU stack.

@maleadt
Copy link
Member

maleadt commented May 23, 2022

Sure, but the fact remains that LLVM.jl is intended to be a low-level/lightweight dependency that many packages can rely on (e.g. LoopVectorization should probably use it instead of manually mangling intrinsics and getting it wrong as it recently did), so it'd be more valuable to optimize its load time first.

@ChrisRackauckas
Copy link
Member Author

Splitting wouldn't take more than an hour or so and would be reversible in the future. That sounds like at least a year away. Instead of waiting much longer for that kind of intense fix, could a quick split be done to fix it now, and when LLVM.jl is lightweight return it?

@maleadt
Copy link
Member

maleadt commented May 23, 2022

Yeah sure. I'm not opposed to a (temporary) split, I'd just like that the use case for GPU users is improved as well, and I don't have experience with load time latency optimization or the tools involved. So it'd be nice if somebody with such experience could have a look instead of papering over the immediate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants