GPUArraysCore? #409

ChrisRackauckas · 2022-05-21T14:55:50Z

julia> @time_imports using GPUArrays
     16.5 ms      ┌ Preferences
     25.2 ms    ┌ JLLWrappers
    173.7 ms  ┌ LLVMExtra_jll
      4.3 ms    ┌ CEnum
     97.1 ms  ┌ LLVM
      2.0 ms  ┌ Adapt
   1000.1 ms  GPUArrays

Could there be a version that just defines the simplest stuff? GPUArrays.jl is too big to be a dependency everywhere, which makes Requires.jl usage a thing. We only need a few things in many places to get what's necessary:

https://github.com/JuliaArrays/ArrayInterface.jl/blob/master/lib/ArrayInterfaceGPUArrays/src/ArrayInterfaceGPUArrays.jl
https://github.com/SciML/RecursiveArrayTools.jl/blob/master/src/RecursiveArrayTools.jl#L27-L29
https://github.com/SciML/LinearSolve.jl/blob/8e31fd75dd10f478fca74fd46477f7b342590ab9/src/default.jl#L12
https://github.com/SciML/ExponentialUtilities.jl/blob/47f47b6f4ec95c3e88f94f2f41dd5f119f788f64/src/krylov_phiv.jl#L130
https://github.com/SciML/DiffEqNoiseProcess.jl/blob/03005350b6284fd8aaa8ca5374d990ea46bb3c80/src/wiener.jl#L19
https://github.com/SciML/DiffEqSensitivity.jl/blob/50fff85abca5452a90b155f8eb9833498d5b7aa4/src/concrete_solve.jl#L45

It would only need:

Top level things like GPUArrays.AbstractGPUArray
GPUArrays.@allowscalar

And the entirety of SciML would then cut out the LLVM dependency and 1 second off every package. Some packages only have a 1 second using time, so it's almost entirely GPUArrays that's the issue now, and just for those two things.

The text was updated successfully, but these errors were encountered:

oschulz · 2022-05-21T19:09:45Z

A more lightweight interface package would be awesome!

maleadt · 2022-05-22T07:53:51Z

What makes GPUArrays heavyweight? I'd rather somebody with some package load-time optimization experience has a look rather than we introduce yet another package just for the sake of reducing load time.

oschulz · 2022-05-22T08:43:52Z

What makes GPUArrays heavyweight?

Currently, GPUArrays has a load time of about 1.3 s, which makes it a "heavy" package:

julia> @time_imports import GPUArrays
     17.1 ms      ┌ Preferences
    273.6 ms    ┌ JLLWrappers
    278.5 ms  ┌ LLVMExtra_jll
      3.4 ms    ┌ CEnum
    134.3 ms  ┌ LLVM
      0.6 ms  ┌ Adapt
   1292.1 ms  GPUArrays

Of that, about 400 ms is LLVM and 900 ms is GPUArrays itself (which is a lot for a single package, we have few that are that heavy):

julia> @time_imports import LLVM, GPUArrays
     17.5 ms      ┌ Preferences
    277.5 ms    ┌ JLLWrappers
    282.5 ms  ┌ LLVMExtra_jll
      3.4 ms  ┌ CEnum
    417.2 ms  LLVM
      0.5 ms  ┌ Adapt
    883.3 ms  GPUArrays

I may be pessimistic, but I don't think any kind of load time optimization can bring that down to a level that an interface package would have. LLVM alone may not be reducible in load time and GPUArrays provides a lot of array function specializations.

oschulz · 2022-05-22T09:09:11Z

Maybe an inversion of dependencies could also help? Looking at @ChrisRackauckas' list above:

https://github.com/JuliaArrays/ArrayInterface.jl/blob/master/lib/ArrayInterfaceGPUArrays/src/ArrayInterfaceGPUArrays.jl

Now that we have a lightweight ArrayInterfaceCore.jl, maybe those specializations could be defined directly in GPUArrays.jl (which would then depend on ArrayInterfaceCore.jl)?

https://github.com/SciML/RecursiveArrayTools.jl/blob/master/src/RecursiveArrayTools.jl#L27-L29

I've proposed adding some sliced-array interfaces to ArrayInterfaceCore.jl (JuliaLang/julia#32310 (comment)), I hope to have a draft with usage examples ready in a few days. Maybe if that interface would cover some array-of-arrays functionality in general, the code in RecursiveArrayTools.jl#L27-L29 could also move into GPUArrays.jl (which would seem a natural place).

https://github.com/SciML/LinearSolve.jl/blob/8e31fd75dd10f478fca74fd46477f7b342590ab9/src/default.jl#L12

This is just b isa GPUArrays.AbstractGPUArray. I actually have "propose API to get backing device of objects" on my ToDo list, which would add AbstractComputingDevice, CPUDevice, AbstractGPUDevice and getdevice(x) to Adapt.jl. This would give us a common supertype for CuDevice and the others and allow b isa GPUArrays.AbstractGPUArray to be done as getdevice(b) isa AbstractGPUDevice.

https://github.com/SciML/ExponentialUtilities.jl/blob/47f47b6f4ec95c3e88f94f2f41dd5f119f788f64/src/krylov_phiv.jl#L130

ExponentialUtilities has a load time of about 120 ms on top of GPUArrays (about 1200 ms). It would make more sense for GPUArrays to depend on ExponentialUtilities than the other way round (current state). Maybe ExponentialUtilities itself could be lightened/optimized a bit?

https://github.com/SciML/DiffEqNoiseProcess.jl/blob/03005350b6284fd8aaa8ca5374d990ea46bb3c80/src/wiener.jl#L19

The @inline wiener_randn!(rng::AbstractRNG, rand_vec::GPUArrays.AbstractGPUArray) = randn!(rand_vec) seems like a bit of dirty hack, since it'll make results irreproducible when users pass an RNG explicitly. Maybe this needs to be solved differently in general (supporting rng for some types in rand for GPU arrays - I think there's Random123 support now? - and throwing an exception if the RNG type is not GPU compatible).

https://github.com/SciML/DiffEqSensitivity.jl/blob/50fff85abca5452a90b155f8eb9833498d5b7aa4/src/concrete_solve.jl#L45

This comes down to u0 isa GPUArrays.AbstractGPUArray again, so the same approach as proposed above (getdevice(u0)) would work. I'll see that I can prep that proposal for Adapt.jl soon, it would also solve JuliaArrays/ElasticArrays.jl#44 .

maleadt · 2022-05-22T09:39:01Z

What makes GPUArrays heavyweight?

Currently, GPUArrays has a load time of about 1.3 s, which makes it a "heavy" package:

I realize that, I meant that it is not clear to me what functionality in GPUArrays inflates load time. Similarly, LLVM.jl shouldn't take 400ms to load, so I'd rather we look into that first before throwing dependencies out, because at some point (i.e. when you do want to use GPUs) you have to incur the load time anyway.

ChrisRackauckas · 2022-05-22T09:44:01Z

because at some point (i.e. when you do want to use GPUs) you have to incur the load time anyway.

But for us, that's the slim-ist minority of users. We have those bits of code in there for the 1% of people who ever want to try a GPU for it to work. So in almost no cases would that load time actually be incurred, it's only incurred because the code for checking GPU support of functions also requires that you load the entirely of LLVM.jl and the beginnings of the GPU stack.

maleadt · 2022-05-23T08:58:49Z

Sure, but the fact remains that LLVM.jl is intended to be a low-level/lightweight dependency that many packages can rely on (e.g. LoopVectorization should probably use it instead of manually mangling intrinsics and getting it wrong as it recently did), so it'd be more valuable to optimize its load time first.

ChrisRackauckas · 2022-05-23T13:18:49Z

Splitting wouldn't take more than an hour or so and would be reversible in the future. That sounds like at least a year away. Instead of waiting much longer for that kind of intense fix, could a quick split be done to fix it now, and when LLVM.jl is lightweight return it?

maleadt · 2022-05-23T17:22:31Z

Yeah sure. I'm not opposed to a (temporary) split, I'd just like that the use case for GPU users is improved as well, and I don't have experience with load time latency optimization or the tools involved. So it'd be nice if somebody with such experience could have a look instead of papering over the immediate issue.

This was referenced May 21, 2022

ArrayInterfaceCore JuliaArrays/ArrayInterface.jl#211

Closed

Only require ArrayInterfaceCore SciML/RecursiveArrayTools.jl#204

Merged

oschulz mentioned this issue May 22, 2022

RFC: Establish concept of a computing device JuliaGPU/Adapt.jl#52

Closed

ChrisRackauckas mentioned this issue Jun 2, 2022

Removing some heavier dependencies SciML/Optimization.jl#239

Closed

maleadt mentioned this issue Jun 8, 2022

Move core functionality to GPUArraysCore. #411

Merged

maleadt linked a pull request Jun 8, 2022 that will close this issue

Move core functionality to GPUArraysCore. #411

Merged

maleadt closed this as completed in #411 Jun 8, 2022

maleadt added the enhancement label Aug 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPUArraysCore? #409

GPUArraysCore? #409

ChrisRackauckas commented May 21, 2022 •

edited

Loading

oschulz commented May 21, 2022

maleadt commented May 22, 2022

oschulz commented May 22, 2022 •

edited

Loading

oschulz commented May 22, 2022

maleadt commented May 22, 2022

ChrisRackauckas commented May 22, 2022

maleadt commented May 23, 2022

ChrisRackauckas commented May 23, 2022

maleadt commented May 23, 2022

GPUArraysCore? #409

GPUArraysCore? #409

Comments

ChrisRackauckas commented May 21, 2022 • edited Loading

oschulz commented May 21, 2022

maleadt commented May 22, 2022

oschulz commented May 22, 2022 • edited Loading

oschulz commented May 22, 2022

maleadt commented May 22, 2022

ChrisRackauckas commented May 22, 2022

maleadt commented May 23, 2022

ChrisRackauckas commented May 23, 2022

maleadt commented May 23, 2022

ChrisRackauckas commented May 21, 2022 •

edited

Loading

oschulz commented May 22, 2022 •

edited

Loading