Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored CUDA and Metal backends into extensions. #42

Merged
merged 7 commits into from
Nov 23, 2024

Conversation

anicusan
Copy link
Contributor

Trying to push for atomics on all KernelAbstractions.jl backends - really appreciate your help. I simplified the code structure in ext and test with the current Julia best practices I know.

The only problem with this extension-based structure is I now get:

┌ Warning: Circular dependency detected. Precompilation will be skipped for:
│   Base.PkgId(Base.UUID("66d79d19-2cc4-5b0b-ac7a-b340256d1ecd"), "LinearAlgebraExt")
│   Base.PkgId(Base.UUID("63c18a36-062a-441e-b654-da1e3ab1ce7c"), "KernelAbstractions")
│   Base.PkgId(Base.UUID("368c01e8-f8da-5dca-a1ea-818da1f33961"), "AtomixMetalExt")
│   Base.PkgId(Base.UUID("dde4c033-4e86-420c-a63e-0dd931031962"), "Metal")
└ @ Base.Precompilation precompilation.jl:511

Is this something to be fixed from this side, or from KernelAbstractions' side? The rest of the ecosystem using Atomix needs to be updated for the current Atomix = "1" version too, otherwise it very quickly spirals into dependency hell.

…d tests into simpler files. Added oneAPI backend. Added tests for extension finding.
…be installed on the GPU backends. Added oneAPI CI runner. TODO: put version back to 1.0.0 once they run, then update rest of ecosystem compat
@anicusan
Copy link
Contributor Author

Okay, the extensions are found and tests pass. Still, we get the circular dependency problem on the GPU CI too - I am not sure where/why/how to fix it, waiting on your input.

Afterwards, we can increment the version up to v1.0.0 once and for all, which will then need updating across the ecosystem (y'all are really doing titanic work).

Copy link

codecov bot commented Nov 11, 2024

Codecov Report

Attention: Patch coverage is 0% with 31 lines in your changes missing coverage. Please review.

Project coverage is 60.16%. Comparing base (e60c518) to head (4311cbb).
Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
ext/AtomixoneAPIExt.jl 0.00% 31 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main      #42       +/-   ##
===========================================
- Coverage   92.95%   60.16%   -32.79%     
===========================================
  Files           6        9        +3     
  Lines         142      236       +94     
===========================================
+ Hits          132      142       +10     
- Misses         10       94       +84     
Flag Coverage Δ
Pkg.test 60.16% <0.00%> (-32.79%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Project.toml Outdated Show resolved Hide resolved
Project.toml Outdated Show resolved Hide resolved
@vchuravy
Copy link
Member

Can you split it into two PRs? One for the rework to extensions and one for adding oneAPI?

@anicusan anicusan changed the title Added oneAPI backend, and refactored CUDA, Metal and oneAPI backends into extensions. Refactored CUDA and Metal backends into extensions. Nov 14, 2024
@anicusan
Copy link
Contributor Author

anicusan commented Nov 14, 2024

@vchuravy I left this PR for the extensions rework only. Once it is merged, I will create a oneAPI PR based on the new ext format.

@christiangnrd
Copy link
Contributor

The circular dependency is probably because the GPU backends depend on KernelAbstractions, which depends on Atomix. I don't know how to fix this with extensions that live in this repo.

@anicusan
Copy link
Contributor Author

Is it a case of having to update the [compat] versions in tandem?

@christiangnrd
Copy link
Contributor

Is it a case of having to update the [compat] versions in tandem?

I don't believe so. I tested this by reverting the Atomix version to 0.1.0 to avoid the compat issues.

@christiangnrd
Copy link
Contributor

christiangnrd commented Nov 20, 2024

I suspect this "cycle" issue may be fixed by JuliaLang/julia#55910 (comment). If true, that means that this should be working once 1.10.8 and 1.11.2 are released.

@maleadt maleadt merged commit b027bdd into JuliaConcurrent:main Nov 23, 2024
5 of 6 checks passed
@christiangnrd
Copy link
Contributor

christiangnrd commented Nov 23, 2024

Okay I compiled the backports-1.11 branch to test this out and I'm getting a different error.

Error:
│  ┌ Warning: Module Metal with build ID ffffffff-ffff-ffff-0000-ba908ccbce5a is missing from the cache.
│  │ This may mean Metal [dde4c033-4e86-420c-a63e-0dd931031962] does not support precompilation but is imported by a module that does.
│  └ @ Base loading.jl:2539
│  ┌ Error: Error during loading of extension AtomixMetalExt of Atomix, use `Base.retry_load_extensions()` to retry.
│  │   exception =
│  │    1-element ExceptionStack:
│  │    Error when precompiling module, potentially caused by a __precompile__(false) declaration in the module.
│  │    Stacktrace:
│  │      [1] _require(pkg::Base.PkgId, env::Nothing)
│  │        @ Base ./loading.jl:2601
│  │      [2] __require_prelocked(uuidkey::Base.PkgId, env::Nothing)
│  │        @ Base ./loading.jl:2386
│  │      [3] #invoke_in_world#3
│  │        @ ./essentials.jl:1089 [inlined]
│  │      [4] invoke_in_world
│  │        @ ./essentials.jl:1086 [inlined]
│  │      [5] _require_prelocked
│  │        @ ./loading.jl:2373 [inlined]
│  │      [6] _require_prelocked
│  │        @ ./loading.jl:2372 [inlined]
│  │      [7] run_extension_callbacks(extid::Base.ExtensionId)
│  │        @ Base ./loading.jl:1542
│  │      [8] run_extension_callbacks(pkgid::Base.PkgId)
│  │        @ Base ./loading.jl:1574
│  │      [9] run_package_callbacks(modkey::Base.PkgId)
│  │        @ Base ./loading.jl:1395
│  │     [10] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String, build_id::UInt128, stalecheck::Bool; reasons::Dict{String, Int64}, DEPOT_PATH::Vector{String})
│  │        @ Base ./loading.jl:2063
│  │     [11] _require(pkg::Base.PkgId, env::String)
│  │        @ Base ./loading.jl:2525
│  │     [12] __require_prelocked(uuidkey::Base.PkgId, env::String)
│  │        @ Base ./loading.jl:2386
│  │     [13] #invoke_in_world#3
│  │        @ ./essentials.jl:1089 [inlined]
│  │     [14] invoke_in_world
│  │        @ ./essentials.jl:1086 [inlined]
│  │     [15] _require_prelocked(uuidkey::Base.PkgId, env::String)
│  │        @ Base ./loading.jl:2373
│  │     [16] macro expansion
│  │        @ ./loading.jl:2312 [inlined]
│  │     [17] macro expansion
│  │        @ ./lock.jl:273 [inlined]
│  │     [18] __require(into::Module, mod::Symbol)
│  │        @ Base ./loading.jl:2269
│  │     [19] #invoke_in_world#3
│  │        @ ./essentials.jl:1089 [inlined]
│  │     [20] invoke_in_world
│  │        @ ./essentials.jl:1086 [inlined]
│  │     [21] require(into::Module, mod::Symbol)
│  │        @ Base ./loading.jl:2258
│  │     [22] include
│  │        @ ./Base.jl:557 [inlined]
│  │     [23] include_package_for_otput(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::Nothing)
│  │        @ Base ./loading.jl:2879
│  │     [24] top-level scope
│  │        @ stdin:5
│  │     [25] eval
│  │        @ ./boot.jl:430 [inlined]
│  │     [26] include_string(mapexpr::typeof(identity), mod::Module, code::String, filename::String)
│  │        @ Base ./loading.jl:2732
│  │     [27] include_string
│  │        @ ./loading.jl:2742 [inlined]
│  │     [28] exec_options(opts::Base.JLOptions)
│  │        @ Base ./client.jl:321
│  │     [29] _start()
│  │        @ Base ./client.jl:531
│  └ @ Base loading.jl:1548

I'm not familiar enough with precompilation to know if that's a Julia bug or a bug in this PR.

Tests still seem to pass though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants