[ROCDL] Use attached target on the GPU module when lowering GPU ops to ROCDL #110735

umangyadav · 2024-10-01T20:17:51Z

Currently convert-to-gpu-rocdl requires target specific information to setup all the conversion patterns and therefore it takes chipset as one of the option during construction.

rocdl-attach-target pass can run before convert-gpu-to-rocdl pass and attach relevant target information on the GPU Module.

CC : @krzysz00 @fabianmcg

llvmbot · 2024-10-01T20:18:24Z

@llvm/pr-subscribers-mlir-gpu

@llvm/pr-subscribers-mlir

Author: Umang Yadav (umangyadav)

Changes

Currently convert-to-gpu-rocdl requires target specific information to setup all the conversion patterns and therefore it takes chipset as one of the option during construction.

rocdl-attach-target pass can run before convert-gpu-to-rocdl pass and attach relevant target information on the GPU Module.

CC : @krzysz00

Full diff: https://github.com/llvm/llvm-project/pull/110735.diff

17 Files Affected:

(modified) mlir/include/mlir/Conversion/GPUToROCDL/GPUToROCDLPass.h (+1-1)
(modified) mlir/include/mlir/Conversion/Passes.td (+2-2)
(modified) mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp (+23-6)
(modified) mlir/test/Conversion/GPUCommon/lower-memory-space-attrs.mlir (+1-1)
(modified) mlir/test/Conversion/GPUCommon/memory-attrbution.mlir (+1-1)
(modified) mlir/test/Conversion/GPUCommon/memref-arg-attrs.mlir (+1-1)
(modified) mlir/test/Conversion/GPUCommon/memref-arg-noalias-attrs.mlir (+1-1)
(modified) mlir/test/Conversion/GPUCommon/memref-arg-noalias-warning.mlir (+1-1)
(modified) mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl-hip.mlir (+1-1)
(modified) mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl-opencl.mlir (+1-1)
(modified) mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl.mlir (+2-2)
(modified) mlir/test/Conversion/GPUToROCDL/memref.mlir (+2-2)
(modified) mlir/test/Integration/GPU/ROCM/gpu-to-hsaco.mlir (+1-1)
(modified) mlir/test/Integration/GPU/ROCM/printf.mlir (+1-1)
(modified) mlir/test/Integration/GPU/ROCM/two-modules.mlir (+1-1)
(modified) mlir/test/Integration/GPU/ROCM/vecadd.mlir (+1-1)
(modified) mlir/test/Integration/GPU/ROCM/vector-transferops.mlir (+1-1)

diff --git a/mlir/include/mlir/Conversion/GPUToROCDL/GPUToROCDLPass.h b/mlir/include/mlir/Conversion/GPUToROCDL/GPUToROCDLPass.h
index 5647787712997b..f1233ad894daf5 100644
--- a/mlir/include/mlir/Conversion/GPUToROCDL/GPUToROCDLPass.h
+++ b/mlir/include/mlir/Conversion/GPUToROCDL/GPUToROCDLPass.h
@@ -42,7 +42,7 @@ void configureGpuToROCDLConversionLegality(ConversionTarget &target);
 /// is configurable.
 std::unique_ptr<OperationPass<gpu::GPUModuleOp>>
 createLowerGpuOpsToROCDLOpsPass(
-    const std::string &chipset = "gfx900",
+    const std::string &chipset = "infer",
     unsigned indexBitwidth = kDeriveIndexBitwidthFromDataLayout,
     bool useBarePtrCallConv = false,
     gpu::amd::Runtime runtime = gpu::amd::Runtime::Unknown);
diff --git a/mlir/include/mlir/Conversion/Passes.td b/mlir/include/mlir/Conversion/Passes.td
index 4d272ba219c6f1..d536ee25a75637 100644
--- a/mlir/include/mlir/Conversion/Passes.td
+++ b/mlir/include/mlir/Conversion/Passes.td
@@ -592,8 +592,8 @@ def ConvertGpuOpsToROCDLOps : Pass<"convert-gpu-to-rocdl", "gpu::GPUModuleOp"> {
   ];
   let options = [
     Option<"chipset", "chipset", "std::string",
-           /*default=*/"\"gfx000\"",
-           "Chipset that these operations will run on">,
+           /*default=*/"\"infer\"",
+           "Chipset that these operations will run on. By Default it will infer target from attached target attribute on GPU module on which it operates">,
     Option<"indexBitwidth", "index-bitwidth", "unsigned",
            /*default=kDeriveIndexBitwidthFromDataLayout*/"0",
            "Bitwidth of the index type, 0 to use size of machine word">,
diff --git a/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp b/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
index 482c9e2c2d0017..83607d2c46a0f4 100644
--- a/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
+++ b/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
@@ -219,6 +219,29 @@ struct LowerGpuOpsToROCDLOpsPass
     gpu::GPUModuleOp m = getOperation();
     MLIRContext *ctx = m.getContext();
 
+    ArrayAttr targets = m.getTargetsAttr();
+    if (chipset == "infer") {
+      if (!targets) {
+        emitError(UnknownLoc::get(ctx),
+                  "ROCDLTargetAttr is empty on GPU module");
+        return signalPassFailure();
+      }
+      if (targets.size() != 1) {
+        emitError(UnknownLoc::get(ctx), "ROCDLTargetAttrs has more specified "
+                                        "more than one gpu-arch on GPU module");
+        return signalPassFailure();
+      }
+      const ROCDL::ROCDLTargetAttr targetAttr =
+          mlir::dyn_cast<ROCDL::ROCDLTargetAttr>(targets.getValue().front());
+      chipset = targetAttr.getChip().str();
+    }
+
+    FailureOr<amdgpu::Chipset> maybeChipset = amdgpu::Chipset::parse(chipset);
+    if (failed(maybeChipset)) {
+      emitError(UnknownLoc::get(ctx), "Invalid chipset name: " + chipset);
+      return signalPassFailure();
+    }
+
     auto llvmDataLayout = m->getAttrOfType<StringAttr>(
         LLVM::LLVMDialect::getDataLayoutAttrName());
     if (!llvmDataLayout) {
@@ -231,12 +254,6 @@ struct LowerGpuOpsToROCDLOpsPass
                     UnitAttr::get(ctx));
     }
 
-    FailureOr<amdgpu::Chipset> maybeChipset = amdgpu::Chipset::parse(chipset);
-    if (failed(maybeChipset)) {
-      emitError(UnknownLoc::get(ctx), "Invalid chipset name: " + chipset);
-      return signalPassFailure();
-    }
-
     /// Customize the bitwidth used for the device side index computations.
     LowerToLLVMOptions options(
         ctx, DataLayout(cast<DataLayoutOpInterface>(m.getOperation())));
diff --git a/mlir/test/Conversion/GPUCommon/lower-memory-space-attrs.mlir b/mlir/test/Conversion/GPUCommon/lower-memory-space-attrs.mlir
index 771f3185904bb8..a338d35525eba7 100644
--- a/mlir/test/Conversion/GPUCommon/lower-memory-space-attrs.mlir
+++ b/mlir/test/Conversion/GPUCommon/lower-memory-space-attrs.mlir
@@ -1,4 +1,4 @@
-// RUN: mlir-opt %s -split-input-file -convert-gpu-to-rocdl | FileCheck %s --check-prefixes=CHECK,ROCDL
+// RUN: mlir-opt %s -split-input-file -convert-gpu-to-rocdl='chipset=gfx900' | FileCheck %s --check-prefixes=CHECK,ROCDL
 // RUN: mlir-opt %s -split-input-file -convert-gpu-to-nvvm | FileCheck %s --check-prefixes=CHECK,NVVM
 
 gpu.module @kernel {
diff --git a/mlir/test/Conversion/GPUCommon/memory-attrbution.mlir b/mlir/test/Conversion/GPUCommon/memory-attrbution.mlir
index 4fc19b8e93646c..b1291e07c060b5 100644
--- a/mlir/test/Conversion/GPUCommon/memory-attrbution.mlir
+++ b/mlir/test/Conversion/GPUCommon/memory-attrbution.mlir
@@ -1,5 +1,5 @@
 // RUN: mlir-opt -allow-unregistered-dialect --convert-gpu-to-nvvm --split-input-file %s | FileCheck --check-prefix=NVVM %s
-// RUN: mlir-opt -allow-unregistered-dialect --convert-gpu-to-rocdl --split-input-file %s | FileCheck --check-prefix=ROCDL %s
+// RUN: mlir-opt -allow-unregistered-dialect --convert-gpu-to-rocdl='chipset=gfx900' --split-input-file %s | FileCheck --check-prefix=ROCDL %s
 
 gpu.module @kernel {
   // NVVM-LABEL:  llvm.func @private
diff --git a/mlir/test/Conversion/GPUCommon/memref-arg-attrs.mlir b/mlir/test/Conversion/GPUCommon/memref-arg-attrs.mlir
index e7c742067b4eb5..3c3082c473896e 100644
--- a/mlir/test/Conversion/GPUCommon/memref-arg-attrs.mlir
+++ b/mlir/test/Conversion/GPUCommon/memref-arg-attrs.mlir
@@ -1,4 +1,4 @@
-// RUN: mlir-opt %s -split-input-file -convert-gpu-to-rocdl='use-bare-ptr-memref-call-conv=0' | FileCheck %s --check-prefixes=CHECK,ROCDL
+// RUN: mlir-opt %s -split-input-file -convert-gpu-to-rocdl='chipset=gfx900 use-bare-ptr-memref-call-conv=0' | FileCheck %s --check-prefixes=CHECK,ROCDL
 // RUN: mlir-opt %s -split-input-file -convert-gpu-to-nvvm='use-bare-ptr-memref-call-conv=0' | FileCheck %s --check-prefixes=CHECK,NVVM
 
 gpu.module @kernel {
diff --git a/mlir/test/Conversion/GPUCommon/memref-arg-noalias-attrs.mlir b/mlir/test/Conversion/GPUCommon/memref-arg-noalias-attrs.mlir
index 33cdc3348e5137..d17214d1f2299b 100644
--- a/mlir/test/Conversion/GPUCommon/memref-arg-noalias-attrs.mlir
+++ b/mlir/test/Conversion/GPUCommon/memref-arg-noalias-attrs.mlir
@@ -1,4 +1,4 @@
-// RUN: mlir-opt %s -split-input-file -convert-gpu-to-rocdl='use-bare-ptr-memref-call-conv=1' | FileCheck %s --check-prefixes=CHECK,ROCDL
+// RUN: mlir-opt %s -split-input-file -convert-gpu-to-rocdl='chipset=gfx900 use-bare-ptr-memref-call-conv=1' | FileCheck %s --check-prefixes=CHECK,ROCDL
 // RUN: mlir-opt %s -split-input-file -convert-gpu-to-nvvm='use-bare-ptr-memref-call-conv=1' | FileCheck %s --check-prefixes=CHECK,NVVM
 
 gpu.module @kernel {
diff --git a/mlir/test/Conversion/GPUCommon/memref-arg-noalias-warning.mlir b/mlir/test/Conversion/GPUCommon/memref-arg-noalias-warning.mlir
index 793df7380d78bd..ab98be59a2c87e 100644
--- a/mlir/test/Conversion/GPUCommon/memref-arg-noalias-warning.mlir
+++ b/mlir/test/Conversion/GPUCommon/memref-arg-noalias-warning.mlir
@@ -1,4 +1,4 @@
-// RUN: mlir-opt %s -split-input-file -convert-gpu-to-rocdl='use-bare-ptr-memref-call-conv=0' -verify-diagnostics
+// RUN: mlir-opt %s -split-input-file -convert-gpu-to-rocdl='chipset=gfx900 use-bare-ptr-memref-call-conv=0' -verify-diagnostics
 
 gpu.module @kernel {
 // expected-warning @+1 {{Cannot copy noalias with non-bare pointers.}}
diff --git a/mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl-hip.mlir b/mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl-hip.mlir
index 1b904fa142bad3..3e3b43c6d4f493 100644
--- a/mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl-hip.mlir
+++ b/mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl-hip.mlir
@@ -1,4 +1,4 @@
-// RUN: mlir-opt %s -convert-gpu-to-rocdl='runtime=HIP' -split-input-file | FileCheck %s
+// RUN: mlir-opt %s -convert-gpu-to-rocdl='chipset=gfx900 runtime=HIP' -split-input-file | FileCheck %s
 
 gpu.module @test_module {
   // CHECK-DAG: llvm.mlir.global internal constant @[[$PRINT_GLOBAL0:[A-Za-z0-9_]+]]("Hello, world\0A\00")
diff --git a/mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl-opencl.mlir b/mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl-opencl.mlir
index 870f5c5016ecef..fa01801972d6a4 100644
--- a/mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl-opencl.mlir
+++ b/mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl-opencl.mlir
@@ -1,4 +1,4 @@
-// RUN: mlir-opt %s -convert-gpu-to-rocdl='runtime=OpenCL' | FileCheck %s
+// RUN: mlir-opt %s -convert-gpu-to-rocdl='chipset=gfx900 runtime=OpenCL' | FileCheck %s
 
 gpu.module @test_module {
   // CHECK: llvm.mlir.global internal constant @[[$PRINT_GLOBAL:[A-Za-z0-9_]+]]("Hello: %d\0A\00")  {addr_space = 4 : i32}
diff --git a/mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl.mlir b/mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl.mlir
index 0d3e9f4ea2bf39..ca827e9acffe3b 100644
--- a/mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl.mlir
+++ b/mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl.mlir
@@ -1,5 +1,5 @@
-// RUN: mlir-opt %s -convert-gpu-to-rocdl -split-input-file | FileCheck %s
-// RUN: mlir-opt %s -convert-gpu-to-rocdl='index-bitwidth=32' -split-input-file | FileCheck --check-prefix=CHECK32 %s
+// RUN: mlir-opt %s -convert-gpu-to-rocdl='chipset=gfx900' -split-input-file | FileCheck %s
+// RUN: mlir-opt %s -convert-gpu-to-rocdl='chipset=gfx900 index-bitwidth=32' -split-input-file | FileCheck --check-prefix=CHECK32 %s
 
 // CHECK-LABEL: @test_module
 // CHECK-SAME: llvm.data_layout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9"
diff --git a/mlir/test/Conversion/GPUToROCDL/memref.mlir b/mlir/test/Conversion/GPUToROCDL/memref.mlir
index e645481c892308..debf899dd68742 100644
--- a/mlir/test/Conversion/GPUToROCDL/memref.mlir
+++ b/mlir/test/Conversion/GPUToROCDL/memref.mlir
@@ -1,6 +1,6 @@
-// RUN: mlir-opt %s -convert-gpu-to-rocdl -split-input-file | FileCheck %s
+// RUN: mlir-opt %s -convert-gpu-to-rocdl='chipset=gfx900' -split-input-file | FileCheck %s
 // RUN: mlir-opt %s \
-// RUN:   -convert-gpu-to-rocdl='use-bare-ptr-memref-call-conv=true' \
+// RUN:   -convert-gpu-to-rocdl='chipset=gfx900 use-bare-ptr-memref-call-conv=true' \
 // RUN:   -split-input-file \
 // RUN: | FileCheck %s --check-prefix=BARE
 
diff --git a/mlir/test/Integration/GPU/ROCM/gpu-to-hsaco.mlir b/mlir/test/Integration/GPU/ROCM/gpu-to-hsaco.mlir
index 3c8f3b1d0cbf4b..edb75ee81224ef 100644
--- a/mlir/test/Integration/GPU/ROCM/gpu-to-hsaco.mlir
+++ b/mlir/test/Integration/GPU/ROCM/gpu-to-hsaco.mlir
@@ -1,6 +1,6 @@
 // RUN: mlir-opt %s \
 // RUN: | mlir-opt -gpu-kernel-outlining \
-// RUN: | mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-rocdl),rocdl-attach-target{chip=%chip})' \
+// RUN: | mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo),rocdl-attach-target{chip=%chip}, gpu.module(convert-gpu-to-rocdl))' \
 // RUN: | mlir-opt -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary \
 // RUN: | mlir-cpu-runner \
 // RUN:   --shared-libs=%mlir_rocm_runtime \
diff --git a/mlir/test/Integration/GPU/ROCM/printf.mlir b/mlir/test/Integration/GPU/ROCM/printf.mlir
index d5e6e3757540b2..e8feeaa69c2907 100644
--- a/mlir/test/Integration/GPU/ROCM/printf.mlir
+++ b/mlir/test/Integration/GPU/ROCM/printf.mlir
@@ -1,5 +1,5 @@
 // RUN: mlir-opt %s \
-// RUN: | mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-rocdl{index-bitwidth=32 runtime=HIP}),rocdl-attach-target{chip=%chip})' \
+// RUN: | mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo),rocdl-attach-target{chip=%chip}, gpu.module(convert-gpu-to-rocdl{index-bitwidth=32 runtime=HIP}))' \
 // RUN: | mlir-opt -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary \
 // RUN: | mlir-cpu-runner \
 // RUN:   --shared-libs=%mlir_rocm_runtime \
diff --git a/mlir/test/Integration/GPU/ROCM/two-modules.mlir b/mlir/test/Integration/GPU/ROCM/two-modules.mlir
index d49d3957abbe96..d20f71d162800c 100644
--- a/mlir/test/Integration/GPU/ROCM/two-modules.mlir
+++ b/mlir/test/Integration/GPU/ROCM/two-modules.mlir
@@ -1,6 +1,6 @@
 // RUN: mlir-opt %s \
 // RUN: | mlir-opt -gpu-kernel-outlining \
-// RUN: | mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-rocdl),rocdl-attach-target{chip=%chip})' \
+// RUN: | mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo),rocdl-attach-target{chip=%chip}, gpu.module(convert-gpu-to-rocdl))' \
 // RUN: | mlir-opt -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary \
 // RUN: | mlir-cpu-runner \
 // RUN:   --shared-libs=%mlir_rocm_runtime \
diff --git a/mlir/test/Integration/GPU/ROCM/vecadd.mlir b/mlir/test/Integration/GPU/ROCM/vecadd.mlir
index 986d8239427e3c..0ac391cd5f8e3b 100644
--- a/mlir/test/Integration/GPU/ROCM/vecadd.mlir
+++ b/mlir/test/Integration/GPU/ROCM/vecadd.mlir
@@ -1,7 +1,7 @@
 // RUN: mlir-opt %s \
 // RUN: | mlir-opt -convert-scf-to-cf \
 // RUN: | mlir-opt -gpu-kernel-outlining \
-// RUN: | mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-rocdl{use-bare-ptr-memref-call-conv=true}),rocdl-attach-target{chip=%chip})' \
+// RUN: | mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo),rocdl-attach-target{chip=%chip}, gpu.module(convert-gpu-to-rocdl{use-bare-ptr-memref-call-conv=true}))' \
 // RUN: | mlir-opt -gpu-to-llvm=use-bare-pointers-for-kernels=true -reconcile-unrealized-casts -gpu-module-to-binary \
 // RUN: | mlir-cpu-runner \
 // RUN:   --shared-libs=%mlir_rocm_runtime \
diff --git a/mlir/test/Integration/GPU/ROCM/vector-transferops.mlir b/mlir/test/Integration/GPU/ROCM/vector-transferops.mlir
index 575d967dcc9a23..417f67e64669ed 100644
--- a/mlir/test/Integration/GPU/ROCM/vector-transferops.mlir
+++ b/mlir/test/Integration/GPU/ROCM/vector-transferops.mlir
@@ -1,7 +1,7 @@
 // RUN: mlir-opt %s \
 // RUN: | mlir-opt -convert-scf-to-cf \
 // RUN: | mlir-opt -gpu-kernel-outlining \
-// RUN: | mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-rocdl{chipset=%chip index-bitwidth=32}),rocdl-attach-target{chip=%chip})' \
+// RUN: | mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo),rocdl-attach-target{chip=%chip}, gpu.module(convert-gpu-to-rocdl{index-bitwidth=32}))' \
 // RUN: | mlir-opt -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary \
 // RUN: | mlir-cpu-runner \
 // RUN:   --shared-libs=%mlir_rocm_runtime \

fabianmcg · 2024-10-01T20:35:51Z

Thank you for this @umangyadav !
I'll take a closer look tomorrow. In the meantime, could you add tests where convert-gpu-to-rocdl uses infer to verify that it functions as expected.

krzysz00 · 2024-10-01T22:41:35Z

As for my review, this overall seems like a good idea.

The one thing I'd like to get a touch of consensus on is whether this will help with solving the target information issues in the long run (ex. letting something like --convert-to-llvm pick up the data layout)

umangyadav · 2024-10-02T14:21:01Z

Thank you for this @umangyadav ! I'll take a closer look tomorrow. In the meantime, could you add tests where convert-gpu-to-rocdl uses infer to verify that it functions as expected.

added a few tests

fabianmcg

In general LGTM

mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp

fabianmcg · 2024-10-04T14:05:23Z

The one thing I'd like to get a touch of consensus on is whether this will help with solving the target information issues in the long run (ex. letting something like --convert-to-llvm pick up the data layout)

@krzysz00 not really, the infer mechanism is not compatible with with how convert-to-llvm works.

…ests

krzysz00 · 2024-10-23T15:27:34Z

@fabianmcg Now that I'm back from vacation, care to expand on the infer/--convert-to-llvm incompatibility?

I'd think we could implement MLIR data layout stuff for the target attribute

fabianmcg · 2024-10-23T16:13:32Z

I'm currently at LLVM dev, I'll get back to this next week.

umangyadav · 2024-11-06T14:32:48Z

I'm currently at LLVM dev, I'll get back to this next week.

Ping @fabianmcg @krzysz00

fabianmcg · 2024-11-06T15:30:16Z

Apologies, I forgot.

@fabianmcg Now that I'm back from vacation, care to expand on the infer/--convert-to-llvm incompatibility?

I'd think we could implement MLIR data layout stuff for the target attribute

The main source of incompatibility with convert-to-llvm is that the pass doesn't take options and uses Pass::initialize to get all the patterns from the loaded dialects. Therefore it has no view of the IR and cannot check what's the target in the IR. At the conference Mehdi agreed that's an issue, now we need to find a performant way we all agree to solve the issue in convert-to-llvm.

So, I don't have any opposition to this patch if it helps downstream, but it won't work right away with convert-to-llvm until we fix it.

umangyadav · 2024-11-06T17:11:08Z

So, I don't have any opposition to this patch if it helps downstream, but it won't work right away with convert-to-llvm until we fix it.

Agreed, that this patch doesn't help in anyway right away. I have some changes here where convert-to-llvm reads DLTI information to read target related information to setup convert-to-llvm

fabianmcg · 2024-11-06T18:38:25Z

I have some changes here where convert-to-llvm reads DLTI information to read target related information to setup convert-to-llvm

What's "here"?

I already submitted a patch upstream that enabled reading the target from convert-to-llvm. The problem is that right now we don't agree upstream on whether the performance hit of moving initialization from Pass::initialize to Pass::runOn... is worth it.

umangyadav · 2024-11-06T19:35:31Z

Here : ROCm/rocMLIR#1679

fabianmcg · 2024-11-06T22:47:39Z

Here : ROCm/rocMLIR#1679

Yes, I did something similar #99566 and it got blocked because of performance constraints.

However, we reached some consensus at LLVM dev that an improvement was needed.

llvmbot added mlir:gpu mlir labels Oct 1, 2024

umangyadav changed the title ~~Use attached target on the GPU module when lowering GPU ops to ROCDL~~ [ROCDL] Use attached target on the GPU module when lowering GPU ops to ROCDL Oct 1, 2024

fabianmcg requested review from fabianmcg and krzysz00 October 1, 2024 20:33

umangyadav force-pushed the infer_target branch 3 times, most recently from 5e6c6f5 to 3e3d297 Compare October 2, 2024 14:20

fabianmcg reviewed Oct 4, 2024

View reviewed changes

umangyadav and others added 3 commits October 4, 2024 14:31

Use attached target on the GPU module when lowering GPU ops to ROCDL

ba8efb5

Add test for reading attached target

693afaa

Address review comments from fabian and update error strings in the t…

4c18e8a

…ests

umangyadav force-pushed the infer_target branch from 6dd2dbc to 4c18e8a Compare October 4, 2024 14:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCDL] Use attached target on the GPU module when lowering GPU ops to ROCDL #110735

[ROCDL] Use attached target on the GPU module when lowering GPU ops to ROCDL #110735

umangyadav commented Oct 1, 2024 •

edited

Loading

llvmbot commented Oct 1, 2024 •

edited

Loading

fabianmcg commented Oct 1, 2024

krzysz00 commented Oct 1, 2024

umangyadav commented Oct 2, 2024

fabianmcg left a comment

fabianmcg commented Oct 4, 2024

krzysz00 commented Oct 23, 2024

fabianmcg commented Oct 23, 2024

umangyadav commented Nov 6, 2024

fabianmcg commented Nov 6, 2024

umangyadav commented Nov 6, 2024 •

edited

Loading

fabianmcg commented Nov 6, 2024

umangyadav commented Nov 6, 2024

fabianmcg commented Nov 6, 2024

[ROCDL] Use attached target on the GPU module when lowering GPU ops to ROCDL #110735

Are you sure you want to change the base?

[ROCDL] Use attached target on the GPU module when lowering GPU ops to ROCDL #110735

Conversation

umangyadav commented Oct 1, 2024 • edited Loading

llvmbot commented Oct 1, 2024 • edited Loading

fabianmcg commented Oct 1, 2024

krzysz00 commented Oct 1, 2024

umangyadav commented Oct 2, 2024

fabianmcg left a comment

Choose a reason for hiding this comment

fabianmcg commented Oct 4, 2024

krzysz00 commented Oct 23, 2024

fabianmcg commented Oct 23, 2024

umangyadav commented Nov 6, 2024

fabianmcg commented Nov 6, 2024

umangyadav commented Nov 6, 2024 • edited Loading

fabianmcg commented Nov 6, 2024

umangyadav commented Nov 6, 2024

fabianmcg commented Nov 6, 2024

umangyadav commented Oct 1, 2024 •

edited

Loading

llvmbot commented Oct 1, 2024 •

edited

Loading

umangyadav commented Nov 6, 2024 •

edited

Loading