(WIP) Batched autodiff #2181

jumerckx · 2024-11-28T15:56:02Z

Added some type conversions to tensor types if width != 1. The simple test case seems correct now.

TODO:

write a proper function to do the batched type conversion and use this instead in the tblgen generator (and elsewhere).
batched autodiff of tensoroperations: add an extra dimension like in the batchpass.
test: function with multiple blocks? I currently convert the block arguments of the entry block but don't do anything more.
...

This still requires changes in the tblgenerated derivative files. For example, createForwardModeTangent in MulFOpFwdDerivative could be altered like this: ``` LogicalResult createForwardModeTangent(Operation *op0, OpBuilder &builder, MGradientUtils *gutils) const { auto op = cast<arith::MulFOp>(op0); if (gutils->width != 1) { auto newop = gutils->getNewFromOriginal(op0); for (auto res : newop->getResults()) { res.setType(mlir::RankedTensorType::get({gutils->width}, res.getType())); } } gutils->eraseIfUnused(op); if (gutils->isConstantInstruction(op)) return success(); mlir::Value res = nullptr; if (!gutils->isConstantValue(op->getOperand(0))) { auto dif = gutils->invertPointerM(op->getOperand(0), builder); { mlir::Value itmp = ({ // Computing MulFOp auto fwdarg_0 = dif; dif.dump(); // TODO: gutils->makeBatched(...) auto fwdarg_1 = gutils->getNewFromOriginal(op->getOperand(1)); builder.create<arith::MulFOp>(op.getLoc(), fwdarg_0, fwdarg_1); }); itmp.dump(); if (!res) res = itmp; else { auto operandType = cast<AutoDiffTypeInterface>(res.getType()); res = operandType.createAddOp(builder, op.getLoc(), res, itmp); } } } if (!gutils->isConstantValue(op->getOperand(1))) { auto dif = gutils->invertPointerM(op->getOperand(1), builder); { mlir::Value itmp = ({ // Computing MulFOp auto fwdarg_0 = dif; dif.dump(); auto fwdarg_1 = gutils->getNewFromOriginal(op->getOperand(0)); builder.create<arith::MulFOp>(op.getLoc(), fwdarg_0, fwdarg_1); }); if (!res) res = itmp; else { auto operandType = cast<AutoDiffTypeInterface>(res.getType()); res = operandType.createAddOp(builder, op.getLoc(), res, itmp); } } } assert(res); gutils->setDiffe(op->getResult(0), res, builder); return success(); } ```

…nction call.

wsmoses · 2024-11-28T15:58:48Z

enzyme/Enzyme/MLIR/Interfaces/CloneFunction.cpp

@@ -27,7 +27,11 @@ getFunctionTypeForClone(mlir::FunctionType FTy, DerivativeMode mode,
  for (auto &&[Ty, returnPrimal, returnShadow, activity] : llvm::zip(
           FTy.getResults(), returnPrimals, returnShadows, ReturnActivity)) {
    if (returnPrimal) {
-      RetTypes.push_back(Ty);
+      if (width != 1) {
+        RetTypes.push_back(mlir::RankedTensorType::get({width}, Ty));


This shouldn’t need changing since the primal is always unmodified, only Derivatives are changed (and we should be pushing the getshadow types for those below)

Oh, then I'm confused of what batched autodiff is.
How should my testcase change?

Nvm, it clicked. It's just the shadow that's batched 😅

so here's an example from llvm vector mode for example: https://github.com/EnzymeAD/Enzyme/blob/main/enzyme/test/Enzyme/ForwardModeVector/add.ll

tho perhaps mul will be more illustrative, https://github.com/EnzymeAD/Enzyme/blob/main/enzyme/test/Enzyme/ForwardModeVector/mul.ll (and obviously feel free to look at any/all of the other examples

jumerckx · 2024-12-02T10:16:35Z

I haven't yet fully made the changes in enzyme-tblgen.cpp, and either way this just works for the simple test case.
But I added the following manually in ArithDerivatives.inc.

mlir::Value itmp = ({
  // Computing MulFOp
  auto fwdarg_0 = dif;
  auto fwdarg_1 = gutils->getNewFromOriginal(op->getOperand(1));
  if (gutils->width != 1)
  {
    fwdarg_1 = builder.create<tensor::SplatOp>(
        op.getLoc(),
        mlir::RankedTensorType::get({gutils->width},
                                    fwdarg_1.getType()),
        fwdarg_1);
  }
  builder.create<arith::MulFOp>(op.getLoc(), fwdarg_0, fwdarg_1);
});

But this is the MLIR code that is generated for this simple test:

  func.func private @fwddiffe2square(%arg0: f64, %arg1: tensor<2xf64>) -> tensor<2xf64> {
    %splat = tensor.splat %arg0 : tensor<2xf64>
    %0 = arith.mulf %arg1, %splat : tensor<2xf64>
    %splat_0 = tensor.splat %arg0 : tensor<2xf64>
    %1 = arith.mulf %arg1, %splat_0 : tensor<2xf64>
    %2 = arith.addf %0, %1 : tensor<2xf64>
    %3 = arith.mulf %arg0, %arg0 : f64
    return %2 : tensor<2xf64>
  }

jumerckx added 3 commits November 28, 2024 15:50

add code to tblgen generator, this eventually needs to be a single fu…

a659455

…nction call.

a test and formatting

1541436

wsmoses reviewed Nov 28, 2024

View reviewed changes

use tensor splatop

a501cb6

remove stale enzyme-tblgen changes

2cb5765

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(WIP) Batched autodiff #2181

(WIP) Batched autodiff #2181

jumerckx commented Nov 28, 2024 •

edited

Loading

wsmoses Nov 28, 2024

jumerckx Nov 28, 2024

jumerckx Nov 28, 2024

wsmoses Nov 28, 2024

wsmoses Nov 28, 2024

jumerckx commented Dec 2, 2024

(WIP) Batched autodiff #2181

Are you sure you want to change the base?

(WIP) Batched autodiff #2181

Conversation

jumerckx commented Nov 28, 2024 • edited Loading

wsmoses Nov 28, 2024

Choose a reason for hiding this comment

jumerckx Nov 28, 2024

Choose a reason for hiding this comment

jumerckx Nov 28, 2024

Choose a reason for hiding this comment

wsmoses Nov 28, 2024

Choose a reason for hiding this comment

wsmoses Nov 28, 2024

Choose a reason for hiding this comment

jumerckx commented Dec 2, 2024

jumerckx commented Nov 28, 2024 •

edited

Loading