Merge pull request #8 from rjiejie/master

xtheadvdot: Add new extension
XUANTIE-RV · Dec 4, 2022 · 29c18da · 29c18da
2 parents c295bde + a9948e4
commit 29c18da
Show file tree

Hide file tree

Showing 10 changed files with 384 additions and 0 deletions.
diff --git a/intro.adoc b/intro.adoc
@@ -34,6 +34,7 @@ The collection consists of the following ISA extensions:
 * `XTheadMemPair` provides two-GP-register memory operations.
 * `XTheadFMemIdx` provides floating-point memory operations.
 * `XTheadMac` provides multiply-accumulate instructions.
+* `XTheadVdot` provides instructions for vector dot.
 
 === Dependencies to standard extensions
 

diff --git a/xthead.adoc b/xthead.adoc
@@ -46,3 +46,4 @@ include::xtheadfmemidx.adoc[]
 include::xtheadmac.adoc[]
 include::xtheadfmv.adoc[]
 include::xtheadint.adoc[]
+include::xtheadvdot.adoc[]
diff --git a/xtheadvdot.adoc b/xtheadvdot.adoc
@@ -0,0 +1,39 @@
+[#xtheadvdot]
+== Vector four 8-bit multiply and add with 32-bit instructions
+
+[NOTE,caption=Frozen]
+The `XTheadVdot` extension is `stable`.
+
+The `XTheadVdot` ISA extension provides vector integer four 8-bit multiply and add with 32-bit element intructions.
+
+This extension depends on the availability of the `V` (vector) ISA extension.
+
+The table below gives an overview of the instructions:
+
+[cols="^3,^3,12,18",stripes=even,options="header"]
+|===
+| RV32 | RV64 | Mnemonic              | Instruction
+| Y    | Y    | th.vmaqa.vv    _vd_, _vs1_, _vs2_ | <<#xtheadvdot-insns-vmaqa-vv>>
+| Y    | Y    | th.vmaqa.vx    _vd_, _rs1_, _vs2_ | <<#xtheadvdot-insns-vmaqa-vx>>
+| Y    | Y    | th.vmaqau.vv   _vd_, _vs1_, _vs2_ | <<#xtheadvdot-insns-vmaqau-vv>>
+| Y    | Y    | th.vmaqau.vx   _vd_, _rs1_, _vs2_ | <<#xtheadvdot-insns-vmaqau-vx>>
+| Y    | Y    | th.vmaqasu.vv  _vd_, _vs1_, _vs2_ | <<#xtheadvdot-insns-vmaqasu-vv>>
+| Y    | Y    | th.vmaqasu.vx  _vd_, _rs1_, _vs2_ | <<#xtheadvdot-insns-vmaqasu-vx>>
+| Y    | Y    | th.vmaqaus.vx  _vd_, _rs1_, _vs2_ | <<#xtheadvdot-insns-vmaqaus-vx>>
+|===
+
+[#xtheadvdot-insns,reftext="Instructions"]
+=== Instructions
+include::xtheadvdot/vmaqa_vv.adoc[]
+<<<
+include::xtheadvdot/vmaqa_vx.adoc[]
+<<<
+include::xtheadvdot/vmaqau_vv.adoc[]
+<<<
+include::xtheadvdot/vmaqau_vx.adoc[]
+<<<
+include::xtheadvdot/vmaqasu_vv.adoc[]
+<<<
+include::xtheadvdot/vmaqasu_vx.adoc[]
+<<<
+include::xtheadvdot/vmaqaus_vx.adoc[]
diff --git a/xtheadvdot/vmaqa_vv.adoc b/xtheadvdot/vmaqa_vv.adoc
@@ -0,0 +1,49 @@
+[#xtheadvdot-insns-vmaqa-vv,reftext=Four signed 8-bit multiply with 32-bit add(vector-vector)]
+==== th.vmaqa.vv
+
+Synopsis::
+Four signed 8-bit multiply with 32-bit add.
+
+Mnemonic::
+th.vmaqa.vv _vd_, _vs1_, _vs2_
+
+Encoding::
+[wavedrom, , svg]
+....
+{reg:[
+    { bits:  7, name: 0xb, attr: ['custom-0, 32 bit'] },
+    { bits:  5, name: 'vd' },
+    { bits:  3, name: 0x6, attr: ['vmaqa'] },
+    { bits:  5, name: 'vs1' },
+    { bits:  5, name: 'vs2' },
+    { bits:  1, name: 'vm' },
+    { bits:  6, name: '0x20' },
+]}
+....
+
+Description::
+
+The four signed 8-bit elements of 32-bit of vs1 are multiplied with the four signed 8-bit elements of 32-bit of vs2 and then the four results are added together with the corresponding 32-bit element of Vd. This instruction is based on vector extension.The vector masking  operates at source operands with 8-bit element size. If vm=1, the instruction is unmasked and the instruction is vmaqa.vv vd, vs1, vs2. If vm=0, the instruction is masked and the instruction is vmaqa.vv vd, vs1, vs2, v0.t. When v0.mask[i] is 1, the multiplication result of vs1[(i+1)*8:i*8] and vs2[(i+1)*8:i*8] is added with vd.The vector length(vl)  operates at destination operands with 32-bit element size. 
+Operation::
+[source,sail]
+--
+vd[i] = vd[i] + vs1[i][7:0]   * vs2[i][7:0] 
+              + vs1[i][15:8]  * vs2[i][15:8] 
+              + vs1[i][23:16] * vs2[i][23:16] 
+              + vs1[i][31:24] * vs2[i][31:24]   
+--
+
+Permission::
+This instruction can be executed in all privilege levels.
+
+Exceptions::
+This instruction triggers the same exceptions that a `vmacc.vv` instructions would trigger except that the value of vsew[2:0] must be 3'b010.
+
+Included in::
+[%header]
+|===
+|Extension
+
+|XTheadvdot (<<#xtheadvdot>>)
+|===
+
diff --git a/xtheadvdot/vmaqa_vx.adoc b/xtheadvdot/vmaqa_vx.adoc
@@ -0,0 +1,49 @@
+[#xtheadvdot-insns-vmaqa-vx,reftext=Four signed 8-bit multiply with 32-bit add(vector-scalar)]
+==== th.vmaqa.vx
+
+Synopsis::
+Four signed 8-bit multiply with 32-bit add.
+
+Mnemonic::
+th.vmaqa.vx _vd_, _rs1_, _vs2_
+
+Encoding::
+[wavedrom, , svg]
+....
+{reg:[
+    { bits:  7, name: 0xb, attr: ['custom-0, 32 bit'] },
+    { bits:  5, name: 'vd' },
+    { bits:  3, name: 0x6, attr: ['vmaqa'] },
+    { bits:  5, name: 'rs1' },
+    { bits:  5, name: 'vs2' },
+    { bits:  1, name: 'vm' },
+    { bits:  6, name: '0x21' },
+]}
+....
+
+Description::
+
+The four signed 8-bit elements of the lower 32-bit of rs1 are multiplied with the four signed 8-bit elements of each 32-bit of vs2 and then the four results are added together with the corresponding 32-bit element of Vd. This instruction is based on vector extension.The vector masking  operates at source operands with 8-bit element size. If vm=1, the instruction is unmasked and the instruction is vmaqa.vx vd, rs1, vs2. If vm=0, the instruction is masked and the instruction is vmaqa.vx vd, rs1, vs2, v0.t. When v0.mask[i] is 1, the multiplication result of rs1[(i+1)*8:i*8] and vs2[(i+1)*8:i*8] is added with vd.The vector length(vl)  operates at destination operands with 32-bit element size. 
+Operation::
+[source,sail]
+--
+vd[i] = vd[i] + rs1[7:0]   * vs2[i][7:0] 
+              + rs1[15:8]  * vs2[i][15:8] 
+              + rs1[23:16] * vs2[i][23:16] 
+              + rs1[31:24] * vs2[i][31:24]   
+--
+
+Permission::
+This instruction can be executed in all privilege levels.
+
+Exceptions::
+This instruction triggers the same exceptions that a `vmacc.vv` instructions would trigger except that the value of vsew[2:0] must be 3'b010.
+
+Included in::
+[%header]
+|===
+|Extension
+
+|XTheadvdot (<<#xtheadvdot>>)
+|===
+
diff --git a/xtheadvdot/vmaqasu_vv.adoc b/xtheadvdot/vmaqasu_vv.adoc
@@ -0,0 +1,49 @@
+[#xtheadvdot-insns-vmaqasu-vv,reftext=Four signed-unsigned and 8-bit multiply with 32-bit add(vector-vector)]
+==== th.vmaqasu.vv
+
+Synopsis::
+Four signed-unsigned 8-bit multiply with 32-bit add.
+
+Mnemonic::
+th.vmaqasu.vv _vd_, _vs1_, _vs2_
+
+Encoding::
+[wavedrom, , svg]
+....
+{reg:[
+    { bits:  7, name: 0xb, attr: ['custom-0, 32 bit'] },
+    { bits:  5, name: 'vd' },
+    { bits:  3, name: 0x6, attr: ['vmaqa'] },
+    { bits:  5, name: 'vs1' },
+    { bits:  5, name: 'vs2' },
+    { bits:  1, name: 'vm' },
+    { bits:  6, name: '0x24' },
+]}
+....
+
+Description::
+
+The four signed 8-bit elements of 32-bit of vs1 are multiplied with the four unsigned 8-bit elements of 32-bit of vs2 and then the four results are added together with the corresponding 32-bit element of Vd. This instruction is based on vector extension.The vector masking  operates at source operands with 8-bit element size. If vm=1, the instruction is unmasked and the instruction is vmaqasu.vv vd, vs1, vs2. If vm=0, the instruction is masked and the instruction is vmaqasu.vv vd, vs1, vs2, v0.t. When v0.mask[i] is 1, the multiplication result of vs1[(i+1)*8:i*8] and vs2[(i+1)*8:i*8] is added with vd.The vector length(vl)  operates at destination operands with 32-bit element size. 
+Operation::
+[source,sail]
+--
+vd[i] = vd[i] + vs1[i][7:0]   * vs2[i][7:0] 
+              + vs1[i][15:8]  * vs2[i][15:8] 
+              + vs1[i][23:16] * vs2[i][23:16] 
+              + vs1[i][31:24] * vs2[i][31:24]   
+--
+
+Permission::
+This instruction can be executed in all privilege levels.
+
+Exceptions::
+This instruction triggers the same exceptions that a `vmacc.vv` instructions would trigger except that the value of vsew[2:0] must be 3'b010.
+
+Included in::
+[%header]
+|===
+|Extension
+
+|XTheadvdot (<<#xtheadvdot>>)
+|===
+
diff --git a/xtheadvdot/vmaqasu_vx.adoc b/xtheadvdot/vmaqasu_vx.adoc
@@ -0,0 +1,49 @@
+[#xtheadvdot-insns-vmaqasu-vx,reftext=Four signed-unsigned and 8-bit multiply with 32-bit add(vector-scalar)]
+==== th.vmaqasu.vx
+
+Synopsis::
+Four signed-unsigned 8-bit multiply with 32-bit add.
+
+Mnemonic::
+th.vmaqasu.vx _vd_, _rs1_, _vs2_
+
+Encoding::
+[wavedrom, , svg]
+....
+{reg:[
+    { bits:  7, name: 0xb, attr: ['custom-0, 32 bit'] },
+    { bits:  5, name: 'vd' },
+    { bits:  3, name: 0x6, attr: ['vmaqa'] },
+    { bits:  5, name: 'rs1' },
+    { bits:  5, name: 'vs2' },
+    { bits:  1, name: 'vm' },
+    { bits:  6, name: '0x25' },
+]}
+....
+
+Description::
+
+The four signed 8-bit elements of the lower 32-bit of rs1 are multiplied with the four unsigned 8-bit elements of each 32-bit of vs2 and then the four results are added together with the corresponding 32-bit element of Vd. This instruction is based on vector extension.The vector masking  operates at source operands with 8-bit element size. If vm=1, the instruction is unmasked and the instruction is vmaqasu.vx vd, rs1, vs2. If vm=0, the instruction is masked and the instruction is vmaqasu.vx vd, rs1, vs2, v0.t. When v0.mask[i] is 1, the multiplication result of rs1[(i+1)*8:i*8] and vs2[(i+1)*8:i*8] is added with vd.The vector length(vl)  operates at destination operands with 32-bit element size. 
+Operation::
+[source,sail]
+--
+vd[i] = vd[i] + rs1[7:0]   * vs2[i][7:0] 
+              + rs1[15:8]  * vs2[i][15:8] 
+              + rs1[23:16] * vs2[i][23:16] 
+              + rs1[31:24] * vs2[i][31:24]   
+--
+
+Permission::
+This instruction can be executed in all privilege levels.
+
+Exceptions::
+This instruction triggers the same exceptions that a `vmacc.vv` instructions would trigger except that the value of vsew[2:0] must be 3'b010.
+
+Included in::
+[%header]
+|===
+|Extension
+
+|XTheadvdot (<<#xtheadvdot>>)
+|===
+
diff --git a/xtheadvdot/vmaqau_vv.adoc b/xtheadvdot/vmaqau_vv.adoc
@@ -0,0 +1,49 @@
+[#xtheadvdot-insns-vmaqau-vv,reftext=Four unsigned 8-bit multiply with 32-bit add(vector-vector)]
+==== th.vmaqau.vv
+
+Synopsis::
+Four unsigned 8-bit multiply with 32-bit add.
+
+Mnemonic::
+th.vmaqau.vv _vd_, _vs1_, _vs2_
+
+Encoding::
+[wavedrom, , svg]
+....
+{reg:[
+    { bits:  7, name: 0xb, attr: ['custom-0, 32 bit'] },
+    { bits:  5, name: 'vd' },
+    { bits:  3, name: 0x6, attr: ['vmaqa'] },
+    { bits:  5, name: 'vs1' },
+    { bits:  5, name: 'vs2' },
+    { bits:  1, name: 'vm' },
+    { bits:  6, name: '0x22' },
+]}
+....
+
+Description::
+
+The four unsigned 8-bit elements of 32-bit of vs1 are multiplied with the four unsigned 8-bit elements of 32-bit of vs2 and then the four results are added together with the corresponding 32-bit element of Vd. This instruction is based on vector extension.The vector masking  operates at source operands with 8-bit element size. If vm=1, the instruction is unmasked and the instruction is vmaqau.vv vd, vs1, vs2. If vm=0, the instruction is masked and the instruction is vmaqau.vv vd, vs1, vs2, v0.t. When v0.mask[i] is 1, the multiplication result of vs1[(i+1)*8:i*8] and vs2[(i+1)*8:i*8] is added with vd.The vector length(vl)  operates at destination operands with 32-bit element size. 
+Operation::
+[source,sail]
+--
+vd[i] = vd[i] + vs1[i][7:0]   * vs2[i][7:0] 
+              + vs1[i][15:8]  * vs2[i][15:8] 
+              + vs1[i][23:16] * vs2[i][23:16] 
+              + vs1[i][31:24] * vs2[i][31:24]   
+--
+
+Permission::
+This instruction can be executed in all privilege levels.
+
+Exceptions::
+This instruction triggers the same exceptions that a `vmacc.vv` instructions would trigger except that the value of vsew[2:0] must be 3'b010.
+
+Included in::
+[%header]
+|===
+|Extension
+
+|XTheadvdot (<<#xtheadvdot>>)
+|===
+
diff --git a/xtheadvdot/vmaqau_vx.adoc b/xtheadvdot/vmaqau_vx.adoc
@@ -0,0 +1,49 @@
+[#xtheadvdot-insns-vmaqau-vx,reftext=Four unsigned 8-bit multiply with 32-bit add(vector-scalar)]
+==== th.vmaqau.vx
+
+Synopsis::
+Four unsigned 8-bit multiply with 32-bit add.
+
+Mnemonic::
+th.vmaqau.vx _vd_, _rs1_, _vs2_
+
+Encoding::
+[wavedrom, , svg]
+....
+{reg:[
+    { bits:  7, name: 0xb, attr: ['custom-0, 32 bit'] },
+    { bits:  5, name: 'vd' },
+    { bits:  3, name: 0x6, attr: ['vmaqa'] },
+    { bits:  5, name: 'rs1' },
+    { bits:  5, name: 'vs2' },
+    { bits:  1, name: 'vm' },
+    { bits:  6, name: '0x23' },
+]}
+....
+
+Description::
+
+The four unsigned 8-bit elements of the lower 32-bit of rs1 are multiplied with the four unsigned 8-bit elements of each 32-bit of vs2 and then the four results are added together with the corresponding 32-bit element of Vd. This instruction is based on vector extension.The vector masking  operates at source operands with 8-bit element size. If vm=1, the instruction is unmasked and the instruction is vmaqau.vx vd, rs1, vs2. If vm=0, the instruction is masked and the instruction is vmaqau.vx vd, rs1, vs2, v0.t. When v0.mask[i] is 1, the multiplication result of rs1[(i+1)*8:i*8] and vs2[(i+1)*8:i*8] is added with vd.The vector length(vl)  operates at destination operands with 32-bit element size. 
+Operation::
+[source,sail]
+--
+vd[i] = vd[i] + rs1[7:0]   * vs2[i][7:0] 
+              + rs1[15:8]  * vs2[i][15:8] 
+              + rs1[23:16] * vs2[i][23:16] 
+              + rs1[31:24] * vs2[i][31:24]   
+--
+
+Permission::
+This instruction can be executed in all privilege levels.
+
+Exceptions::
+This instruction triggers the same exceptions that a `vmacc.vv` instructions would trigger except that the value of vsew[2:0] must be 3'b010.
+
+Included in::
+[%header]
+|===
+|Extension
+
+|XTheadvdot (<<#xtheadvdot>>)
+|===
+