Skip to content

Commit

Permalink
Merge pull request #8 from rjiejie/master
Browse files Browse the repository at this point in the history
xtheadvdot: Add new extension
  • Loading branch information
Cooper-Qu authored Dec 4, 2022
2 parents c295bde + a9948e4 commit 29c18da
Show file tree
Hide file tree
Showing 10 changed files with 384 additions and 0 deletions.
1 change: 1 addition & 0 deletions intro.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ The collection consists of the following ISA extensions:
* `XTheadMemPair` provides two-GP-register memory operations.
* `XTheadFMemIdx` provides floating-point memory operations.
* `XTheadMac` provides multiply-accumulate instructions.
* `XTheadVdot` provides instructions for vector dot.

=== Dependencies to standard extensions

Expand Down
1 change: 1 addition & 0 deletions xthead.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,4 @@ include::xtheadfmemidx.adoc[]
include::xtheadmac.adoc[]
include::xtheadfmv.adoc[]
include::xtheadint.adoc[]
include::xtheadvdot.adoc[]
39 changes: 39 additions & 0 deletions xtheadvdot.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
[#xtheadvdot]
== Vector four 8-bit multiply and add with 32-bit instructions

[NOTE,caption=Frozen]
The `XTheadVdot` extension is `stable`.

The `XTheadVdot` ISA extension provides vector integer four 8-bit multiply and add with 32-bit element intructions.

This extension depends on the availability of the `V` (vector) ISA extension.

The table below gives an overview of the instructions:

[cols="^3,^3,12,18",stripes=even,options="header"]
|===
| RV32 | RV64 | Mnemonic | Instruction
| Y | Y | th.vmaqa.vv _vd_, _vs1_, _vs2_ | <<#xtheadvdot-insns-vmaqa-vv>>
| Y | Y | th.vmaqa.vx _vd_, _rs1_, _vs2_ | <<#xtheadvdot-insns-vmaqa-vx>>
| Y | Y | th.vmaqau.vv _vd_, _vs1_, _vs2_ | <<#xtheadvdot-insns-vmaqau-vv>>
| Y | Y | th.vmaqau.vx _vd_, _rs1_, _vs2_ | <<#xtheadvdot-insns-vmaqau-vx>>
| Y | Y | th.vmaqasu.vv _vd_, _vs1_, _vs2_ | <<#xtheadvdot-insns-vmaqasu-vv>>
| Y | Y | th.vmaqasu.vx _vd_, _rs1_, _vs2_ | <<#xtheadvdot-insns-vmaqasu-vx>>
| Y | Y | th.vmaqaus.vx _vd_, _rs1_, _vs2_ | <<#xtheadvdot-insns-vmaqaus-vx>>
|===

[#xtheadvdot-insns,reftext="Instructions"]
=== Instructions
include::xtheadvdot/vmaqa_vv.adoc[]
<<<
include::xtheadvdot/vmaqa_vx.adoc[]
<<<
include::xtheadvdot/vmaqau_vv.adoc[]
<<<
include::xtheadvdot/vmaqau_vx.adoc[]
<<<
include::xtheadvdot/vmaqasu_vv.adoc[]
<<<
include::xtheadvdot/vmaqasu_vx.adoc[]
<<<
include::xtheadvdot/vmaqaus_vx.adoc[]
49 changes: 49 additions & 0 deletions xtheadvdot/vmaqa_vv.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
[#xtheadvdot-insns-vmaqa-vv,reftext=Four signed 8-bit multiply with 32-bit add(vector-vector)]
==== th.vmaqa.vv

Synopsis::
Four signed 8-bit multiply with 32-bit add.

Mnemonic::
th.vmaqa.vv _vd_, _vs1_, _vs2_

Encoding::
[wavedrom, , svg]
....
{reg:[
{ bits: 7, name: 0xb, attr: ['custom-0, 32 bit'] },
{ bits: 5, name: 'vd' },
{ bits: 3, name: 0x6, attr: ['vmaqa'] },
{ bits: 5, name: 'vs1' },
{ bits: 5, name: 'vs2' },
{ bits: 1, name: 'vm' },
{ bits: 6, name: '0x20' },
]}
....

Description::

The four signed 8-bit elements of 32-bit of vs1 are multiplied with the four signed 8-bit elements of 32-bit of vs2 and then the four results are added together with the corresponding 32-bit element of Vd. This instruction is based on vector extension.The vector masking operates at source operands with 8-bit element size. If vm=1, the instruction is unmasked and the instruction is vmaqa.vv vd, vs1, vs2. If vm=0, the instruction is masked and the instruction is vmaqa.vv vd, vs1, vs2, v0.t. When v0.mask[i] is 1, the multiplication result of vs1[(i+1)*8:i*8] and vs2[(i+1)*8:i*8] is added with vd.The vector length(vl) operates at destination operands with 32-bit element size.
Operation::
[source,sail]
--
vd[i] = vd[i] + vs1[i][7:0] * vs2[i][7:0]
+ vs1[i][15:8] * vs2[i][15:8]
+ vs1[i][23:16] * vs2[i][23:16]
+ vs1[i][31:24] * vs2[i][31:24]
--
Permission::
This instruction can be executed in all privilege levels.
Exceptions::
This instruction triggers the same exceptions that a `vmacc.vv` instructions would trigger except that the value of vsew[2:0] must be 3'b010.
Included in::
[%header]
|===
|Extension
|XTheadvdot (<<#xtheadvdot>>)
|===
49 changes: 49 additions & 0 deletions xtheadvdot/vmaqa_vx.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
[#xtheadvdot-insns-vmaqa-vx,reftext=Four signed 8-bit multiply with 32-bit add(vector-scalar)]
==== th.vmaqa.vx

Synopsis::
Four signed 8-bit multiply with 32-bit add.

Mnemonic::
th.vmaqa.vx _vd_, _rs1_, _vs2_

Encoding::
[wavedrom, , svg]
....
{reg:[
{ bits: 7, name: 0xb, attr: ['custom-0, 32 bit'] },
{ bits: 5, name: 'vd' },
{ bits: 3, name: 0x6, attr: ['vmaqa'] },
{ bits: 5, name: 'rs1' },
{ bits: 5, name: 'vs2' },
{ bits: 1, name: 'vm' },
{ bits: 6, name: '0x21' },
]}
....

Description::

The four signed 8-bit elements of the lower 32-bit of rs1 are multiplied with the four signed 8-bit elements of each 32-bit of vs2 and then the four results are added together with the corresponding 32-bit element of Vd. This instruction is based on vector extension.The vector masking operates at source operands with 8-bit element size. If vm=1, the instruction is unmasked and the instruction is vmaqa.vx vd, rs1, vs2. If vm=0, the instruction is masked and the instruction is vmaqa.vx vd, rs1, vs2, v0.t. When v0.mask[i] is 1, the multiplication result of rs1[(i+1)*8:i*8] and vs2[(i+1)*8:i*8] is added with vd.The vector length(vl) operates at destination operands with 32-bit element size.
Operation::
[source,sail]
--
vd[i] = vd[i] + rs1[7:0] * vs2[i][7:0]
+ rs1[15:8] * vs2[i][15:8]
+ rs1[23:16] * vs2[i][23:16]
+ rs1[31:24] * vs2[i][31:24]
--
Permission::
This instruction can be executed in all privilege levels.
Exceptions::
This instruction triggers the same exceptions that a `vmacc.vv` instructions would trigger except that the value of vsew[2:0] must be 3'b010.
Included in::
[%header]
|===
|Extension
|XTheadvdot (<<#xtheadvdot>>)
|===
49 changes: 49 additions & 0 deletions xtheadvdot/vmaqasu_vv.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
[#xtheadvdot-insns-vmaqasu-vv,reftext=Four signed-unsigned and 8-bit multiply with 32-bit add(vector-vector)]
==== th.vmaqasu.vv

Synopsis::
Four signed-unsigned 8-bit multiply with 32-bit add.

Mnemonic::
th.vmaqasu.vv _vd_, _vs1_, _vs2_

Encoding::
[wavedrom, , svg]
....
{reg:[
{ bits: 7, name: 0xb, attr: ['custom-0, 32 bit'] },
{ bits: 5, name: 'vd' },
{ bits: 3, name: 0x6, attr: ['vmaqa'] },
{ bits: 5, name: 'vs1' },
{ bits: 5, name: 'vs2' },
{ bits: 1, name: 'vm' },
{ bits: 6, name: '0x24' },
]}
....

Description::

The four signed 8-bit elements of 32-bit of vs1 are multiplied with the four unsigned 8-bit elements of 32-bit of vs2 and then the four results are added together with the corresponding 32-bit element of Vd. This instruction is based on vector extension.The vector masking operates at source operands with 8-bit element size. If vm=1, the instruction is unmasked and the instruction is vmaqasu.vv vd, vs1, vs2. If vm=0, the instruction is masked and the instruction is vmaqasu.vv vd, vs1, vs2, v0.t. When v0.mask[i] is 1, the multiplication result of vs1[(i+1)*8:i*8] and vs2[(i+1)*8:i*8] is added with vd.The vector length(vl) operates at destination operands with 32-bit element size.
Operation::
[source,sail]
--
vd[i] = vd[i] + vs1[i][7:0] * vs2[i][7:0]
+ vs1[i][15:8] * vs2[i][15:8]
+ vs1[i][23:16] * vs2[i][23:16]
+ vs1[i][31:24] * vs2[i][31:24]
--
Permission::
This instruction can be executed in all privilege levels.
Exceptions::
This instruction triggers the same exceptions that a `vmacc.vv` instructions would trigger except that the value of vsew[2:0] must be 3'b010.
Included in::
[%header]
|===
|Extension
|XTheadvdot (<<#xtheadvdot>>)
|===
49 changes: 49 additions & 0 deletions xtheadvdot/vmaqasu_vx.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
[#xtheadvdot-insns-vmaqasu-vx,reftext=Four signed-unsigned and 8-bit multiply with 32-bit add(vector-scalar)]
==== th.vmaqasu.vx

Synopsis::
Four signed-unsigned 8-bit multiply with 32-bit add.

Mnemonic::
th.vmaqasu.vx _vd_, _rs1_, _vs2_

Encoding::
[wavedrom, , svg]
....
{reg:[
{ bits: 7, name: 0xb, attr: ['custom-0, 32 bit'] },
{ bits: 5, name: 'vd' },
{ bits: 3, name: 0x6, attr: ['vmaqa'] },
{ bits: 5, name: 'rs1' },
{ bits: 5, name: 'vs2' },
{ bits: 1, name: 'vm' },
{ bits: 6, name: '0x25' },
]}
....

Description::

The four signed 8-bit elements of the lower 32-bit of rs1 are multiplied with the four unsigned 8-bit elements of each 32-bit of vs2 and then the four results are added together with the corresponding 32-bit element of Vd. This instruction is based on vector extension.The vector masking operates at source operands with 8-bit element size. If vm=1, the instruction is unmasked and the instruction is vmaqasu.vx vd, rs1, vs2. If vm=0, the instruction is masked and the instruction is vmaqasu.vx vd, rs1, vs2, v0.t. When v0.mask[i] is 1, the multiplication result of rs1[(i+1)*8:i*8] and vs2[(i+1)*8:i*8] is added with vd.The vector length(vl) operates at destination operands with 32-bit element size.
Operation::
[source,sail]
--
vd[i] = vd[i] + rs1[7:0] * vs2[i][7:0]
+ rs1[15:8] * vs2[i][15:8]
+ rs1[23:16] * vs2[i][23:16]
+ rs1[31:24] * vs2[i][31:24]
--
Permission::
This instruction can be executed in all privilege levels.
Exceptions::
This instruction triggers the same exceptions that a `vmacc.vv` instructions would trigger except that the value of vsew[2:0] must be 3'b010.
Included in::
[%header]
|===
|Extension
|XTheadvdot (<<#xtheadvdot>>)
|===
49 changes: 49 additions & 0 deletions xtheadvdot/vmaqau_vv.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
[#xtheadvdot-insns-vmaqau-vv,reftext=Four unsigned 8-bit multiply with 32-bit add(vector-vector)]
==== th.vmaqau.vv

Synopsis::
Four unsigned 8-bit multiply with 32-bit add.

Mnemonic::
th.vmaqau.vv _vd_, _vs1_, _vs2_

Encoding::
[wavedrom, , svg]
....
{reg:[
{ bits: 7, name: 0xb, attr: ['custom-0, 32 bit'] },
{ bits: 5, name: 'vd' },
{ bits: 3, name: 0x6, attr: ['vmaqa'] },
{ bits: 5, name: 'vs1' },
{ bits: 5, name: 'vs2' },
{ bits: 1, name: 'vm' },
{ bits: 6, name: '0x22' },
]}
....

Description::

The four unsigned 8-bit elements of 32-bit of vs1 are multiplied with the four unsigned 8-bit elements of 32-bit of vs2 and then the four results are added together with the corresponding 32-bit element of Vd. This instruction is based on vector extension.The vector masking operates at source operands with 8-bit element size. If vm=1, the instruction is unmasked and the instruction is vmaqau.vv vd, vs1, vs2. If vm=0, the instruction is masked and the instruction is vmaqau.vv vd, vs1, vs2, v0.t. When v0.mask[i] is 1, the multiplication result of vs1[(i+1)*8:i*8] and vs2[(i+1)*8:i*8] is added with vd.The vector length(vl) operates at destination operands with 32-bit element size.
Operation::
[source,sail]
--
vd[i] = vd[i] + vs1[i][7:0] * vs2[i][7:0]
+ vs1[i][15:8] * vs2[i][15:8]
+ vs1[i][23:16] * vs2[i][23:16]
+ vs1[i][31:24] * vs2[i][31:24]
--
Permission::
This instruction can be executed in all privilege levels.
Exceptions::
This instruction triggers the same exceptions that a `vmacc.vv` instructions would trigger except that the value of vsew[2:0] must be 3'b010.
Included in::
[%header]
|===
|Extension
|XTheadvdot (<<#xtheadvdot>>)
|===
49 changes: 49 additions & 0 deletions xtheadvdot/vmaqau_vx.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
[#xtheadvdot-insns-vmaqau-vx,reftext=Four unsigned 8-bit multiply with 32-bit add(vector-scalar)]
==== th.vmaqau.vx

Synopsis::
Four unsigned 8-bit multiply with 32-bit add.

Mnemonic::
th.vmaqau.vx _vd_, _rs1_, _vs2_

Encoding::
[wavedrom, , svg]
....
{reg:[
{ bits: 7, name: 0xb, attr: ['custom-0, 32 bit'] },
{ bits: 5, name: 'vd' },
{ bits: 3, name: 0x6, attr: ['vmaqa'] },
{ bits: 5, name: 'rs1' },
{ bits: 5, name: 'vs2' },
{ bits: 1, name: 'vm' },
{ bits: 6, name: '0x23' },
]}
....

Description::

The four unsigned 8-bit elements of the lower 32-bit of rs1 are multiplied with the four unsigned 8-bit elements of each 32-bit of vs2 and then the four results are added together with the corresponding 32-bit element of Vd. This instruction is based on vector extension.The vector masking operates at source operands with 8-bit element size. If vm=1, the instruction is unmasked and the instruction is vmaqau.vx vd, rs1, vs2. If vm=0, the instruction is masked and the instruction is vmaqau.vx vd, rs1, vs2, v0.t. When v0.mask[i] is 1, the multiplication result of rs1[(i+1)*8:i*8] and vs2[(i+1)*8:i*8] is added with vd.The vector length(vl) operates at destination operands with 32-bit element size.
Operation::
[source,sail]
--
vd[i] = vd[i] + rs1[7:0] * vs2[i][7:0]
+ rs1[15:8] * vs2[i][15:8]
+ rs1[23:16] * vs2[i][23:16]
+ rs1[31:24] * vs2[i][31:24]
--
Permission::
This instruction can be executed in all privilege levels.
Exceptions::
This instruction triggers the same exceptions that a `vmacc.vv` instructions would trigger except that the value of vsew[2:0] must be 3'b010.
Included in::
[%header]
|===
|Extension
|XTheadvdot (<<#xtheadvdot>>)
|===
Loading

0 comments on commit 29c18da

Please sign in to comment.