Metadata Table | |
---|---|
Manual Type | bitmanip |
Spec Revision | |
Spec Release Date | |
Git Revision | 1.0.0 |
Git URL | https://github.com/riscv/riscv-bitmanip.git |
Source | bitmanip/bitmanip.adoc |
Conversion Date | 2023/10/11 |
License | CC-by-4.0 |
This document is released under the Creative Commons Attribution 4.0 International License.
It describes the BitManip Zba, Zbb, Zbc and Zbs extensions being submitted for public review.
Contributors to this specification (in alphabetical order) include:
Jacob Bachmeyer,
Allen Baum,
Ari Ben,
Alex Bradbury,
Steven Braeger,
Rogier Brussee,
Michael Clark,
Ken Dockser,
Paul Donahue,
Dennis Ferguson,
Fabian Giesen,
John Hauser,
Robert Henry,
Bruce Hoult,
Po-wei Huang,
Ben Marshall,
Rex McCrary,
Lee Moore,
Jiří Moravec,
Samuel Neves,
Markus Oberhumer,
Christopher Olson,
Nils Pipenbrinck,
Joseph Rahmeh,
Xue Saw,
Tommy Thorn,
Philipp Tomsich,
Avishai Tvila,
Andrew Waterman,
Thomas Wicki, and
Claire Wolf.
We express our gratitude to everyone that contributed to, reviewed or improved this specification through their comments and questions.
The bit-manipulation (bitmanip) extension collection is comprised of several component extensions to the base RISC-V architecture that are intended to provide some combination of code size reduction, performance improvement, and energy reduction. While the instructions are intended to have general use, some instructions are more useful in some domains than others. Hence, several smaller bitmanip extensions are provided, rather than one large extension. Each of these smaller extensions is grouped by common function and use case, and each of which has its own Zb*-extension name.
Each bitmanip extension includes a group of several bitmanip instructions that have similar purposes and that can often share the same logic. Some instructions are available in only one extension while others are available in several. The instructions have mnemonics and encodings that are independent of the extensions in which they appear. Thus, when implementing extensions with overlapping instructions, there is no redundancy is logic or encoding.
The bitmanip extensions are defined for RV32 and RV64. Most of the instructions are expected to be forward compatible with RV128. While the shift-immediate instructions are defined to have at most a 6-bit immediate field, a 7th bit is available in the encoding space should this be needed for RV128.
The bitmanip extension follows the convention in RV64 that w-suffixed instructions (without a dot before the w) ignore the upper 32 bits of their inputs, operate on the least-significant 32-bits as signed values and produce a 32-bit signed result that is sign-extended to XLEN.
Bitmanip instructions with the suffix .uw have one operand that is an unsigned 32-bit value that is extracted from the least significant 32 bits of the specified register. Other than that, these perform full XLEN operations.
Bitmanip instructions with the suffix .b, .h and .w only look at the least significant 8-bits, 16-bits and 32-bits of the input (respectively) and produce an XLEN-wide result that is sign-extended or zero-extended, based on specific instruction.
The semantics of each instruction in Instructions (in alphabetical order) is expressed in a SAIL-like syntax.
The first group of bitmanip extensions to be release for Public Review are:
Below is a list of all of the instructions (and pseudoinstructions) that are included in these extensions along with their specific mapping:
RV32 | RV64 | Mnemonic | Instruction | Zba | Zbb | Zbc | Zbs |
---|---|---|---|---|---|---|---|
X |
add.uw rd, rs1, rs2 |
X |
|||||
X |
X |
andn rd, rs1, rs2 |
X |
||||
X |
X |
clmul rd, rs1, rs2 |
X |
||||
X |
X |
clmulh rd, rs1, rs2 |
X |
||||
X |
X |
clmulr rd, rs1, rs2 |
X |
||||
X |
X |
clz rd, rs |
X |
||||
X |
clzw rd, rs |
X |
|||||
X |
X |
cpop rd, rs |
X |
||||
X |
cpopw rd, rs |
X |
|||||
X |
X |
ctz rd, rs |
X |
||||
X |
ctzw rd, rs |
X |
|||||
X |
X |
max rd, rs1, rs2 |
X |
||||
X |
X |
maxu rd, rs1, rs2 |
X |
||||
X |
X |
min rd, rs1, rs2 |
X |
||||
X |
X |
minu rd, rs1, rs2 |
X |
||||
X |
X |
orc.b rd, rs1, rs2 |
X |
||||
X |
X |
orn rd, rs1, rs2 |
X |
||||
X |
X |
rev8_rd_, rs |
X |
||||
X |
X |
rol rd, rs1, rs2 |
X |
||||
X |
rolw rd, rs1, rs2 |
X |
|||||
X |
X |
ror rd, rs1, rs2 |
X |
||||
X |
X |
rori rd, rs1, shamt |
X |
||||
X |
roriw rd, rs1, shamt |
X |
|||||
X |
rorw rd, rs1, rs2 |
X |
|||||
X |
X |
bclr rd, rs1, rs2 |
X |
||||
X |
X |
bclri rd, rs1, imm |
X |
||||
X |
X |
bext rd, rs1, rs2 |
X |
||||
X |
X |
bexti rd, rs1, imm |
X |
||||
X |
X |
binv rd, rs1, rs2 |
X |
||||
X |
X |
binvi rd, rs1, imm |
X |
||||
X |
X |
bset rd, rs1, rs2 |
X |
||||
X |
X |
bseti rd, rs1, imm |
X |
||||
X |
X |
sext.b rd, rs |
X |
||||
X |
X |
sext.h rd, rs |
X |
||||
X |
X |
sh1add rd, rs1, rs2 |
X |
||||
X |
sh1add.uw rd, rs1, rs2 |
X |
|||||
X |
X |
sh2add rd, rs1, rs2 |
X |
||||
X |
sh2add.uw rd, rs1, rs2 |
X |
|||||
X |
X |
sh3add rd, rs2, rs2 |
X |
||||
X |
sh3add.uw rd, rs1, rs2 |
X |
|||||
X |
slli.uw rd, rs1, imm |
X |
|||||
X |
X |
xnor rd, rs1, rs2 |
X |
||||
X |
X |
zext.h rd, rs |
X |
The Zba extension is frozen. |
The Zba instructions can be used to accelerate the generation of addresses that index into arrays of basic types (halfword, word, doubleword) using both unsigned word-sized and XLEN-sized indices: a shifted index is added to a base address.
The shift and add instructions to a left shift of 1, 2, or 3 because these are commonly found in real-world code and because they can be implemented with a minimal amount of additional hardware beyond that of the simple adder. This avoids lengthening the critical path in implementations.
While the shift and add instructions are limited to a maximum left shift of 3, the slli instruction (from the base ISA) can be used to perform similar shifts for indexing into arrays of wider elements. The slli.uw — added in this sub extension — can be used when the index is to be interpreted as an unsigned word.
The following instructions comprise the Zba extension:
RV32 | RV64 | Mnemonic | Instruction |
---|---|---|---|
✓ |
add.uw rd, rs1, rs2 |
||
✓ |
✓ |
sh1add rd, rs1, rs2 |
|
✓ |
sh1add.uw rd, rs1, rs2 |
||
✓ |
✓ |
sh2add rd, rs1, rs2 |
|
✓ |
sh2add.uw rd, rs1, rs2 |
||
✓ |
✓ |
sh3add rd, rs2, rs2 |
|
✓ |
sh3add.uw rd, rs1, rs2 |
||
✓ |
slli.uw rd, rs1, imm |
The Zbb extension is frozen. |
RV32 | RV64 | Mnemonic | Instruction |
---|---|---|---|
✓ |
✓ |
andn rd, rs1, rs2 |
|
✓ |
✓ |
orn rd, rs1, rs2 |
|
✓ |
✓ |
xnor rd, rs1, rs2 |
Implementation Hint
The Logical with Negate instructions can be implemented by inverting the rs2 inputs to the base-required AND, OR, and XOR logic instructions. In some implementations, the inverter on rs2 used for subtraction can be reused for this purpose. |
RV32 | RV64 | Mnemonic | Instruction |
---|---|---|---|
✓ |
✓ |
clz rd, rs |
|
✓ |
clzw rd, rs |
||
✓ |
✓ |
ctz rd, rs |
|
✓ |
ctzw rd, rs |
These instructions count the number of set bits (1-bits). This is also commonly referred to as population count.
RV32 | RV64 | Mnemonic | Instruction |
---|---|---|---|
✓ |
✓ |
cpop rd, rs |
|
✓ |
cpopw rd, rs |
The integer minimum/maximum instructions are arithmetic R-type instructions that returns the smaller/larger of two operands.
RV32 | RV64 | Mnemonic | Instruction |
---|---|---|---|
✓ |
✓ |
max rd, rs1, rs2 |
|
✓ |
✓ |
maxu rd, rs1, rs2 |
|
✓ |
✓ |
min rd, rs1, rs2 |
|
✓ |
✓ |
minu rd, rs1, rs2 |
These instructions perform the sign-extension or zero-extension of the least significant 8 bits, 16 bits or 32 bits of the source register.
These instructions replace the generalized idioms slli rD,rS,(XLEN-<size>) + srli
(for zero-extension) or slli + srai
(for sign-extension) for the sign-extension of 8-bit and 16-bit quantities, and for the zero-extension of 16-bit and 32-bit quantities.
RV32 | RV64 | Mnemonic | Instruction |
---|---|---|---|
✓ |
✓ |
sext.b rd, rs |
|
✓ |
✓ |
sext.h rd, rs |
|
✓ |
✓ |
zext.h rd, rs |
Bitwise rotation instructions are similar to the shift-logical operations from the base spec. However, where the shift-logical instructions shift in zeros, the rotate instructions shift in the bits that were shifted out of the other side of the value. Such operations are also referred to as ‘circular shifts’.
RV32 | RV64 | Mnemonic | Instruction |
---|---|---|---|
✓ |
✓ |
rol rd, rs1, rs2 |
|
✓ |
rolw rd, rs1, rs2 |
||
✓ |
✓ |
ror rd, rs1, rs2 |
|
✓ |
✓ |
rori rd, rs1, shamt |
|
✓ |
roriw rd, rs1, shamt |
||
✓ |
rorw rd, rs1, rs2 |
Architecture Explanation
The rotate instructions were included to replace a common four-instruction sequence to achieve the same effect (neg; sll/srl; srl/sll; or) |
orc.b sets the bits of each byte in the result rd to all zeros if no bit within the respective byte of rs is set, or to all ones if any bit within the respective byte of rs is set.
The intended use-case are string-processing functions, like strlen and strcpy, which can utilize orc.b for testing for zero bytes, and counting trailing non-zero bytes in a word.
RV32 | RV64 | Mnemonic | Instruction |
---|---|---|---|
✓ |
✓ |
orc.b rd, rs |
This instruction reverses the byte-ordering in a register.
RV32 | RV64 | Mnemonic | Instruction |
---|---|---|---|
✓ |
✓ |
rev8 rd, rs |
The Zbc extension is frozen. |
Carry-less multiplication is the multiplication in the polynomial ring over GF(2).
clmul produces the lower half of the carry-less product and clmulh produces the upper half of the 2✕XLEN carry-less product.
clmulr produces bits 2✕XLEN−2:XLEN-1 of the 2✕XLEN carry-less product. That means clmulh is equivalent to clmulr followed by a 1-bit right shift. (The MSB of a clmulh result is always zero.)
RV32 | RV64 | Mnemonic | Instruction |
---|---|---|---|
✓ |
✓ |
clmul rd, rs1, rs2 |
|
✓ |
✓ |
clmulh rd, rs1, rs2 |
|
✓ |
✓ |
clmulr rd, rs1, rs2 |
The Zbs extension is frozen. |
The single-bit instructions provide a mechanism to set, clear, invert, or extract a single bit in a register. The bit is specified by its index.
RV32 | RV64 | Mnemonic | Instruction |
---|---|---|---|
X |
X |
bclr rd, rs1, rs2 |
|
X |
X |
bclri rd, rs1, imm |
|
X |
X |
bext rd, rs1, rs2 |
|
X |
X |
bexti rd, rs1, imm |
|
X |
X |
binv rd, rs1, rs2 |
|
X |
X |
binvi rd, rs1, imm |
|
X |
X |
bset rd, rs1, rs2 |
|
X |
X |
bseti rd, rs1, imm |
Add unsigned word
add.uw rd, rs1, rs2
zext.w rd, rs1 → add.uw rd, rs1, zero
This instruction performs an XLEN-wide addition between rs2 and the zero-extended least-significant word of rs1.
let base = X(rs2);
let index = EXTZ(X(rs1)[31..0]);
X(rd) = base + index;
Extension | Minimum version | Lifecycle state |
---|---|---|
0.93 |
Frozen |
AND with inverted operand
andn rd, rs1, rs2
This instruction performs the bitwise logical AND operation between rs1 and the bitwise inversion of rs2.
X(rd) = X(rs1) & ~X(rs2);
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbb (Basic bit-manipulation) |
0.93 |
Frozen |
Single-Bit Clear (Register)
bclr rd, rs1, rs2
This instruction returns rs1 with a single bit cleared at the index specified in rs2. The index is read from the lower log2(XLEN) bits of rs2.
let index = X(rs2) & (XLEN - 1);
X(rd) = X(rs1) & ~(1 << index)
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbs (Single-bit instructions) |
0.93 |
Frozen |
Single-Bit Clear (Immediate)
bclri rd, rs1, shamt
This instruction returns rs1 with a single bit cleared at the index specified in shamt. The index is read from the lower log2(XLEN) bits of shamt. For RV32, the encodings corresponding to shamt[5]=1 are reserved.
let index = shamt & (XLEN - 1);
X(rd) = X(rs1) & ~(1 << index)
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbs (Single-bit instructions) |
0.93 |
Frozen |
Single-Bit Extract (Register)
bext rd, rs1, rs2
This instruction returns a single bit extracted from rs1 at the index specified in rs2. The index is read from the lower log2(XLEN) bits of rs2.
let index = X(rs2) & (XLEN - 1);
X(rd) = (X(rs1) >> index) & 1;
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbs (Single-bit instructions) |
0.93 |
Frozen |
Single-Bit Extract (Immediate)
bexti rd, rs1, shamt
This instruction returns a single bit extracted from rs1 at the index specified in rs2. The index is read from the lower log2(XLEN) bits of shamt. For RV32, the encodings corresponding to shamt[5]=1 are reserved.
let index = shamt & (XLEN - 1);
X(rd) = (X(rs1) >> index) & 1;
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbs (Single-bit instructions) |
0.93 |
Frozen |
Single-Bit Invert (Register)
binv rd, rs1, rs2
This instruction returns rs1 with a single bit inverted at the index specified in rs2. The index is read from the lower log2(XLEN) bits of rs2.
let index = X(rs2) & (XLEN - 1);
X(rd) = X(rs1) ^ (1 << index)
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbs (Single-bit instructions) |
0.93 |
Frozen |
Single-Bit Invert (Immediate)
binvi rd, rs1, shamt
This instruction returns rs1 with a single bit inverted at the index specified in shamt. The index is read from the lower log2(XLEN) bits of shamt. For RV32, the encodings corresponding to shamt[5]=1 are reserved.
let index = shamt & (XLEN - 1);
X(rd) = X(rs1) ^ (1 << index)
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbs (Single-bit instructions) |
0.93 |
Frozen |
Single-Bit Set (Register)
bset rd, rs1,rs2
This instruction returns rs1 with a single bit set at the index specified in rs2. The index is read from the lower log2(XLEN) bits of rs2.
let index = X(rs2) & (XLEN - 1);
X(rd) = X(rs1) | (1 << index)
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbs (Single-bit instructions) |
0.93 |
Frozen |
Single-Bit Set (Immediate)
bseti rd, rs1,shamt
This instruction returns rs1 with a single bit set at the index specified in shamt. The index is read from the lower log2(XLEN) bits of shamt. For RV32, the encodings corresponding to shamt[5]=1 are reserved.
let index = shamt & (XLEN - 1);
X(rd) = X(rs1) | (1 << index)
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbs (Single-bit instructions) |
0.93 |
Frozen |
Carry-less multiply (low-part)
clmul rd, rs1, rs2
clmul produces the lower half of the 2·XLEN carry-less product.
let rs1_val = X(rs1);
let rs2_val = X(rs2);
let output : xlenbits = 0;
foreach (i from 0 to xlen by 1) {
output = if ((rs2_val >> i) & 1)
then output ^ (rs1_val << i);
else output;
}
X[rd] = output
Extension | Minimum version | Lifecycle state |
---|---|---|
0.93 |
Frozen |
Carry-less multiply (high-part)
clmulh rd, rs1, rs2
clmulh produces the upper half of the 2·XLEN carry-less product.
let rs1_val = X(rs1);
let rs2_val = X(rs2);
let output : xlenbits = 0;
foreach (i from 1 to xlen by 1) {
output = if ((rs2_val >> i) & 1)
then output ^ (rs1_val >> (xlen - i));
else output;
}
X[rd] = output
Extension | Minimum version | Lifecycle state |
---|---|---|
0.93 |
Frozen |
Carry-less multiply (reversed)
clmulr rd, rs1, rs2
clmulr produces bits 2·XLEN−2:XLEN-1 of the 2·XLEN carry-less product.
let rs1_val = X(rs1);
let rs2_val = X(rs2);
let output : xlenbits = 0;
foreach (i from 0 to (xlen - 1) by 1) {
output = if ((rs2_val >> i) & 1)
then output ^ (rs1_val >> (xlen - i - 1));
else output;
}
X[rd] = output
Extension | Minimum version | Lifecycle state |
---|---|---|
0.93 |
Frozen |
Count leading zero bits
clz rd, rs
This instruction counts the number of 0’s before the first 1, starting at the most-significant bit (i.e., XLEN-1) and progressing to bit 0. Accordingly, if the input is 0, the output is XLEN, and if the most-significant bit of the input is a 1, the output is 0.
val HighestSetBit : forall ('N : Int), 'N >= 0. bits('N) -> int
function HighestSetBit x = {
foreach (i from (xlen - 1) to 0 by 1 in dec)
if [x[i]] == 0b1 then return(i) else ();
return -1;
}
let rs = X(rs);
X[rd] = (xlen - 1) - HighestSetBit(rs);
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbb (Basic bit-manipulation) |
0.93 |
Frozen |
Count leading zero bits in word
clzw rd, rs
This instruction counts the number of 0’s before the first 1 starting at bit 31 and progressing to bit 0. Accordingly, if the least-significant word is 0, the output is 32, and if the most-significant bit of the word (i.e., bit 31) is a 1, the output is 0.
val HighestSetBit32 : forall ('N : Int), 'N >= 0. bits('N) -> int
function HighestSetBit32 x = {
foreach (i from 31 to 0 by 1 in dec)
if [x[i]] == 0b1 then return(i) else ();
return -1;
}
let rs = X(rs);
X[rd] = 31 - HighestSetBit(rs);
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbb (Basic bit-manipulation) |
0.93 |
Frozen |
Count set bits
cpop rd, rs
This instructions counts the number of 1’s (i.e., set bits) in the source register.
let bitcount = 0;
let rs = X(rs);
foreach (i from 0 to (xlen - 1) in inc)
if rs[i] == 0b1 then bitcount = bitcount + 1 else ();
X[rd] = bitcount
Software Hint
This operations is known as population count, popcount, sideways sum, bit summation, or Hamming weight. The GCC builtin function |
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbb (Basic bit-manipulation) |
0.93 |
Frozen |
Count set bits in word
cpopw rd, rs
This instructions counts the number of 1’s (i.e., set bits) in the least-significant word of the source register.
let bitcount = 0;
let val = X(rs);
foreach (i from 0 to 31 in inc)
if val[i] == 0b1 then bitcount = bitcount + 1 else ();
X[rd] = bitcount
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbb (Basic bit-manipulation) |
0.93 |
Frozen |
Count trailing zeros
ctz rd, rs
This instruction counts the number of 0’s before the first 1, starting at the least-significant bit (i.e., 0) and progressing to the most-significant bit (i.e., XLEN-1). Accordingly, if the input is 0, the output is XLEN, and if the least-significant bit of the input is a 1, the output is 0.
val LowestSetBit : forall ('N : Int), 'N >= 0. bits('N) -> int
function LowestSetBit x = {
foreach (i from 0 to (xlen - 1) by 1 in dec)
if [x[i]] == 0b1 then return(i) else ();
return xlen;
}
let rs = X(rs);
X[rd] = LowestSetBit(rs);
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbb (Basic bit-manipulation) |
0.93 |
Frozen |
Count trailing zero bits in word
ctzw rd, rs
This instruction counts the number of 0’s before the first 1, starting at the least-significant bit (i.e., 0) and progressing to the most-significant bit of the least-significant word (i.e., 31). Accordingly, if the least-significant word is 0, the output is 32, and if the least-significant bit of the input is a 1, the output is 0.
val LowestSetBit32 : forall ('N : Int), 'N >= 0. bits('N) -> int
function LowestSetBit32 x = {
foreach (i from 0 to 31 by 1 in dec)
if [x[i]] == 0b1 then return(i) else ();
return 32;
}
let rs = X(rs);
X[rd] = LowestSetBit32(rs);
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbb (Basic bit-manipulation) |
0.93 |
Frozen |
Maximum
max rd, rs1, rs2
This instruction returns the larger of two signed integers.
let rs1_val = X(rs1);
let rs2_val = X(rs2);
let result = if rs1_val <_s rs2_val
then rs2_val
else rs1_val;
X(rd) = result;
Software Hint
Calculating the absolute value of a signed integer can be performed using the following sequence: neg rD,rS followed by max rD,rS,rD. When using this common sequence, it is suggested that they are scheduled with no intervening instructions so that implementations that are so optimized can fuse them together. |
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbb (Basic bit-manipulation) |
0.93 |
Frozen |
Unsigned maximum
maxu rd, rs1, rs2
This instruction returns the larger of two unsigned integers.
let rs1_val = X(rs1);
let rs2_val = X(rs2);
let result = if rs1_val <_u rs2_val
then rs2_val
else rs1_val;
X(rd) = result;
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbb (Basic bit-manipulation) |
0.93 |
Frozen |
Minimum
min rd, rs1, rs2
This instruction returns the smaller of two signed integers.
let rs1_val = X(rs1);
let rs2_val = X(rs2);
let result = if rs1_val <_s rs2_val
then rs1_val
else rs2_val;
X(rd) = result;
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbb (Basic bit-manipulation) |
0.93 |
Frozen |
Unsigned minimum
minu rd, rs1, rs2
This instruction returns the smaller of two unsigned integers.
let rs1_val = X(rs1);
let rs2_val = X(rs2);
let result = if rs1_val <_u rs2_val
then rs1_val
else rs2_val;
X(rd) = result;
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbb (Basic bit-manipulation) |
0.93 |
Frozen |
Bitwise OR-Combine, byte granule
orc.b rd, rs
Combines the bits within every byte through a reciprocal bitwise logical OR. This sets the bits of each byte in the result rd to all zeros if no bit within the respective byte of rs is set, or to all ones if any bit within the respective byte of rs is set.
let input = X(rs);
let output : xlenbits = 0;
let j = xlen;
foreach (i from 0 to xlen by 8) {
output[(i + 7)..i] = if input[(i - 7)..i] == 0
then 0b00000000
else 0b11111111;
}
X[rd] = output;
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbb (Basic bit-manipulation) |
0.93 |
Frozen |
OR with inverted operand
orn rd, rs1, rs2
This instruction performs the bitwise logical AND operation between rs1 and the bitwise inversion of rs2.
X(rd) = X(rs1) | ~X(rs2);
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbb (Basic bit-manipulation) |
0.93 |
Frozen |
Byte-reverse register
rev8 rd, rs
This instruction reverses the order of the bytes in a register.
let input = X(rs);
let output : xlenbits = 0;
let j = xlen;
foreach (i from 0 to xlen by 8) {
output[i..(i + 7)] = input[(j - 7)..j];
j = j - 8;
}
X[rd] = output
Note
The rev8 mnemonic corresponds to different instruction encodings in RV32 and RV64. |
Software Hint
The byte-reverse operation is only available for the full register
width. To emulate word-sized and halfword-sized byte-reversal,
perform a |
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbb (Basic bit-manipulation) |
0.93 |
Frozen |
Rotate Left (Register)
rol rd, rs1, rs2
This instruction performs a rotate left of rs1 by the amount in least-significant log2(XLEN) bits of rs2.
let shamt = if xlen == 32
then X(rs2)[4..0]
else X(rs2)[5..0];
let result = (X(rs1) << shamt) | (X(rs2) >> (xlen - shamt));
X(rd) = result;
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbb (Basic bit-manipulation) |
0.93 |
Frozen |
Rotate Left Word (Register)
rolw rd, rs1, rs2
This instruction performs a rotate left on the least-significant word of rs1 by the amount in least-significant 5 bits of rs2. The resulting word value is sign-extended by copying bit 31 to all of the more-significant bits.
let rs1 = EXTZ(X(rs1)[31..0])
let shamt = X(rs2)[4..0];
let result = (rs1 << shamt) | (rs1 >> (32 - shamt));
X(rd) = EXTS(result);
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbb (Basic bit-manipulation) |
0.93 |
Frozen |
Rotate Right
ror rd, rs1, rs2
This instruction performs a rotate right of rs1 by the amount in least-significant log2(XLEN) bits of rs2.
let shamt = if xlen == 32
then X(rs2)[4..0]
else X(rs2)[5..0];
let result = (X(rs1) >> shamt) | (X(rs2) << (xlen - shamt));
X(rd) = result;
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbb (Basic bit-manipulation) |
0.93 |
Frozen |
Rotate Right (Immediate)
rori rd, rs1, shamt
This instruction performs a rotate right of rs1 by the amount in the least-significant log2(XLEN) bits of shamt. For RV32, the encodings corresponding to shamt[5]=1 are reserved.
let shamt = if xlen == 32
then shamt[4..0]
else shamt[5..0];
let result = (X(rs1) >> shamt) | (X(rs2) << (xlen - shamt));
X(rd) = result;
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbb (Basic bit-manipulation) |
0.93 |
Frozen |
Rotate Right Word by Immediate
roriw rd, rs1, shamt
This instruction performs a rotate right on the least-significant word of rs1 by the amount in the least-significant log2(XLEN) bits of shamt. The resulting word value is sign-extended by copying bit 31 to all of the more-significant bits.
let rs1 = EXTZ(X(rs1)[31..0];
let result = (rs1 >> shamt[4..0]) | (X(rs1) << (32 - shamt[4..0]));
X(rd) = EXTS(result[31..0]);
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbb (Basic bit-manipulation) |
0.93 |
Frozen |
Rotate Right Word (Register)
rorw rd, rs1, rs2
This instruction performs a rotate right on the least-significant word of rs1 by the amount in least-significant 5 bits of rs2. The resultant word is sign-extended by copying bit 31 to all of the more-significant bits.
let rs1 = EXTZ(X(rs1)[31..0])
let shamt = X(rs2)[4..0];
let result = (rs1 >> shamt) | (rs1 << (32 - shamt));
X(rd) = EXTS(result);
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbb (Basic bit-manipulation) |
0.93 |
Frozen |
Sign-extend byte
sext.b rd, rs
This instruction sign-extends the least-significant byte in the source to XLEN by copying the most-significant bit in the byte (i.e., bit 7) to all of the more-significant bits.
X(rd) = EXTS(X(rs)[7..0]);
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbb (Basic bit-manipulation) |
0.93 |
Frozen |
Sign-extend halfword
sext.h rd, rs
This instruction sign-extends the least-significant halfword in rs to XLEN by copying the most-significant bit in the halfword (i.e., bit 15) to all of the more-significant bits.
X(rd) = EXTS(X(rs)[15..0]);
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbb (Basic bit-manipulation) |
0.93 |
Frozen |
Shift left by 1 and add
sh1add rd, rs1, rs2
This instruction shifts rs1 to the left by 1 bit and adds it to rs2.
X(rd) = X(rs2) + (X(rs1) << 1);
Extension | Minimum version | Lifecycle state |
---|---|---|
0.93 |
Frozen |
Shift unsigned word left by 1 and add
sh1add.uw rd, rs1, rs2
This instruction performs an XLEN-wide addition of two addends. The first addend is rs2. The second addend is the unsigned value formed by extracting the least-significant word of rs1 and shifting it left by 1 place.
let base = X(rs2);
let index = EXTZ(X(rs1)[31..0]);
X(rd) = base + (index << 1);
Extension | Minimum version | Lifecycle state |
---|---|---|
0.93 |
Frozen |
Shift left by 2 and add
sh2add rd, rs1, rs2
This instruction shifts rs1 to the left by 2 places and adds it to rs2.
X(rd) = X(rs2) + (X(rs1) << 2);
Extension | Minimum version | Lifecycle state |
---|---|---|
0.93 |
Frozen |
Shift unsigned word left by 2 and add
sh2add.uw rd, rs1, rs2
This instruction performs an XLEN-wide addition of two addends. The first addend is rs2. The second addend is the unsigned value formed by extracting the least-significant word of rs1 and shifting it left by 2 places.
let base = X(rs2);
let index = EXTZ(X(rs1)[31..0]);
X(rd) = base + (index << 2);
Extension | Minimum version | Lifecycle state |
---|---|---|
0.93 |
Frozen |
Shift left by 3 and add
sh3add rd, rs1, rs2
This instruction shifts rs1 to the left by 3 places and adds it to rs2.
X(rd) = X(rs2) + (X(rs1) << 3);
Extension | Minimum version | Lifecycle state |
---|---|---|
0.93 |
Frozen |
Shift unsigned word left by 3 and add
sh3add.uw rd, rs1, rs2
This instruction performs an XLEN-wide addition of two addends. The first addend is rs2. The second addend is the unsigned value formed by extracting the least-significant word of rs1 and shifting it left by 3 places.
let base = X(rs2);
let index = EXTZ(X(rs1)[31..0]);
X(rd) = base + (index << 3);
Extension | Minimum version | Lifecycle state |
---|---|---|
0.93 |
Frozen |
Shift-left unsigned word (Immediate)
slli.uw rd, rs1, shamt
This instruction takes the least-significant word of rs1, zero-extends it, and shifts it left by the immediate.
X(rd) = (EXTZ(X(rs)[31..0]) << shamt);
Extension | Minimum version | Lifecycle state |
---|---|---|
0.93 |
Frozen |
Architecture Explanation
This instruction is the same as slli with zext.w performed on rs1 before shifting. |
Exclusive NOR
xnor rd, rs1, rs2
This instruction performs the bit-wise exclusive-NOR operation on rs1 and rs2.
X(rd) = ~(X(rs1) ^ X(rs2));
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbb (Basic bit-manipulation) |
0.93 |
Frozen |
Zero-extend halfword
zext.h rd, rs
This instruction zero-extends the least-significant halfword of the source to XLEN by inserting 0’s into all of the bits more significant than 15.
X(rd) = EXTZ(X(rs)[15..0]);
Note
The zext.h mnemonic corresponds to different instruction encodings in RV32 and RV64. |
Extension | Minimum version | Lifecycle state |
---|---|---|
Zbb (Basic bit-manipulation) |
0.93 |
Frozen |
The orc.b instruction allows for the efficient detecting of NUL bytes in an XLEN-sized chunk of data:
the result of orc.b on a chunk that does not contain any NUL bytes will be all-zeros, and
after a bitwise-negation of the result of orc.b, the first NUL byte can be detected by ctz/clz (depending on the endianness of data).
A full example of a strlen function, which uses these techniques and also demonstrates the use of it for unaligned/partial data, is the following:
#include <sys/asm.h>
.text
.globl strlen
.type strlen, @function
strlen:
andi a3, a0, (SZREG-1) // offset
andi a1, a0, -SZREG // align pointer
.Lprologue:
li a4, SZREG
sub a4, a4, a3 // XLEN - offset
slli a3, a3, PTRLOG // offset * 8
REG_L a2, 0(a1) // chunk
/*
* Shift the partial/unaligned chunk we loaded to remove the bytes
* from before the start of the string, adding NUL bytes at the end.
*/
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
srl a2, a2 ,a3 // chunk >> (offset * 8)
#else
sll a2, a2, a3
#endif
orc.b a2, a2
not a2, a2
/*
* Non-NUL bytes in the string have been expanded to 0x00, while
* NUL bytes have become 0xff. Search for the first set bit
* (corresponding to a NUL byte in the original chunk).
*/
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
ctz a2, a2
#else
clz a2, a2
#endif
/*
* The first chunk is special: compare against the number of valid
* bytes in this chunk.
*/
srli a0, a2, 3
bgtu a4, a0, .Ldone
addi a3, a1, SZREG
li a4, -1
.align 2
/*
* Our critical loop is 4 instructions and processes data in 4 byte
* or 8 byte chunks.
*/
.Lloop:
REG_L a2, SZREG(a1)
addi a1, a1, SZREG
orc.b a2, a2
beq a2, a4, .Lloop
.Lepilogue:
not a2, a2
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
ctz a2, a2
#else
clz a2, a2
#endif
sub a1, a1, a3
add a0, a0, a1
srli a2, a2, 3
add a0, a0, a2
.Ldone:
ret
#include <sys/asm.h>
.text
.globl strcmp
.type strcmp, @function
strcmp:
or a4, a0, a1
li t2, -1
and a4, a4, SZREG-1
bnez a4, .Lsimpleloop
# Main loop for aligned strings
.Lloop:
REG_L a2, 0(a0)
REG_L a3, 0(a1)
orc.b t0, a2
bne t0, t2, .Lfoundnull
addi a0, a0, SZREG
addi a1, a1, SZREG
beq a2, a3, .Lloop
# Words don't match, and no null byte in first word.
# Get bytes in big-endian order and compare.
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
rev8 a2, a2
rev8 a3, a3
#endif
# Synthesize (a2 >= a3) ? 1 : -1 in a branchless sequence.
sltu a0, a2, a3
neg a0, a0
ori a0, a0, 1
ret
.Lfoundnull:
# Found a null byte.
# If words don't match, fall back to simple loop.
bne a2, a3, .Lsimpleloop
# Otherwise, strings are equal.
li a0, 0
ret
# Simple loop for misaligned strings
.Lsimpleloop:
lbu a2, 0(a0)
lbu a3, 0(a1)
addi a0, a0, 1
addi a1, a1, 1
bne a2, a3, 1f
bnez a2, .Lsimpleloop
1:
sub a0, a2, a3
ret
.size strcmp, .-strcmp