Added support for AVX512 bfloat16 instructions
These are the three bfloat16 instructions.
VCVTNE2PS2BF16—Convert Two Packed Single Data to One Packed BF16 Data
EVEX.128.F2.0F38.W0 72 /r VCVTNE2PS2BF16 xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
EVEX.256.F2.0F38.W0 72 /r VCVTNE2PS2BF16 ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
EVEX.512.F2.0F38.W0 72 /r VCVTNE2PS2BF16 zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst
Op/En Tuple Operand 1 Operand 2 Operand 3
A Full ModRM:reg (w) EVEX.vvvv (r) ModRM:r/m (r)
VCVTNEPS2BF16—Convert Packed Single Data to Packed BF16 Data
EVEX.128.F3.0F38.W0 72 /r VCVTNEPS2BF16 xmm1{k1}{z}, xmm2/m128/m32bcst
EVEX.256.F3.0F38.W0 72 /r VCVTNEPS2BF16 xmm1{k1}{z}, ymm2/m256/m32bcst
EVEX.512.F3.0F38.W0 72 /r VCVTNEPS2BF16 ymm1{k1}{z}, zmm2/m512/m32bcst
Op/En Tuple Operand 1 Operand 2
A Full ModRM:reg (w) ModRM:r/m (r)
VDPBF16PS—Dot Product of BF16 Pairs Accumulated into Packed Single Precision
EVEX.128.F3.0F38.W0 52 /r VDPBF16PS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst
EVEX.256.F3.0F38.W0 52 /r VDPBF16PS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst
EVEX.512.F3.0F38.W0 52 /r VDPBF16PS zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst
Op/En Tuple Operand 1 Operand 2 Operand 3
A Full ModRM:reg (w) EVEX.vvvv (r) ModRM:r/m (r)
List of places to update
From https://github.com/DynamoRIO/dynamorio/blob/master/core/ir/x86/opcode_api.h#L53
* When adding new instructions, be sure to update all of these places:
* 1) decode_table op_instr array
* 2) decode_table decoding table entries
* 3) OP_ enum (here) via x86opnums.pl
* 4) update OP_LAST at end of enum here
* 5) decode_fast tables if necessary (they are conservative)
* 6) instr_create macros
* 7) suite/tests/api/ir* tests
* 8) add binutils tests in third_party/binutils/test_decenc
Step 1: update op_instr array
Added entries to op_instr. These point directly to evex_Wb_extensions since these instructions only have evex encoding.
Step 2: add decode_table entries
- updated
third_byte_38table to point toprefix_extensionssince these instructions have common opcodes and differ in prefix.- The instructions
VCVTNEPS2BF16andVCVTNE2PS2BF16have three byte opcodes starting with0f 38so the decoder looks atthird_byte_38[third_byte_38_index[opcode]] - Since these instructions have the same opcode (
72) and differ only in the prefix (f2/f3), we need to point thethird_byte_38toprefix_extensionswhich in turn points to the appropriateEVEX_Wbentries. - The instruction
VDPBF16PShas the same opcode (52) as the VNNI instructionvpdpwsdand they differ only in the prefix (F3/66). We need to update that entry to point toprefix_extensionsinstead ofe_vex_extensions. This causes thee_vex_extensionsentry (e_vex ext 151) to be orphaned - do we remove this entry?
- The instructions
- added entries in
prefix_extensionsto point to appropriate vex/evex entries - added leaf entries in
evex_Wb_extensions
Updated opcodes for invalid entries in e_vex ext 151 and 152 for consistency.
Step 3: add OP_ enums
Done
Step 4: update OP_LAST
Not needed since OP_LAST already points to the last enum.
Step 5: decode_fast tables if necessary
Not done
Step 6: instr_create macros
Added 1dst_3src macros for VCVTNE2PS2BF16 and VDPBF16PS since they write to operand 1 and read from mask register, operand 2, and operand 3.
Added 1dst_2src macro for VCVTNEPS2BF16 since it writes to operand 1 and reads from mask register and operand 2. We are setting the destination size explicitly since this writes to "half" the destination.
Step 7: suite/tests/api/ir tests
Added tests in ir_x86_3args_avx512_evex_mask.h and ir_x86_4args_avx512_evex_mask_C.h.
Currently commented out the VCVTNEPS2BF16 test because the destination size needs to be set explicitly.
Step 8: binutils tests
Added binutils tests that encode the assembly instructions using instr_create_.. APIs and match against the opcode bytes rather than the opposite because we don't produce disassembly that can match exactly against binutils disassembly.
These currently have two workarounds
- set dest size explicitly
- set zeroing prefix explicitly