pmxvbf16ger2

Prefixed Masked VSX Vector BFloat16 Ger (Rank-2 Update)

pmxvbf16ger2 AT, XA, XB, XMSK, YMSK

Matrix Multiply Assist (MMA) instruction. Computes ACC <- ACC + (A * B) using BF16 inputs.

Details

Prefixed MMA instruction that performs a masked rank-2 generalized matrix multiply using BFloat16 inputs. Computes ACC(AT) ← ACC(AT) + (A × B) with element-wise masking via XMSK (for rows of A) and YMSK (for columns of B). Requires MMA support and VSX category. Updates CR6 to reflect accumulator saturation status.

Pseudocode Operation

acc ← ACC[AT]
for i in 0..1 do
  for j in 0..3 do
    if XMSK[i] = 1 ∧ YMSK[j] = 1 then
      acc[i,j] ← acc[i,j] + (BF16(XA[i]) × BF16(XB[j]))
ACC[AT] ← acc
CR6 ← saturation_status(acc)

Programming Note

The pmxvbf16ger2 instruction is useful for performing matrix operations on bfloat16 data types with masking, allowing selective computation based on a mask. Ensure that the input vectors and accumulator are properly aligned to avoid performance penalties. This instruction operates at the user privilege level and will raise an exception if the result exceeds the 32-bit signed integer range, requiring saturation handling.

Example

pmxvbf16ger2 0, 1, 2, 0, 0

// AI Tensor Core operation.

Encoding

Binary Layout

PMSK

XMSK

YMSK

Format MMIRR-form

Opcode 0x06000000

Extension Prefixed

Operands

AT
Accumulator (0-7)
XA
Vector A
XB
Vector B
XMSK
Mask for A
YMSK
Mask for B