| Synchronization |
TSYNC |
Synchronize PTO execution (wait on events or insert a per-op pipeline barrier). |
| Manual / Resource Binding |
TASSIGN |
Bind a Tile object to an implementation-defined on-chip address (manual placement). |
| Manual / Resource Binding |
TSETFMATRIX |
Set FMATRIX register(s) for IMG2COL-like ops. |
| Manual / Resource Binding |
TSET_IMG2COL_RPT |
Set IMG2COL repeat metadata from an IMG2COL configuration tile. |
| Manual / Resource Binding |
TSET_IMG2COL_PADDING |
Set IMG2COL padding metadata from an IMG2COL configuration tile. |
| Elementwise (Tile-Tile) |
TADD |
Elementwise add of two tiles. |
| Elementwise (Tile-Tile) |
TABS |
Elementwise absolute value of a tile. |
| Elementwise (Tile-Tile) |
TAND |
Elementwise bitwise AND of two tiles. |
| Elementwise (Tile-Tile) |
TOR |
Elementwise bitwise OR of two tiles. |
| Elementwise (Tile-Tile) |
TSUB |
Elementwise subtract of two tiles. |
| Elementwise (Tile-Tile) |
TMUL |
Elementwise multiply of two tiles. |
| Elementwise (Tile-Tile) |
TMIN |
Elementwise minimum of two tiles. |
| Elementwise (Tile-Tile) |
TMAX |
Elementwise maximum of two tiles. |
| Elementwise (Tile-Tile) |
TCMP |
Compare two tiles and write a packed predicate mask. |
| Elementwise (Tile-Tile) |
TDIV |
Elementwise division of two tiles. |
| Elementwise (Tile-Tile) |
TSHL |
Elementwise shift-left of two tiles. |
| Elementwise (Tile-Tile) |
TSHR |
Elementwise shift-right of two tiles. |
| Elementwise (Tile-Tile) |
TXOR |
Elementwise bitwise XOR of two tiles. |
| Elementwise (Tile-Tile) |
TLOG |
Elementwise natural logarithm of a tile. |
| Elementwise (Tile-Tile) |
TRECIP |
Elementwise reciprocal of a tile. |
| Elementwise (Tile-Tile) |
TPRELU |
Elementwise PReLU (parametric ReLU) with a per-element slope tile. |
| Elementwise (Tile-Tile) |
TADDC |
Elementwise ternary add: src0 + src1 + src2. |
| Elementwise (Tile-Tile) |
TSUBC |
Elementwise ternary op: src0 - src1 + src2. |
| Elementwise (Tile-Tile) |
TCVT |
Elementwise type conversion with a specified rounding mode. |
| Elementwise (Tile-Tile) |
TSEL |
Select between two tiles using a mask tile (per-element selection). |
| Elementwise (Tile-Tile) |
TRSQRT |
Elementwise reciprocal square root. |
| Elementwise (Tile-Tile) |
TSQRT |
Elementwise square root. |
| Elementwise (Tile-Tile) |
TEXP |
Elementwise exponential. |
| Elementwise (Tile-Tile) |
TNOT |
Elementwise bitwise NOT of a tile. |
| Elementwise (Tile-Tile) |
TRELU |
Elementwise ReLU of a tile. |
| Elementwise (Tile-Tile) |
TNEG |
Elementwise negation of a tile. |
| Elementwise (Tile-Tile) |
TREM |
Elementwise remainder of two tiles. |
| Elementwise (Tile-Tile) |
TFMOD |
Elementwise fmod of two tiles. |
| Tile-Scalar / Tile-Immediate |
TEXPANDS |
Broadcast a scalar into a destination tile. |
| Tile-Scalar / Tile-Immediate |
TCMPS |
Compare a tile against a scalar and write per-element comparison results. |
| Tile-Scalar / Tile-Immediate |
TSELS |
Select between source tile and scalar using a mask tile (per-element selection for source tile). |
| Tile-Scalar / Tile-Immediate |
TMINS |
Elementwise minimum of a tile and a scalar. |
| Tile-Scalar / Tile-Immediate |
TADDS |
Elementwise add a scalar to a tile. |
| Tile-Scalar / Tile-Immediate |
TSUBS |
Elementwise subtract a scalar from a tile. |
| Tile-Scalar / Tile-Immediate |
TDIVS |
Elementwise division with a scalar (tile/scalar or scalar/tile). |
| Tile-Scalar / Tile-Immediate |
TMULS |
Elementwise multiply a tile by a scalar. |
| Tile-Scalar / Tile-Immediate |
TFMODS |
Elementwise remainder with a scalar: fmod(src, scalar). |
| Tile-Scalar / Tile-Immediate |
TREMS |
Elementwise remainder with a scalar: remainder(src, scalar). |
| Tile-Scalar / Tile-Immediate |
TMAXS |
Elementwise max of a tile and a scalar: max(src, scalar). |
| Tile-Scalar / Tile-Immediate |
TANDS |
Elementwise bitwise AND of a tile and a scalar. |
| Tile-Scalar / Tile-Immediate |
TORS |
Elementwise bitwise OR of a tile and a scalar. |
| Tile-Scalar / Tile-Immediate |
TSHLS |
Elementwise shift-left a tile by a scalar. |
| Tile-Scalar / Tile-Immediate |
TSHRS |
Elementwise shift-right a tile by a scalar. |
| Tile-Scalar / Tile-Immediate |
TXORS |
Elementwise bitwise XOR of a tile and a scalar. |
| Tile-Scalar / Tile-Immediate |
TLRELU |
Leaky ReLU with a scalar slope. |
| Tile-Scalar / Tile-Immediate |
TADDSC |
Elementwise fused add with scalar and a second tile: src0 + scalar + src1. |
| Tile-Scalar / Tile-Immediate |
TSUBSC |
Elementwise fused op: src0 - scalar + src1. |
| Axis Reduce / Expand |
TROWSUM |
Reduce each row by summing across columns. |
| Axis Reduce / Expand |
TROWPROD |
Reduce each row by multiplying across columns. |
| Axis Reduce / Expand |
TCOLSUM |
Reduce each column by summing across rows. |
| Axis Reduce / Expand |
TCOLPROD |
Reduce each column by multiplying across rows. |
| Axis Reduce / Expand |
TCOLMAX |
Reduce each column by taking the maximum across rows. |
| Axis Reduce / Expand |
TROWMAX |
Reduce each row by taking the maximum across columns. |
| Axis Reduce / Expand |
TROWMIN |
Reduce each row by taking the minimum across columns. |
| Axis Reduce / Expand |
TROWEXPAND |
Broadcast the first element of each source row across the destination row. |
| Axis Reduce / Expand |
TROWEXPANDDIV |
Row-wise broadcast divide: divide each row of src0 by a per-row scalar vector src1. |
| Axis Reduce / Expand |
TROWEXPANDMUL |
Row-wise broadcast multiply: multiply each row of src0 by a per-row scalar vector src1. |
| Axis Reduce / Expand |
TROWEXPANDSUB |
Row-wise broadcast subtract: subtract a per-row scalar vector src1 from each row of src0. |
| Axis Reduce / Expand |
TROWEXPANDADD |
Row-wise broadcast add: add a per-row scalar vector. |
| Axis Reduce / Expand |
TROWEXPANDMAX |
Row-wise broadcast max with a per-row scalar vector. |
| Axis Reduce / Expand |
TROWEXPANDMIN |
Row-wise broadcast min with a per-row scalar vector. |
| Axis Reduce / Expand |
TROWEXPANDEXPDIF |
Row-wise exp-diff: compute exp(src0 - src1) with per-row scalars. |
| Axis Reduce / Expand |
TCOLMIN |
Reduce each column by taking the minimum across rows. |
| Axis Reduce / Expand |
TCOLEXPAND |
Broadcast the first element of each source column across the destination column. |
| Axis Reduce / Expand |
TCOLEXPANDDIV |
Column-wise broadcast divide: divide each column by a per-column scalar vector. |
| Axis Reduce / Expand |
TCOLEXPANDMUL |
Column-wise broadcast multiply: multiply each column by a per-column scalar vector. |
| Axis Reduce / Expand |
TCOLEXPANDADD |
Column-wise broadcast add with per-column scalar vector. |
| Axis Reduce / Expand |
TCOLEXPANDMAX |
Column-wise broadcast max with per-column scalar vector. |
| Axis Reduce / Expand |
TCOLEXPANDMIN |
Column-wise broadcast min with per-column scalar vector. |
| Axis Reduce / Expand |
TCOLEXPANDSUB |
Column-wise broadcast subtract: subtract a per-column scalar vector from each column. |
| Axis Reduce / Expand |
TCOLEXPANDEXPDIF |
Column-wise exp-diff: compute exp(src0 - src1) with per-column scalars. |
| Memory (GM <-> Tile) |
TLOAD |
Load data from a GlobalTensor (GM) into a Tile. |
| Memory (GM <-> Tile) |
TPREFETCH |
Prefetch data from global memory into a tile-local cache/buffer (hint). |
| Memory (GM <-> Tile) |
TSTORE |
Store data from a Tile into a GlobalTensor (GM), optionally using atomic write or quantization parameters. |
| Memory (GM <-> Tile) |
TSTORE_FP |
Store an accumulator tile into global memory using a scaling (fp) tile for vector quantization parameters. |
| Memory (GM <-> Tile) |
MGATHER |
Gather-load elements from global memory into a tile using per-element indices. |
| Memory (GM <-> Tile) |
MSCATTER |
Scatter-store elements from a tile into global memory using per-element indices. |
| Matrix Multiply |
TGEMV_MX |
GEMV with additional scaling tiles for mixed-precision / quantized matrix-vector compute. |
| Matrix Multiply |
TMATMUL_MX |
Matrix multiply (GEMM) with additional scaling tiles for mixed-precision / quantized matmul on supported targets. |
| Matrix Multiply |
TMATMUL |
Matrix multiply (GEMM) producing an accumulator/output tile. |
| Matrix Multiply |
TMATMUL_ACC |
Matrix multiply with accumulator input (fused accumulate). |
| Matrix Multiply |
TMATMUL_BIAS |
Matrix multiply with bias add. |
| Matrix Multiply |
TGEMV |
General Matrix-Vector multiplication producing an accumulator/output tile. |
| Matrix Multiply |
TGEMV_ACC |
GEMV with explicit accumulator input/output tiles. |
| Matrix Multiply |
TGEMV_BIAS |
GEMV with bias add. |
| Data Movement / Layout |
TEXTRACT |
Extract a sub-tile from a source tile. |
| Data Movement / Layout |
TEXTRACT_FP |
Extract with fp/scaling tile (vector-quantization parameters). |
| Data Movement / Layout |
TIMG2COL |
Image-to-column transform for convolution-like workloads. |
| Data Movement / Layout |
TINSERT |
Insert a sub-tile into a destination tile at an (indexRow, indexCol) offset. |
| Data Movement / Layout |
TINSERT_FP |
Insert with fp/scaling tile (vector-quantization parameters). |
| Data Movement / Layout |
TFILLPAD |
Copy+pad a tile outside the valid region with a compile-time pad value. |
| Data Movement / Layout |
TFILLPAD_INPLACE |
In-place fill/pad variant. |
| Data Movement / Layout |
TFILLPAD_EXPAND |
Fill/pad while allowing dst to be larger than src. |
| Data Movement / Layout |
TMOV |
Move/copy between tiles, optionally applying implementation-defined conversion modes. |
| Data Movement / Layout |
TMOV_FP |
Move/convert from an accumulator tile into a destination tile, using a scaling (fp) tile for vector quantization parameters. |
| Data Movement / Layout |
TRESHAPE |
Reinterpret a tile as another tile type/shape while preserving the underlying bytes. |
| Data Movement / Layout |
TTRANS |
Transpose with an implementation-defined temporary tile. |
| Complex |
TPRINT |
Debug/print elements from a tile (implementation-defined). |
| Complex |
TMRGSORT |
Merge sort for multiple sorted lists (implementation-defined element format and layout). |
| Complex |
TSORT32 |
Sort a fixed-size 32-element block and produce an index mapping. |
| Complex |
TGATHER |
Gather/select elements using either an index tile or a compile-time mask pattern. |
| Complex |
TCI |
Generate a contiguous integer sequence into a destination tile. |
| Complex |
TTRI |
Generate a triangular (lower/upper) mask tile. |
| Complex |
TPARTADD |
Partial elementwise add with implementation-defined handling of mismatched valid regions. |
| Complex |
TPARTMUL |
Partial elementwise multiply with implementation-defined handling of mismatched valid regions. |
| Complex |
TPARTMAX |
Partial elementwise max with implementation-defined handling of mismatched valid regions. |
| Complex |
TPARTMIN |
Partial elementwise min with implementation-defined handling of mismatched valid regions. |
| Complex |
TGATHERB |
Gather elements using byte offsets. |
| Complex |
TSCATTER |
Scatter rows of a source tile into a destination tile using per-element row indices. |
| Complex |
TQUANT |
Quantize a tile (e.g. FP32 to FP8) producing exponent/scaling/max outputs. |