PTO ISA Reference¶
This directory contains the per-instruction reference for the PTO Tile Lib ISA.
- Source of truth (C++ intrinsics):
include/pto/common/pto_instr.hpp - Common conventions (operands, events, modifiers)
Synchronization¶
- TSYNC - Synchronize PTO execution (wait on events or insert a per-op pipeline barrier).
Manual / Resource Binding¶
- TASSIGN - Bind a Tile object to an implementation-defined on-chip address (manual placement).
- TSETFMATRIX - Set FMATRIX register(s) for IMG2COL-like ops.
- TSET_IMG2COL_RPT - Set IMG2COL repeat metadata from an IMG2COL configuration tile.
- TSET_IMG2COL_PADDING - Set IMG2COL padding metadata from an IMG2COL configuration tile.
Elementwise (Tile-Tile)¶
- TADD - Elementwise add of two tiles.
- TABS - Elementwise absolute value of a tile.
- TAND - Elementwise bitwise AND of two tiles.
- TOR - Elementwise bitwise OR of two tiles.
- TSUB - Elementwise subtract of two tiles.
- TMUL - Elementwise multiply of two tiles.
- TMIN - Elementwise minimum of two tiles.
- TMAX - Elementwise maximum of two tiles.
- TCMP - Compare two tiles and write a packed predicate mask.
- TDIV - Elementwise division of two tiles.
- TSHL - Elementwise shift-left of two tiles.
- TSHR - Elementwise shift-right of two tiles.
- TXOR - Elementwise bitwise XOR of two tiles.
- TLOG - Elementwise natural logarithm of a tile.
- TRECIP - Elementwise reciprocal of a tile.
- TPRELU - Elementwise PReLU (parametric ReLU) with a per-element slope tile.
- TADDC - Elementwise ternary add:
src0 + src1 + src2. - TSUBC - Elementwise ternary op:
src0 - src1 + src2. - TCVT - Elementwise type conversion with a specified rounding mode.
- TSEL - Select between two tiles using a mask tile (per-element selection).
- TRSQRT - Elementwise reciprocal square root.
- TSQRT - Elementwise square root.
- TEXP - Elementwise exponential.
- TNOT - Elementwise bitwise NOT of a tile.
- TRELU - Elementwise ReLU of a tile.
- TNEG - Elementwise negation of a tile.
- TREM - Elementwise remainder of two tiles.
- TFMOD - Elementwise fmod of two tiles.
Tile-Scalar / Tile-Immediate¶
- TEXPANDS - Broadcast a scalar into a destination tile.
- TCMPS - Compare a tile against a scalar and write per-element comparison results.
- TSELS - Select one of two source tiles using a scalar
selectMode(global select). - TMINS - Elementwise minimum of a tile and a scalar.
- TADDS - Elementwise add a scalar to a tile.
- TSUBS - Elementwise subtract a scalar from a tile.
- TDIVS - Elementwise division with a scalar (tile/scalar or scalar/tile).
- TMULS - Elementwise multiply a tile by a scalar.
- TFMODS - Elementwise remainder with a scalar:
fmod(src, scalar). - TREMS - Elementwise remainder with a scalar:
remainder(src, scalar). - TMAXS - Elementwise max of a tile and a scalar:
max(src, scalar). - TANDS - Elementwise bitwise AND of a tile and a scalar.
- TORS - Elementwise bitwise OR of a tile and a scalar.
- TSHLS - Elementwise shift-left a tile by a scalar.
- TSHRS - Elementwise shift-right a tile by a scalar.
- TXORS - Elementwise bitwise XOR of a tile and a scalar.
- TLRELU - Leaky ReLU with a scalar slope.
- TADDSC - Elementwise fused add with scalar and a second tile:
src0 + scalar + src1. - TSUBSC - Elementwise fused op:
src0 - scalar + src1.
Axis Reduce / Expand¶
- TROWSUM - Reduce each row by summing across columns.
- TROWPROD - Reduce each row by multiplying across columns.
- TCOLSUM - Reduce each column by summing across rows.
- TCOLPROD - Reduce each column by multiplying across rows.
- TCOLMAX - Reduce each column by taking the maximum across rows.
- TROWMAX - Reduce each row by taking the maximum across columns.
- TROWMIN - Reduce each row by taking the minimum across columns.
- TROWEXPAND - Broadcast the first element of each source row across the destination row.
- TROWEXPANDDIV - Row-wise broadcast divide: divide each row of
src0by a per-row scalar vectorsrc1. - TROWEXPANDMUL - Row-wise broadcast multiply: multiply each row of
src0by a per-row scalar vectorsrc1. - TROWEXPANDSUB - Row-wise broadcast subtract: subtract a per-row scalar vector
src1from each row ofsrc0. - TROWEXPANDADD - Row-wise broadcast add: add a per-row scalar vector.
- TROWEXPANDMAX - Row-wise broadcast max with a per-row scalar vector.
- TROWEXPANDMIN - Row-wise broadcast min with a per-row scalar vector.
- TROWEXPANDEXPDIF - Row-wise exp-diff: compute exp(src0 - src1) with per-row scalars.
- TCOLMIN - Reduce each column by taking the minimum across rows.
- TCOLEXPAND - Broadcast the first element of each source column across the destination column.
- TCOLEXPANDDIV - Column-wise broadcast divide: divide each column by a per-column scalar vector.
- TCOLEXPANDMUL - Column-wise broadcast multiply: multiply each column by a per-column scalar vector.
- TCOLEXPANDADD - Column-wise broadcast add with per-column scalar vector.
- TCOLEXPANDMAX - Column-wise broadcast max with per-column scalar vector.
- TCOLEXPANDMIN - Column-wise broadcast min with per-column scalar vector.
- TCOLEXPANDSUB - Column-wise broadcast subtract: subtract a per-column scalar vector from each column.
- TCOLEXPANDEXPDIF - Column-wise exp-diff: compute exp(src0 - src1) with per-column scalars.
Memory (GM <-> Tile)¶
- TLOAD - Load data from a GlobalTensor (GM) into a Tile.
- TPREFETCH - Prefetch data from global memory into a tile-local cache/buffer (hint).
- TSTORE - Store data from a Tile into a GlobalTensor (GM), optionally using atomic write or quantization parameters.
- TSTORE_FP - Store an accumulator tile into global memory using a scaling (
fp) tile for vector quantization parameters. - MGATHER - Gather-load elements from global memory into a tile using per-element indices.
- MSCATTER - Scatter-store elements from a tile into global memory using per-element indices.
Matrix Multiply¶
- TGEMV_MX - GEMV with additional scaling tiles for mixed-precision / quantized matrix-vector compute.
- TMATMUL_MX - Matrix multiply (GEMM) with additional scaling tiles for mixed-precision / quantized matmul on supported targets.
- TMATMUL - Matrix multiply (GEMM) producing an accumulator/output tile.
- TMATMUL_ACC - Matrix multiply with accumulator input (fused accumulate).
- TMATMUL_BIAS - Matrix multiply with bias add.
- TGEMV - General Matrix-Vector multiplication producing an accumulator/output tile.
- TGEMV_ACC - GEMV with explicit accumulator input/output tiles.
- TGEMV_BIAS - GEMV with bias add.
Data Movement / Layout¶
- TEXTRACT - Extract a sub-tile from a source tile.
- TEXTRACT_FP - Extract with fp/scaling tile (vector-quantization parameters).
- TIMG2COL - Image-to-column transform for convolution-like workloads.
- TINSERT - Insert a sub-tile into a destination tile at an (indexRow, indexCol) offset.
- TINSERT_FP - Insert with fp/scaling tile (vector-quantization parameters).
- TFILLPAD - Copy+pad a tile outside the valid region with a compile-time pad value.
- TFILLPAD_INPLACE - In-place fill/pad variant.
- TFILLPAD_EXPAND - Fill/pad while allowing dst to be larger than src.
- TMOV - Move/copy between tiles, optionally applying implementation-defined conversion modes.
- TMOV_FP - Move/convert from an accumulator tile into a destination tile, using a scaling (
fp) tile for vector quantization parameters. - TRESHAPE - Reinterpret a tile as another tile type/shape while preserving the underlying bytes.
- TTRANS - Transpose with an implementation-defined temporary tile.
Complex¶
- TPRINT - Debug/print elements from a tile (implementation-defined).
- TMRGSORT - Merge sort for multiple sorted lists (implementation-defined element format and layout).
- TSORT32 - Sort a fixed-size 32-element block and produce an index mapping.
- TGATHER - Gather/select elements using either an index tile or a compile-time mask pattern.
- TCI - Generate a contiguous integer sequence into a destination tile.
- TTRI - Generate a triangular (lower/upper) mask tile.
- TPARTADD - Partial elementwise add with implementation-defined handling of mismatched valid regions.
- TPARTMUL - Partial elementwise multiply with implementation-defined handling of mismatched valid regions.
- TPARTMAX - Partial elementwise max with implementation-defined handling of mismatched valid regions.
- TPARTMIN - Partial elementwise min with implementation-defined handling of mismatched valid regions.
- TGATHERB - Gather elements using byte offsets.
- TSCATTER - Scatter rows of a source tile into a destination tile using per-element row indices.
- TQUANT - Quantize a tile (e.g. FP32 to FP8) producing exponent/scaling/max outputs.