PTO Tile Lib

PTO ISA Reference

This directory contains the per-instruction reference for the PTO Tile Lib ISA.

Synchronization

  • TSYNC - Synchronize PTO execution (wait on events or insert a per-op pipeline barrier).

Manual / Resource Binding

  • TASSIGN - Bind a Tile object to an implementation-defined on-chip address (manual placement).
  • TSETFMATRIX - Set FMATRIX register(s) for IMG2COL-like ops.
  • TSET_IMG2COL_RPT - Set IMG2COL repeat metadata from an IMG2COL configuration tile.
  • TSET_IMG2COL_PADDING - Set IMG2COL padding metadata from an IMG2COL configuration tile.

Elementwise (Tile-Tile)

  • TADD - Elementwise add of two tiles.
  • TABS - Elementwise absolute value of a tile.
  • TAND - Elementwise bitwise AND of two tiles.
  • TOR - Elementwise bitwise OR of two tiles.
  • TSUB - Elementwise subtract of two tiles.
  • TMUL - Elementwise multiply of two tiles.
  • TMIN - Elementwise minimum of two tiles.
  • TMAX - Elementwise maximum of two tiles.
  • TCMP - Compare two tiles and write a packed predicate mask.
  • TDIV - Elementwise division of two tiles.
  • TSHL - Elementwise shift-left of two tiles.
  • TSHR - Elementwise shift-right of two tiles.
  • TXOR - Elementwise bitwise XOR of two tiles.
  • TLOG - Elementwise natural logarithm of a tile.
  • TRECIP - Elementwise reciprocal of a tile.
  • TPRELU - Elementwise PReLU (parametric ReLU) with a per-element slope tile.
  • TADDC - Elementwise ternary add: src0 + src1 + src2.
  • TSUBC - Elementwise ternary op: src0 - src1 + src2.
  • TCVT - Elementwise type conversion with a specified rounding mode.
  • TSEL - Select between two tiles using a mask tile (per-element selection).
  • TRSQRT - Elementwise reciprocal square root.
  • TSQRT - Elementwise square root.
  • TEXP - Elementwise exponential.
  • TNOT - Elementwise bitwise NOT of a tile.
  • TRELU - Elementwise ReLU of a tile.
  • TNEG - Elementwise negation of a tile.
  • TREM - Elementwise remainder of two tiles.
  • TFMOD - Elementwise fmod of two tiles.

Tile-Scalar / Tile-Immediate

  • TEXPANDS - Broadcast a scalar into a destination tile.
  • TCMPS - Compare a tile against a scalar and write per-element comparison results.
  • TSELS - Select one of two source tiles using a scalar selectMode (global select).
  • TMINS - Elementwise minimum of a tile and a scalar.
  • TADDS - Elementwise add a scalar to a tile.
  • TSUBS - Elementwise subtract a scalar from a tile.
  • TDIVS - Elementwise division with a scalar (tile/scalar or scalar/tile).
  • TMULS - Elementwise multiply a tile by a scalar.
  • TFMODS - Elementwise remainder with a scalar: fmod(src, scalar).
  • TREMS - Elementwise remainder with a scalar: remainder(src, scalar).
  • TMAXS - Elementwise max of a tile and a scalar: max(src, scalar).
  • TANDS - Elementwise bitwise AND of a tile and a scalar.
  • TORS - Elementwise bitwise OR of a tile and a scalar.
  • TSHLS - Elementwise shift-left a tile by a scalar.
  • TSHRS - Elementwise shift-right a tile by a scalar.
  • TXORS - Elementwise bitwise XOR of a tile and a scalar.
  • TLRELU - Leaky ReLU with a scalar slope.
  • TADDSC - Elementwise fused add with scalar and a second tile: src0 + scalar + src1.
  • TSUBSC - Elementwise fused op: src0 - scalar + src1.

Axis Reduce / Expand

  • TROWSUM - Reduce each row by summing across columns.
  • TROWPROD - Reduce each row by multiplying across columns.
  • TCOLSUM - Reduce each column by summing across rows.
  • TCOLPROD - Reduce each column by multiplying across rows.
  • TCOLMAX - Reduce each column by taking the maximum across rows.
  • TROWMAX - Reduce each row by taking the maximum across columns.
  • TROWMIN - Reduce each row by taking the minimum across columns.
  • TROWEXPAND - Broadcast the first element of each source row across the destination row.
  • TROWEXPANDDIV - Row-wise broadcast divide: divide each row of src0 by a per-row scalar vector src1.
  • TROWEXPANDMUL - Row-wise broadcast multiply: multiply each row of src0 by a per-row scalar vector src1.
  • TROWEXPANDSUB - Row-wise broadcast subtract: subtract a per-row scalar vector src1 from each row of src0.
  • TROWEXPANDADD - Row-wise broadcast add: add a per-row scalar vector.
  • TROWEXPANDMAX - Row-wise broadcast max with a per-row scalar vector.
  • TROWEXPANDMIN - Row-wise broadcast min with a per-row scalar vector.
  • TROWEXPANDEXPDIF - Row-wise exp-diff: compute exp(src0 - src1) with per-row scalars.
  • TCOLMIN - Reduce each column by taking the minimum across rows.
  • TCOLEXPAND - Broadcast the first element of each source column across the destination column.
  • TCOLEXPANDDIV - Column-wise broadcast divide: divide each column by a per-column scalar vector.
  • TCOLEXPANDMUL - Column-wise broadcast multiply: multiply each column by a per-column scalar vector.
  • TCOLEXPANDADD - Column-wise broadcast add with per-column scalar vector.
  • TCOLEXPANDMAX - Column-wise broadcast max with per-column scalar vector.
  • TCOLEXPANDMIN - Column-wise broadcast min with per-column scalar vector.
  • TCOLEXPANDSUB - Column-wise broadcast subtract: subtract a per-column scalar vector from each column.
  • TCOLEXPANDEXPDIF - Column-wise exp-diff: compute exp(src0 - src1) with per-column scalars.

Memory (GM <-> Tile)

  • TLOAD - Load data from a GlobalTensor (GM) into a Tile.
  • TPREFETCH - Prefetch data from global memory into a tile-local cache/buffer (hint).
  • TSTORE - Store data from a Tile into a GlobalTensor (GM), optionally using atomic write or quantization parameters.
  • TSTORE_FP - Store an accumulator tile into global memory using a scaling (fp) tile for vector quantization parameters.
  • MGATHER - Gather-load elements from global memory into a tile using per-element indices.
  • MSCATTER - Scatter-store elements from a tile into global memory using per-element indices.

Matrix Multiply

  • TGEMV_MX - GEMV with additional scaling tiles for mixed-precision / quantized matrix-vector compute.
  • TMATMUL_MX - Matrix multiply (GEMM) with additional scaling tiles for mixed-precision / quantized matmul on supported targets.
  • TMATMUL - Matrix multiply (GEMM) producing an accumulator/output tile.
  • TMATMUL_ACC - Matrix multiply with accumulator input (fused accumulate).
  • TMATMUL_BIAS - Matrix multiply with bias add.
  • TGEMV - General Matrix-Vector multiplication producing an accumulator/output tile.
  • TGEMV_ACC - GEMV with explicit accumulator input/output tiles.
  • TGEMV_BIAS - GEMV with bias add.

Data Movement / Layout

  • TEXTRACT - Extract a sub-tile from a source tile.
  • TEXTRACT_FP - Extract with fp/scaling tile (vector-quantization parameters).
  • TIMG2COL - Image-to-column transform for convolution-like workloads.
  • TINSERT - Insert a sub-tile into a destination tile at an (indexRow, indexCol) offset.
  • TINSERT_FP - Insert with fp/scaling tile (vector-quantization parameters).
  • TFILLPAD - Copy+pad a tile outside the valid region with a compile-time pad value.
  • TFILLPAD_INPLACE - In-place fill/pad variant.
  • TFILLPAD_EXPAND - Fill/pad while allowing dst to be larger than src.
  • TMOV - Move/copy between tiles, optionally applying implementation-defined conversion modes.
  • TMOV_FP - Move/convert from an accumulator tile into a destination tile, using a scaling (fp) tile for vector quantization parameters.
  • TRESHAPE - Reinterpret a tile as another tile type/shape while preserving the underlying bytes.
  • TTRANS - Transpose with an implementation-defined temporary tile.

Complex

  • TPRINT - Debug/print elements from a tile (implementation-defined).
  • TMRGSORT - Merge sort for multiple sorted lists (implementation-defined element format and layout).
  • TSORT32 - Sort a fixed-size 32-element block and produce an index mapping.
  • TGATHER - Gather/select elements using either an index tile or a compile-time mask pattern.
  • TCI - Generate a contiguous integer sequence into a destination tile.
  • TTRI - Generate a triangular (lower/upper) mask tile.
  • TPARTADD - Partial elementwise add with implementation-defined handling of mismatched valid regions.
  • TPARTMUL - Partial elementwise multiply with implementation-defined handling of mismatched valid regions.
  • TPARTMAX - Partial elementwise max with implementation-defined handling of mismatched valid regions.
  • TPARTMIN - Partial elementwise min with implementation-defined handling of mismatched valid regions.
  • TGATHERB - Gather elements using byte offsets.
  • TSCATTER - Scatter rows of a source tile into a destination tile using per-element row indices.
  • TQUANT - Quantize a tile (e.g. FP32 to FP8) producing exponent/scaling/max outputs.