PTO ISA Overview

This page is the source-synchronized ISA index generated from docs/isa/manifest.yaml.

Docs Contents

Area Page Description
Overview docs/README.md PTO ISA guide entry point and navigation.
Overview docs/PTOISA.md This page (overview + full instruction index).
ISA reference docs/isa/README.md Per-instruction reference directory index.
ISA reference docs/isa/conventions.md Shared notation, operands, events, and modifiers.
Assembly (PTO-AS) docs/assembly/PTO-AS.md PTO-AS syntax reference.
Source of truth include/pto/common/pto_instr.hpp C++ intrinsic API (authoritative).

Instruction Index (All PTO Instructions)

Category Instruction Description
Synchronization TSYNC Synchronize PTO execution (wait on events or insert a per-op pipeline barrier).
Manual / Resource Binding TASSIGN Bind a Tile object to an implementation-defined on-chip address (manual placement).
Manual / Resource Binding TSETFMATRIX Set FMATRIX register(s) for IMG2COL-like ops.
Manual / Resource Binding TSET_IMG2COL_RPT Set IMG2COL repeat metadata from an IMG2COL configuration tile.
Manual / Resource Binding TSET_IMG2COL_PADDING Set IMG2COL padding metadata from an IMG2COL configuration tile.
Elementwise (Tile-Tile) TADD Elementwise add of two tiles.
Elementwise (Tile-Tile) TABS Elementwise absolute value of a tile.
Elementwise (Tile-Tile) TAND Elementwise bitwise AND of two tiles.
Elementwise (Tile-Tile) TOR Elementwise bitwise OR of two tiles.
Elementwise (Tile-Tile) TSUB Elementwise subtract of two tiles.
Elementwise (Tile-Tile) TMUL Elementwise multiply of two tiles.
Elementwise (Tile-Tile) TMIN Elementwise minimum of two tiles.
Elementwise (Tile-Tile) TMAX Elementwise maximum of two tiles.
Elementwise (Tile-Tile) TCMP Compare two tiles and write a packed predicate mask.
Elementwise (Tile-Tile) TDIV Elementwise division of two tiles.
Elementwise (Tile-Tile) TSHL Elementwise shift-left of two tiles.
Elementwise (Tile-Tile) TSHR Elementwise shift-right of two tiles.
Elementwise (Tile-Tile) TXOR Elementwise bitwise XOR of two tiles.
Elementwise (Tile-Tile) TLOG Elementwise natural logarithm of a tile.
Elementwise (Tile-Tile) TRECIP Elementwise reciprocal of a tile.
Elementwise (Tile-Tile) TPRELU Elementwise PReLU (parametric ReLU) with a per-element slope tile.
Elementwise (Tile-Tile) TADDC Elementwise ternary add: src0 + src1 + src2.
Elementwise (Tile-Tile) TSUBC Elementwise ternary op: src0 - src1 + src2.
Elementwise (Tile-Tile) TCVT Elementwise type conversion with a specified rounding mode.
Elementwise (Tile-Tile) TSEL Select between two tiles using a mask tile (per-element selection).
Elementwise (Tile-Tile) TRSQRT Elementwise reciprocal square root.
Elementwise (Tile-Tile) TSQRT Elementwise square root.
Elementwise (Tile-Tile) TEXP Elementwise exponential.
Elementwise (Tile-Tile) TNOT Elementwise bitwise NOT of a tile.
Elementwise (Tile-Tile) TRELU Elementwise ReLU of a tile.
Elementwise (Tile-Tile) TNEG Elementwise negation of a tile.
Elementwise (Tile-Tile) TREM Elementwise remainder of two tiles.
Elementwise (Tile-Tile) TFMOD Elementwise fmod of two tiles.
Tile-Scalar / Tile-Immediate TEXPANDS Broadcast a scalar into a destination tile.
Tile-Scalar / Tile-Immediate TCMPS Compare a tile against a scalar and write per-element comparison results.
Tile-Scalar / Tile-Immediate TSELS Select between source tile and scalar using a mask tile (per-element selection for source tile).
Tile-Scalar / Tile-Immediate TMINS Elementwise minimum of a tile and a scalar.
Tile-Scalar / Tile-Immediate TADDS Elementwise add a scalar to a tile.
Tile-Scalar / Tile-Immediate TSUBS Elementwise subtract a scalar from a tile.
Tile-Scalar / Tile-Immediate TDIVS Elementwise division with a scalar (tile/scalar or scalar/tile).
Tile-Scalar / Tile-Immediate TMULS Elementwise multiply a tile by a scalar.
Tile-Scalar / Tile-Immediate TFMODS Elementwise remainder with a scalar: fmod(src, scalar).
Tile-Scalar / Tile-Immediate TREMS Elementwise remainder with a scalar: remainder(src, scalar).
Tile-Scalar / Tile-Immediate TMAXS Elementwise max of a tile and a scalar: max(src, scalar).
Tile-Scalar / Tile-Immediate TANDS Elementwise bitwise AND of a tile and a scalar.
Tile-Scalar / Tile-Immediate TORS Elementwise bitwise OR of a tile and a scalar.
Tile-Scalar / Tile-Immediate TSHLS Elementwise shift-left a tile by a scalar.
Tile-Scalar / Tile-Immediate TSHRS Elementwise shift-right a tile by a scalar.
Tile-Scalar / Tile-Immediate TXORS Elementwise bitwise XOR of a tile and a scalar.
Tile-Scalar / Tile-Immediate TLRELU Leaky ReLU with a scalar slope.
Tile-Scalar / Tile-Immediate TADDSC Elementwise fused add with scalar and a second tile: src0 + scalar + src1.
Tile-Scalar / Tile-Immediate TSUBSC Elementwise fused op: src0 - scalar + src1.
Axis Reduce / Expand TROWSUM Reduce each row by summing across columns.
Axis Reduce / Expand TROWPROD Reduce each row by multiplying across columns.
Axis Reduce / Expand TCOLSUM Reduce each column by summing across rows.
Axis Reduce / Expand TCOLPROD Reduce each column by multiplying across rows.
Axis Reduce / Expand TCOLMAX Reduce each column by taking the maximum across rows.
Axis Reduce / Expand TROWMAX Reduce each row by taking the maximum across columns.
Axis Reduce / Expand TROWMIN Reduce each row by taking the minimum across columns.
Axis Reduce / Expand TROWEXPAND Broadcast the first element of each source row across the destination row.
Axis Reduce / Expand TROWEXPANDDIV Row-wise broadcast divide: divide each row of src0 by a per-row scalar vector src1.
Axis Reduce / Expand TROWEXPANDMUL Row-wise broadcast multiply: multiply each row of src0 by a per-row scalar vector src1.
Axis Reduce / Expand TROWEXPANDSUB Row-wise broadcast subtract: subtract a per-row scalar vector src1 from each row of src0.
Axis Reduce / Expand TROWEXPANDADD Row-wise broadcast add: add a per-row scalar vector.
Axis Reduce / Expand TROWEXPANDMAX Row-wise broadcast max with a per-row scalar vector.
Axis Reduce / Expand TROWEXPANDMIN Row-wise broadcast min with a per-row scalar vector.
Axis Reduce / Expand TROWEXPANDEXPDIF Row-wise exp-diff: compute exp(src0 - src1) with per-row scalars.
Axis Reduce / Expand TCOLMIN Reduce each column by taking the minimum across rows.
Axis Reduce / Expand TCOLEXPAND Broadcast the first element of each source column across the destination column.
Axis Reduce / Expand TCOLEXPANDDIV Column-wise broadcast divide: divide each column by a per-column scalar vector.
Axis Reduce / Expand TCOLEXPANDMUL Column-wise broadcast multiply: multiply each column by a per-column scalar vector.
Axis Reduce / Expand TCOLEXPANDADD Column-wise broadcast add with per-column scalar vector.
Axis Reduce / Expand TCOLEXPANDMAX Column-wise broadcast max with per-column scalar vector.
Axis Reduce / Expand TCOLEXPANDMIN Column-wise broadcast min with per-column scalar vector.
Axis Reduce / Expand TCOLEXPANDSUB Column-wise broadcast subtract: subtract a per-column scalar vector from each column.
Axis Reduce / Expand TCOLEXPANDEXPDIF Column-wise exp-diff: compute exp(src0 - src1) with per-column scalars.
Memory (GM <-> Tile) TLOAD Load data from a GlobalTensor (GM) into a Tile.
Memory (GM <-> Tile) TPREFETCH Prefetch data from global memory into a tile-local cache/buffer (hint).
Memory (GM <-> Tile) TSTORE Store data from a Tile into a GlobalTensor (GM), optionally using atomic write or quantization parameters.
Memory (GM <-> Tile) TSTORE_FP Store an accumulator tile into global memory using a scaling (fp) tile for vector quantization parameters.
Memory (GM <-> Tile) MGATHER Gather-load elements from global memory into a tile using per-element indices.
Memory (GM <-> Tile) MSCATTER Scatter-store elements from a tile into global memory using per-element indices.
Matrix Multiply TGEMV_MX GEMV with additional scaling tiles for mixed-precision / quantized matrix-vector compute.
Matrix Multiply TMATMUL_MX Matrix multiply (GEMM) with additional scaling tiles for mixed-precision / quantized matmul on supported targets.
Matrix Multiply TMATMUL Matrix multiply (GEMM) producing an accumulator/output tile.
Matrix Multiply TMATMUL_ACC Matrix multiply with accumulator input (fused accumulate).
Matrix Multiply TMATMUL_BIAS Matrix multiply with bias add.
Matrix Multiply TGEMV General Matrix-Vector multiplication producing an accumulator/output tile.
Matrix Multiply TGEMV_ACC GEMV with explicit accumulator input/output tiles.
Matrix Multiply TGEMV_BIAS GEMV with bias add.
Data Movement / Layout TEXTRACT Extract a sub-tile from a source tile.
Data Movement / Layout TEXTRACT_FP Extract with fp/scaling tile (vector-quantization parameters).
Data Movement / Layout TIMG2COL Image-to-column transform for convolution-like workloads.
Data Movement / Layout TINSERT Insert a sub-tile into a destination tile at an (indexRow, indexCol) offset.
Data Movement / Layout TINSERT_FP Insert with fp/scaling tile (vector-quantization parameters).
Data Movement / Layout TFILLPAD Copy+pad a tile outside the valid region with a compile-time pad value.
Data Movement / Layout TFILLPAD_INPLACE In-place fill/pad variant.
Data Movement / Layout TFILLPAD_EXPAND Fill/pad while allowing dst to be larger than src.
Data Movement / Layout TMOV Move/copy between tiles, optionally applying implementation-defined conversion modes.
Data Movement / Layout TMOV_FP Move/convert from an accumulator tile into a destination tile, using a scaling (fp) tile for vector quantization parameters.
Data Movement / Layout TRESHAPE Reinterpret a tile as another tile type/shape while preserving the underlying bytes.
Data Movement / Layout TTRANS Transpose with an implementation-defined temporary tile.
Complex TPRINT Debug/print elements from a tile (implementation-defined).
Complex TMRGSORT Merge sort for multiple sorted lists (implementation-defined element format and layout).
Complex TSORT32 Sort a fixed-size 32-element block and produce an index mapping.
Complex TGATHER Gather/select elements using either an index tile or a compile-time mask pattern.
Complex TCI Generate a contiguous integer sequence into a destination tile.
Complex TTRI Generate a triangular (lower/upper) mask tile.
Complex TPARTADD Partial elementwise add with implementation-defined handling of mismatched valid regions.
Complex TPARTMUL Partial elementwise multiply with implementation-defined handling of mismatched valid regions.
Complex TPARTMAX Partial elementwise max with implementation-defined handling of mismatched valid regions.
Complex TPARTMIN Partial elementwise min with implementation-defined handling of mismatched valid regions.
Complex TGATHERB Gather elements using byte offsets.
Complex TSCATTER Scatter rows of a source tile into a destination tile using per-element row indices.
Complex TQUANT Quantize a tile (e.g. FP32 to FP8) producing exponent/scaling/max outputs.