Tutorial: Row Softmax (Building Block)¶

Row-softmax is a standard pattern used in attention kernels. The PTO tile-level decomposition is:

row_max = TROWMAX(x) → [M, 1]
x = x - expand(row_max) (TROWEXPAND + TSUB)
x = exp(x) (TEXP)
row_sum = TROWSUM(x) → [M, 1]
x = x / expand(row_sum) (TROWEXPAND + TDIV)

Single-tile example¶

#include <pto/pto-inst.hpp>
using namespace pto;

template <typename T, int M, int N>
AICORE void RowSoftmaxOneTile(__gm__ T* out, __gm__ T* in) {
  using GT = GT2D<T, M, N>;
  using XTile = Tile<TileType::Vec, T, M, N, BLayout::RowMajor, DYNAMIC, DYNAMIC>;
  using Col1 = Tile<TileType::Vec, T, M, 1, BLayout::RowMajor, DYNAMIC, DYNAMIC>;

  GT gin(in), gout(out);
  XTile x(M, N), tmp(M, N);
  Col1 row_max(M, 1), row_sum(M, 1);

  TLOAD(x, gin);

  TROWMAX(row_max, x);
  TROWEXPAND(tmp, row_max);
  TSUB(x, x, tmp);

  TEXP(x, x);

  TROWSUM(row_sum, x);
  TROWEXPAND(tmp, row_sum);
  TDIV(x, x, tmp);

  TSTORE(gout, x);
}

Notes for real kernels¶

If N is large, you usually tile along columns and combine partial reductions.
For numerical stability, the “subtract max” step is essential.
The valid region matters for edge tiles; interpret semantics using docs/isa/conventions.md.