TROWMAX

Tile Operation Diagram

TROWMAX tile operation

Introduction

Reduce each row by taking the maximum across columns.

Math Interpretation

Let R = src.GetValidRow() and C = src.GetValidCol(). For 0 <= i < R:

\[ \mathrm{dst}_{i,0} = \max_{0 \le j < C} \mathrm{src}_{i,j} \]

Assembly Syntax

PTO-AS form: see PTO-AS Specification.

Synchronous form:

%dst = trowmax %src : !pto.tile<...> -> !pto.tile<...>

Lowering may introduce internal scratch tiles; the C++ intrinsic requires an explicit tmp operand.

AS Level 1 (SSA)

%dst = pto.trowmax %src, %tmp : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>

AS Level 2 (DPS)

pto.trowmax ins(%src, %tmp : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)

C++ Intrinsic

Declared in include/pto/common/pto_instr.hpp:

template <typename TileDataOut, typename TileDataIn, typename TileDataTmp, typename... WaitEvents>
PTO_INST RecordEvent TROWMAX(TileDataOut &dst, TileDataIn &src, TileDataTmp &tmp, WaitEvents &... events);

Constraints

General constraints / checks

  • dst and src must both be TileType::Vec.
  • src must use standard ND layout: row-major and non-fractal (BLayout::RowMajor, SLayout::NoneBox).
  • dst must use one of the following non-fractal layouts:
    • ND layout (BLayout::RowMajor, SLayout::NoneBox), or
    • DN layout with exactly one column (BLayout::ColMajor, SLayout::NoneBox, Cols == 1).
  • dst and src must use the same element type.
  • Runtime valid-region checks:
    • src.GetValidRow() != 0
    • src.GetValidCol() != 0
    • src.GetValidRow() == dst.GetValidRow()
  • The intrinsic signature requires an explicit tmp operand.

A2A3 implementation checks

  • Supported element types: half, float, int32_t, int16_t.
  • The implementation accepts both ND output and DN output with Cols == 1.
  • Runtime checks follow the shared row-reduce check path:
    • src.GetValidRow() != 0
    • src.GetValidCol() != 0
    • src.GetValidRow() == dst.GetValidRow()
  • The current implementation path passes tmp into the backend call, but this document does not add extra tmp shape/layout constraints beyond what is explicitly enforced by the checked implementation.

Examples

Auto

#include <pto/pto-inst.hpp>

using namespace pto;

void example_auto() {
  using SrcT = Tile<TileType::Vec, float, 16, 16>;
  using DstT = Tile<TileType::Vec, float, 16, 1, BLayout::ColMajor>;
  using TmpT = Tile<TileType::Vec, float, 16, 16>;
  SrcT src;
  DstT dst;
  TmpT tmp;
  TROWMAX(dst, src, tmp);
}

Manual

#include <pto/pto-inst.hpp>

using namespace pto;

void example_manual() {
  using SrcT = Tile<TileType::Vec, float, 16, 16>;
  using DstT = Tile<TileType::Vec, float, 16, 1, BLayout::ColMajor>;
  using TmpT = Tile<TileType::Vec, float, 16, 16>;
  SrcT src;
  DstT dst;
  TmpT tmp;
  TASSIGN(src, 0x1000);
  TASSIGN(dst, 0x2000);
  TASSIGN(tmp, 0x3000);
  TROWMAX(dst, src, tmp);
}

ASM Form Examples

Auto Mode

# Auto mode: compiler/runtime-managed placement and scheduling.
%dst = pto.trowmax %src, %tmp : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>

Manual Mode

# Manual mode: resources must be bound explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
%dst = pto.trowmax %src, %tmp : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>

PTO Assembly Form

%dst = trowmax %src : !pto.tile<...> -> !pto.tile<...>
# AS Level 2 (DPS)
pto.trowmax ins(%src, %tmp : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)