TROWARGMAX¶
Tile Operation Diagram¶
Introduction¶
Get the column index of the maximum element, or both value and column index of the maximum element for each row.
Math Interpretation¶
Let R = src.GetValidRow() and C = src.GetValidCol(). For 0 <= i < R:
\[ \mathrm{dst}_{i,0} = \underset{0 \le j < C}{\operatorname{argmax}} \; \mathrm{src}_{i,j} \]
\[ \mathrm{dstval}_{i,0} = \max_{0 \le j < C} \mathrm{src}_{i,j} \]
Assembly Syntax¶
PTO-AS form: see docs/grammar/PTO-AS.md.
Synchronous form:
%dst = trowargmax %src : !pto.tile<...> -> !pto.tile<...>
Lowering may introduce internal scratch tiles; the C++ intrinsic requires an explicit tmp operand.
IR Level 1 (SSA)¶
%dst = pto.trowargmax %src, %tmp : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
IR Level 2 (DPS)¶
pto.trowargmax ins(%src, %tmp : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)
C++ Intrinsic¶
Declared in include/pto/common/pto_instr.hpp:
Output index only:
template <typename TileDataOut, typename TileDataIn, typename TileDataTmp, typename... WaitEvents>
PTO_INST RecordEvent TROWARGMAX(TileDataOut& dst, TileDataIn& src, TileDataTmp& tmp, WaitEvents&... events);
Output both value and index:
template <typename TileDataOutVal, typename TileDataOutIdx, typename TileDataIn, typename TileDataTmp,
typename... WaitEvents>
PTO_INST RecordEvent TROWARGMAX(TileDataOutVal &dstVal, TileDataOutIdx &dstIdx, TileDataIn &src, TileDataTmp &tmp,
WaitEvents &... events)
Constraints¶
General constraints / checks¶
- Supported source element types:
half,float. srcmust use standard ND layout: row-major and non-fractal (BLayout::RowMajor,SLayout::NoneBox).- When output index only:
dstandsrcmust beTileType::Vec.- Supported destination element types:
uint32_t,int32_t. - Runtime checks follow the shared row-reduce check path:
src.GetValidRow() != 0src.GetValidCol() != 0src.GetValidRow() == dst.GetValidRow()
dstis checked through the shared row-reduce-index path and may use either of these non-fractal layouts:- DN layout with one column (
BLayout::ColMajor,Cols == 1), or - ND layout whose valid column count is 1.
- DN layout with one column (
- When output both value and index:
dstVal,dstIdx,srcmust beTileType::Vec.- Supported destination element types:
uint32_t,int32_t. - Runtime checks follow the shared row-reduce check path:
src.GetValidRow() != 0src.GetValidCol() != 0src.GetValidRow() == dstIdx.GetValidRow()src.GetValidRow() == dstVal.GetValidRow()
dstVal,dstIdxare checked through the shared row-reduce-index path and may use either of these non-fractal layouts:- DN layout with one column (
BLayout::ColMajor,Cols == 1), or - ND layout whose valid column count is 1.
- DN layout with one column (
About temporary tile tmp¶
- Temporary tile is only used by A3, A5 accepts
tmptile but leave it unused. - When output index only,
tmptile is not used whensrcValidCol <= ElementPerRepeat. - When output both value and index and
srcValidCol <= ElementPerRepeat,tmpmay use either of these non-fractal layouts:- DN layout with one column (
BLayout::ColMajor,Cols == 1), rows is twice ofsrc. - ND layout whose valid column count is 2, rows is the same as
src.
- DN layout with one column (
- When
srcValidCol > ElementPerRepeat:- Rows of
tmptile is equal tosrc. tmptile's stride can be calculated out based onsrc'svalidColusing the following formula:
- Rows of
repeats = ceil(validCol / elementPerRepeat)
stride = (ceil(repeats * 2 / elementPerBlock) + ceil(repeats / elementPerBlock)) * elementPerBlock
Examples¶
Auto¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_auto() {
using SrcT = Tile<TileType::Vec, float, 16, 16>;
using DstT = Tile<TileType::Vec, uint32_t, 16, 1, BLayout::ColMajor>;
using DstValT = Tile<TileType::Vec, float, 16, 1, BLayout::ColMajor>;
using TmpT = Tile<TileType::Vec, float, 16, 16>;
SrcT src;
DstT dst;
DstValT dst;
TmpT tmp;
TROWARGMAX(dst, src, tmp);
TROWARGMAX(dstVal, dst, src, tmp);
}
Manual¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_manual() {
using SrcT = Tile<TileType::Vec, float, 16, 16>;
using DstT = Tile<TileType::Vec, uint32_t, 16, 1, BLayout::ColMajor>;
using DstValT = Tile<TileType::Vec, float, 16, 1, BLayout::ColMajor>;
using TmpT = Tile<TileType::Vec, float, 16, 16>;
SrcT src;
DstT dst;
DstValT dst;
TmpT tmp;
TASSIGN(src, 0x1000);
TASSIGN(dst, 0x2000);
TASSIGN(dstVal, 0x3000);
TASSIGN(tmp, 0x4000);
TROWARGMAX(dst, src, tmp);
TROWARGMAX(dstVal, dst, src, tmp);
}
ASM Form Examples¶
Auto Mode¶
# Auto mode: compiler/runtime-managed placement and scheduling.
%dst = pto.trowmax %src, %tmp : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
Manual Mode¶
# Manual mode: resources must be bound explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
%dst = pto.trowmax %src, %tmp : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>
PTO Assembly Form¶
%dst = trowmax %src : !pto.tile<...> -> !pto.tile<...>
# IR Level 2 (DPS)
pto.trowmax ins(%src, %tmp : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)