TPARTARGMAX¶
Tile Operation Diagram¶
Introduction¶
Performs elementwise maximum selection over the destination valid region and returns the corresponding index values. When both src0Val and src1Val are valid at an element, the result value is max(src0Val, src1Val) and the result index is the index of the maximum value; when only one input is valid there, the result copies that input's value and index. Handling of other mismatched-validity cases is implementation-defined.
Math Interpretation¶
For each element (i, j) in the destination valid region:
\[
\begin{aligned}
(\mathrm{dstVal}_{i,j}, \mathrm{dstIdx}_{i,j}) =
\begin{cases}
(\mathrm{src0Val}_{i,j}, \mathrm{src0Idx}_{i,j}) & \text{if } \mathrm{src0Val}_{i,j} > \mathrm{src1Val}_{i,j} \text{ and both inputs are defined at } (i,j) \\
(\mathrm{src1Val}_{i,j}, \mathrm{src1Idx}_{i,j}) & \text{if } \mathrm{src1Val}_{i,j} \ge \mathrm{src0Val}_{i,j} \text{ and both inputs are defined at } (i,j) \\
(\mathrm{src0Val}_{i,j}, \mathrm{src0Idx}_{i,j}) & \text{if only src0 is defined at } (i,j) \\
(\mathrm{src1Val}_{i,j}, \mathrm{src1Idx}_{i,j}) & \text{if only src1 is defined at } (i,j)
\end{cases}
\end{aligned}
\]
Assembly Syntax¶
PTO-AS form: see PTO-AS Specification.
Synchronous form:
%dstVal, %dstIdx = tpartargmax %src0Val, %src1Val, %src0Idx, %src1Idx : !pto.tile<...> -> (!pto.tile<...>, !pto.tile<...>)
AS Level 1 (SSA)¶
%dstVal, %dstIdx = pto.tpartargmax %src0Val, %src1Val, %src0Idx, %src1Idx : (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> (!pto.tile<...>, !pto.tile<...>)
AS Level 2 (DPS)¶
pto.tpartargmax ins(%src0Val, %src1Val, %src0Idx, %src1Idx : !pto.tile_buf<...>, !pto.tile_buf<...>, !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dstVal, %dstIdx : !pto.tile_buf<...>, !pto.tile_buf<...>)
C++ Intrinsic¶
Declared in include/pto/common/pto_instr.hpp:
template <typename TileDataDst, typename TileDataSrc0, typename TileDataSrc1,
typename TileDataDstIdx, typename TileDataSrc0Idx, typename TileDataSrc1Idx,
typename... WaitEvents>
PTO_INST RecordEvent TPARTARGMAX(TileDataDst &dstVal, TileDataSrc0 &src0Val, TileDataSrc1 &src1Val,
TileDataDstIdx &dstIdx, TileDataSrc0Idx &src0Idx, TileDataSrc1Idx &src1Idx,
WaitEvents &... events);
Constraints¶
General constraints / checks¶
dstVal,src0Val, andsrc1Valmust use the same element type.dstIdx,src0Idx, andsrc1Idxmust use the same element type.- Value type and index type combination constraints:
- If the value type is
half, the index type must beint16_toruint16_t. - If the value type is
float, the index type must beint32_toruint32_t.
- If the value type is
- Valid regions must match between value tiles and index tiles for each pair:
src0Valandsrc0Idxmust have identical valid regions.src1Valandsrc1Idxmust have identical valid regions.dstValanddstIdxmust have identical valid regions.
- The destination valid region must exactly match the valid region of either
src0Valorsrc1Val. - If
dstValhas a zero valid region, the instruction returns early. - For each element in the destination valid region:
- if both inputs are valid, the instruction applies the elementwise maximum and returns the index of the larger value;
- if only one input is valid, the result copies that input's value and index.
- Handling of any validity pattern not explicitly listed above is implementation-defined.
A5 implementation checks¶
- Supported value types:
half,float. - Supported index types:
int16_t,uint16_t,int32_t,uint32_t.
Examples¶
Auto¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_auto() {
using ValTileT = Tile<TileType::Vec, float, 16, 16>;
using IdxTileT = Tile<TileType::Vec, int32_t, 16, 16>;
ValTileT src0Val, src1Val, dstVal;
IdxTileT src0Idx, src1Idx, dstIdx;
TPARTARGMAX(dstVal, src0Val, src1Val, dstIdx, src0Idx, src1Idx);
}
Manual¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_manual() {
using ValTileT = Tile<TileType::Vec, float, 16, 16>;
using IdxTileT = Tile<TileType::Vec, int32_t, 16, 16>;
ValTileT src0Val, src1Val, dstVal;
IdxTileT src0Idx, src1Idx, dstIdx;
TASSIGN(src0Val, 0x1000);
TASSIGN(src1Val, 0x2000);
TASSIGN(dstVal, 0x3000);
TASSIGN(src0Idx, 0x4000);
TASSIGN(src1Idx, 0x5000);
TASSIGN(dstIdx, 0x6000);
TPARTARGMAX(dstVal, src0Val, src1Val, dstIdx, src0Idx, src1Idx);
}
ASM Form Examples¶
Auto Mode¶
# Auto mode: compiler/runtime-managed placement and scheduling.
%dstVal, %dstIdx = pto.tpartargmax %src0Val, %src1Val, %src0Idx, %src1Idx : (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> (!pto.tile<...>, !pto.tile<...>)
Manual Mode¶
# Manual mode: bind resources explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
%dstVal, %dstIdx = pto.tpartargmax %src0Val, %src1Val, %src0Idx, %src1Idx : (!pto.tile<...>, !pto.tile<...>, !pto.tile<...>, !pto.tile<...>) -> (!pto.tile<...>, !pto.tile<...>)
PTO Assembly Form¶
%dstVal, %dstIdx = tpartargmax %src0Val, %src1Val, %src0Idx, %src1Idx : !pto.tile<...> -> (!pto.tile<...>, !pto.tile<...>)
# AS Level 2 (DPS)
pto.tpartargmax ins(%src0Val, %src1Val, %src0Idx, %src1Idx : !pto.tile_buf<...>, !pto.tile_buf<...>, !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dstVal, %dstIdx : !pto.tile_buf<...>, !pto.tile_buf<...>)