TSORT32¶
Tile Operation Diagram¶
Introduction¶
Sort a fixed-size 32-element block and produce an index mapping.
Math Interpretation¶
Sorts values from src into dst and produces an index mapping in idx. Conceptually, for each row i:
\[ \mathrm{dst}_{i,k} = \mathrm{src}_{i,\pi_i(k)} \]
where \(\pi_i\) is a permutation of the indices in the row. Sort order and stability are target-defined.
Assembly Syntax¶
PTO-AS form: see PTO-AS Specification.
Synchronous form:
%dst, %idx = tsort32 %src : !pto.tile<...> -> (!pto.tile<...>, !pto.tile<...>)
IR Level 1 (SSA)¶
%dst, %idx = pto.tsort32 %src : !pto.tile<...> -> (!pto.tile<...>, !pto.tile<...>)
IR Level 2 (DPS)¶
pto.tsort32 ins(%src : !pto.tile_buf<...>) outs(%dst, %idx : !pto.tile_buf<...>, !pto.tile_buf<...>)
C++ Intrinsic¶
Declared in include/pto/common/pto_instr.hpp:
template <typename DstTileData, typename SrcTileData, typename IdxTileData>
PTO_INST RecordEvent TSORT32(DstTileData& dst, SrcTileData& src, IdxTileData& idx);
template <typename DstTileData, typename SrcTileData, typename IdxTileData, typename TmpTileData>
PTO_INST RecordEvent TSORT32(DstTileData& dst, SrcTileData& src, IdxTileData& idx, TmpTileData& tmp);
Constraints¶
TSORT32does not takeWaitEvents&...and does not callTSYNC(...)internally; synchronize explicitly if needed.- Implementation checks (A2A3/A5):
DstTileData::DTypemust behalforfloat.SrcTileData::DTypemust matchDstTileData::DType.IdxTileData::DTypemust beuint32_t.dst/src/idxtile location must beTileType::Vec, and all must be row-major (isRowMajor).- Valid region:
- The implementation uses
dst.GetValidRow()as the number of rows and usessrc.GetValidCol()to determine how many 32-element blocks to sort per row.
Examples¶
Auto¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_auto() {
using SrcT = Tile<TileType::Vec, float, 1, 32>;
using DstT = Tile<TileType::Vec, float, 1, 32>;
using IdxT = Tile<TileType::Vec, uint32_t, 1, 32>;
SrcT src;
DstT dst;
IdxT idx;
TSORT32(dst, src, idx);
}
Manual¶
#include <pto/pto-inst.hpp>
using namespace pto;
void example_manual() {
using SrcT = Tile<TileType::Vec, float, 1, 32>;
using DstT = Tile<TileType::Vec, float, 1, 32>;
using IdxT = Tile<TileType::Vec, uint32_t, 1, 32>;
SrcT src;
DstT dst;
IdxT idx;
TASSIGN(src, 0x1000);
TASSIGN(dst, 0x2000);
TASSIGN(idx, 0x3000);
TSORT32(dst, src, idx);
}
ASM Form Examples¶
Auto Mode¶
# Auto mode: compiler/runtime-managed placement and scheduling.
%dst, %idx = pto.tsort32 %src : !pto.tile<...> -> (!pto.tile<...>, !pto.tile<...>)
Manual Mode¶
# Manual mode: bind resources explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
%dst, %idx = pto.tsort32 %src : !pto.tile<...> -> (!pto.tile<...>, !pto.tile<...>)
PTO Assembly Form¶
%dst, %idx = tsort32 %src : !pto.tile<...> -> (!pto.tile<...>, !pto.tile<...>)
# IR Level 2 (DPS)
pto.tsort32 ins(%src : !pto.tile_buf<...>) outs(%dst, %idx : !pto.tile_buf<...>, !pto.tile_buf<...>)