TPUSH¶
Introduction¶
Push a producer tile into a TPipe FIFO for Cube-Vector communication.
This page describes both the TileData overload and the GlobalData commit overload. For GM-slot workflows that expose the FIFO entry as a GlobalTensor, use TALLOC to allocate a slot view, write the slot with normal memory instructions, then use TPUSH(Pipe&, GlobalData&) to commit the slot.
Operation Semantics¶
For the TileData overload, TPUSH performs three steps:
- Wait for FIFO space when
Pipe::shouldWaitFree(pipe.prod.tileIndex)is true. - Store the producer tile into the current FIFO slot.
- Record data-ready synchronization for the consumer.
The producer tile index is incremented after the FIFO slot address is computed.
For the GlobalData overload, TPUSH only records data-ready synchronization for a slot that was already allocated by TALLOC. It does not store tile data by itself.
C++ Intrinsic¶
Declared in include/pto/common/pto_instr.hpp:
template <typename Pipe, typename TileProd, TileSplitAxis Split,
std::enable_if_t<is_tile_data_v<TileProd>, int> = 0, typename... WaitEvents>
PTO_INST RecordEvent TPUSH(Pipe &pipe, TileProd &tile, WaitEvents &... events);
template <typename Pipe, typename GlobalData, TileSplitAxis Split,
std::enable_if_t<is_global_data_v<GlobalData>, int> = 0, typename... WaitEvents>
PTO_INST RecordEvent TPUSH(Pipe &pipe, GlobalData &gmTensor, WaitEvents &... events);
Pipe is typically an TPipe declared in TPush.hpp:
template <uint8_t FlagID, uint8_t DirType, uint32_t SlotSize, uint32_t SlotNum,
uint32_t LocalSlotNum = 2, bool IsNoSplit = false, bool EN_UNIT_FLAG = false>
struct TPipe;
Constraints¶
- TileData producer:
TileProd::Locmust beTileType::AccorTileType::Vec.Direction::DIR_C2V: Cube produces an accumulator tile for vector consumption.Direction::DIR_V2C: Vector produces a vector tile for cube consumption.Direction::DIR_BOTH: both C2V and V2C producers are supported by the same pipe type.
- FIFO slot:
SlotSizemust be large enough for one logical FIFO entry.SlotNum >= 1.Pipe::SyncPeriodis derived fromSlotNum:(SlotNum <= 2) ? SlotNum : SlotNum / 2.
- Split behavior:
TileSplitAxis::TILE_NO_SPLIT: no sub-vector offset is applied.TileSplitAxis::TILE_UP_DOWN: vector subblocks map to row halves.TileSplitAxis::TILE_LEFT_RIGHT: vector subblocks map to column halves.
- Synchronization:
- Free-space waits are sparse and controlled by
Pipe::SyncPeriod. - Data-ready record is emitted for each
TPUSH.
- Free-space waits are sparse and controlled by
- GlobalData commit:
gmTensormust be a FIFO slot view returned byTALLOC.- Data must be written into
gmTensorbefore callingTPUSH(Pipe&, GlobalData&). TPUSH(Pipe&, GlobalData&)ignores the tensor contents and only commits the FIFO slot to the consumer.
- Tile Type Support:
- TPUSH/TPOP Supported Tile Types:
TileType::Acc(Accumulator Tile): Used by Cube core for C2V direction communication.TileType::Vec(Vector Tile): Used by Vector core for V2C direction communication.TileType::Ctrl(Control Tile): Used by Vector core for V2C_CTRL direction control signal transmission.
- TPUSH/TPOP Supported Tile Types:
Examples¶
C2V Accumulator Push¶
#include <pto/pto-inst.hpp>
using namespace pto;
template <typename T>
AICORE void example_c2v(__gm__ void *fifoMem)
{
constexpr uint32_t M = 128;
constexpr uint32_t N = 128;
constexpr uint32_t FlagID = 0;
constexpr uint32_t FifoDepth = 2;
using Pipe = TPipe<FlagID, Direction::DIR_C2V, M * N * sizeof(float), FifoDepth>;
using AccTile = TileAcc<float, M, N, M, N>;
Pipe pipe(fifoMem, 0x0, 0x0);
AccTile acc;
TASSIGN(acc, 0x0);
// Fill acc with a cube computation before pushing.
TPUSH<Pipe, AccTile, TileSplitAxis::TILE_NO_SPLIT>(pipe, acc);
}
V2C Vector Push¶
#include <pto/pto-inst.hpp>
using namespace pto;
template <typename T>
AICORE void example_v2c(__gm__ void *fifoMem)
{
constexpr uint32_t M = 128;
constexpr uint32_t N = 128;
constexpr uint32_t FlagID = 0;
constexpr uint32_t FifoDepth = 2;
using Pipe = TPipe<FlagID, Direction::DIR_V2C, M * N * sizeof(T), FifoDepth>;
using VecTile = Tile<TileType::Vec, T, M, N, BLayout::RowMajor, M, N>;
Pipe pipe(fifoMem, 0x0, 0x0);
VecTile tile;
TASSIGN(tile, 0x0);
TPUSH<Pipe, VecTile, TileSplitAxis::TILE_NO_SPLIT>(pipe, tile);
}
GlobalData Slot Commit¶
#include <pto/pto-inst.hpp>
using namespace pto;
template <typename T>
AICORE void example_globaldata(__gm__ void *fifoMem)
{
constexpr uint32_t M = 128;
constexpr uint32_t N = 128;
constexpr uint32_t FlagID = 0;
constexpr uint32_t FifoDepth = 2;
using Pipe = TPipe<FlagID, Direction::DIR_C2V, M * N * sizeof(T), FifoDepth>;
using SlotGlobal = GlobalTensor<T, Shape<1, 1, 1, M, N>, Stride<1, 1, 1, N, 1>>;
using VecTile = Tile<TileType::Vec, T, M, N, BLayout::RowMajor, M, N>;
Pipe pipe(fifoMem, 0x0, 0x0);
SlotGlobal slot;
VecTile tile;
TASSIGN(tile, 0x0);
TALLOC<Pipe, SlotGlobal, TileSplitAxis::TILE_NO_SPLIT>(pipe, slot);
TSTORE(slot, tile);
TPUSH<Pipe, SlotGlobal, TileSplitAxis::TILE_NO_SPLIT>(pipe, slot);
}
ASM Form Examples¶
The current public assembly reference does not define a stable PTO-AS spelling for TPUSH. Use the C++ intrinsic form for manual CV FIFO programming.