TPARTADD

Tile Operation Diagram

TPARTADD tile operation

Introduction

Performs elementwise addition over the destination valid region. When both src0 and src1 are valid at an element, the result is their sum; when only one input is valid there, the result copies that input value. Handling of other mismatched-validity cases is implementation-defined.

Math Interpretation

For each element (i, j) in the destination valid region:

\[ \mathrm{dst}_{i,j} = \begin{cases} \mathrm{src0}_{i,j} + \mathrm{src1}_{i,j} & \text{if both inputs are defined at } (i,j) \\ \mathrm{src0}_{i,j} & \text{if only src0 is defined at } (i,j) \\ \mathrm{src1}_{i,j} & \text{if only src1 is defined at } (i,j) \end{cases} \]

Assembly Syntax

PTO-AS form: see PTO-AS Specification.

Synchronous form:

%dst = tpartadd %src0, %src1 : !pto.tile<...> -> !pto.tile<...>

AS Level 1 (SSA)

%dst = pto.tpartadd %src0, %src1 : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>

AS Level 2 (DPS)

pto.tpartadd ins(%src0, %src1 : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)

C++ Intrinsic

Declared in include/pto/common/pto_instr.hpp:

template <typename TileDataDst, typename TileDataSrc0, typename TileDataSrc1, typename... WaitEvents>
PTO_INST RecordEvent TPARTADD(TileDataDst &dst, TileDataSrc0 &src0, TileDataSrc1 &src1, WaitEvents &... events);

Constraints

General constraints / checks

  • dst, src0, and src1 must use the same element type.
  • The destination valid region defines the result domain.
  • For each element in the destination valid region:
    • if both inputs are valid, the instruction applies its elementwise operator;
    • if only one input is valid, the result copies that input value.
  • If dst has a zero valid region, the instruction returns early.
  • Supported partial-validity patterns require at least one source tile to have a valid region exactly equal to dst, while the other source tile's valid region must not exceed dst in either dimension.
  • Handling of any validity pattern not explicitly listed above is implementation-defined.

A2A3 implementation checks

  • Supported element types: int32_t, int16_t, half, float.
  • dst, src0, and src1 must all be row-major (isRowMajor).

A5 implementation checks

  • Supported element types: uint8_t, int8_t, uint16_t, int16_t, uint32_t, int32_t, half, float, bfloat16_t.

Examples

Auto

#include <pto/pto-inst.hpp>

using namespace pto;

void example_auto() {
  using TileT = Tile<TileType::Vec, float, 16, 16>;
  TileT src0, src1, dst;
  TPARTADD(dst, src0, src1);
}

Manual

#include <pto/pto-inst.hpp>

using namespace pto;

void example_manual() {
  using TileT = Tile<TileType::Vec, float, 16, 16>;
  TileT src0, src1, dst;
  TASSIGN(src0, 0x1000);
  TASSIGN(src1, 0x2000);
  TASSIGN(dst,  0x3000);
  TPARTADD(dst, src0, src1);
}

ASM Form Examples

Auto Mode

# Auto mode: compiler/runtime-managed placement and scheduling.
%dst = pto.tpartadd %src0, %src1 : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>

Manual Mode

# Manual mode: resources must be bound explicitly before issuing the instruction.
# Optional for tile operands:
# pto.tassign %arg0, @tile(0x1000)
# pto.tassign %arg1, @tile(0x2000)
%dst = pto.tpartadd %src0, %src1 : (!pto.tile<...>, !pto.tile<...>) -> !pto.tile<...>

PTO Assembly Form

%dst = tpartadd %src0, %src1 : !pto.tile<...> -> !pto.tile<...>
# AS Level 2 (DPS)
pto.tpartadd ins(%src0, %src1 : !pto.tile_buf<...>, !pto.tile_buf<...>) outs(%dst : !pto.tile_buf<...>)