TNOTIFY¶
Introduction¶
Send flag notification to remote NPU. Used for lightweight synchronization between NPUs without transferring bulk data.
Math Interpretation¶
For NotifyOp::Set:
\[ \mathrm{signal}^{\mathrm{remote}} = \mathrm{value} \]
For NotifyOp::AtomicAdd:
\[ \mathrm{signal}^{\mathrm{remote}} \mathrel{+}= \mathrm{value} \quad (\text{atomic}) \]
Assembly Syntax¶
PTO-AS form: see PTO-AS Specification.
tnotify %signal_remote, %value {op = #pto.notify_op<Set>} : (!pto.memref<i32>, i32)
tnotify %signal_remote, %value {op = #pto.notify_op<AtomicAdd>} : (!pto.memref<i32>, i32)
C++ Intrinsic¶
Declared in include/pto/comm/pto_comm_inst.hpp:
template <typename GlobalSignalData, typename... WaitEvents>
PTO_INST void TNOTIFY(GlobalSignalData &dstSignalData, int32_t value, NotifyOp op, WaitEvents&... events);
Constraints¶
- Type constraints:
GlobalSignalData::DTypemust beint32_t(32-bit signal).
- Memory constraints:
dstSignalDatamust point to remote address (on target NPU).dstSignalDatashould be 4-byte aligned.
- Operation semantics:
NotifyOp::Set: Direct store to remote memory.NotifyOp::AtomicAdd: Hardware atomic add usingst_atomicinstruction.
Examples¶
Basic Set Notification¶
#include <pto/comm/pto_comm_inst.hpp>
using namespace pto;
void notify_set(__gm__ int32_t* remote_signal) {
comm::Signal sig(remote_signal);
// Set remote signal to 1
comm::TNOTIFY(sig, 1, comm::NotifyOp::Set);
}
Atomic Counter Increment¶
#include <pto/comm/pto_comm_inst.hpp>
using namespace pto;
void atomic_increment(__gm__ int32_t* remote_counter) {
comm::Signal counter(remote_counter);
// Atomically add 1 to remote counter
comm::TNOTIFY(counter, 1, comm::NotifyOp::AtomicAdd);
}
Producer-Consumer Pattern¶
#include <pto/comm/pto_comm_inst.hpp>
using namespace pto;
// Producer: notify when data is ready
void producer(__gm__ int32_t* remote_flag) {
// ... produce data ...
comm::Signal flag(remote_flag);
comm::TNOTIFY(flag, 1, comm::NotifyOp::Set);
}
// Consumer: wait for data
void consumer(__gm__ int32_t* local_flag) {
comm::Signal flag(local_flag);
comm::TWAIT(flag, 1, comm::WaitCmp::EQ);
// ... consume data ...
}