Repo Context for AI Agents (PTO Tile Lib)¶
This document is a fast, practical orientation for agents working in this repo: what it is, where the key entrypoints live, and the shortest paths to build/run in CPU, NPU simulator (sim), and on-board NPU (npu) modes.
What This Repo Is¶
- PTO Tile Library: C++ headers + implementations for the PTO (Parallel Tile Operation) virtual ISA defined by Ascend CANN.
- Supports multiple backends:
- CPU simulation (cross-platform, no Ascend driver/CANN required).
- Ascend NPU backends split by SoC generation:
- A2/A3 family:
include/pto/npu/a2a3/(selected via-v a3in test scripts). - A5:
include/pto/npu/a5/.
- A2/A3 family:
- Primary include for upper-layer code:
#include <pto/pto-inst.hpp>(unified entry header).
Repo Map (Where To Look First)¶
- Project overview + common commands:
README.md - Detailed setup (CPU first, then NPU):
docs/getting-started.md - ISA docs and navigation:
docs/README.md(ISA guide entry)docs/isa/(per-instruction reference)- Public API headers and backend status table:
include/README.md - Core public headers / backend split:
include/pto/README.md - Build/package entrypoint:
build.sh, top-levelCMakeLists.txt,cmake/ - Tests entrypoints:
- CPU simulator tests:
tests/run_cpu.py,tests/run_cpu_tests.sh - NPU ST build/run:
tests/script/run_st.py,tests/run_st.sh - Test layout overview:
tests/README.md - Demos:
demos/(CPU demos used bytests/run_cpu.py --demo ...)
Run: CPU Simulator (Recommended First)¶
CPU simulation is meant to be the “works everywhere” correctness path.
From repo root:
python3 tests/run_cpu.py --clean --verbose
Useful variants:
python3 tests/run_cpu.py --testcase tadd
python3 tests/run_cpu.py --testcase tadd --gtest_filter 'TADDTest.*'
python3 tests/run_cpu.py --demo gemm --verbose
python3 tests/run_cpu.py --demo flash_attn --verbose
Notes:
- CPU ST uses CMake and GoogleTest; it may download GTest if not installed system-wide.
- Compiler requirement is at least C++20 (see
tests/cpu/st/CMakeLists.txt). - For enabling bfloat16 support in CPU-SIM, GCC>=14 is required
Run: NPU ST (Ascend) — sim and npu¶
NPU ST is built/run via tests/script/run_st.py:
python3 tests/script/run_st.py -r [sim|npu] -v [a3|a5] -t <testcase> -g <gtest_filter>
Key points:
-v a3selects the A2/A3 implementation underinclude/pto/npu/a2a3/(the test script maps it to a SoC string likeAscend910B1).-r simuses the Ascend simulator libraries under$ASCEND_HOME_PATH/tools/simulator/<SOC>/libandruntime/lib64/stub.-r npuruns on real hardware.
Examples (single case):
python3 tests/script/run_st.py -r sim -v a3 -t tadd -g TADDTest.case_float_64x64_64x64
python3 tests/script/run_st.py -r npu -v a3 -t tadd -g TADDTest.case_float_64x64_64x64
Recommended suites (wrapper script):
chmod +x ./tests/run_st.sh
./tests/run_st.sh a3 sim simple
./tests/run_st.sh a3 npu simple
Environment: Ascend CANN / Toolkit¶
NPU ST requires a working Ascend environment. Typical setup (choose the correct install path):
source /usr/local/Ascend/cann/bin/setenv.bash
# or
source $HOME/Ascend/ascend-toolkit/latest/bin/setenv.bash
tests/script/run_st.py expects ASCEND_HOME_PATH to be set (usually done by setenv.bash).
Common Pitfalls (And How This Repo Handles Them)¶
- GTest ABI mismatch on Linux: some systems have
libgtest*.abuilt with_GLIBCXX_USE_CXX11_ABI=0. - CPU and NPU ST CMake projects support
PTO_GLIBCXX_USE_CXX11_ABI=auto|0|1and auto-detect when possible. simopen-files limit: simulator runs may require a higherulimit -n(seedocs/getting-started.mdandbuild.sh).