Kernels¶
This directory contains kernel/operator implementations that complement PTO Tile Lib.
Most kernel subdirectories are self-contained mini-projects (kernel + host + scripts) with their own README.md, CMakeLists.txt, and run.sh.
Where to start¶
- Manual (hand-tuned) NPU kernels: manual
- Custom operator scaffolding: custom
- End-to-end demos (including CPU): demos
Directory layout¶
manual/: Hand-tuned kernels with explicit buffering/synchronization (NPU-focused)manual/a2a3/: Kernels for A2/A3 platformsmanual/a2a3/gemm_performance/: High-performance GEMM examplemanual/a2a3/conv2d_forward/: Conv2D forward kernel examplemanual/a2a3/topk/: TopK kernel example
manual/a5/: Kernels for A5 platformsmanual/a5/flash_atten/: Flash-Attention kernel for A5manual/a5/matmul_mxfp4_performance/: MXFP4 matrix multiplication examplemanual/a5/matmul_mxfp8_performance/: MXFP8 matrix multiplication example
manual/common/: Cross-platform kernelsmanual/common/flash_atten/: Flash-Attention kernel (A2/A3/A5)
custom/: Examples/scaffolding for custom kernel/operator extensions
Notes¶
- Public interfaces live in
include/; tests live intests/. - If you add a new kernel project here, prefer adding a small
README.mdand arun.shso it can be discovered and executed consistently.