Compilation Process

This document explains the PTO operator compilation process, helping developers understand the complete workflow from source code to executable files.

Contents


1. Compilation Overview

1.1 Compilation Pipeline

PTO C++ Source (.cpp)
    ↓
Preprocessor (macro expansion, #include, #ifdef)
    ↓
C++ Frontend (lexer, parser, semantic analysis, AST)
    ↓
PTO Intrinsic Expansion (TLOAD/TSTORE/TADD → low-level instructions)
    ↓
Middle-end (optimization passes, IR generation)
    ↓
Backend (instruction selection, register allocation, code generation)
    ↓
Linker (symbol resolution, relocation)
    ↓
Executable / Shared Library

1.2 Required Tools

CMake (>= 3.16):

# Ubuntu/Debian
sudo apt install cmake

# macOS
brew install cmake

C++ Compiler (C++20 support): - GCC >= 13.0 - Clang >= 15.0 - MSVC 2022 (Windows)

Python (>= 3.8):

sudo apt install python3 python3-pip

1.3 Optional Tools

Ninja (faster builds):

sudo apt install ninja-build

ccache (compilation cache):

sudo apt install ccache
export CC="ccache gcc"
export CXX="ccache g++"

2. Build System Configuration

2.1 Minimal CMake Configuration

cmake_minimum_required(VERSION 3.16)
project(MyPTOOperator LANGUAGES CXX)

set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

find_package(PTO REQUIRED)

add_executable(my_operator src/my_operator.cpp)
target_link_libraries(my_operator PRIVATE PTO::pto)

2.2 Build Configuration

Backend Selection:

# CPU simulation
cmake -B build -DPTO_BACKEND=CPU

# NPU (A2/A3)
cmake -B build -DPTO_BACKEND=NPU -DSOC_VERSION=Ascend910B1

# NPU (A5)
cmake -B build -DPTO_BACKEND=NPU -DSOC_VERSION=Ascend910_9599

Build Types:

# Debug (no optimization, debug symbols)
cmake -B build -DCMAKE_BUILD_TYPE=Debug

# Release (full optimization)
cmake -B build -DCMAKE_BUILD_TYPE=Release

# RelWithDebInfo (optimization + debug symbols)
cmake -B build -DCMAKE_BUILD_TYPE=RelWithDebInfo

2.3 Build Commands

# Configure
cmake -B build -DCMAKE_BUILD_TYPE=Release

# Build
cmake --build build -j$(nproc)

# Test
ctest --test-dir build --output-on-failure

# Install
cmake --install build --prefix /path/to/install

3. Compilation Steps

3.1 Preprocessing

Macro Expansion:

// Source
#define TILE_SIZE 256
using TileT = Tile<TileType::Vec, float, 16, TILE_SIZE>;

// After preprocessing
using TileT = Tile<TileType::Vec, float, 16, 256>;

View Preprocessed Output:

g++ -E -P src/my_operator.cpp -o my_operator.i

3.2 Compilation

PTO Intrinsic Expansion:

// Source
TLOAD(tile, input);

// Expanded to low-level instructions
__builtin_pto_load(tile.data(), input.data(), tile.size(), tile.alignment());

Generate Object File:

g++ -std=c++20 -O3 -c src/my_operator.cpp -o build/my_operator.o

3.3 Linking

Symbol Resolution:

my_operator.o:
  - Defines: main, my_kernel
  - References: TLOAD, TSTORE, TADD

libpto.a:
  - Defines: TLOAD, TSTORE, TADD, ...

Linker resolves:
  my_operator.o::TLOAD → libpto.a::TLOAD ✓

Generate Executable:

g++ build/my_operator.o -L/path/to/pto/lib -lpto -o build/my_operator

4. Compilation Options

4.1 Optimization Levels

Option Use Case Performance
-O0 Debugging Slowest
-O1 Basic optimization Medium
-O2 Production (recommended) Fast
-O3 Maximum optimization Fastest
-Os Size optimization Medium
-Ofast Aggressive (may violate standards) Fastest

Example:

# Production build
g++ -O3 -march=native src/my_operator.cpp

# Debug build
g++ -O0 -g src/my_operator.cpp

4.2 Architecture-Specific Options

-march=native: Optimize for current CPU

g++ -O3 -march=native src/my_operator.cpp

-march=x86-64: Generic x86-64 code

g++ -O3 -march=x86-64 src/my_operator.cpp

4.3 Debug Options

Debug Symbols:

g++ -g src/my_operator.cpp
gdb ./my_operator

Sanitizers:

# Address sanitizer (memory errors)
g++ -fsanitize=address src/my_operator.cpp

# Undefined behavior sanitizer
g++ -fsanitize=undefined src/my_operator.cpp

4.4 Warning Options

g++ -Wall -Wextra -Wpedantic -Werror src/my_operator.cpp

5. Cross Compilation

5.1 x86 → ARM Cross Compilation

Install Toolchain:

sudo apt install g++-aarch64-linux-gnu

CMake Toolchain File:

# toolchain-aarch64.cmake
set(CMAKE_SYSTEM_NAME Linux)
set(CMAKE_SYSTEM_PROCESSOR aarch64)
set(CMAKE_C_COMPILER aarch64-linux-gnu-gcc)
set(CMAKE_CXX_COMPILER aarch64-linux-gnu-g++)

Build:

cmake -B build -DCMAKE_TOOLCHAIN_FILE=toolchain-aarch64.cmake
cmake --build build

6. Compilation Optimization

6.1 Speed Up Compilation

Use Ninja:

cmake -B build -G Ninja
ninja -C build

Use ccache:

export CC="ccache gcc"
export CXX="ccache g++"
cmake -B build
cmake --build build

Parallel Build:

cmake --build build -j$(nproc)

Precompiled Headers:

target_precompile_headers(my_operator PRIVATE <pto/pto-inst.hpp>)

6.2 Reduce Binary Size

Strip Debug Symbols:

strip build/my_operator

Link-Time Optimization (LTO):

set(CMAKE_INTERPROCEDURAL_OPTIMIZATION TRUE)

7. Troubleshooting

7.1 Common Compilation Errors

Error: Header not found

error: pto/pto-inst.hpp: No such file or directory

Solution:

export PTO_LIB_PATH=/path/to/pto-isa
cmake -B build -DPTO_ROOT=/path/to/pto-isa

Error: Static assertion failed

static_assert failed: "Tile shape not aligned"

Solution:

// Wrong: width 250 is not multiple of 16
using TileT = Tile<TileType::Vec, float, 16, 250>;

// Correct: width 256 is multiple of 16
using TileT = Tile<TileType::Vec, float, 16, 256>;

Error: Undefined reference

undefined reference to `pto::TLOAD(...)`

Solution:

target_link_libraries(my_operator PRIVATE PTO::pto)

7.2 Runtime Errors

Error: Shared library not found

error while loading shared libraries: libpto.so

Solution:

export LD_LIBRARY_PATH=/path/to/pto/lib:$LD_LIBRARY_PATH

References