These instructions apply to all AI-assisted contributions to TPBench. Breaching these instructions can result in automatic banning.
The basic version supports single-core kernels and pthread-based multithread kernels.
# Execute in the root folder of the workspace.
$ cmake -B build
$ cmake --build build --config ReleaseOptional: list TPB_* CMake options and registered kernels (no compile):
cmake --build build --target tpb_cmake_helpStep 1: Run the ctest suite. If any tests fail, check the issues and fix.
# Execute in the root folder of the workspace.
$ cd build
$ ctestStep 2: Run a single-core kernel. If the kernel does not finish normally and print "triad_bw_walltime" that larger than 1000 MB/s, check the issues and fix.
# Execute in the root folder of the workspace.
$ ./build/bin/tpbcli run --kernel stream --kargs stream_array_size=524288,ntest=100Step 3 (optional): Verify results are recorded in the database:
# List recent runs
./build/bin/tpbcli db list
# Check the latest log file for detailed output
LOG_FILE=$(ls -t ~/.tpbench/rafdb/log/tpbrunlog_*.log | head -1)
tail -50 "$LOG_FILE"The following file tree shows the source code organization. Items in .gitignore (build artifacts, binaries, data files) are excluded.
TPBench/
├── CMakeLists.txt # Root CMake configuration
├── cmake/
│ ├── TPBenchConfig.cmake.in # CMake package config template
│ ├── TPBenchKernel.cmake # Kernel registration module (out-of-tree PLI)
│ ├── TPBenchKernelRegistry.cmake # CPU + ROCm kernel catalogs (build + tpb_cmake_help)
│ ├── TPBenchKernelSelect.cmake # Tag/name-based kernel selection logic
│ ├── TPBenchGpuKernelsRocm.cmake # ROCm targets from TPB_ROCM_KERNEL_DEFS
│ ├── TPBenchInstallRpath.cmake # RPATH helpers for install targets
│ └── TPBenchCmakeHelp.cmake # Generates tpb_cmake_help.txt
│
├── docs/
│ ├── API_Reference.md # Public API documentation
│ ├── STYLE_GUIDE.md # Code style guidelines
│ ├── USAGE.md / USAGE_CN.md # User manual (EN/CN)
│ ├── arts/ # Diagrams and illustrations
│ ├── design/ # Design documents (EN/CN)
│ └── howtos/ # How-to guides (EN/CN)
│
├── setup/
│ ├── Make.* # Platform-specific Makefiles
│ └── yaml/
│ ├── default.yml # Default benchmark configuration
│ └── *_template.yml # Benchmark templates
│
├── src/
│ ├── tpbcli.c # CLI entry point (top-level `tpbcli-argp` dispatch)
│ ├── tpbcli-argp.c/h # Tree-based CLI parser (`tpbcli run`, `database`, etc.)
│ ├── tpbcli-run.c/h # `run` subcommand
│ ├── tpbcli-run-dim.c/h # `run-dim` subcommand (dimensional sweep)
│ ├── tpbcli-benchmark.c/h # `benchmark` subcommand
│ ├── tpbcli-kernel.c/h # `kernel` subcommand (dispatch)
│ ├── tpbcli-kernel-list.c/h # `kernel list` / `ls`
│ ├── tpbcli-database*.c/h # `database` / `db` subcommand (argp tree; list, dump)
│ ├── tpbcli-help.c/h # `help` subcommand
│ ├── tpb-bench-yaml.c/h # YAML benchmark configuration parser
│ ├── tpb-bench-score.c/h # Benchmark scoring
│ ├── tpb-timer.c/h # Timer abstraction
│ │
│ ├── include/
│ │ ├── tpb-public.h # Public API header
│ │ ├── tpb-unitdefs.h # Unit definitions
│ │ └── tpbench.h* # Version header (generated)
│ │
│ ├── corelib/
│ │ ├── CMakeLists.txt # Core library build config
│ │ ├── tpb-driver.c/h # Kernel driver (load/execute kernels)
│ │ ├── tpb-dynloader.c/h # Dynamic library loader (dlopen/dlsym)
│ │ ├── tpb-impl.c/h # Internal implementation
│ │ ├── tpb-io.c/h # I/O utilities
│ │ ├── tpb-stat.c/h # Statistics collection
│ │ ├── tpb-argp.c/h # Argument parsing
│ │ ├── tpb-unitcast.c/h # Unit conversion
│ │ ├── tpb-autorecord.c/h # Auto-recording (tbatch, task capsule)
│ │ ├── tpb-types.h # Internal type definitions
│ │ ├── tpb_corelib_state.c/h # Global state management
│ │ ├── tpb_corelib_mpi.c # MPI coordination (compiled when MPI found)
│ │ ├── tpb_corelib_mpi_stub.c # MPI stub (compiled when MPI absent)
│ │ ├── tpb-mpi_stub.c/h # MPI type stubs for non-MPI builds
│ │ ├── strftime.c/h # Time formatting
│ │ └── rafdb/ # Run-and-forget database backend
│ │ ├── tpb-raf-*.c/h # entry, id, magic, merge, record, workspace
│ │ ├── tpb-raf-types.h # RAFDB type definitions
│ │ └── tpb-sha1.c/h # SHA-1 checksum
│ │
│ ├── kernels/
│ │ ├── CMakeLists.txt # Kernel build; GLOB discovers sources in subdirectories
│ │ ├── kernels.h # Legacy placeholders (PLI uses dlopen/dlsym)
│ │ ├── simple/ # Single-process CPU kernels (tpbk_*.c)
│ │ ├── streaming_memory_access_mpi/ # MPI CPU kernels (see 1.1.4)
│ │ ├── stream/ # Reference STREAM sources (not built by default)
│ │ └── rocm/ # ROCm GPU kernels (tpbk_roofline*.hip/cpp)
│ │
│ ├── libpfc/ # Performance counter library (3rd party)
│ │ ├── include/libpfc*.h # PMU counter headers
│ │ ├── src/ # Library implementation
│ │ └── kmod/ # Kernel module for TSC access
│ │
│ ├── pmu/ # PMU enable utilities
│ │ ├── armv8/enable_pmu.c # ARMv8 PMU enabler
│ │ └── x86-64/pfckmod.c # x86 performance counter module
│ │
│ ├── timers/ # Timer implementations
│ │ ├── clock_gettime.c # POSIX timer
│ │ ├── tsc_asym.c # TSC-based timer
│ │ └── timers.h # Timer interface
│ │
│ └── utils/ # Utility programs
│ ├── pchase*.c # Cache line ping-pong latency test
│ ├── get_time_error.c # Timer error measurement
│ └── watch_cy_armv8.c # Cycle counter monitor
│
└── tests/
├── CMakeLists.txt # Test suite configuration
├── RunBuiltTest.cmake # Test runner script
├── corelib/ # Unit tests (raf, strftime, capsule, 1d_array_write, pli, mocks)
├── integration/ # merge_fork, merge_hybrid, merge_pthread, tri_tests_record
└── tpbcli/ # CLI tests (B1 dimargs, B2 run argv, B3 argp, B4 database); CMakeLists.txt
| Component | Path | Purpose |
|---|---|---|
| CLI Frontend | src/tpbcli*.c, src/tpbcli-argp.c |
Command-line interface; tpbcli-argp for run, database/db, top-level dispatch; YAML/scoring for benchmark |
| Core Library | src/corelib/ |
Kernel loading, execution, result collection, auto-record |
| Benchmark Kernels | src/kernels/simple/, src/kernels/streaming_memory_access_mpi/ |
CPU kernels (single-process and MPI) |
| MPI Support | src/corelib/tpb_corelib_mpi.c (+ stub) |
MPI-coordinated init, task write, capsule |
| GPU Kernels | src/kernels/rocm/ |
ROCm GPU benchmark implementations |
| rafdb (run-and-forget DB) | src/corelib/rafdb/ |
Persistent storage for benchmark results |
| Timer Backend | src/timers/ |
High-resolution timing implementations |
| PMU Support | src/libpfc/, src/pmu/ |
Hardware performance counter access |
After building with cmake --build build, the output structure is:
build/
├── bin/
│ ├── tpbcli # Main CLI executable
│ ├── tpbk_*.tpbx # Kernel PLI executables (e.g., tpbk_stream.tpbx)
│ └── tests/ # Test executables
├── lib/
│ ├── libtpbench.so # Core TPBench library
│ └── libtpbk_*.so # Kernel shared libraries
└── etc/
└── yaml/ # Installed configuration files
Source discovery. src/kernels/CMakeLists.txt finds tpbk_<kern>.c with file(GLOB ... "/*/tpbk_${_kname}.c") under any immediate subdirectory of src/kernels/ (e.g. simple/, streaming_memory_access_mpi/). Place new CPU kernels accordingly; the registry name <kern> must match the basename tpbk_<kern>.c.
Single-file kernels.
One file tpbk_<kern>.c instrumented by TPBench API and linked to libtpbench.so:
libtpbk_<kern>.so:add_librarycompiles that file withoutTPB_K_BUILD_MAIN. Corelib discovers it underlib/and callsdlsym("tpbk_pli_register_<kern>")(seesrc/corelib/tpb-dynloader.c).tpbk_<kern>.tpbx:add_executablecompiles the same file with compile definitionTPB_K_BUILD_MAIN, which enablesmain()for fork/exec from the driver. The executable linkstpbenchonly (not the kernel.so).
Registry vs source. In cmake/TPBenchKernelRegistry.cmake, stream_mpi is enabled when MPI is found; scale_mpi, axpy_mpi, rtriad_mpi, and sum_mpi have sources under streaming_memory_access_mpi/ but their registry rows are commented out until re-enabled.
Both corelib and the kernel call tpbk_pli_register_<kern>() (params and static outputs) to register parameters and target metrics information; dynamic outputs may still be added at run time in the runner.
- New
TPB_*build option: Add anoption()orset(... CACHE STRING "help" )in rootCMakeLists.txt(or insrc/kernels/CMakeLists.txtfor kernel-selection caches). Also append one"VAR|same short description"entry to_tpb_cmake_help_doc_linesincmake/TPBenchCmakeHelp.cmake. That list is the fallback when CMake replaces the cache HELPSTRING (e.g. after-DTPB_*=...on the command line); without it,tpb_cmake_helpwould show a generic placeholder. Reconfigure and runcmake --build build --target tpb_cmake_helpto verify. Existing kernel compile overrides:TPB_KERNEL_CFLAGS,TPB_KERNEL_CXXFLAGS,TPB_KERNEL_FFLAGSinsrc/kernels/CMakeLists.txt(empty default uses-O2for the relevant language). - New CPU kernel: Add one row to
TPB_CPU_KERNEL_DEFSincmake/TPBenchKernelRegistry.cmake(NAME|DEFAULT_TAGS|EXTRA_LINK_LIBS|CONDITION). Addtpbk_<kern>.cunder anysrc/kernels/<subdir>/that the GLOB can see (convention:simple/for single-process). Implementtpbk_pli_register_<kern>, the runner, and (for the single-file pattern)mainunder#ifdef TPB_K_BUILD_MAIN. Build rules are generated insrc/kernels/CMakeLists.txt(no per-kerneladd_libraryin root CMake).src/kernels/kernels.his legacy; PLI does not require newregister_*declarations there. - New MPI CPU kernel: Use
CONDITIONMPI_C_FOUNDandEXTRA_LINK_LIBSMPI::MPI_Cin the registry row (see existingstream_mpirow). RootCMakeLists.txtenablesfind_package(MPI)when a selected kernel needs it. Preferstreaming_memory_access_mpi/for MPI sources; follow the single-file or dual-file layout described in §1.1.4. - New ROCm kernel: Add one row to
TPB_ROCM_KERNEL_DEFSin the same registry (NAME|TAGS|PREREQ_TEXT|rocm|<hip path>|<pli main path>relative to source root). Do not hand-edit per-kerneladd_libraryin root CMake; build rules are generated bycmake/TPBenchGpuKernelsRocm.cmake. - CLI Commands: Add new subcommands in
src/tpbcli-<cmd>.c/h(and benchmark/YAML helpers insrc/tpb-bench-*.c/hif needed) and register intpbcli.c(top-leveltpbcli-argptree). Nested parsing fordatabase/dblives intpbcli-database.c. - Corelib Feature Extensions: Extend
src/corelib/for new TPBench features. - Tests: Add unit tests in
tests/following existing patterns. Ask users for the index. - Limited Kernel Modification: When kernel codes have to be changed according to front-end or corelib modifications, unless explicit requests, only modify and test
tpbk_stream.c, do not touch other kernels. - All code contributions MUST follow the style rules defined in
docs/STYLE_GUIDE.md. Before writing or modifying any C code, review, obey, and per-item check the whole style guide. - Documentation Updates: When modifying functionality that affects user workflow (e.g., database queries, result retrieval, MPI recording), update the relevant documentation in
docs/USAGE.mdand docs indocs/designto reflect the changes. Ensure examples use realistic commands that users can actually run.