High-performance CUDA/C++ implementation for DMFE simulations. The code builds a reusable core library and a single executable that drives the simulation.
- Library:
DFME-core(static) — shared GPU/host modules - Executable:
RG-Evo— main entry point, placed in the project root on build
For build, run, inputs/outputs, and full CLI reference, see README.md.
I/O at a glance:
- Preferred format:
data.h5(HDF5), loaded via a runtime-optional wrapper; automatic fallback todata.binif HDF5 is unavailable at runtime. - Separate artifacts:
params.txt, histories (rvec.txt,energy.txt,qk0.txt, and in debug modestep_metrics.txtwith per-step runtime and memory), and compressed snapshots (QK_compressed,QR_compressed,t1_compressed.txt). - TUI/save telemetry: progress windows for main file [0.10..0.50], params [0.50..0.65], histories [0.65..0.80], compressed [0.80..0.90], with a concise "Save started"/"Save finished: " message pair.
Public headers are under include/, sources under src/:
- core: device/host utils, config/initialization, I/O helpers and save telemetry
- interpolation: grid and vector interpolation kernels and dispatchers
- convolution: convolution kernels
- math: math helpers and sigma kernels/wrappers
- EOMs: time-stepping and Runge–Kutta implementations
- search: search/utility kernels
- sparsify: sparsification helpers
- simulation: simulation control and runner (async/sync save orchestration)
- version: build/version info and compatibility helpers
- io: modular writers (HDF5 or binary), params/history/compressed outputs, runtime HDF5 loader
- Root
CMakeLists.txt— build configuration (C++/CUDA, targets DFME-core and RG-Evo)main.cpp— program entry (version print, CLI parse, init, run)build/— CMake build artifacts (generated)Grid_data/— precomputed grid assets (512/, 1024/, 2048/)Overview.md,README.md,build.sh
include/— public headers for all modules- core APIs:
config.hpp,globals.hpp,io_utils.hpp,gpu_memory_utils.hpp,device_utils.cuh,host_device_utils.hpp,host_utils.hpp - math:
math_ops.hpp,math_sigma.hpp - interpolation:
index_mat.hpp,index_vec.hpp,interpolation_core.hpp - EOMs:
time_steps.hpp,runge_kutta.hpp,rk_data.hpp - search:
search_utils.hpp - simulation:
simulation_control.hpp,simulation_runner.hpp,simulation_data.hpp - convolution:
convolution/convolution.hpp - versioning:
version_info.hpp,version_compat.hpp - io runtime:
io/h5_runtime.hpp
- core APIs:
src/— implementations by modulecore/— config, initialization, IO, device/host utils, GPU memoryinterpolation/— index_mat, index_vec, dispatchersconvolution/— convolution kernelsmath/— math_ops and sigma kernels/wrappersEOMs/— time_steps and runge_kuttasearch/— CPU/GPU search utilitiessimulation/— runner and control logicio/— modular writers and helpers (runtime-optional HDF5 loader, params/history/compressed writers)version/— version info and compatibility
The executable provides a grid subcommand to generate the non‑equidistant θ/ϕ grids and the interpolation metadata consumed at runtime. Outputs are written under Grid_data/<L>/.
Quick example:
./RG-Evo grid --len 512 --Tmax 100000 --dir 512 \
--interp-method index-poly --interp-order 9
# Optional smooth index remapping:
./RG-Evo grid -L 512 --alpha 0.25 --delta 0.5 --dir 512-custom
Key flags:
--len Lgrid size (512/1024/2048 supported by default);--Tmax Xsets long‑time scale for θ.--interp-method {poly|rational|bspline}and--interp-order nchoose the interpolation used to precompute weights.--fh-stencil m(rational only) sets the Floater–Hormann window size (m ≥ n+1). Default m=n+1 (equivalent topoly); wider m (e.g., n+3..n+7) blends overlapping local stencils for additional robustness on irregular grids while keeping locality.--spline-order ssets the B‑spline degree forint.dat(default 5).--alpha Xin [0,1] and--delta X ≥ 0apply an optional smooth non‑linear index remapping for θ; defaults are 0 (paper‑exact grid). Values are recorded ingrid_params.txt.
Defaults if omitted: --Tmax 100000, --interp-method index-poly, --interp-order 9, --spline-order 5, and for rational the default --fh-stencil is n+1.
Artifacts written: theta.dat, phi1.dat, phi2.dat, int.dat, posA1y.dat, posA2y.dat, posB2y.dat, and interpolation indices/weights for A1/A2/B2. The file grid_params.txt captures provenance, including alpha/delta when used.
Folder layout under Grid_data/<L>/ (brief):
- Grids:
theta.dat(N),phi1.dat(N×N),phi2.dat(N×N) - Integration:
int.dat(N) — open‑clamped B‑spline quadrature of degrees(default 5) - Positions:
posA1y.dat,posA2y.dat,posB2y.dat - Interp metadata:
indsA1y.dat,weightsA1y.dat,indsA2y.dat,weightsA2y.dat,indsB2y.dat,weightsB2y.dat
Validation mode: add --validate (or -V) to recompute θ/ϕ/int and compare against the saved files in Grid_data/<L>/ without writing outputs; returns non‑zero on mismatch.
Note: at simulation startup, grids for -L are auto‑provisioned if missing (symlink reuse or on‑the‑fly generation). See README “Automatic grid provisioning at startup”.
- Entry/CLI
main.cpp: prints version, callsparseCommandLineArguments, theninitandrunSimulationparseCommandLineArguments—src/core/config.cu, declared ininclude/config.hpp
- Initialization/State
init—src/core/initialization.cu, declared ininclude/initialization.hpprollbackState—src/simulation/simulation_control.cu, declared ininclude/simulation_control.hpp- Simulation data structures —
include/simulation_data.hpp
- Simulation loop
runSimulation—src/simulation/simulation_runner.cu, declared ininclude/simulation_runner.hpp// I/O- Utilities & telemetry —
src/io/io_utils.cpp, declared ininclude/io/io_utils.hpp - HDF5 runtime loader —
src/io/h5_runtime.cpp, interface ininclude/io/h5_runtime.hpp - Writers —
src/io/io_save_hdf5.cpp,src/io/io_save_binary.cpp,src/io/io_save_params.cpp,src/io/io_save_compressed.cpp, orchestrated bysrc/io/io_save_driver.cpp
- Interpolation
- Matrix:
indexMatAll(CPU) —src/interpolation/index_mat.cu;indexMatAllGPU(GPU) — same file; API ininclude/index_mat.hpp - Vector:
indexVecLN3,indexVecN,indexVecR2(CPU) andindexVec*GPU(GPU) —src/interpolation/index_vec.cu; API ininclude/index_vec.hpp - High-level orchestration —
src/interpolation/interpolation_core.cu; API ininclude/interpolation_core.hpp
- Matrix:
- Math/Sigma
computeSigmaKandRKernel(GPU kernel) —src/math/math_sigma.cu; decl ininclude/math_sigma.hppSigmaK*,SigmaR*host/GPU wrappers —src/math/math_sigma.cu; decls ininclude/math_sigma.hpp- Misc math helpers —
include/math_ops.hpp
- Device/Host utils
SubtractGPU,scalarMultiply,AddSubtractGPU,FusedUpdate—src/core/device_utils.cu; API ininclude/device_utils.cuhandinclude/host_device_utils.hpp
- Convolution
- Host/GPU convolution routines —
src/convolution/convolution.cu; API ininclude/convolution.hpp
- Host/GPU convolution routines —
- EOMs (time integration)
- Time-stepping kernels/launchers —
src/EOMs/time_steps.cu; API ininclude/time_steps.hpp - Runge–Kutta variants —
src/EOMs/runge_kutta.cu; API ininclude/runge_kutta.hpp
- Time-stepping kernels/launchers —
- Search utilities
- CPU/GPU search and kernels —
src/search/search_utils.cu; API ininclude/search_utils.hpp
- CPU/GPU search and kernels —
- Versioning
- Version info and string —
src/version/version_info.cu; API ininclude/version_info.hpp - Compatibility analysis/checks —
src/version/version_compat.cu; API ininclude/version_compat.hpp
- Version info and string —
- Executable:
test_interp_weights(built by CMake; output placed in the project root). - Source:
src/grid/test_interpolation_weights.cpp(standalone; no DFME‑core linkage). - What it does: generates a θ grid (len=512,
Tmax=1e5) and verifies that- polynomial (
poly), - rational (
rational, with default stencil), - rational with wider window (
m=n+5), and - B‑spline (global) weights
reproduce monomials up to degree
nat 10 query points in [0,1]. It prints the max abs error per variant and exits non‑zero on failure; on success it ends with “All variants PASSED”.
- polynomial (
Run after a normal build from the repo root:
./test_interp_weights