The preprocessor is a text-to-text transformation pass that processes C preprocessing directives before lexing. It consumes raw source text and produces expanded source text annotated with GCC-compatible line markers, ready for the lexer/parser pipeline.
The preprocessor implements the ISO C11 translation phases 2-4:
- Line splicing -- backslash-newline sequences are joined.
- Comment replacement --
/* ... */block comments are replaced with a single space (per C11 5.1.1.2 phase 3);// ...line comments are stripped. - Directive processing and macro expansion --
#define,#include,#if/#ifdef/#elif/#else/#endif,#pragma,#error,#warning,#line, and#undefare handled. All macro references in non-directive lines are expanded.
The entire pass operates on raw text (not tokens). All scanning is done on
byte slices (&[u8]) for performance, since C preprocessor tokens are ASCII.
UTF-8 multi-byte sequences inside string and character literals are copied
verbatim without interpretation.
┌───────────────────────────────────────────┐
raw source ──> │ join_continued_lines (phase 2) │
│ strip_block_comments (phase 3) │
│ ┌─────────────────────────────────────┐ │
│ │ per-line loop: │ │
│ │ directive? → process_directive() │ │
│ │ active? → expand macros │ │
│ │ inactive? → emit blank line │ │
│ └─────────────────────────────────────┘ │
└──────────────┬────────────────────────────┘
│
expanded text + line markers
│
┌──────────────────────┼──────────────────────────┐
│ side channels: │ │
│ errors[] │ warnings[] │
│ weak_pragmas[] │ redefine_extname[] │
│ macro_expansion_info[] │
└─────────────────────────────────────────────────┘
The output is a single String containing expanded C source with embedded line
markers of the form # <line> "<file>" [flags] (following GCC conventions:
flag 1 = entering an include, flag 2 = returning from an include). The
downstream lexer uses these markers to maintain accurate source locations.
| Method | Description |
|---|---|
Preprocessor::new() |
Create a new instance with predefined macros and builtin macro definitions already loaded. |
preprocess(&mut self, source: &str) -> String |
Main entry point. Expands the source and returns text with line markers. |
preprocess_force_include(&mut self, content, path) |
Process a -include file. Macros persist; pragma synthetic tokens are collected for prepending. |
| Method | Description |
|---|---|
set_filename(name) |
Set __FILE__, __BASE_FILE__, and push onto the include stack. |
define_macro(name, value) |
Define an object-like macro (equivalent to -D). |
undefine_macro(name) |
Remove a macro (equivalent to -U). |
add_include_path(path) |
Append to the -I search list. |
add_quote_include_path(path) |
Append to the -iquote search list. |
add_system_include_path(path) |
Append to the -isystem search list. |
add_after_include_path(path) |
Append to the -idirafter search list. |
set_target(target) |
Switch architecture ("aarch64", "riscv64", "i686", "i386"), updating predefined macros and include paths. |
set_pic(enabled) |
Define or undefine __PIC__/__pic__. |
set_optimize(opt, size) |
Define __OPTIMIZE__ and __OPTIMIZE_SIZE__. |
set_gnu89_inline(gnu89) |
Toggle between GNU89 and C99 inline semantics macros. |
set_sse_macros(no_sse) |
Control __SSE__/__SSE2__/__MMX__ definitions. |
set_extended_simd_macros(...) |
Define __SSE3__, __SSSE3__, __SSE4_1__, __SSE4_2__, __AVX__, __AVX2__ based on flags. |
set_asm_mode(asm_mode) |
Enable assembly preprocessing mode: $ is not treated as an identifier character (for AT&T assembly $MACRO_NAME expansion), and #pragma pack / #pragma GCC visibility synthetic tokens are suppressed (since these are C parser-specific and would cause assembler errors). |
set_riscv_abi(abi) |
Override RISC-V float ABI macros (lp64 = soft-float, lp64f = single, lp64d = double). |
set_riscv_march(march) |
Override RISC-V extension macros based on -march= (removes F/D macros when extensions not present). |
set_strict_ansi(strict) |
Define or undefine __STRICT_ANSI__ (set for -std=cXX non-GNU modes; checked by glibc and other headers to gate GNU extensions). |
| Method | Description |
|---|---|
errors() -> &[PreprocessorDiagnostic] |
Collected #error directives and unresolved #include failures. |
warnings() -> &[PreprocessorDiagnostic] |
Collected #warning directives. |
weak_pragmas |
Vec<(String, Option<String>)> -- #pragma weak declarations. |
redefine_extname_pragmas |
Vec<(String, String)> -- #pragma redefine_extname declarations. |
take_macro_expansion_info() |
Per-line records of which macros were expanded, for "in expansion of macro" diagnostics. |
dump_defines() -> String |
GCC -dM compatible dump of all defined macros. |
Every macro is stored in a MacroTable (a FxHashMap<String, MacroDef>)
which also carries an asm_mode flag. When asm_mode is true, $ is not
treated as an identifier character during macro expansion, so that AT&T
assembly immediates like $MACRO_NAME correctly trigger expansion of
MACRO_NAME. Each macro definition carries these fields:
pub struct MacroDef {
pub name: String,
pub is_function_like: bool,
pub params: Vec<String>, // empty for object-like macros
pub is_variadic: bool, // last param is `...`
pub has_named_variadic: bool, // e.g., `args...` (GNU extension)
pub body: String, // replacement text
}#define VERSION 42Expansion is straightforward: every occurrence of the identifier in non-directive text is replaced with the body. After substitution, the result is rescanned for further macro expansion.
#define MAX(a, b) ((a) > (b) ? (a) : (b))Expansion follows the C11 algorithm (sections 6.10.3 -- 6.10.3.4):
-
Argument collection --
parse_macro_argsmatches the opening(through the balanced closing), splitting on commas at depth 0. String and character literals are skipped so that commas and parentheses inside them do not confuse the parser. -
Argument prescan -- every argument is fully macro-expanded (
expand_text) before substitution. Arguments adjacent to#or##use the raw (unexpanded) form instead. -
Stringification and token pasting --
handle_stringify_and_pasteprocesses#paramanda ## boperators (see below). -
Parameter substitution --
substitute_paramsreplaces parameter names in the body with (expanded) argument text. -
Rescanning -- the substituted body is re-expanded with the macro's own name added to the "currently expanding" set, preventing self-referential recursion.
#param wraps the raw (unexpanded) argument text in double quotes, escaping
embedded " and \ characters per C11 6.10.3.2.
a ## b concatenates the text of the left and right operands into a single
token. When an operand is a parameter, the raw argument text is used (not
the prescan-expanded text). The pasted result is wrapped in internal
paste-protection markers (0x02/0x03) to prevent re-substitution during
the same parameter replacement pass; these markers are stripped before the
final output.
An anti-paste guard (would_paste_tokens) inserts protective spaces between
adjacent tokens that would otherwise form unintended multi-character tokens
(e.g., two / becoming a // comment, two + becoming ++).
#define LOG(fmt, ...) printf(fmt, __VA_ARGS__)The variadic arguments (everything after the last named parameter) are
collected as a single comma-separated string. Both the standard ... /
__VA_ARGS__ form and the GNU named-variadic extension (args...) are
supported.
The GNU , ## __VA_ARGS__ extension is implemented: when __VA_ARGS__ is
empty and appears to the right of ## preceded by a comma, the comma is
removed. This enables the common pattern:
#define DBG(fmt, ...) fprintf(stderr, fmt, ## __VA_ARGS__)
DBG("hello") // expands to: fprintf(stderr, "hello")A "currently expanding" set (FxHashSet<String>) tracks which macros are
actively being expanded. When an identifier matches a macro in this set, it is
blue-painted -- prefixed with a sentinel byte (0x01) that prevents it
from being recognized as a macro during rescanning. All blue-paint markers are
stripped from the final output.
This set is allocated once per preprocessing run and cleared/reused for each
source line (expand_line_reuse), avoiding per-line allocation overhead.
When a source line has unbalanced parentheses (more ( than )), the
preprocessor accumulates subsequent lines into a pending_line buffer until
parentheses balance. This handles macro invocations whose arguments span
multiple lines:
SOME_MACRO(
arg1,
arg2,
arg3
)A safety limit (MAX_PENDING_NEWLINES = 100,000) prevents runaway
accumulation from genuinely unbalanced source. The preprocessor also detects
when a line ends with a function-like macro name (even after expansion) and
accumulates the next line in case it starts with (.
Conditional directives (#ifdef, #endif, etc.) that appear inside
accumulated multi-line arguments are still processed to keep the conditional
stack correct, but are not added to the pending text.
| Directive | Effect |
|---|---|
#if <expr> |
Push a conditional; evaluate the constant expression. |
#ifdef <name> |
Push; true if the macro is defined. |
#ifndef <name> |
Push; true if the macro is not defined. |
#elif <expr> |
Flip the current conditional; evaluate if no earlier branch was taken. |
#else |
Flip to the remaining branch. |
#endif |
Pop the conditional stack. |
The ConditionalStack maintains a Vec<ConditionalState> where each entry
tracks:
- Whether any branch in the
#if/#elif/#elsechain has been taken. - Whether the current branch is active.
- Whether the parent context is active (for proper nesting).
Lines inside an inactive branch are replaced with empty lines to preserve
line numbering. Nested #if blocks within inactive branches are tracked
(push/pop) but their conditions are not evaluated.
#if and #elif expressions go through a multi-stage pipeline:
-
resolve_defined_in_expr-- Replacedefined(X),defined X,__has_builtin(X),__has_attribute(X),__has_feature(X),__has_extension(X),__has_include(X), and__has_include_next(X)with1or0. -
Macro expansion -- The resolved expression is expanded through the standard macro expansion pipeline.
-
Second
resolve_defined_in_exprpass -- Catches cases where macro expansion introduced new__has_*()calls. -
replace_remaining_idents_with_zero-- Per the C standard, any remaining identifiers (excepttrue/false) evaluate to0. -
evaluate_condition-- First expands simple object-like macros in the condition, then evaluates viaeval_const_expr.
The expression evaluator (eval_const_expr) is a recursive-descent parser
supporting:
- Integer literals (decimal, hex
0x, octal0) withU/L/LL/ULLsuffixes. - Character literals with escape sequences.
defined(X)/defined X.- The ternary operator
? :. - Logical:
&&,||,!. - Bitwise:
&,|,^,~. - Shifts:
<<,>>. - Relational:
==,!=,<,>,<=,>=. - Arithmetic:
+,-,*,/,%(including unary-and+). - Parenthesized sub-expressions.
Per C99 6.10.1, arithmetic uses intmax_t (signed) or uintmax_t (unsigned)
depending on whether any operand has a U suffix. Hex and octal literals
that exceed INT64_MAX are implicitly unsigned.
The search order is GCC-compatible and depends on the include style:
Quoted includes (#include "file.h"):
- Directory of the currently-processed file.
- Directory of the original source file.
-iquotepaths.-Ipaths.-isystempaths.- Default system paths (bundled headers,
/usr/include, GCC internal headers). -idirafterpaths.
System includes (#include <file.h>):
-Ipaths.-isystempaths.- Default system paths.
-idirafterpaths.
Path resolution results are cached in an include_resolve_cache
(FxHashMap) keyed by (include_path, is_system, current_dir) to avoid
repeated filesystem stat() calls.
Paths are made absolute without resolving symlinks (make_absolute
instead of canonicalize), matching GCC behavior where #include "..."
searches are relative to the symlink location, not the symlink target.
#include_next searches for the header starting from the next include
path after the one that contained the currently-processed file. This is used
by system headers to layer architecture-specific headers over generic ones.
Falls back to regular #include resolution if the current file cannot be
located in any search path.
When the #include argument does not start with " or <, it is
macro-expanded first. All spaces in the resulting path are stripped by
normalize_include_path to remove both anti-paste spaces (inserted by
would_paste_tokens to prevent // comment formation) and inter-token
spaces preserved from macro bodies. This normalization is only applied to
macro-expanded paths; direct include paths are left unchanged.
- Maximum depth of 200 (matching GCC's default).
- Files without
#pragma onceor include guards may be re-included (intentional, to support patterns like TCC's conditional self-inclusion). - Only excessive nesting of the same file path triggers the depth limit.
After preprocessing an included file, the raw source is scanned for the classic include guard pattern:
#ifndef GUARD_MACRO
#define GUARD_MACRO
...
#endifDetection is intentionally conservative -- it returns None for files with
#else/#elif at the outermost level, code before #ifndef, or content
after #endif. When a guard is detected and the guard macro is still defined
on subsequent #include of the same file, re-processing is skipped entirely
(same optimization as GCC and Clang).
When well-known standard headers are included (e.g., stdarg.h, stdbool.h,
complex.h), the preprocessor injects compiler-builtin macro definitions
regardless of whether the real system header is found. For example, including
stdarg.h defines va_start, va_end, va_copy, and va_arg as macros
expanding to __builtin_* forms. The va_list, __builtin_va_list, and
__gnuc_va_list types are handled natively by the parser/sema/lowerer
pipeline (not injected as typedef text, which would break when stdarg.h
is included from nested headers).
When a standard header file cannot be found on disk (no system headers installed or cross-compiling without a sysroot), the preprocessor injects minimal fallback declarations so that compilation can proceed. These are only injected when the real header is absent -- when a project provides its own headers (e.g., musl, dietlibc), the fallbacks are not used and cannot cause conflicts.
| Header | Fallback Declarations |
|---|---|
stdio.h |
typedef struct _IO_FILE FILE; and extern FILE *stdin, *stdout, *stderr; |
errno.h |
extern int errno; |
complex.h |
creal, crealf, creall, cimag, cimagf, cimagl, conj, conjf, conjl, cabs, cabsf, carg, cargf function declarations |
Pragmas are dispatched by handle_pragma and produce either side-channel
data or synthetic tokens injected into the output for the parser.
| Pragma | Behavior |
|---|---|
#pragma once |
Marks the current file in pragma_once_files; subsequent #include of the same file returns empty. |
#pragma pack(N) |
Emits __ccc_pack_set_N ; synthetic token. Suppressed in asm_mode. |
#pragma pack() |
Emits __ccc_pack_reset ;. Suppressed in asm_mode. |
#pragma pack(push, N) |
Emits __ccc_pack_push_N ;. Suppressed in asm_mode. |
#pragma pack(push) |
Emits __ccc_pack_push_only ;. Suppressed in asm_mode. |
#pragma pack(pop) |
Emits __ccc_pack_pop ;. Suppressed in asm_mode. |
#pragma push_macro("X") |
Saves the current definition of macro X onto macro_save_stack. |
#pragma pop_macro("X") |
Restores the previously saved definition of macro X. |
#pragma weak sym |
Appends (sym, None) to weak_pragmas. |
#pragma weak sym = tgt |
Appends (sym, Some(tgt)) to weak_pragmas. |
#pragma redefine_extname old new |
Appends (old, new) to redefine_extname_pragmas. |
#pragma GCC visibility push(V) |
Emits __ccc_visibility_push_V ; synthetic token. Suppressed in asm_mode. |
#pragma GCC visibility pop |
Emits __ccc_visibility_pop ;. Suppressed in asm_mode. |
Unrecognized pragmas (including #pragma GCC diagnostic ...) are silently
ignored.
On construction, define_predefined_macros populates the macro table from a
static table of (name, body) pairs. Categories include:
| Category | Examples |
|---|---|
| Standard C | __STDC__, __STDC_VERSION__ (201710L / C17), __STDC_HOSTED__ |
| Platform | __linux__, __unix__, __LP64__, __ELF__ |
| Architecture | __x86_64__, __amd64__ (default); __aarch64__, __ARM_NEON (via set_target) |
| GCC compat | __GNUC__ (14), __GNUC_MINOR__ (2), __VERSION__, __GNUC_STDC_INLINE__ |
| sizeof | __SIZEOF_POINTER__ (8), __SIZEOF_INT__ (4), __SIZEOF_LONG__ (8), etc. |
| Type limits | __INT_MAX__, __LONG_MAX__, __SIZE_MAX__, etc. |
| Type names | __SIZE_TYPE__ (long unsigned int), __PTRDIFF_TYPE__ (long int), etc. |
| Float characteristics | __FLT_MAX__, __DBL_EPSILON__, __LDBL_MANT_DIG__, etc. |
| Byte order | __BYTE_ORDER__ (__ORDER_LITTLE_ENDIAN__) |
| Atomics | __GCC_ATOMIC_INT_LOCK_FREE (2), etc. |
| SIMD | __SSE__, __SSE2__, __MMX__ (x86_64 only) |
| Date/Time | __DATE__, __TIME__ (static values) |
__LINE__ and __COUNTER__ are not stored in the macro table. They use
Cell<usize> fields on MacroTable and are expanded by special-case code in
expand_identifier, avoiding per-line MacroDef allocation:
| Macro | Behavior |
|---|---|
__LINE__ |
Current line number; stored in a Cell<usize> and updated each line via set_line(). Respects #line overrides. |
__COUNTER__ |
Monotonically incrementing counter (Cell<usize>); increments on each expansion. |
__FILE__ and __BASE_FILE__ are stored in the macro table as regular
object-like macros but have dedicated update APIs to avoid full MacroDef
reallocation on every #include:
| Macro | Behavior |
|---|---|
__FILE__ |
Current filename; updated in-place via MacroTable::set_file() (mutates the existing entry's body rather than inserting a new MacroDef). |
__BASE_FILE__ |
Always the top-level input filename (set once by set_filename(), does not change during #include). |
These are recognized by is_defined() for #ifdef and evaluated specially
in #if expressions by resolve_defined_in_expr:
__has_builtin(X)-- returns1for supported compiler builtins.__has_attribute(X)-- returns1for supported GCC attributes.__has_feature(X)/__has_extension(X)-- always0.__has_include(X)/__has_include_next(X)-- probes the include path resolution machinery and returns1if the header exists.
Since the preprocessor can operate without system headers, builtin_macros.rs
defines essential macros that would normally come from standard headers:
| Header | Macros Provided |
|---|---|
<limits.h> |
CHAR_BIT, INT_MAX, LONG_MAX, LLONG_MIN, etc. |
<stdint.h> |
INT8_MIN .. UINT64_MAX, INTPTR_MAX, SIZE_MAX, INT64_C(x), UINT32_C(x), etc. |
<stddef.h> |
NULL, offsetof(type, member) |
<stdbool.h> |
bool, true, false (injected on #include <stdbool.h>) |
<stdatomic.h> |
__ATOMIC_RELAXED .. __ATOMIC_SEQ_CST, memory_order_* |
<float.h> |
FLT_MAX, DBL_EPSILON, LDBL_DIG, FLT_EVAL_METHOD, etc. |
<inttypes.h> |
PRId8 .. PRIx64, PRIuMAX, PRIuPTR, etc. |
Additionally, define_type_traits_macros provides macros that supplement
the standard headers or serve as fallbacks when system headers are unavailable:
| Category | Macros Provided |
|---|---|
| Type widths (GCC extension) | __CHAR_WIDTH__ (8), __SHRT_WIDTH__ (16), __INT_WIDTH__ (32), __LONG_WIDTH__ (64), __LONG_LONG_WIDTH__ (64), __PTRDIFF_WIDTH__, __SIZE_WIDTH__, __WCHAR_WIDTH__, __WINT_WIDTH__, __SIZEOF_WCHAR_T__ |
| Byte order | __BYTE_ORDER__ (1234), __ORDER_LITTLE_ENDIAN__ (1234), __ORDER_BIG_ENDIAN__ (4321), __ORDER_PDP_ENDIAN__ (3412) |
| C11/C17 feature test | __STDC_UTF_16__, __STDC_UTF_32__, __STDC_NO_ATOMICS__, __STDC_NO_VLA__ |
<stdlib.h> |
EXIT_SUCCESS (0), EXIT_FAILURE (1), RAND_MAX |
<stdio.h> |
EOF (-1), BUFSIZ (8192), SEEK_SET (0), SEEK_CUR (1), SEEK_END (2) |
<errno.h> |
ENOENT, EINTR, EIO, ENOMEM, EACCES, EEXIST, ENOTDIR, EISDIR, EINVAL, ENFILE, EMFILE, ENOSPC, EPIPE, ERANGE, ENOSYS |
<signal.h> |
SIGABRT, SIGFPE, SIGILL, SIGINT, SIGSEGV, SIGTERM, SIG_DFL, SIG_IGN, SIG_ERR |
| GCC version | __GNUC_PATCHLEVEL__ (0) |
The integer constant macros (INT64_C, UINT32_C, etc.) are function-like
macros that use ## for suffix pasting (e.g., INT64_C(x) expands to
x ## LL).
The returned String from preprocess() contains:
- An initial line marker:
# 1 "filename". - Expanded source lines with macros resolved.
- Blank lines preserving the original line numbering.
- Line markers at include boundaries (
# 1 "header.h" 1on entry,# N "parent.c" 2on return). - Synthetic pragma tokens (e.g.,
__ccc_pack_set_4 ;).
| Field | Type | Content |
|---|---|---|
errors |
Vec<PreprocessorDiagnostic> |
#error messages and unresolved #include paths, with file/line/col. |
warnings |
Vec<PreprocessorDiagnostic> |
#warning messages with file/line/col. |
weak_pragmas |
Vec<(String, Option<String>)> |
Weak symbol declarations and aliases. |
redefine_extname_pragmas |
Vec<(String, String)> |
External name redirections. |
macro_expansion_info |
Vec<MacroExpansionInfo> |
Maps output line numbers to expanded macro names for diagnostic rendering. |
PreprocessorDiagnostic carries file, line, col, and message fields,
enabling GCC-compatible file:line:col: error: / warning: output formatting.
| File | Purpose |
|---|---|
mod.rs |
Module declaration and re-exports. Exposes Preprocessor as the public type. |
pipeline.rs |
Core struct, preprocess() pipeline, directive dispatch (process_directive), line accumulation, and public configuration API. |
macro_defs.rs |
MacroDef and MacroTable types. All macro expansion logic: expand_text, expand_function_macro, handle_stringify_and_paste, substitute_params, argument parsing, blue-paint markers, paste-protection markers, and anti-paste guards. |
conditionals.rs |
ConditionalStack (push/pop state machine for #if nesting). evaluate_condition and expand_condition_macros for #if expression preprocessing. eval_const_expr recursive-descent expression evaluator and tokenizer. |
expr_eval.rs |
resolve_defined_in_expr (replaces defined(), __has_builtin(), __has_attribute(), __has_include(), etc. with 0/1). replace_remaining_idents_with_zero (C standard: undefined identifiers in #if are 0). |
includes.rs |
#include / #include_next handling. Include path resolution with GCC-compatible search order. Include guard detection (detect_include_guard). Path caching, symlink-preserving absolute path construction, computed include normalization, recursive inclusion depth limiting, and builtin header injection. |
predefined_macros.rs |
Static tables of predefined macros (standard C, platform, GCC compat, sizeof, type limits, float characteristics). set_target for architecture switching ("aarch64", "riscv64", "i686", "i386"). set_pic, set_optimize, set_gnu89_inline, set_sse_macros, set_extended_simd_macros, set_riscv_abi, set_riscv_march, set_strict_ansi configuration methods. bundled_include_dir and default_system_include_paths. |
pragmas.rs |
handle_pragma dispatcher. Handlers for once, pack, push_macro/pop_macro, weak, redefine_extname, and GCC visibility push/pop. Synthetic token emission for pack and visibility pragmas. |
builtin_macros.rs |
Macro definitions substituting for <limits.h>, <stdint.h>, <stddef.h>, <stdbool.h>, <stdatomic.h>, <float.h>, and <inttypes.h>. Includes both object-like and function-like (suffix-pasting) macros. |
text_processing.rs |
Low-level text transformations: strip_block_comments (with LineMap for line number remapping), join_continued_lines, has_unbalanced_parens, strip_line_comment (strips // comments outside string literals, returns Cow<str>), and split_first_word (splits directive keyword from arguments, treating ( as a word boundary). |
utils.rs |
Shared character/byte classification (is_ident_start, is_ident_cont, byte variants), bytes_to_str (&[u8] to &str for ASCII identifiers), string/char literal skipping and copying helpers. |
The preprocessor operates on raw text strings rather than a token stream. This is a deliberate trade-off:
- Simplicity: no need for a tokenizer that handles both pre- and post-expansion token grammars.
- Performance: byte-slice scanning with
&[u8]avoids allocation overhead. Hot paths usebytes_to_str(unchecked UTF-8 conversion for ASCII identifiers) and batch slice copies instead of per-character operations. - Compatibility: the output is a flat string with line markers, which is
the same interface GCC and Clang present to their parsers via
-E.
The FxHashSet<String> used for tracking "currently expanding" macros is
allocated once and reused across all lines via expand_line_reuse. A separate
reusable set (directive_expanding) is kept for directive-level expansions
(#if, #elif, #line, #error). This eliminates a measurable per-line
allocation overhead when preprocessing large headers.
Self-referential macro names are marked with a 0x01 prefix byte during
expansion. This is a single-byte sentinel chosen because it cannot appear in
valid C source. The markers survive through rescanning (preventing
re-expansion) and are stripped in one final pass over the output string.
Paste-protection markers (0x02/0x03) serve a similar role for ## results.
Several expansion helpers return Cow<str> to avoid allocating when the
common-case fast path applies (e.g., handle_stringify_and_paste returns
Cow::Borrowed(&mac.body) when the body contains no # characters,
strip_block_comments returns Cow::Borrowed(source) when there are no
block comments).
The preprocessor claims __GNUC__ 14, __GNUC_MINOR__ 2 to satisfy version
checks in the Linux kernel, glibc, QEMU, and other major projects. The
predefined macro set is tuned for an LP64 Linux/ELF/x86-64 target by
default, with set_target providing overrides for aarch64, riscv64,
i686, and i386.
__DATE__and__TIME__are static -- they expand to compile-time constants rather than the actual build timestamp._Float128is aliased tolong double-- on x86-64, this loses the difference between 80-bit extended precision and IEEE binary128.__has_featureand__has_extensionalways return0-- no Clang feature set is modeled._Pragma("...")is skipped -- the C99_Pragmaoperator is consumed and discarded during macro expansion rather than being executed.- No
#embedsupport -- the C23#embeddirective is not implemented. - No digraph/trigraph processing -- trigraphs (
??=,??/, etc.) and digraphs (<:,:>) are not translated in the preprocessor phase. - Expression evaluator does not support floating-point --
#ifexpressions are evaluated using integer arithmetic only (standard-compliant, since preprocessor expressions are integer constant expressions).