The semantic analysis pass sits between parsing and IR lowering in the compilation
pipeline. Its primary role is information gathering, not strict type checking:
it walks the AST to build a rich collection of type metadata, constant values, and
function signatures that the IR lowerer consumes. The pass does not reject programs
with type errors (with a few exceptions noted below); instead, it populates the
SemaResult structure so the lowerer can make informed decisions without
re-deriving type information from the AST.
Parser AST ────> SemanticAnalyzer.analyze(&ast)
|
|-- symbol table (scoped)
|-- function signatures
|-- TypeContext (typedefs, struct layouts, enum constants)
|-- ExprTypeMap (expression -> CType annotations)
+-- ConstMap (expression -> compile-time IrConst values)
|
SemanticAnalyzer.into_result() --> SemaResult
|
Lowerer::with_type_context(sema_result.type_context,
sema_result.functions,
sema_result.expr_types,
sema_result.const_values, ...)
| File | Responsibility |
|---|---|
analysis.rs |
Main pass: AST walk, declaration/statement/expression analysis, -Wreturn-type, implicit declarations, TypeConvertContext trait impl |
type_context.rs |
TypeContext -- module-level type state (struct layouts, typedefs, enum constants, ctype cache) with undo-log scope management |
type_checker.rs |
ExprTypeChecker -- expression-level CType inference using only sema-available state |
const_eval.rs |
SemaConstEval -- compile-time constant evaluation returning IrConst values |
builtins.rs |
Static database mapping __builtin_* names and _mm_* intrinsics to their lowering behavior |
The analyzer is created, configured, run, and consumed in a strict linear sequence:
let mut sema = SemanticAnalyzer::new(); // 1. Create
sema.set_diagnostics(engine); // 2. Configure (optional)
sema.analyze(&ast)?; // 3. Run -- returns Err(count) on errors
let mut diagnostics = sema.take_diagnostics(); // 4. Reclaim diagnostic engine
let result: SemaResult = sema.into_result(); // 5. Consume -- moves ownershipnew() initializes the symbol table, creates an empty SemaResult, and
pre-populates implicit function declarations for common libc functions (printf,
malloc, memcpy, strcmp, etc.) so that C89-style code that calls functions
without prototypes produces warnings rather than errors.
analyze(&mut self, tu: &TranslationUnit) -> Result<(), usize> walks every
top-level declaration and function definition. Errors are emitted through the
DiagnosticEngine with source spans. The method returns Ok(()) if no hard
errors occurred, or Err(n) with the error count.
into_result(self) -> SemaResult consumes the analyzer and returns the
accumulated results. The method simply returns self.result. Both sema's
declaration processing and expression type inference use
TypeContext::next_anon_struct_id() (a single shared counter), so no separate
counter synchronization is needed.
The SemanticAnalyzer struct holds the following fields:
| Field | Type | Purpose |
|---|---|---|
symbol_table |
SymbolTable |
Scoped variable/function name resolution. |
result |
SemaResult |
Accumulated output for the lowerer. |
enum_counter |
i64 |
Auto-incrementing counter for enum variant values. |
diagnostics |
RefCell<DiagnosticEngine> |
Structured error/warning reporting. Uses RefCell so &self methods (e.g., eval_const_expr_as_usize) can emit diagnostics. |
defined_structs |
RefCell<FxHashSet<String>> |
Set of struct/union keys that have full definitions (not just forward declarations). Used to distinguish struct X { ... } from struct X; for incomplete type checking. Uses RefCell since resolve_struct_or_union takes &self. |
SemanticAnalyzer implements the TypeConvertContext trait from
common::type_builder. This trait is the mechanism for shared type-building
logic between sema and the lowerer. The trait provides a default
resolve_type_spec_to_ctype method that handles all 22 primitive C types,
pointers, arrays, and function pointers identically. SemanticAnalyzer
supplies the five required methods (four for type resolution, one for
constant expression evaluation):
| Trait Method | Sema Behavior |
|---|---|
resolve_typedef(name) |
Looks up type_context.typedefs; falls back to CType::Int. |
resolve_struct_or_union(...) |
Computes layout, inserts into TypeContext, tracks definitions in defined_structs. |
resolve_enum(...) |
Returns CType::Enum preserving variant values for packed-size computation. |
resolve_typeof_expr(expr) |
Delegates to ExprTypeChecker::infer_expr_ctype; falls back to CType::Int. |
eval_const_expr_as_usize(expr) |
Evaluates via SemaConstEval; emits "size of array is negative" error for negative values. |
SemaResult bundles everything the IR lowerer needs from semantic analysis:
pub struct SemaResult {
pub functions: FxHashMap<String, FunctionInfo>,
pub type_context: TypeContext,
pub expr_types: ExprTypeMap, // FxHashMap<ExprId, CType>
pub const_values: ConstMap, // FxHashMap<ExprId, IrConst>
}Every function encountered during analysis -- whether defined, declared as a
prototype, or implicitly declared from a call site -- gets a FunctionInfo
entry:
pub struct FunctionInfo {
pub return_type: CType,
pub params: Vec<(CType, Option<String>)>,
pub variadic: bool,
pub is_defined: bool,
pub is_noreturn: bool,
}The is_noreturn flag is sticky: if a prior declaration carries
__attribute__((noreturn)) or _Noreturn, the flag persists even when the
function definition does not repeat the attribute. Implicitly declared functions
(C89-style calls without prototypes) default to return_type: CType::Int,
variadic: true, is_defined: false.
The shared type-system state. Described in detail below.
Maps each AST expression node's ExprId (a stable identity derived from the
node's heap address) to its inferred CType. Populated bottom-up during the
AST walk: after recursing into sub-expressions, annotate_expr_type creates a
fresh ExprTypeChecker and records the result. The lowerer consults this map as
an O(1) fallback in get_expr_ctype() when its own IR-state-based inference
cannot determine the type.
Maps each expression's ExprId to its pre-computed IrConst compile-time
constant value. Populated alongside expr_types by annotate_const_value.
The lowerer uses this as an O(1) fast path before falling back to its own
eval_const_expr, which handles additional cases requiring IR-level state
(global addresses, runtime values).
TypeContext is the central type-system repository. Sema creates it, populates it
during the AST walk, and transfers it by ownership to the lowerer. The lowerer
continues to extend it (e.g., adding struct layouts discovered during lowering).
| Field | Type | Purpose |
|---|---|---|
struct_layouts |
RefCell<FxHashMap<String, RcLayout>> |
Struct/union field layouts indexed by tag name. Uses Rc<StructLayout> for cheap clone/lookup. |
enum_constants |
FxHashMap<String, i64> |
Enum constant name-to-value mapping. |
typedefs |
FxHashMap<String, CType> |
Typedef name-to-resolved-CType mapping. |
typedef_alignments |
FxHashMap<String, usize> |
Per-typedef alignment overrides from __attribute__((aligned(N))). |
function_typedefs |
FxHashMap<String, FunctionTypedefInfo> |
Bare function typedefs (e.g., typedef int func_t(int, int)). |
func_ptr_typedefs |
FxHashSet<String> |
Set of typedef names that are function pointer types. |
func_ptr_typedef_info |
FxHashMap<String, FunctionTypedefInfo> |
Function pointer typedef metadata. |
enum_typedefs |
FxHashSet<String> |
Typedef names that alias enum types (for unsigned bitfield treatment). |
packed_enum_types |
FxHashMap<String, EnumType> |
Packed enum definitions for forward-reference size lookups. |
func_return_ctypes |
FxHashMap<String, CType> |
Cached return types for known functions. |
ctype_cache |
RefCell<FxHashMap<String, CType>> |
Cache for named struct/union CType resolution. |
scope_stack |
RefCell<Vec<TypeScopeFrame>> |
Undo-log frames for scope management. |
anon_ctype_counter |
Cell<u32> |
Counter for generating unique anonymous struct/union keys. Uses Cell for interior mutability since type_spec_to_ctype takes &self. |
pub struct FunctionTypedefInfo {
pub return_type: TypeSpecifier, // AST type, NOT resolved CType
pub params: Vec<ParamDecl>, // AST declarations, NOT resolved types
pub variadic: bool,
}Note that return_type is a TypeSpecifier (an AST-level type representation)
and params is a Vec<ParamDecl> (AST parameter declarations), not resolved
CType values. This preserves the original AST syntax so the lowerer can
perform its own resolution with full lowering context. This applies to both
function_typedefs and func_ptr_typedef_info entries.
On construction, TypeContext::seed_builtin_typedefs() pre-populates typedef
mappings for standard C types (size_t, uint64_t, pid_t, FILE, pthread_t,
etc.) so that sema can resolve these types even when the source does not include
the relevant headers. The mappings are target-aware: on ILP32 (i686), size_t
maps to CType::UInt and int64_t maps to CType::LongLong; on LP64 (x86-64,
arm64, riscv64), they map to CType::ULong and CType::Long respectively.
Rather than cloning entire hash maps at scope boundaries (O(total-map-size) per
scope push), TypeContext uses an undo-log pattern. Each scope boundary
pushes a TypeScopeFrame that records:
- Added keys -- keys inserted during this scope (removed on pop).
- Shadowed keys -- keys that existed before and were overwritten, along with their previous values (restored on pop).
This gives O(changes-in-scope) cost for push/pop. The undo log tracks five
separate maps: enum_constants, struct_layouts, ctype_cache, typedefs,
and typedef_alignments.
pub struct TypeScopeFrame {
pub enums_added: Vec<String>,
pub struct_layouts_added: Vec<String>,
pub struct_layouts_shadowed: Vec<(String, RcLayout)>,
pub ctype_cache_added: Vec<String>,
pub ctype_cache_shadowed: Vec<(String, CType)>,
pub typedefs_added: Vec<String>,
pub typedefs_shadowed: Vec<(String, CType)>,
pub typedef_alignments_added: Vec<String>,
pub typedef_alignments_shadowed:Vec<(String, usize)>,
}Scope push/pop pairs are placed at:
- Function body entry/exit (both symbol table and type context).
- Compound statement (
{ ... }) entry/exit. forloop init clause (to scope loop-local declarations).
A key subtlety in pop_scope: when restoring a shadowed struct layout, the
system avoids restoring an empty forward-declaration layout over a full definition.
This ensures that a struct defined inside a function body but forward-declared
in an outer scope retains its complete layout after the inner scope exits.
Several TypeContext fields use RefCell (for maps) or Cell (for counters)
because type resolution methods that take &self -- particularly
type_spec_to_ctype called from the TypeConvertContext trait -- may need to
insert forward-declaration layouts or update the ctype cache. The scoped
insertion methods (insert_struct_layout_scoped_from_ref,
invalidate_ctype_cache_scoped_from_ref) borrow both the data map and the scope
stack to record undo entries.
ExprTypeChecker infers the CType of any AST expression using only sema-available
state: SymbolTable, TypeContext, and the FunctionInfo map. It does not depend
on IR-level state (allocas, IR types, global variable metadata), which makes it
suitable for use during semantic analysis before IR lowering begins.
The checker operates on immutable references and does not modify any state. It is
created fresh for each call to annotate_expr_type and borrows:
pub struct ExprTypeChecker<'a> {
pub symbols: &'a SymbolTable,
pub types: &'a TypeContext,
pub functions: &'a FxHashMap<String, FunctionInfo>,
pub expr_types: Option<&'a FxHashMap<ExprId, CType>>, // memoization cache
}The expr_types field enables memoization: because the AST walk in analyze_expr
is bottom-up, child expression types are already in the map when a parent expression
is analyzed. This turns what would be O(2^N) recursive re-evaluation on deeply nested
chains like 1+1+1+...+1 into O(N) with O(1) lookups.
The checker handles all expression forms in the AST:
- Literals:
IntLiteral->CType::Int,FloatLiteral->CType::Double,FloatLiteralF32->CType::Float,FloatLiteralLongDouble->CType::LongDouble,StringLiteral->Pointer(Char),WideStringLiteral->Pointer(Int),Char16StringLiteral->Pointer(UShort). Integer literal types are target-aware: on ILP32, values exceedingINT_MAXpromote toLongLong. Imaginary literal variants produce their complex type:ImaginaryLiteral->ComplexDouble,ImaginaryLiteralF32->ComplexFloat,ImaginaryLiteralLongDouble->ComplexLongDouble. - Identifiers: symbol table lookup, enum constant lookup (with GCC promotion
rules:
int->unsigned int->long long->unsigned long long), function signature lookup. - Unary operators:
&wraps inPointer,*peels offPointer/Array, arithmetic operators apply integer promotion,!producesint.__real__/__imag__(RealPart/ImagPart) extract the component type of a complex number (e.g.,ComplexDouble->Double). - Binary operators: comparison and logical operators produce
int; shift operators produce the promoted type of the left operand; arithmetic operators apply the usual arithmetic conversions; pointer arithmetic preserves the pointer type (or producesptrdiff_tfor pointer-pointer subtraction). GCC vector extensions are handled: if either operand is a vector type, the result is that vector type. - Casts: resolves the target
TypeSpecifiertoCType. - Function calls: looks up the callee's return type from the function table,
with special handling for
__builtin_choose_expr. sizeof/_Alignof: alwaysCType::ULong. Note: on ILP32,ULongis a 4-byte type, which is the correct size forsize_ton those platforms, even thoughseed_builtin_typedefs()mapssize_ttoCType::UInton ILP32. Both are 4-byte unsigned types, so the practical impact is nil.- Member access: resolves the struct/union type of the base expression, then looks up the field type in the struct layout.
- Ternary conditional: applies C11 6.5.15 composite type rules.
GnuConditional(Elvis operatora ?: b): applies the same composite type rules between the condition and the else expression.- Statement expressions (
({ ... })): type is the type of the last expression statement, with fallback resolution from declarations within the compound. _Generic: resolves the controlling expression type and matches against the association list.- Compound literals: type from the type specifier.
VaArg: type from the target type specifier.LabelAddr(&&label): alwaysPointer(Void).BuiltinTypesCompatibleP: alwaysCType::Int.
SemaConstEval evaluates constant expressions at compile time, returning IrConst
values that match the lowerer's richer result type. Like ExprTypeChecker, it
borrows sema state immutably and is created fresh per evaluation.
pub struct SemaConstEval<'a> {
pub types: &'a TypeContext,
pub symbols: &'a SymbolTable,
pub functions: &'a FxHashMap<String, FunctionInfo>,
pub const_values: Option<&'a FxHashMap<ExprId, IrConst>>, // memoization
pub expr_types: Option<&'a FxHashMap<ExprId, CType>>, // type cache
}| Expression | Handling |
|---|---|
| Integer/float/char literals | Delegates to shared common::const_eval::eval_literal |
Unary +, -, ~, ! |
Sub-int promotion, then arithmetic via common::const_arith |
| Binary ops | Type-aware signedness semantics: derives operand CTypes for width/signedness, delegates to shared eval_binop_with_types |
| Cast chains | Tracks bit-widths through nested casts; handles float-to-int, int-to-float, and 128-bit conversions with proper sign extension |
sizeof(type) / sizeof(expr) |
Resolves through sema's type system |
_Alignof(type) / __alignof__(type) |
C11 standard vs. GCC preferred alignment |
__alignof__(expr) |
Checks for explicit alignment on variable declarations |
Ternary ?: |
Evaluates condition, returns selected branch |
GnuConditional (Elvis a ?: b) |
Evaluates condition; if nonzero returns condition value, otherwise evaluates else branch |
__builtin_types_compatible_p |
Structural CType comparison |
offsetof patterns (&((T*)0)->member) |
Resolves struct field offsets, handles nested access and array subscripts |
| Enum constants | Direct i64 lookup in TypeContext.enum_constants |
| Builtin function calls | Delegates to shared common::const_eval::eval_builtin_call |
Logical && / ` |
The following require IR-level state and remain in the lowerer's const_eval.rs:
- Global address expressions (
&x, function pointers as values, array decay) func_stateconst local values- Pointer arithmetic on global addresses
Both const_values and expr_types caches prevent exponential blowup. For
deeply nested binary expressions (common in preprocessor-generated enum
initializers), each sub-expression is evaluated exactly once during the bottom-up
walk, and subsequent references are O(1) lookups.
A static LazyLock<FxHashMap<&str, BuiltinInfo>> maps GCC __builtin_* names
and Intel _mm_* intrinsic names to their lowering behavior. Each entry is a
BuiltinInfo with one of four BuiltinKind variants:
| Kind | Behavior | Examples |
|---|---|---|
LibcAlias(name) |
Emit a call to the named libc function | __builtin_memcpy -> memcpy, __builtin_printf -> printf |
Identity |
Return the first argument unchanged | __builtin_expect, __builtin_assume_aligned |
ConstantF64(val) |
Evaluate to a compile-time float constant | __builtin_nan -> NaN, __builtin_inf -> Infinity |
Intrinsic(kind) |
Requires target-specific codegen | __builtin_clz, __builtin_bswap32, _mm_set1_epi8, AES-NI, CRC32, etc. |
The BuiltinIntrinsic enum covers a wide range of operations:
- Bit manipulation: clz, ctz, popcount, bswap, ffs, parity, clrsb
- Overflow arithmetic:
__builtin_add_overflow,__builtin_sub_overflow,__builtin_mul_overflow(and unsigned/signed/predicate variants) - Floating-point classification: fpclassify, isnan, isinf, isfinite, isnormal, signbit, isinf_sign
- Floating-point comparison:
__builtin_isgreater,__builtin_isless,__builtin_islessgreater,__builtin_isunordered, etc. - Atomics:
__sync_synchronize(fence) - Complex numbers: creal, cimag, conj,
__builtin_complex - Variadic arguments: va_start, va_end, va_copy
- Alloca:
__builtin_alloca,__builtin_alloca_with_align(dynamic stack allocation) - Return/frame address:
__builtin_return_address,__builtin_frame_address - Thread pointer:
__builtin_thread_pointer(TLS base address) - Compile-time queries:
__builtin_constant_p,__builtin_object_size,__builtin_classify_type - Fortification:
__builtin___memcpy_chk,__builtin___strcpy_chk,__builtin___sprintf_chk, etc. (glibc_FORTIFY_SOURCEwrappers forwarded to their__*_chkruntime functions) - Variadic arg forwarding:
__builtin_va_arg_pack,__builtin_va_arg_pack_len(used in always-inline fortification wrappers) - CPU feature detection:
__builtin_cpu_init(no-op),__builtin_cpu_supports(conservatively returns 0) - Prefetch:
__builtin_prefetch(lowered as a no-op) - x86 SSE/SSE2/SSE4.1/AES-NI/CLMUL: complete 128-bit SIMD instruction coverage
The is_builtin() function extends the static map with direct recognition of
__builtin_choose_expr, __builtin_unreachable, and __builtin_trap (handled
by name in the lowerer's try_lower_builtin_call before the map lookup), plus
pattern-matched recognition of __atomic_* and __sync_* families (handled by
the lowerer's expr_atomics.rs). This prevents sema from emitting spurious
"implicit declaration" warnings for these builtins.
The analyzer maintains two parallel scope stacks:
-
SymbolTable(incommon::symbol_table) -- variable/function name resolution with lexical scoping.push_scope()/pop_scope()manage variable visibility. -
TypeContextscope stack -- undo-log for type-system state (struct layouts, typedefs, enum constants). Ensures that local struct definitions inside a function body do not overwrite global layouts.
Both stacks are pushed/popped together at function body boundaries and compound
statement boundaries. for loops get their own scope pair (for C99 loop-local
declarations like for (int i = 0; ...)).
When analyze_declaration encounters a declaration:
- Enum constants are collected recursively from the type specifier, walking into struct/union field types to catch inline enum definitions.
- The base type is resolved via
type_spec_to_ctype. For__auto_typedeclarations (a GCC extension), the type is inferred from the first declarator's initializer expression usingExprTypeChecker. - Typedefs are handled specially: the resolved CType is stored in
TypeContext.typedefs, function typedef info is extracted intofunction_typedefsorfunc_ptr_typedef_info, and alignment overrides are recorded intypedef_alignments. - Variable declarations build the full CType (including derived declarators
like pointers and arrays), resolve incomplete array sizes from initializers
(e.g.,
int arr[] = {1,2,3}), check for incomplete struct/union types, and register the symbol in the symbol table. - Function prototypes (declarations with
CType::Function) register in thefunctionsmap withis_defined: false.
analyze_expr recursively walks the expression tree:
- Identifier resolution: checks the symbol table, enum constants, builtin database, and function table. Emits an error for undeclared identifiers.
- Implicit function declarations: when a function call targets an unknown
name, sema emits a
-Wimplicit-function-declarationwarning and registers the function asreturn_type: Int, variadic: true. - Sub-expression recursion: descends into all child expressions.
- Type annotation (
annotate_expr_type): after sub-expressions are processed, creates anExprTypeCheckerand records the inferred CType inexpr_types. - Constant annotation (
annotate_const_value): creates aSemaConstEvaland records any compile-time constant value inconst_values.
The bottom-up order is critical: it ensures the memoization caches (expr_types,
const_values) are populated for child nodes before parent nodes are evaluated,
preventing exponential blowup on deep expression trees.
Sema emits structured diagnostics through the DiagnosticEngine, which supports:
- Errors:
- Undeclared identifiers
- Incomplete struct/union types in variable declarations ("storage size of 'struct/union X' isn't known")
- Incomplete struct/union types in
sizeofexpressions (check_sizeof_incomplete_type) - Non-integer switch controlling expressions
- Invalid struct/union field access ("'type' has no member named 'field'",
via
check_member_exists) - Negative array sizes ("size of array is negative", emitted from
eval_const_expr_as_usize) - Incompatible pointer/float implicit conversions in assignments, variable
initializers, and function call arguments (C11 6.5.16.1p1, via
check_pointer_float_conversion) - Incompatible pointer subtraction operand types (C11 6.5.6p3, via
check_pointer_subtraction_compat)
- Warnings:
-Wimplicit-function-declaration(C89-style implicit declarations),-Wreturn-type(non-void functions that may not return a value).
The diagnostic engine is transferred to sema via set_diagnostics and reclaimed
via take_diagnostics, preserving the source manager for span resolution.
Sema implements a control-flow analysis to detect non-void functions where control
can reach the closing } without a return statement. The analysis:
- Tracks whether each statement can "fall through" to the next.
- Recognizes
return,goto,break,continueas diverging statements. - Handles infinite loops (
while(1),for(;;)) as non-fall-through. - Analyzes
if/elsebranches: both must diverge for the wholeifto not fall through. - Handles
switchstatements: requires adefaultlabel and checks that no segment breaks out of the switch (segments that fall through to the next case are allowed). - Recognizes calls to noreturn functions (
abort,exit,__builtin_unreachable, and user-declared__attribute__((noreturn))functions). - Exempts
main()(C99 5.1.2.2.3: reaching}is equivalent toreturn 0), noreturn functions, and naked functions (body is pure inline assembly).
The driver orchestrates the handoff:
let sema_result = sema.into_result();
let lowerer = Lowerer::with_type_context(
target,
sema_result.type_context, // ownership transfer
sema_result.functions, // ownership transfer
sema_result.expr_types, // ownership transfer
sema_result.const_values, // ownership transfer
diagnostics,
gnu89_inline,
);
let (module, diagnostics) = lowerer.lower(&ast);The lowerer receives all four components by ownership:
TypeContextbecomes the lowerer's primary type-system state. It continues to extend it with struct layouts discovered during lowering.functionsprovides authoritative C-level function signatures. The lowerer falls back to these when its own ABI-adjustedFuncSiglacks info.expr_typesis consulted as a fast O(1) fallback inget_expr_ctype()after the lowerer's own IR-state-based inference.const_valuesis consulted as an O(1) fast path before the lowerer's owneval_const_expr, which handles additional cases requiring IR-level state.
Sema deliberately does not reject programs with type errors in most cases. The design philosophy is pragmatic: collect as much information as possible so the lowerer can do its job, rather than attempting full C type checking and risking rejection of valid programs that the type checker does not yet fully understand. The few hard errors sema does emit (undeclared identifiers, incomplete struct types in variable definitions, invalid member access, negative array sizes) are cases where the lowerer would definitely fail.
The choice to annotate every expression during a single bottom-up walk (rather
than on-demand top-down inference) was driven by performance. Preprocessor-generated
code frequently produces deeply nested expression chains (e.g., enum initializers
with hundreds of + operators). Without memoization, recursive type inference on
such chains exhibits O(2^N) behavior. The bottom-up walk with ExprTypeMap and
ConstMap caches guarantees O(N) total work.
The undo-log pattern for TypeContext scope management was chosen over
snapshot/clone because many scopes modify only a handful of entries while the
total map size can be large (hundreds of struct layouts, thousands of typedefs
from system headers). The undo-log makes scope push/pop proportional to the
number of changes within the scope rather than the total map size.
Expression type and constant annotations are keyed by ExprId, a type-safe wrapper
around each Expr node's heap address. This works because the AST is allocated once
during parsing and is never moved or reallocated before the lowerer consumes it.
This approach avoids the overhead of assigning explicit IDs to every AST node while
still providing stable, hashable identities.
Type inference and constant evaluation are separate modules because they serve different consumers and have different coverage:
ExprTypeCheckerproducesCTypefor every expression it can type.SemaConstEvalproducesIrConstonly for expressions that are compile-time constants.
Both share the same borrowed state pattern (immutable references, created fresh per invocation) and both use the same memoization caches, but they live in separate modules because their logic is fundamentally different (type algebra vs. arithmetic evaluation).
Pure arithmetic operations (literal evaluation, binary op arithmetic, sub-int
promotion) live in common::const_eval and common::const_arith, shared between
sema's SemaConstEval and the lowerer's const_eval.rs. This avoids duplicating
tricky signedness and bit-width logic while allowing each module to handle its
own context-specific cases.
-
Limited type checking. Sema validates pointer<->float type incompatibility in assignments, variable initializers, and function call arguments (C11 6.5.16.1p1), and checks pointer subtraction operand compatibility (C11 6.5.6p3), but does not perform full type compatibility checking. Return statement types, function pointer call arguments, and other implicit conversions are not validated. Programs with non-pointer/float type errors may pass sema and cause panics or incorrect codegen in the lowerer.
-
Incomplete
typeof(expr)support. TheExprTypeCheckerreturnsNonefor expressions it cannot type (complex expression chains through typeof, expressions requiring lowering-specific state). In such cases, sema falls back toCType::Int. -
No VLA support in constant evaluation. Variable-length arrays and runtime-computed array sizes are not handled by
SemaConstEval; they fall through to the lowerer. -
Conservative
-Wreturn-typeanalysis. The control flow analysis is intraprocedural and conservative. It may produce false positives (warning when all paths actually do return) in complex control flow patterns involving multiple gotos or computed jumps. It does not perform value-range analysis on conditions. -
Implicit function declarations. Sema pre-populates a set of common libc functions and emits warnings for unknown calls rather than errors. This matches C89 behavior but may mask genuine "undefined function" bugs.
-
Single-pass limitation. Sema makes a single forward pass over the AST. This means that a function called before its declaration (without a forward prototype) will be recorded as an implicit declaration with
intreturn type, even if the actual definition later in the file has a different return type. The lowerer handles such cases with its own multi-pass approach.