AGENTS.md

AI Agent Guidelines for Stringy

@GOTCHAS.md

Critical Rules

These rules are non-negotiable. Violations will cause CI failures.

No unsafe code - #![forbid(unsafe_code)] enforced
Zero warnings - cargo clippy -- -D warnings must pass
ASCII only - No emojis, em-dashes, smart quotes, or Unicode punctuation (except when explicitly testing or working with Unicode strings or emojis)
File size limit - Keep files under 500 lines; split larger files
No blanket #[allow] - Any allow requires inline justification

Project Summary

Stringy extracts meaningful strings from ELF, PE, and Mach-O binaries using format-specific knowledge and semantic classification. Unlike standard strings, it is section-aware and semantically intelligent.

Rust: Edition 2024, MSRV 1.91
Data flow: Binary -> Format Detection -> Container Parsing -> String Extraction -> Deduplication -> Classification -> Ranking -> Output

Module Structure

Module	Purpose
`container/`	Format detection, section analysis, imports/exports via `goblin`
`extraction/`	ASCII/UTF-8/UTF-16 extraction, deduplication, PE resources
`classification/`	Semantic tagging (URLs, IPs, domains, paths, GUIDs), ranking
`output/`	Formatters: `json/`, `table/` (tty/plain), `yara/`
`pipeline/`	Orchestrator: config, filtering, score normalization, `Pipeline::run`
`types/`	Core data structures, error handling with `thiserror`

Key Patterns

Section Weights

Container parsers assign weights (1.0-10.0) based on string likelihood. Higher = more valuable. See existing parsers in container/*.rs for reference values.

Error Handling

Use thiserror with detailed context. Include offsets, section names, and file paths in error messages. Convert external errors with From implementations.

Public API Structs

Use #[non_exhaustive] for public structs and provide explicit constructors. When using #[non_exhaustive] structs internally, always use the constructor pattern (Type::new()) rather than struct literals - struct literals bypass the forward-compatibility guarantee. See GOTCHAS.md for struct literal update checklists.

Test-Only Code

For test utilities that shouldn't be in production builds:

Add #[cfg(test)] to both the struct/type definition AND any impl blocks
Use pub(crate) visibility for internal test helpers
Keep test infrastructure in #[cfg(test)] mod tests blocks within the module

CLI (clap)

Use idiomatic clap derive API patterns. Push validation into clap wherever possible -- use value_parser, PossibleValue, range constraints, and custom value parsers rather than manual post-parse validation. Keep main.rs thin by letting clap handle argument conflicts, defaults, and error messages. See GOTCHAS.md for clap pitfalls and test co-change requirements.

Current CLI Flags (main.rs)

Flag	Short	Type	Notes
`FILE`		positional	Input binary (use `-` for stdin)
`--json`	`-j`	bool	Conflicts with `--yara`
`--yara`		bool	Conflicts with `--json`
`--only-tags`		`Vec<Tag>`	Repeatable, `value_parser = Tag::from_str`
`--no-tags`		`Vec<Tag>`	Repeatable, runtime overlap check with `--only-tags`
`--min-len`	`-m`	`Option<usize>`	Custom parser enforces >= 1
`--top`	`-t`	`Option<usize>`	Custom parser enforces >= 1
`--enc`		`Option<CliEncoding>`	ascii, utf8, utf16, utf16le, utf16be
`--raw`		bool	Conflicts with `--only-tags`, `--no-tags`, `--top`, `--debug`, `--yara`
`--summary`		bool	Conflicts with `--json`, `--yara`; runtime TTY check
`--debug`		bool	Conflicts with `--raw`

Regex Patterns

Use std::sync::LazyLock for compiled regexes. Always use .expect("descriptive message") instead of .unwrap() for regex compilation - invalid regex patterns should fail fast with clear error messages.

Development Commands

just gen-fixtures # Generate test fixtures (ELF/PE/Mach-O via Zig cross-compilation)
just check      # Pre-commit: fmt + lint + test
just test       # Run tests with nextest
just lint       # Full lint suite
just fix        # Auto-fix clippy warnings
just ci-check   # Full CI suite locally
just build      # Debug build
just run <args> # Run stringy with arguments
just bench      # Run benchmarks
just format     # Format all (Rust, JSON, YAML, Markdown, Justfile)

CI Architecture

CI workflows use just recipes as single source of truth, except Quality/MSRV jobs
All other jobs: Use jdx/mise-action@v3 for tooling -- just recipes work here
See GOTCHAS.md for CI edge cases (Quality/MSRV jobs, mise cargo subcommands, Mergify).

Testing

Use insta for snapshot testing
Binary fixtures in tests/fixtures/
Integration tests use two naming patterns: integration_*.rs (CLI and format tests) and test_*.rs (extraction and filter tests)
Compiled fixtures (ELF, PE, Mach-O) are gitignored -- run just gen-fixtures before just test
Fixtures are cross-compiled via Zig (managed by mise) -- no Docker required
test_empty.bin and test_unknown.bin are committed (platform-independent)
Regenerate snapshots after changing test_binary.c: INSTA_UPDATE=always cargo nextest run
integration_flows_1_5.rs contains end-to-end CLI flow tests (quick analysis, filtering, top-N, JSON, YARA)
assert_cmd is non-TTY; use format_table_with_mode(..., true) to test TTY table output at the library level

Imports

Import from stringy::extraction or stringy::types, not deeply nested paths. Re-exports are in lib.rs. Pipeline types (Pipeline, PipelineConfig, FilterConfig, EncodingFilter) are re-exported from lib.rs. New public pipeline types must be added to both pipeline/mod.rs re-exports and lib.rs.

Key Dependencies

goblin - Binary format parsing (ELF, PE, Mach-O)
mmap-guard - Safe memory-mapped file I/O (wraps memmap2)
pelite - PE resource extraction
thiserror - Error type definitions
indicatif - Progress bars and spinners for CLI output
tempfile - Temporary file creation for stdin-to-Pipeline bridging in main.rs
insta - Snapshot testing (dev)
criterion - Benchmarking (dev)

Adding Features

New semantic tag: Add variant to Tag enum in types/mod.rs, implement pattern in classification/patterns/ or classification/mod.rs

New section weight: Add match arm in the relevant container/*.rs parser

New string extractor: Follow patterns in extraction/ module

Splitting large files: When a file exceeds 500 lines, convert to a module directory: foo.rs -> foo/mod.rs + foo/submodule.rs. Move related code to submodules while keeping public re-exports in mod.rs.

Open-Source Quality Standards (OSSF Best Practices)

Maintain these standards for OSSF Scorecard compliance:

Every PR Must

Sign off commits with git commit -s (DCO enforced by GitHub App)
Pass CI (clippy, rustfmt, tests, CodeQL, cargo-deny) before merge
Include tests for new functionality -- this is policy, not optional
Be reviewed (human or CodeRabbit) for correctness, safety, and style
Not introduce unwrap() in library code, unchecked errors, or unvalidated input

Every Release Must

Have human-readable release notes via git-cliff (not raw git log)
Use unique SemVer identifiers (vX.Y.Z tags)
Be built reproducibly (pinned toolchain, committed Cargo.lock, cargo-dist)

Security

Vulnerabilities go through private reporting (GitHub advisories or [email protected]), never public issues
cargo-deny and cargo-audit run in CI -- fix findings promptly
Medium+ severity vulnerabilities: we aim to release a fix within 90 days of confirmation (see SECURITY.md for canonical policy)

Documentation

Exported APIs require rustdoc comments with examples where appropriate
CONTRIBUTING.md documents code review criteria, test policy, DCO, and governance
SECURITY.md documents vulnerability reporting with scope, safe harbor, and PGP key
AGENTS.md must accurately reflect implemented features (not aspirational)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AI Agent Guidelines for Stringy

Critical Rules

Project Summary

Module Structure

Key Patterns

Section Weights

Error Handling

Public API Structs

Test-Only Code

CLI (clap)

Current CLI Flags (main.rs)

Regex Patterns

Development Commands

CI Architecture

Testing

Imports

Key Dependencies

Adding Features

Open-Source Quality Standards (OSSF Best Practices)

Every PR Must

Every Release Must

Security

Documentation

Uh oh!

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

AI Agent Guidelines for Stringy

Critical Rules

Project Summary

Module Structure

Key Patterns

Section Weights

Error Handling

Public API Structs

Test-Only Code

CLI (clap)

Current CLI Flags (main.rs)

Regex Patterns

Development Commands

CI Architecture

Testing

Imports

Key Dependencies

Adding Features

Open-Source Quality Standards (OSSF Best Practices)

Every PR Must

Every Release Must

Security

Documentation