@GOTCHAS.md
These rules are non-negotiable. Violations will cause CI failures.
- No
unsafecode -#![forbid(unsafe_code)]enforced - Zero warnings -
cargo clippy -- -D warningsmust pass - ASCII only - No emojis, em-dashes, smart quotes, or Unicode punctuation (except when explicitly testing or working with Unicode strings or emojis)
- File size limit - Keep files under 500 lines; split larger files
- No blanket
#[allow]- Anyallowrequires inline justification
Stringy extracts meaningful strings from ELF, PE, and Mach-O binaries using format-specific knowledge and semantic classification. Unlike standard strings, it is section-aware and semantically intelligent.
- Rust: Edition 2024, MSRV 1.91
- Data flow: Binary -> Format Detection -> Container Parsing -> String Extraction -> Deduplication -> Classification -> Ranking -> Output
| Module | Purpose |
|---|---|
container/ |
Format detection, section analysis, imports/exports via goblin |
extraction/ |
ASCII/UTF-8/UTF-16 extraction, deduplication, PE resources |
classification/ |
Semantic tagging (URLs, IPs, domains, paths, GUIDs), ranking |
output/ |
Formatters: json/, table/ (tty/plain), yara/ |
pipeline/ |
Orchestrator: config, filtering, score normalization, Pipeline::run |
types/ |
Core data structures, error handling with thiserror |
Container parsers assign weights (1.0-10.0) based on string likelihood. Higher = more valuable. See existing parsers in container/*.rs for reference values.
Use thiserror with detailed context. Include offsets, section names, and file paths in error messages. Convert external errors with From implementations.
Use #[non_exhaustive] for public structs and provide explicit constructors. When using #[non_exhaustive] structs internally, always use the constructor pattern (Type::new()) rather than struct literals - struct literals bypass the forward-compatibility guarantee. See GOTCHAS.md for struct literal update checklists.
For test utilities that shouldn't be in production builds:
- Add
#[cfg(test)]to both the struct/type definition AND any impl blocks - Use
pub(crate)visibility for internal test helpers - Keep test infrastructure in
#[cfg(test)] mod testsblocks within the module
Use idiomatic clap derive API patterns. Push validation into clap wherever possible -- use value_parser, PossibleValue, range constraints, and custom value parsers rather than manual post-parse validation. Keep main.rs thin by letting clap handle argument conflicts, defaults, and error messages. See GOTCHAS.md for clap pitfalls and test co-change requirements.
| Flag | Short | Type | Notes |
|---|---|---|---|
FILE |
positional | Input binary (use - for stdin) |
|
--json |
-j |
bool | Conflicts with --yara |
--yara |
bool | Conflicts with --json |
|
--only-tags |
Vec<Tag> |
Repeatable, value_parser = Tag::from_str |
|
--no-tags |
Vec<Tag> |
Repeatable, runtime overlap check with --only-tags |
|
--min-len |
-m |
Option<usize> |
Custom parser enforces >= 1 |
--top |
-t |
Option<usize> |
Custom parser enforces >= 1 |
--enc |
Option<CliEncoding> |
ascii, utf8, utf16, utf16le, utf16be | |
--raw |
bool | Conflicts with --only-tags, --no-tags, --top, --debug, --yara |
|
--summary |
bool | Conflicts with --json, --yara; runtime TTY check |
|
--debug |
bool | Conflicts with --raw |
Use std::sync::LazyLock for compiled regexes. Always use .expect("descriptive message") instead of .unwrap() for regex compilation - invalid regex patterns should fail fast with clear error messages.
just gen-fixtures # Generate test fixtures (ELF/PE/Mach-O via Zig cross-compilation)
just check # Pre-commit: fmt + lint + test
just test # Run tests with nextest
just lint # Full lint suite
just fix # Auto-fix clippy warnings
just ci-check # Full CI suite locally
just build # Debug build
just run <args> # Run stringy with arguments
just bench # Run benchmarks
just format # Format all (Rust, JSON, YAML, Markdown, Justfile)- CI workflows use
justrecipes as single source of truth, except Quality/MSRV jobs - All other jobs: Use
jdx/mise-action@v3for tooling --justrecipes work here - See GOTCHAS.md for CI edge cases (Quality/MSRV jobs, mise cargo subcommands, Mergify).
- Use
instafor snapshot testing - Binary fixtures in
tests/fixtures/ - Integration tests use two naming patterns:
integration_*.rs(CLI and format tests) andtest_*.rs(extraction and filter tests) - Compiled fixtures (ELF, PE, Mach-O) are gitignored -- run
just gen-fixturesbeforejust test - Fixtures are cross-compiled via Zig (managed by mise) -- no Docker required
test_empty.binandtest_unknown.binare committed (platform-independent)- Regenerate snapshots after changing
test_binary.c:INSTA_UPDATE=always cargo nextest run integration_flows_1_5.rscontains end-to-end CLI flow tests (quick analysis, filtering, top-N, JSON, YARA)assert_cmdis non-TTY; useformat_table_with_mode(..., true)to test TTY table output at the library level
Import from stringy::extraction or stringy::types, not deeply nested paths. Re-exports are in lib.rs. Pipeline types (Pipeline, PipelineConfig, FilterConfig, EncodingFilter) are re-exported from lib.rs. New public pipeline types must be added to both pipeline/mod.rs re-exports and lib.rs.
goblin- Binary format parsing (ELF, PE, Mach-O)mmap-guard- Safe memory-mapped file I/O (wrapsmemmap2)pelite- PE resource extractionthiserror- Error type definitionsindicatif- Progress bars and spinners for CLI outputtempfile- Temporary file creation for stdin-to-Pipeline bridging inmain.rsinsta- Snapshot testing (dev)criterion- Benchmarking (dev)
New semantic tag: Add variant to Tag enum in types/mod.rs, implement pattern in classification/patterns/ or classification/mod.rs
New section weight: Add match arm in the relevant container/*.rs parser
New string extractor: Follow patterns in extraction/ module
Splitting large files: When a file exceeds 500 lines, convert to a module directory: foo.rs -> foo/mod.rs + foo/submodule.rs. Move related code to submodules while keeping public re-exports in mod.rs.
Maintain these standards for OSSF Scorecard compliance:
- Sign off commits with
git commit -s(DCO enforced by GitHub App) - Pass CI (clippy, rustfmt, tests, CodeQL, cargo-deny) before merge
- Include tests for new functionality -- this is policy, not optional
- Be reviewed (human or CodeRabbit) for correctness, safety, and style
- Not introduce
unwrap()in library code, unchecked errors, or unvalidated input
- Have human-readable release notes via git-cliff (not raw git log)
- Use unique SemVer identifiers (
vX.Y.Ztags) - Be built reproducibly (pinned toolchain, committed
Cargo.lock, cargo-dist)
- Vulnerabilities go through private reporting (GitHub advisories or [email protected]), never public issues
cargo-denyandcargo-auditrun in CI -- fix findings promptly- Medium+ severity vulnerabilities: we aim to release a fix within 90 days of confirmation (see SECURITY.md for canonical policy)
- Exported APIs require rustdoc comments with examples where appropriate
- CONTRIBUTING.md documents code review criteria, test policy, DCO, and governance
- SECURITY.md documents vulnerability reporting with scope, safe harbor, and PGP key
- AGENTS.md must accurately reflect implemented features (not aspirational)