Skip to content

Add -strings flag to extract Go strings from binaries#77

Merged
stevemk14ebr merged 11 commits intomandiant:masterfrom
kami922:feature/strings-command
Feb 17, 2026
Merged

Add -strings flag to extract Go strings from binaries#77
stevemk14ebr merged 11 commits intomandiant:masterfrom
kami922:feature/strings-command

Conversation

@kami922
Copy link
Contributor

@kami922 kami922 commented Dec 29, 2025

fix Issue #45

Add -strings flag to extract Go strings from binaries

Summary

Implements Issue #45 by adding a -strings command-line flag that extracts embedded Go strings from compiled binaries. The implementation uses the FLOSS-inspired algorithm to detect and extract strings from the Go compiler's string internment table.

Changes

Commit 1: Infrastructure (07ee507)

  • Add Strings []string field to ExtractMetadata struct
  • Add -strings command-line flag with help text
  • Update function signatures to pass flag through call chain
  • Add string output section to printForHuman() for human-readable display
  • Wire up flag parsing and placeholder logic

Commit 2: Implementation (e2bdeb7)

  • Create objfile/strings.go (318 lines) with complete extraction algorithm
  • Implement string candidate detection (scan for pointer+length pairs)
  • Implement monotonic run detection to find string internment table
  • Add UTF-8 validation and printability filtering (min 4 chars, 80% printable)
  • Add helper methods to objfile/elf.go: getSections(), is64Bit(), isLittleEndian()
  • Replace placeholder with actual extraction call in main.go

Commit 3: Documentation (5a2eaf1)

  • Update README.md with -strings flag documentation
  • Add Strings field to example JSON output

Algorithm

Based on FLOSS floss/language/go/extract.py with adaptations:

  1. Scan binary sections for Go string structures (pointer + length pairs)
  2. Sort candidates by length to identify the pattern
  3. Find longest monotonically increasing run - Go's compiler stores strings sorted by length
  4. Extract and validate strings - UTF-8 validation, printability checks, minimum length filter
  5. Output in JSON or human-readable format

Testing

Tested with testproject/testproject (ELF binary):

  • ✅ Successfully extracts 512 strings
  • ✅ Includes runtime symbols: "bool", "func", "chan", "mheap", "gccheckmark"
  • ✅ Includes error messages: "broken pipe", "bad address", "file exists"
  • ✅ Proper filtering (no binary garbage, minimum length 4 characters)
  • ✅ Works with both JSON and -human output formats

Example usage:

# JSON output
./GoReSym -strings binary | jq '.Strings | length'

# Human-readable output
./GoReSym -strings -human binary

Current Limitations
ELF only: Currently only ELF binaries (Linux) are fully supported. Helper methods for PE (Windows) and Mach-O (macOS) can be added in follow-up if needed.
Standard strings only: Does not extract stack-constructed or encrypted strings (as discussed in issue, out of initial scope)
No deduplication: Same string may appear multiple times (user can pipe through sort -u if needed)

- Add Strings []string to ExtractMetadata struct
- Add -strings command-line flag for string extraction
- Update main_impl and main_impl_tmpfile signatures to accept printStrings parameter
- Add placeholder string extraction logic with TODO marker
- Update printForHuman to display extracted strings section
- Verified flag appears in help and outputs correctly in both JSON and human format

Part of mandiant#45
- Create objfile/strings.go with core extraction logic
- Implement FLOSS-based string internment table detection
- Add string candidate scanning (pointer + length pairs)
- Implement findLongestMonotonicRun() for pattern detection
- Add UTF-8 validation and printability filtering
- Minimum string length: 4 characters, 80% printable
- Add helper methods to elfFile: getSections(), is64Bit(), isLittleEndian()
- Update main.go to call file.ExtractStrings() instead of placeholder
- Tested with testproject/testproject: extracts 512 strings successfully
- Extracts real Go strings: type names, runtime symbols, error messages

Based on FLOSS floss/language/go/extract.py algorithm
Part of mandiant#45
- Add -strings flag to available flags list
- Add Strings field to example JSON output
- Document purpose: extract embedded Go strings from binary

Part of mandiant#45
@williballenthin
Copy link
Contributor

i think it's important to add some tests cases, ideally corroborated with FLOSS's output, to show this works as expected

@stevemk14ebr
Copy link
Collaborator

Thanks for your contribution! I am on holiday this week and will review likely next week or the following. In the meantime tests would be welcome as Willi suggests.

Per maintainer request, added comprehensive test suite:

- strings_floss_test.go: Validates GoReSym against FLOSS reference output
  * 99.2% match rate (648/653 strings match FLOSS)
  * Uses FLOSS output from testproject.exe as ground truth
  * Reference saved in testdata/floss_reference.txt

- strings_test.go: Additional unit tests for:
  * ELF and PE binary string extraction
  * Monotonic run detection algorithm
  * String filtering (printability, minimum length)

- pe.go: Added helper methods (getSections, is64Bit, isLittleEndian)
  to enable string extraction from PE binaries

All tests pass.
@kami922
Copy link
Contributor Author

kami922 commented Dec 31, 2025

@williballenthin @stevemk14ebr I have added test corroborated with Floss output as per review request.

@kami922
Copy link
Contributor Author

kami922 commented Jan 2, 2026

Hello i was working on issue #55 and accidentally pushed that commit to this branch i am working to solve this blunder sorry for inconvenience.

@kami922 kami922 force-pushed the feature/strings-command branch from bb7034e to 99633f0 Compare January 2, 2026 13:08
objfile/elf.go Outdated
}

// getSections returns all sections for string extraction
func (f *elfFile) getSections() ([]Section, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we include the section data in the returned sections array this could be potentially a very very large array (gigabytes in degenerate cases). We should use a generator here instead to help with memory pressures.

objfile/pe.go Outdated
}

// getSections returns all sections for string extraction
func (f *peFile) getSections() ([]Section, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, generator

func (e *Entry) getSections() ([]Section, error) {
// Use the rawFile interface to get sections
if sectioner, ok := e.raw.(interface {
getSections() ([]Section, error)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're missing a getSections implementation for macho I believe.

- Convert getSections() to iterateSections() using callback pattern to avoid memory pressure
- Add Strings field to GoReSym.proto for external parsers
- Implement iterateSections() for Mach-O format (previously missing)

Changes requested by @stevemk14ebr in review:
1. Memory optimization: Replace array-based section loading with generator pattern
2. Proto definition: Add 'repeated string strings = 13' field
3. Mach-O support: Add missing iterateSections() implementation
@stevemk14ebr
Copy link
Collaborator

stevemk14ebr commented Jan 13, 2026

I get a few test failures now that we have a new main argument. Can we extend the string testing to cover a few more binaries, these could be the test binaries we have already with checks that strings are correctly extracted from each and a reasonable number of strings, all printable.

# github.com/mandiant/GoReSym [github.com/mandiant/GoReSym.test]
./main_test.go:33:66: not enough arguments in call to main_impl
	have (string, bool, bool, bool, bool, number, string)
	want (string, bool, bool, bool, bool, int, string, bool)
./main_test.go:122:63: not enough arguments in call to main_impl
	have (string, bool, bool, bool, bool, number, string)
	want (string, bool, bool, bool, bool, int, string, bool)
./main_test.go:217:61: not enough arguments in call to main_impl
	have (string, bool, bool, bool, bool, number, string)
	want (string, bool, bool, bool, bool, int, string, bool)
./main_test.go:231:61: not enough arguments in call to main_impl
	have (string, bool, bool, bool, bool, number, string)
	want (string, bool, bool, bool, bool, int, string, bool)
FAIL	github.com/mandiant/GoReSym [build failed]

…t#77 feedback

- Fixed 4 test compilation errors by adding missing printStrings parameter to main_impl() calls
- Added comprehensive TestStringExtraction function with 7 test cases covering Linux/macOS/Windows binaries
- Implemented isPrintable() helper for ASCII validation (range 32-126)
@stevemk14ebr
Copy link
Collaborator

stevemk14ebr commented Jan 20, 2026

The current code ensures that >80% of characters are printable. We'd prefer to instead ensure all strings are fully printable. Can we align with the logic in FLOSS a little better for the string internment table locating, validation, and final string extraction?

At https://github.com/mandiant/flare-floss/blob/39c22434279f1c48045132077eba60453bf0dda8/floss/language/go/extract.py#L253 FLOSS finds the boundary of the string table and then it walks this table to get all the strings https://github.com/mandiant/flare-floss/blob/39c22434279f1c48045132077eba60453bf0dda8/floss/language/go/extract.py#L325. The closer we keep this logic to FLOSS the more confidence I will have in its correctness.

I'm still ok with not extracting stack strings, which will be an acceptable difference compared to FLOSS.

cc @mr-tz for visibility.

kami922 and others added 3 commits January 27, 2026 20:21
Rewrite string extraction to match FLOSS (extract.py) logic:
- Sort candidates by address (not length) to fix monotonic run detection
- Add image VA range and max section size filtering for candidates
- Use candidate (pointer, length) pairs for direct extraction from blob
- Replace 80% printable threshold with 100% fully printable check
- Fix PE section addresses to include ImageBase for correct VA comparison

Results: 648 strings extracted from PE test binary, 100% match with FLOSS.
@stevemk14ebr
Copy link
Collaborator

stevemk14ebr commented Jan 30, 2026

This fails to extract strings from a few of the test binaries. Please see how I've extended the unit tests and work to resolve the errors

…check

1. Add .text to data sections for old Go Windows binaries (1.7-1.10)
   that store strings in the code section instead of .rdata

2. Add maxReasonableStringLength (64KB) to filter out garbage candidates
   with huge lengths that cause incorrect blob boundary detection

3. Update TestIsDataSection to reflect .text being a valid data section

Fixes string extraction failures on:
- Old Windows PE binaries (Go 1.7-1.10): 0 -> 256 strings
- Modern Windows: 574 strings
- macOS: 284 strings

All tests pass including TestWeirdBins on 15 real binaries.
@kami922
Copy link
Contributor Author

kami922 commented Feb 9, 2026

@stevemk14ebr Hello, just dropping by to remind you about the pr.

@stevemk14ebr
Copy link
Collaborator

I'm still getting lots of test failures, can you please ensure all tests pass manually.

$ go test
--- FAIL: TestAllVersions (35.31s)
    --- FAIL: TestAllVersions/122/testproject_mac (0.22s)
        main_test.go:112: Go 122 Strings failed on testproject_mac: %!s(<nil>)
    --- FAIL: TestAllVersions/122/testproject_mac_stripped (0.17s)
        main_test.go:112: Go 122 Strings failed on testproject_mac_stripped: %!s(<nil>)
    --- FAIL: TestAllVersions/16/testproject_lin (0.11s)
        main_test.go:112: Go 16 Strings failed on testproject_lin: %!s(<nil>)
    --- FAIL: TestAllVersions/16/testproject_lin_32 (0.12s)
        main_test.go:112: Go 16 Strings failed on testproject_lin_32: %!s(<nil>)
    --- FAIL: TestAllVersions/16/testproject_lin_stripped (0.09s)
        main_test.go:112: Go 16 Strings failed on testproject_lin_stripped: %!s(<nil>)
    --- FAIL: TestAllVersions/16/testproject_lin_stripped_32 (0.09s)
        main_test.go:112: Go 16 Strings failed on testproject_lin_stripped_32: %!s(<nil>)
    --- FAIL: TestAllVersions/16/testproject_mac (0.41s)
        main_test.go:112: Go 16 Strings failed on testproject_mac: %!s(<nil>)
    --- FAIL: TestAllVersions/16/testproject_mac_stripped (0.20s)
        main_test.go:112: Go 16 Strings failed on testproject_mac_stripped: %!s(<nil>)
    --- FAIL: TestAllVersions/16/testproject_win_32.exe (0.07s)
        main_test.go:112: Go 16 Strings failed on testproject_win_32.exe: %!s(<nil>)
    --- FAIL: TestAllVersions/16/testproject_win_stripped_32.exe (0.07s)
        main_test.go:112: Go 16 Strings failed on testproject_win_stripped_32.exe: %!s(<nil>)
    --- FAIL: TestAllVersions/16/testproject_win_stripped.exe (0.07s)
        main_test.go:112: Go 16 Strings failed on testproject_win_stripped.exe: %!s(<nil>)
    --- FAIL: TestAllVersions/16/testproject_win.exe (0.07s)
        main_test.go:112: Go 16 Strings failed on testproject_win.exe: %!s(<nil>)
    --- FAIL: TestAllVersions/15/testproject_lin (0.13s)
        main_test.go:112: Go 15 Strings failed on testproject_lin: %!s(<nil>)
    --- FAIL: TestAllVersions/15/testproject_lin_32 (0.10s)
        main_test.go:112: Go 15 Strings failed on testproject_lin_32: %!s(<nil>)
    --- FAIL: TestAllVersions/15/testproject_lin_stripped (0.11s)
        main_test.go:112: Go 15 Strings failed on testproject_lin_stripped: %!s(<nil>)
    --- FAIL: TestAllVersions/15/testproject_lin_stripped_32 (0.09s)
        main_test.go:112: Go 15 Strings failed on testproject_lin_stripped_32: %!s(<nil>)
    --- FAIL: TestAllVersions/15/testproject_mac (0.49s)
        main_test.go:112: Go 15 Strings failed on testproject_mac: %!s(<nil>)
    --- FAIL: TestAllVersions/15/testproject_mac_stripped (0.22s)
        main_test.go:112: Go 15 Strings failed on testproject_mac_stripped: %!s(<nil>)
    --- FAIL: TestAllVersions/15/testproject_win_32.exe (0.09s)
        main_test.go:112: Go 15 Strings failed on testproject_win_32.exe: %!s(<nil>)
    --- FAIL: TestAllVersions/15/testproject_win_stripped_32.exe (0.08s)
        main_test.go:112: Go 15 Strings failed on testproject_win_stripped_32.exe: %!s(<nil>)
    --- FAIL: TestAllVersions/15/testproject_win_stripped.exe (0.09s)
        main_test.go:112: Go 15 Strings failed on testproject_win_stripped.exe: %!s(<nil>)
    --- FAIL: TestAllVersions/15/testproject_win.exe (0.09s)
        main_test.go:112: Go 15 Strings failed on testproject_win.exe: %!s(<nil>)
FAIL
exit status 1
FAIL	github.com/mandiant/GoReSym	53.477s

…typo

1. main_test.go: Skip Strings check for Go 1.5/1.6
   Pre-SSA C-based linker does not produce length-sorted string blobs,
   so findLongestMonotonicRun never reaches the minimum threshold of 10.
   Pattern mirrors the existing interface-parsing guard.

2. objfile/strings.go: Handle prevNull == -1 in findStringBlobRange
   Apple's linker on newer Go/macOS (1.22+) packs sections without leading
   padding, so no null bytes exist before the first string candidate.
   bytes.LastIndex returns -1 in that case; treat it as offset 0 instead
   of bailing out with nil.

3. objfile/macho.go: Add missing is64Bit/isLittleEndian for machoFile
   elfFile and peFile already implement these interface methods; machoFile
   was the only rawFile implementation without them. Detect CPU type
   (CpuAmd64/Arm64 = 64-bit) and byte order from the macho.File struct.

4. build_test_files.sh: Fix $ver typo -> $GO_VER on mkdir line
   The directory was never created before Docker ran, causing Go 1.5
   builds to produce no output. Now all 12 binaries are built for
   every version including 1.5 and 1.6.
@kami922
Copy link
Contributor Author

kami922 commented Feb 15, 2026

@stevemk14ebr Hello Initially for some reason the test binaries for go 1.5 and 1.6 were not created i am not sure why when i ran build_test_files it made binaries for other versions but not for 1.5 and 1.6. however all tests are now passing on my machine. Lemme know your thoughts

image

@kami922
Copy link
Contributor Author

kami922 commented Feb 15, 2026

@stevemk14ebr .
image

image image

@stevemk14ebr
Copy link
Collaborator

This now passes all the tests, and I have confirmed that version 1.5 does appear to not store strings sorted so the test condition disabling string verification for this version and 1.6 seems reasonable. Merging, thanks for your interest and contribution.

@stevemk14ebr stevemk14ebr merged commit d7f23b7 into mandiant:master Feb 17, 2026
2 checks passed
@anushkasharmaa1
Copy link

Hi! I saw the suggestion about tagging extracted strings with their start/end addresses. I’m new to the codebase, but I’d be interested in exploring adding this if it’s still useful. Happy to propose a small, additive change building on the existing -strings implementation.

@stevemk14ebr
Copy link
Collaborator

I think adding the string addresses would be reasonable, you're free to take a shot at this. Please provide unit tests and ensure it works on all the architectures and platforms we support. We'd at minimum want the file offset or the resolved virtual address, virtual addresses would be preferable but file offsets to a lesser degree would be acceptable if accurate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants