A versatile Python tool to extract comprehensive metadata and characteristics from files. For malware analysts, digital forensics, and SOC engineers.
β One Tool to Rule Them All: File Metadata & Static Analysis for Malware Analysts and SOC Teams (Medium article)
Step-by-step guide with screenshots: installation, fileinfo.py vs Basic_inf_gathering.py, hashes, strings, YARA, full static analysis, and MalwareBazaar workflow.
| Script | Use case |
|---|---|
fileinfo.py |
Recommended. Batch mode, JSON/CSV, PE/ELF/Mach-O, strings, optional YARA/ssdeep/tlsh. |
Basic_inf_gathering.py |
Single-file, human-readable table only (original behavior). |
- Hashes: MD5, SHA-1, SHA-256, SHA-384, SHA-512 (single pass); optional ssdeep and tlsh when installed.
- Formats: PE (Rich header, overlay, resources, signature, packing), ELF (entry, interpreter, sections/segments), Mach-O (entry, commands).
- Magic numbers: 60+ file types (executables, archives, documents, media).
- Strings: ASCII and UTF-16 LE extraction with configurable minimum length.
- Output: Human-readable table, JSON, or CSV; optional output file.
- Batch: Multiple files and/or recursive directory (
-r). - Optional YARA: Rule scanning when
yara-pythonand a rules file are provided. - Full static analysis (
--full): Maximum metadata without decompilation or code analysis:- Byte-level: null ratio, printable ratio, byte frequency, longest null run.
- Entropy map: per-block entropy to find packed/encrypted regions.
- Head/tail hex dump: first and last bytes for structure inspection.
- String patterns: URLs, IPv4, emails, Windows/Unix paths, registry keys (from raw bytes).
- PE deep: machine type, subsystem, DLL characteristics (ASLR, DEP, etc.), section table (name, size, entropy), full import/export lists, exphash, relocations, TLS callbacks, delay imports, Rich header, resource types, version info (FileVersion, CompanyName, etc.).
- ELF deep: class, machine, sections/segments, dynamic (NEEDED, RPATH, RUNPATH), exported/imported symbols, notes.
- Mach-O deep: CPU type, file type, dylibs, segments, UUID.
- Containers: ZIP file listing (names, sizes); OLE stream listing (optional
olefile).
git clone https://github.com/anpa1200/Basic-File-Information-Gathering-Script.git
cd Basic-File-Information-Gathering-Script
# On Debian/Ubuntu, ensure venv support: sudo apt install python3-venv
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
python3 -m pip install -r requirements.txt
# Optional: pip install ssdeep py-tlsh yara-python
# For real malware download + analysis: pip install requests pyzipper-
ModuleNotFoundError: No module named 'lief'β Install dependencies inside the project directory with the venv activated (see below). Run the script with the same Python that has the packages (e.g.python3 Basic_inf_gathering.py file.exeafter activating the venv). -
venv/bin/activate: No such file or directoryorvenv/bin/python3: No such file or directoryβ The venv was not fully created. On Debian/Ubuntu you need thepython3-venvpackage:sudo apt update sudo apt install python3-venv rm -rf venv python3 -m venv venv source venv/bin/activate python3 -m pip install -r requirements.txtThen run scripts with
python3 fileinfo.py ...orpython3 Basic_inf_gathering.py ...while the venv is active. -
venv/bin/pip: cannot execute: required file not foundβ The virtual environment is broken (e.g. Python path changed). Recreate it from the project root (whererequirements.txtlives), and install thepython3-venvpackage if needed (see above). Then:rm -rf venv python3 -m venv venv source venv/bin/activate python3 -m pip install -r requirements.txt -
No venv / prefer system install β You can install dependencies for your user with the system Python (no venv):
python3 -m pip install --user -r requirements.txt python3 fileinfo.py /path/to/file.exe
# Single file (table)
python3 fileinfo.py /path/to/file.exe
# Multiple files
python3 fileinfo.py file1.exe file2.bin
# Recursive directory
python3 fileinfo.py -r /path/to/samples/
# JSON output
python3 fileinfo.py --json /path/to/file.exe
# CSV (for spreadsheets / automation)
python3 fileinfo.py --csv -r ./samples/ -o report.csv
# Extra hashes + strings
python3 fileinfo.py --hashes md5,sha1,sha256,sha512 --strings --min-str-len 8 file.exe
# YARA scan
python3 fileinfo.py --yara /path/to/rules.yar file.exe
# Skip specific binary analysis
python3 fileinfo.py --no-elf --no-macho /path/to/pe_only/
# Full static analysis (max metadata, no decompilation)
python3 fileinfo.py --full /path/to/sample.exe
python3 fileinfo.py --full --json sample.exe -o full_report.json| Option | Description |
|---|---|
paths |
One or more files or directories. |
-r, --recursive |
Recurse into directories. |
--json |
Output JSON. |
--csv |
Output CSV (one row per file). |
-o, --output |
Write output to file. |
--hashes |
Comma-separated: md5, sha1, sha256, sha384, sha512. |
--no-fuzzy |
Disable ssdeep/tlsh. |
--strings |
Extract ASCII + Unicode strings. |
--min-str-len |
Minimum string length (default 6). |
--no-pe / --no-elf / --no-macho |
Skip that formatβs analysis. |
--yara |
Path to YARA rules file. |
--full |
Full static analysis: byte stats, entropy map, head/tail hex, string patterns (URLs, IPs, paths, registry), PE/ELF/Mach-O deep (sections, imports/exports, relocs, version info, etc.), ZIP/OLE listing. No decompilation. |
-v, --verbose |
Verbose errors to stderr. |
python3 Basic_inf_gathering.py <path_to_file>- Single file only.
- Output: detailed table to stdout (PE timestamp, imphash, hashes, entropy, permissions, magic, file type, digital signature, entry point, packer).
- Python 3.7+
- LIEF:
pip install lief - cryptography (optional, for PE certificate details):
pip install cryptography
# Table output for one PE
$ python3 fileinfo.py sample.exe
# JSON for automation
$ python3 fileinfo.py --json sample.exe -o report.json
# Batch CSV
$ python3 fileinfo.py --csv -r ./malware_samples/ -o summary.csv
# With strings and YARA
$ python3 fileinfo.py --strings --yara rules.yar sample.exe
# Maximum static info (no decompilation)
$ python3 fileinfo.py --full sample.exe
$ python3 fileinfo.py --full --json sample.exe -o full.jsonTo run the tool on real Windows PE malware and compare with MalwareBazaar metadata:
- Get a free API key from abuse.ch Authentication (required for MalwareBazaar).
- Install deps:
pip install requests pyzipper - Download a sample and run full analysis:
export ABUSE_CH_AUTH_KEY='your-key-here'
# Download one recent sample (API picks from recent detections) and run --full analysis
python3 download_malware_sample.py
# Download by known SHA256 (e.g. from a public report)
python3 download_malware_sample.py 9FDEA40A9872A77335AE3B733A50F4D1E9F8EFF193AE84E36FB7E5802C481F72
# Download by tag (e.g. Emotet, TrickBot) then analyze
python3 download_malware_sample.py --tag Emotet --limit 1Output (per sample) under malware_samples/<sha256>/:
- Extracted binary (or
sample.binif not zipped) our_analysis.jsonβ full static report fromfileinfo.py --full --jsonbazaar_info.jsonβ MalwareBazaar metadata (signature, imphash, ssdeep, tags, etc.) for comparison
Compare hashes, imphash, file type, and PE/string findings between our_analysis.json and bazaar_info.json (and any public report you have for that hash).
fileinfo.pyβ Main CLI: hashes, PE/ELF/Mach-O, strings, YARA,--fullstatic.static_analysis.pyβ Deep static analysis module (byte stats, entropy map, PE/ELF/Mach-O deep, string patterns, ZIP/OLE). Used when--fullis set.download_malware_sample.pyβ Download real Windows PE samples from MalwareBazaar (abuse.ch), run full analysis, save Bazaar metadata for comparison.Basic_inf_gathering.pyβ Legacy single-file table script.
- Python 3.8+ (for
fileinfo.py), 3.7+ (forBasic_inf_gathering.py) - LIEF: required for PE/ELF/Mach-O parsing
- cryptography: optional, for PE digital signature details
- ssdeep / py-tlsh: optional, for fuzzy hashing
- yara-python: optional, for YARA scanning
- olefile: optional, for OLE/compound document stream listing in
--full
| Resource | Link |
|---|---|
| Basic-File-Information-Gathering-Script (this repo) | GitHub Β· Medium: File Metadata & Static Analysis |
| Static-malware-Analysis-Orchestrator | GitHub β one-command pipeline (triage, strings, PE imports, unpack) Β· Medium: Full workflow |
| String-Analyzer | GitHub Β· Medium: String Analyzer Guide |
| PE-Import-Analyzer | GitHub Β· Medium: PE Import Analyzer Guide |
| Unpacker | GitHub Β· Medium: Unpacker Guide |
| Author | Medium @1200km |
See LICENSE for details.