GitHub - stacknil/LogLens at codex/feat/csv-export-v0.3

LogLens C++20 defensive log analysis CLI for Linux authentication logs, with parser coverage telemetry, configurable detection rules, CI, and CodeQL. It parses `auth.log` / `secure`-style syslog input and `journalctl --output=short-full`-style input, normalizes authentication evidence, applies configurable rule-based detections, and emits deterministic Markdown and JSON reports, with optional CSV exports for findings and warnings. ## Project Status LogLens is an MVP / early release. The repository is stable enough for public review, local experimentation, and extension, but the parser and detection coverage are intentionally narrow. ## Why This Project Exists Many small security tools can detect a handful of known log patterns. Fewer tools make their parsing limits visible. LogLens is built around three ideas: - detection engineering over offensive functionality - parser observability over silent failure - repository discipline over throwaway scripts The project reports suspicious login activity while also surfacing parser coverage, unknown-line buckets, CI status, and code scanning hygiene. ## Scope LogLens is a defensive, public-safe repository. It is intended for log parsing, detection experiments, and engineering practice. It does not provide exploitation, persistence, credential attack automation, or live offensive capability. ## Repository Checks LogLens includes two minimal GitHub Actions workflows: - `CI` builds and tests the project on `ubuntu-latest` and `windows-latest` - `CodeQL` runs GitHub code scanning for C/C++ on pushes, pull requests, and a weekly schedule Both workflows are intended to stay stable enough to require on pull requests to `main`. Release-facing documentation is split across `CHANGELOG.md`, `docs/release-process.md`, `docs/release-v0.1.0.md`, and the repository's GitHub release notes. The repository hardening note is in `docs/repo-hardening.md`, and vulnerability reporting guidance is in `SECURITY.md`. ## Threat Model LogLens is designed for offline review of `auth.log` and `secure` style text logs collected from systems you own or administer. The MVP focuses on common, high-signal patterns that often appear during credential guessing, username enumeration, or bursty privileged command use. The current tool helps answer: - Is one source IP generating repeated SSH failures in a short window? - Is one source IP trying several usernames in a short window? - Is one account running sudo unusually often in a short window? It does not attempt to replace a SIEM, correlate across hosts, enrich IPs, or decide whether a finding is malicious on its own. ## Detections LogLens currently detects: - Repeated SSH failed password attempts from the same IP within 10 minutes - One IP trying multiple usernames within 15 minutes - Bursty sudo activity from the same user within 5 minutes LogLens currently parses and reports these additional auth patterns beyond the core detector inputs: - `Accepted publickey` SSH successes - `Failed publickey` SSH failures, which count toward SSH brute-force detection by default - `pam_unix(...:auth): authentication failure` - `pam_unix(...:session): session opened` - selected `pam_faillock(...:auth)` failure variants - selected `pam_sss(...:auth)` failure variants LogLens also tracks parser coverage telemetry for unsupported or malformed lines, including: - `total_lines` - `parsed_lines` - `unparsed_lines` - `parse_success_rate` - `top_unknown_patterns` LogLens does not currently detect: - Lateral movement - MFA abuse - SSH key misuse - Many PAM-specific failures beyond the parsed `pam_unix`, `pam_faillock`, and `pam_sss` sample patterns - Cross-file or cross-host correlation ## Build `bash cmake -S . -B build cmake --build build ctest --test-dir build --output-on-failure` For fresh-machine setup and repeatable local presets, see `docs/dev-setup.md`. ## Run `bash ./build/loglens --mode syslog --year 2026 ./assets/sample_auth.log ./out ./build/loglens --mode journalctl-short-full ./assets/sample_journalctl_short_full.log ./out-journal ./build/loglens --config ./assets/sample_config.json ./assets/sample_auth.log ./out-config ./build/loglens --mode syslog --year 2026 --csv ./assets/sample_auth.log ./out-csv` The CLI writes: - `report.md` - `report.json` into the output directory you provide. If you omit the output directory, the files are written into the current working directory. When you add `--csv`, LogLens also writes: - `findings.csv` - `warnings.csv` The CSV schema is intentionally small and stable: - `findings.csv`: `rule`, `subject_kind`, `subject`, `event_count`, `window_start`, `window_end`, `usernames`, `summary` - `warnings.csv`: `kind`, `message` When an input spans multiple hostnames, both reports add compact host-level summaries without changing detector thresholds or introducing cross-host correlation logic. ## Sample Output For sanitized sample input, see `assets/sample_auth.log` and `assets/sample_journalctl_short_full.log`. `report.md` summary excerpt: `markdown ## Summary - Input mode: syslog_legacy - Parsed events: 14 - Findings: 3 - Parser warnings: 2` `report.json` summary excerpt: `json { "input_mode": "syslog_legacy", "parsed_event_count": 14, "finding_count": 3, "warning_count": 2 }` The config file schema is intentionally small and strict: json { "input_mode": "syslog_legacy", "timestamp": { "assume_year": 2026 }, "brute_force": { "threshold": 5, "window_minutes": 10 }, "multi_user_probing": { "threshold": 3, "window_minutes": 15 }, "sudo_burst": { "threshold": 3, "window_minutes": 5 }, "auth_signal_mappings": { "ssh_failed_password": { "counts_as_attempt_evidence": true, "counts_as_terminal_auth_failure": true }, "ssh_invalid_user": { "counts_as_attempt_evidence": true, "counts_as_terminal_auth_failure": true }, "ssh_failed_publickey": { "counts_as_attempt_evidence": true, "counts_as_terminal_auth_failure": true }, "pam_auth_failure": { "counts_as_attempt_evidence": true, "counts_as_terminal_auth_failure": false } } } This mapping lets LogLens normalize parsed events into detection signals before applying brute-force or multi-user rules. By default, `pam_auth_failure` is treated as lower-confidence attempt evidence and does not count as a terminal authentication failure unless the config explicitly upgrades it. Timestamp handling is now explicit: - `--mode syslog` or `input_mode: syslog_legacy` requires `--year` or `timestamp.assume_year` - `--mode journalctl-short-full` or `input_mode: journalctl_short_full` parses the embedded year and timezone and ignores `assume_year` ## Example Input text Mar 10 08:11:22 example-host sshd[1234]: Failed password for invalid user admin from 203.0.113.10 port 51022 ssh2 Mar 10 08:12:10 example-host sshd[1235]: Accepted password for alice from 203.0.113.20 port 51111 ssh2 Mar 10 08:15:00 example-host sudo: alice : TTY=pts/0 ; PWD=/home/alice ; USER=root ; COMMAND=/usr/bin/systemctl restart ssh Mar 10 08:27:10 example-host sshd[1243]: Failed publickey for invalid user svc-backup from 203.0.113.40 port 51240 ssh2 Mar 10 08:28:33 example-host pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=203.0.113.41 user=alice Mar 10 08:29:50 example-host pam_unix(sudo:session): session opened for user root by alice(uid=0) Mar 10 08:30:12 example-host sshd[1244]: Connection closed by authenticating user alice 203.0.113.50 port 51290 [preauth] Mar 10 08:31:18 example-host sshd[1245]: Timeout, client not responding from 203.0.113.51 port 51291 `journalctl --output short-full` style example: text Tue 2026-03-10 08:11:22 UTC example-host sshd[2234]: Failed password for invalid user admin from 203.0.113.10 port 51022 ssh2 Tue 2026-03-10 08:13:10 UTC example-host sshd[2236]: Failed password for test from 203.0.113.10 port 51040 ssh Tue 2026-03-10 08:18:05 UTC example-host sshd[2238]: Failed publickey for invalid user deploy from 203.0.113.10 port 51060 ssh2 Tue 2026-03-10 08:31:18 UTC example-host sshd[2245]: Connection closed by authenticating user alice 203.0.113.51 port 51291 [preauth] ## Known Limitations - `syslog_legacy` requires an explicit year; LogLens does not guess one implicitly. - `journalctl_short_full` currently supports `UTC`, `GMT`, `Z`, and numeric timezone offsets, not arbitrary timezone abbreviations. - Parser coverage is still selective: it covers common `sshd`, `sudo`, `pam_unix`, and selected `pam_faillock` / `pam_sss` variants rather than broad Linux auth-family support. - Unsupported lines are surfaced as parser telemetry and warnings, not as detector findings. - `pam_unix` auth failures remain lower-confidence by default unless signal mappings explicitly upgrade them. - Detector configuration uses a fixed `config.json` schema rather than partial overrides or alternate config formats. - Findings are rule-based triage aids, not incident verdicts or attribution. ## Future Roadmap - Additional auth patterns and PAM coverage - Larger sanitized test corpus

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github		.github
assets		assets
docs		docs
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages