This directory contains utility scripts for deployment, operations, automation, and release management.
- k8s/ - Kubernetes deployment manifests and Helm charts
- systemd/ - systemd service files for Linux deployments
Synchronise GitHub milestones with the (Target: …) and (Issue: #NNN) annotations
found in every src/**/ROADMAP.md file.
sync-milestones-from-roadmap.py– Parse roadmaps, create missing milestones (Q2 2026 / Q3 2026 / Q4 2026 / Q1 2027, …) and assign open issues to them.
Usage:
# Preview what would happen (no GitHub API writes)
python3 scripts/sync-milestones-from-roadmap.py --dry-run
# Only generate the audit report (docs/issue-milestone-audit.md)
python3 scripts/sync-milestones-from-roadmap.py --audit-only
# Apply changes (requires GITHUB_TOKEN with issues:write scope)
GITHUB_TOKEN=ghp_... python3 scripts/sync-milestones-from-roadmap.py
# Verbose output (one line per issue)
GITHUB_TOKEN=ghp_... python3 scripts/sync-milestones-from-roadmap.py --verboseEnvironment variables:
| Variable | Description |
|---|---|
GITHUB_TOKEN |
PAT or GITHUB_TOKEN (Actions) with repo scope |
GITHUB_REPOSITORY |
Fallback owner/repo; defaults to makr-code/ThemisDB |
Audit report: docs/issue-milestone-audit.md is generated automatically and lists
all 665 issue references found in roadmaps, including the 550 issues that lack an
explicit (Target: …) annotation and require manual milestone assignment.
Tests: tests/test_sync_milestones.py
python3 -m pytest tests/test_sync_milestones.py -vCreate and reconcile GitHub issues from the consolidated src/ROADMAP.md
backlog. The script resolves each roadmap row to the linked
src/<module>/FUTURE_ENHANCEMENTS.md section, extracts acceptance criteria, and
can backfill issue references into the roadmap after successful creation. Generated
issues use governance-aligned area:*, priority:*, type:*, and status:*
labels and render the mandatory Context, Goal, Acceptance Criteria,
Relationships, and References sections in the issue body.
sync-issues-from-roadmap.py– Preview, apply, and backfill roadmap issues.
Usage:
# Generate issue bodies and preview reports only
python3 scripts/sync-issues-from-roadmap.py --mode preview
# Limit to a specific module or priority for batching
python3 scripts/sync-issues-from-roadmap.py --mode preview --module auth --limit 5
python3 scripts/sync-issues-from-roadmap.py --mode preview --priority critical
# Create missing issues with gh CLI and write Issue references back to src/ROADMAP.md
python3 scripts/sync-issues-from-roadmap.py --mode apply --backfill
# Backfill later from a prior apply manifest
python3 scripts/sync-issues-from-roadmap.py --mode backfill \
--manifest artifacts/roadmap-issues/roadmap-issues-apply.json
# Optional: run the same flow via workflow_dispatch in GitHub Actions
gh workflow run sync-roadmap-issues.yml -f mode=preview -f priority=critical -f limit=10Outputs:
artifacts/roadmap-issues/roadmap-issues-preview.jsonartifacts/roadmap-issues/roadmap-issues-preview.mdartifacts/roadmap-issues/bodies/*.mdartifacts/roadmap-issues/roadmap-issues-apply.jsonartifacts/roadmap-issues/roadmap-issues-summary.jsonartifacts/roadmap-issues/roadmap-issues-summary.md
The summary artifacts are written automatically after preview, apply, and
backfill runs and contain per-priority totals plus any remaining roadmap rows
without an issue reference.
Tests: tests/test_sync_issues_from_roadmap.py
python3 -m pytest tests/test_sync_issues_from_roadmap.py -vBuild binaries for historical version tags retroactively.
retroactive-release-builder.sh- Linux/macOS versionretroactive-release-builder.ps1- Windows (PowerShell) version
Purpose: Extract source code at specific version tags and build/package binaries for all past releases.
Documentation: See RETROACTIVE_RELEASE_BUILDING.md
Quick Start:
# List available tags
./scripts/retroactive-release-builder.sh --list-tags
# Build specific tag
./scripts/retroactive-release-builder.sh --tag v1.3.4 --platform linux
# Build all tags
./scripts/retroactive-release-builder.sh --all-tagsorchestrate-release.ps1- Multi-edition release orchestratorbuild-community-release.ps1- Build Community Editionbuild-enterprise-release.ps1- Build Enterprise Editionbuild-hyperscaler-release.ps1- Build Hyperscaler Editionprepare-release.sh/prepare-release.ps1- Prepare releasescreate-github-release.ps1- Create GitHub releasesbuild-release-packages.sh- Build Linux packages
Automate the entire Git Flow release process: develop → release/vX.X.X → main (+ tag) → retroactive build
complete-release.sh- Linux/macOS automated releasecomplete-release.ps1- Windows automated release
Purpose: Complete release workflow following Git Flow branching strategy
Documentation: See RETROACTIVE_RELEASE_GITFLOW.md
Quick Start:
# Complete release workflow
./scripts/complete-release.sh 1.5.0
# Dry run to preview
./scripts/complete-release.sh 1.5.0 --dry-run
# Skip retroactive build
./scripts/complete-release.sh 1.5.0 --skip-buildScripts for deploying ThemisDB in various environments.
Scripts for database operations, maintenance, and monitoring.
Build and development automation scripts are located in the project root:
build.sh/build.ps1- Build scriptssetup.sh/setup.ps1- Development environment setupsync-wiki.ps1- Wiki synchronization
Scripts for maintaining documentation consistency:
add_doc_metadata.py- Automatically add structured YAML metadata to markdown files
Usage:
# Dry-run mode (preview what would change)
python3 scripts/add_doc_metadata.py --dry-run
# Add metadata to all markdown files
python3 scripts/add_doc_metadata.py
# Add metadata to specific files
python3 scripts/add_doc_metadata.py --files README.md CONTRIBUTING.mdThe script adds structured metadata including:
- Author (Themis DevTeam & Copilot)
- Document number (release tag or date)
- Creation and modification dates (from git history)
- First commit title
- Document title (first markdown heading)
- File path
See the add-doc-metadata workflow for automated execution.
Scripts for managing LLM models and running inferencing benchmarks:
download-ollama-models.ps1- Download models from Ollama and convert to GGUFprepare_release_mini_llm.py- Prepare a small GGUF release bundle for llama.cpp (default.gguf+ manifest)run-llm-benchmarks.ps1- Execute LLM inferencing benchmarkssetup-llm-benchmarks.ps1- Complete workflow: download + build + benchmark
Scripts for release packaging with bundled documentation and LLM assets:
generate_docs_database.py- Generate the precompiled documentation JSON databasegenerate_docs_rocksdb.py- Generate importer code for docs.db / RocksDB documentation bundleprepare_release_mini_llm.py- Download or stage a small GGUF model asmodels/default.gguf
Quick Start:
# Prepare release mini model bundle
python3 scripts/prepare_release_mini_llm.py --output-dir release/models
# Generate documentation database JSON
python3 scripts/generate_docs_database.py --output data/docs_database.jsonQuick Start:
# Download models and run benchmarks (all-in-one)
.\scripts\setup-llm-benchmarks.ps1
# Or step by step:
.\scripts\download-ollama-models.ps1 -ModelNames @("llama3.2:1b", "phi3:mini")
.\scripts\run-llm-benchmarks.ps1See LLM Benchmarking Guide for details.
Scripts for automated documentation quality assurance and validation:
docs-lint.py- Lint markdown documentation for structure and syntax issueslink-check.py- Validate internal and external linkstoc-check.py- Validate table of contents in MkDocs configurationvalidate-docs.sh- Run all documentation validation checks
Features:
- Heading hierarchy validation
- Markdown syntax checking
- Link validation (internal and external)
- TOC consistency checking
- File naming convention validation
- Cross-reference validation
Quick Start:
# Run all validation checks
./scripts/validate-docs.sh
# Run individual checks
python3 scripts/docs-lint.py
python3 scripts/link-check.py
python3 scripts/toc-check.py
# Generate JSON reports for CI/CD
python3 scripts/docs-lint.py --format json --output lint-report.jsonCI/CD Integration:
- Automated via
.github/workflows/documentation-validation.yml - Runs on PRs and pushes to main/develop branches
- Blocks merge if validation fails
Documentation:
Scripts for automatically downloading and importing German legal training data from multiple HuggingFace sources with automatic fallback:
ingest_legal_training_data.py- Download and convert German legal datasets from HuggingFace or local filesgenerate_legal_rocksdb.py- Generate C++ RocksDB importer for legal training data
joelito/legal_mc_de is no longer available. The system now supports multiple alternative datasets with automatic fallback.
Dataset Information:
- Primary:
joelNiklaus/MultiLegalPile(German subset) - RECOMMENDED - Alternative 1:
elenanereiss/german-ler - Alternative 2: Local custom datasets (JSON format)
- Language: German (de)
- Domain: Legal documents
- Default samples: 10,000
Features:
- Automatic fallback across multiple datasets
- Dataset availability checking
- Local file support for custom datasets
- ThemisDB JSON format conversion
- RocksDB database generation
- CMake build integration
Quick Start:
# List available datasets
python3 scripts/ingest_legal_training_data.py --list-datasets
# Download using automatic fallback (tries MultiLegalPile first)
python3 scripts/ingest_legal_training_data.py \
--output data/legal_training_data.json \
--max-samples 10000
# Specify a specific dataset
python3 scripts/ingest_legal_training_data.py \
--dataset joelNiklaus/MultiLegalPile \
--max-samples 10000
# Use a local dataset
python3 scripts/ingest_legal_training_data.py \
--local-file custom_legal_data.json
# Generate RocksDB importer
python3 scripts/generate_legal_rocksdb.py \
--method cpp \
--output data/legal_training.dbCMake Integration:
# Enable during build
cmake -B build -DTHEMIS_BUILD_LEGAL_TRAINING_DATA=ON
# Build the database
cmake --build build --target legal_training_dataDocumentation:
Scripts for security auditing and compliance verification:
comprehensive-code-audit.sh- Systematic security and compliance audit script
Purpose: Automate comprehensive security analysis covering SAST, dependency scanning, secret detection, container security, and dynamic analysis.
Compliance Coverage:
- BSI C5 (Cloud Computing Compliance Criteria Catalogue)
- ISO/IEC 27001 (Information Security Management)
- DSGVO/GDPR (EU Data Protection Regulation)
- NIS2 (Network and Information Security Directive)
- OWASP ASVS (Application Security Verification Standard)
- NIST Cybersecurity Framework
Features:
- Static Application Security Testing (SAST)
- cppcheck - C++ static analysis
- clang-tidy - Modern C++ linting and security checks
- Semgrep - Pattern-based security scanning
- Dependency & Supply Chain Security
- Trivy - Vulnerability scanning
- vcpkg dependency inventory
- Secret Detection
- Gitleaks - Credential and API key detection
- Container Security
- Dockerfile security analysis
- Dynamic Analysis
- Integration with Valgrind, ASAN, TSAN
- Recommendations for runtime testing
Quick Start:
# Full comprehensive audit
./scripts/comprehensive-code-audit.sh
# Quick audit (skip time-consuming checks)
AUDIT_QUICK=1 ./scripts/comprehensive-code-audit.sh
# Audit specific categories
./scripts/comprehensive-code-audit.sh --skip-dependencies --skip-dynamic
# Continue even if issues found
./scripts/comprehensive-code-audit.sh --continue-on-error
# View all options
./scripts/comprehensive-code-audit.sh --helpOutput:
- Audit results directory:
audit-results-<timestamp>/ - Comprehensive report:
audit-results-<timestamp>/comprehensive-audit-report.md - Category-specific reports in subdirectories (sast/, dependencies/, secrets/, etc.)
CI/CD Integration:
- Referenced in security audit workflows
- Can be triggered manually via GitHub Actions
Documentation:
- SECURITY.md - Security scanning section
- Security Compliance Investigation Template
- Full Audit Checklist
- Audit TODO List
Prerequisites:
# Ubuntu/Debian
sudo apt-get install cmake git build-essential cppcheck clang-tidy
# Install optional tools for full coverage
# Trivy - https://aquasecurity.github.io/trivy/
# Gitleaks - https://github.com/gitleaks/gitleaks
# Semgrep - https://semgrep.dev/Each script includes documentation in the header comments. Run scripts with -h or --help for usage information where applicable.
For detailed deployment and operations documentation, see:
Urheber: Themis DevTeam & Copilot
Dokumenten-Nr: Stand: 2026-02-17
Erstelldatum: 2026-02-17
Letzte Änderung: 2026-02-17
Commit-Titel: "Create documentation for build simplification proposals in ThemisDB repository."
Reviewer:
Titel: "ThemisDB Scripts"
Dateipfad: scripts/README.md