Skip to content

Latest commit

 

History

History
1159 lines (992 loc) · 65.7 KB

File metadata and controls

1159 lines (992 loc) · 65.7 KB

Changelog

All notable changes to ThemisDB will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

Added

  • HammingCoder — RAID-2 / Hamming Shard-Level Error Correction (include/sharding/redundancy_strategy.h, src/sharding/redundancy_strategy.cpp)
    • HAMMING added to ErasureCodingAlgorithm enum; ErasureCoder::create(HAMMING) factory method returns a HammingCoder instance
    • HammingCoder::encode(): systematic XOR-based parity; parity shard p covers data shard j when bit p of (j+1) is set — classical Hamming assignment at block granularity
    • HammingCoder::decode(): iterative XOR repair; recovers all shards whose parity coverage allows; std::runtime_error on irrecoverable failure sets
    • No Galois-Field arithmetic — purely XOR-based; O(k × r × shard_size) encode/decode
    • 16 focused tests in tests/test_hamming_coder.cpp (HC_01..HC_16): chunk invariants, single/multi-shard failure, canonical Hamming(7,4) coverage verification, 1 MB round-trip, edge cases
    • HammingCoderFocusedTests CTest target registered

Security

  • Task Scheduler AuthZ Hardening (GAP-001) 🔐

    • Activated runtime permission checks in TaskScheduler for:
      • registerTask() → requires task:register
      • executeTaskNow() / executeDAG() → requires task:execute
      • registerFunction() → requires task:register_function and system_admin role
    • Added denied-access security audit events (UNAUTHORIZED_ACCESS) with structured justification metadata (required_permission, reason, justification).
    • HttpServer task create/execute routes now propagate authenticated request context (user, IP, permissions, roles, justification) into TaskScheduler thread-local context.
    • TaskSchedulerApiHandler::executeTask() now returns status=error when scheduler execution is rejected (e.g., missing permission), instead of reporting executed.
  • JWT/JWKS cache synchronization hardening 🔒

    • JWTValidator::fetchJWKS() now guarantees jwks_refreshing_ reset via RAII even on exceptional exits, preventing stuck refresh state under parallel validation/key-fetch error paths.
    • Added explicit header includes for std::mutex / std::condition_variable in include/auth/jwt_validator.h to keep synchronization primitives self-contained.
  • Docker Image Security Hardening 🔒

    • THEMIS_ENABLE_ENCRYPTED_STORAGE Build-ARG hinzugefügt (default: OFF): gocryptfs und fuse werden nur noch installiert, wenn der ARG explizit auf ON gesetzt wird. Dadurch entfällt Go-stdlib 1.22.2 aus dem Standard-Runtime-Image.
    • tar wird nach Paketinstallation aus dem Runtime-Image entfernt (apt-get purge -y --auto-remove tar).
    • CVE-Scan (Docker Scout) vor und nach den Änderungen: 39 CVEs (inkl. 3 CRITICAL, 11 HIGH) → 3 CVEs verbleibend (alle LOW/MEDIUM, kein upstream-Fix verfügbar).
    • Community-Image auf DockerHub veröffentlicht: themisdb/themisdb:latest und themisdb/themisdb:1.8.1-rc1.
    • Verbleibende CVEs: CVE-2024-2236 (libgcrypt20, LOW), CVE-2024-56433 (shadow, LOW), CVE-2025-45582 (tar, MEDIUM) — alle ohne upstream-Fix; Waiver dokumentiert in docs/audit-reports/cve-waivers.md.

Documentation

  • Module-Docs Sync 📚 — 2026-04-17

    • 58 Module indexiert; 761 Primary-Markdown-Dateien in src/ und include/
    • 0 Module ohne Sekundärdokumentation erkannt; Issues erzeugt
    • Sekundärdokumentation aktualisiert in docs/de/ und docs/en/
    • Tool: tools/module_docs_builder.py v1.0.0
  • Module-Docs Sync 📚 — 2026-04-16

    • 56 Module indexiert; 752 Primary-Markdown-Dateien in src/ und include/
    • 6 Module ohne Sekundärdokumentation erkannt; Issues erzeugt
    • Sekundärdokumentation aktualisiert in docs/de/ und docs/en/
    • Tool: tools/module_docs_builder.py v1.0.0
  • Module-Docs Sync 📚 — 2026-04-15

    • 56 Module indexiert; 752 Primary-Markdown-Dateien in src/ und include/
    • 6 Module ohne Sekundärdokumentation erkannt; Issues erzeugt
    • Sekundärdokumentation aktualisiert in docs/de/ und docs/en/
    • Tool: tools/module_docs_builder.py v1.0.0

[1.8.1-rc1] - 2026-04-04

Release Notes: docs/de/releases/RELEASE_NOTES_v1.8.1-rc1.md

Added

  • README: Comprehensive Technology & Feature Badges 🏷️
    • Added 11 badge categories to the README header showcasing ThemisDB capabilities:
    • Technology Stack: C++17/20, CUDA, Vulkan, RocksDB, llama.cpp
    • Multi-Model Capabilities: Relational (AQL), Vector (HNSW+FAISS), Graph (Property Graphs), Document, Geospatial (GeoJSON/R-tree), TimeSeries
    • Enterprise & Security: ACID (MVCC), TLS 1.3, PKI (X.509/GPG), RBAC, AES-256-GCM Encryption
    • AI/ML Integration: LLM-Ready, RAG, Vector Search, Embeddings, LoRA Fine-Tuning
    • Performance: GPU-Accelerated, SIMD, 45K WPS, 120K RPS
    • Distributed Systems: Sharding, Raft Replication, CDC
    • Query & Analytics: AQL, GraphQL, OLAP, Full-Text Search (BM25)
    • Data Integration: PostgreSQL Wire Import, Multi-Format Export, Content Pipeline
    • Observability: Prometheus, OpenTelemetry, Audit Logging
    • Quality Metrics: 41 Modules, 500K+ LOC, 3 Production-Ready Core Modules
    • Community: Chat (Slack), Forum (GitHub Discussions), Contributing Guide
    • All badges link to relevant src/ module directories using shields.io
  • Geo Module: Full GeoJSON RFC 7946 parsing 🌍
    • EWKBParser::parseGeoJSON() now handles all seven RFC 7946 geometry types: Point, MultiPoint, LineString, MultiLineString, Polygon, MultiPolygon, and GeometryCollection (including 3D variants with Z coordinates).
    • EWKBParser::toGeoJSON() serializes all seven geometry types.
    • EWKB parse() and serialize() now support all geometry types (types 4–7).
    • GeometryCollection is parsed recursively up to a depth of 8 to prevent stack overflow on adversarial input.
    • computeMBR() and computeCentroid() now recurse into nested sub-geometries.
    • WGS84 coordinate range validation: longitude must be in [-180, 180] and latitude in [-90, 90]; invalid coordinates throw std::runtime_error. Compile with -DTHEMIS_GEO_COMPAT_LAX to skip coordinate range validation during a migration window.
  • Geo Module: In-memory R-tree spatial index 🌳
    • New GeoRTree class (include/geo/geo_rtree.h, src/geo/geo_rtree.cpp): an in-memory R-tree index for GeometryInfo objects enabling sub-linear intersects and contains queries.
    • When compiled with THEMIS_GEO_BOOST_BACKEND and Boost.Geometry headers present, uses boost::geometry::index::rtree with rstar<16> splitting strategy.
    • Without Boost, automatically falls back to an O(n) linear MBR scan — semantically identical, no dependency required.
    • bulkLoad(entries) uses STR (Sort-Tile-Recursive) packing via the Boost bulk-insert constructor for 3–5× faster cold-start load compared to incremental insert().
    • memoryBytes() returns a conservative estimate of heap usage and logs the value via the existing structured audit log field geo_index_bytes_allocated.
    • 20 unit tests covering: empty index, insert, bulkLoad (including replace-on-reload), remove, clear, intersects (single/multiple/overlapping/world), contains (single/multiple/boundary), memory reporting, and move semantics.
  • Geo Module: ST_UNION and ST_DIFFERENCE geometry operations 🔷
    • New ISpatialComputeBackend::stUnion(geom1, geom2) and stDifference(geom1, geom2) virtual methods added to include/geo/spatial_backend.h.
    • CpuExactBackend (cpu-only, no Boost dependency): full implementation using the Greiner-Hormann polygon clipping algorithm (ACM TOG 1998) with fast-paths for containment, disjoint, and B-inside-A (returns polygon with hole ring). Point and Point-Polygon cases handled with simple coordinate logic.
    • BoostCpuExactBackend: implementation via boost::geometry::union_ and boost::geometry::difference; falls back to CpuExactBackend for non-polygon types.
    • GpuBatchBackend: delegates to getCpuExactBackend() with audit log and metrics records — same pattern as the existing stBuffer GPU fallback.
    • AQL functions ST_UNION(geom1, geom2) and ST_DIFFERENCE(geom1, geom2) registered in include/query/functions/geo_functions.h; return GeoJSON geometry.
    • 15 new unit tests in tests/geo/test_geo_st_union_difference.cpp (parameterised over cpu_exact and gpu_spatial backends) and 7 AQL-level tests added to tests/geo/test_aql_st_functions.cpp.

⚠️ Breaking Changes

  • GeoJSON strict parsing (EWKBParser::parseGeoJSON): coordinate values outside the WGS84 range (longitude [-180, 180], latitude [-90, 90]) now throw std::runtime_error. Previously, out-of-range coordinates were silently accepted. To restore the old lenient behavior for one release cycle, compile with -DTHEMIS_GEO_COMPAT_LAX=1.
  • Unknown geometry types now throw std::runtime_error with the message "GeoJSON: unsupported geometry type: <type>" instead of silently returning an empty geometry.

Changed

  • Config Architecture Reorganization 🗂️
    • Hierarchical Directory Structure: Reorganized all config files into logical categories
      • config/core/ - Core system configurations (config.yaml, security.yaml, updates.yaml)
      • config/platform/ - Platform-specific configs (rpi3, rpi4, rpi5, qnap)
      • config/ai_ml/ - AI/ML configurations (LLM, vision, LoRA, RAG)
      • config/security/ - Security & authentication configs (RBAC, PII, Kerberos)
      • config/compliance/ - Compliance & ethics (ethical guidelines, audit, governance)
      • config/performance/ - Performance optimizations (scaling, query cache, acceleration)
      • config/data_management/ - Data lifecycle (retention, redundancy, MIME types)
      • config/distributed/ - Distributed system configs (replication, sharding)
      • config/licensing/ - License configurations (community, enterprise)
      • config/networking/ - Network configurations (connection pooling)
      • config/content/ - Content processing (processors, edge types)
      • config/monitoring/ - Monitoring & observability (Prometheus metrics)
      • config/features/ - Feature flags and capability generation
      • config/assistants/ - Assistant configurations (docs, feedback)
      • config/processing/ - Stream/event processing (CEP rules)
      • config/deprecated/ - Deprecated/backup files
    • ConfigPathResolver Utility: Automatic backward compatibility layer
      • Resolves legacy paths to new hierarchical locations
      • Provides fallback mechanism with deprecation warnings
      • Zero breaking changes to existing code
      • Includes resolve(), tryResolve(), and mapLegacyToNew() methods
    • Updated C++ Code Paths: Updated all config loading code to use new structure
      • src/server/http_server.cpp - LoRA training config
      • src/server/mcp_server.cpp - LLM system prompts
      • src/utils/pii_detector.cpp - PII patterns
      • src/main_server.cpp - Core config, security, retention policies
      • src/index/vector_index.cpp - Scaling optimizations
      • src/content/mime_detector.cpp - MIME types
    • Comprehensive Documentation:
      • config/README.md - Complete directory structure overview
      • config/MIGRATION_GUIDE.md - Detailed migration instructions
      • Full path mapping table (60+ config files)
    • Benefits: Improved organization, better discoverability, scalability, backward compatibility

Added

  • Search Module v1.5.0 — 7 new search components 🔍

    • QueryExpander (include/search/query_expander.h): Synonym expansion with configurable max_expansions; Levenshtein-based spelling correction against a user-supplied vocabulary; alternative query generation; zero-result relaxation (drops last token). Tests: tests/test_query_expander.cpp (28 tests).
    • FuzzyMatcher (include/search/fuzzy_matcher.h): Levenshtein, Soundex, Metaphone, and N-gram (Dice-coefficient) similarity; public static utilities for direct use; wraps SecondaryIndexManager::scanFulltextFuzzy. Tests: tests/test_fuzzy_matcher.cpp (24 tests).
    • FacetedSearch (include/search/faceted_search.h): Per-field value-count facets (computeFacet), multi-column batch facets (computeFacets), numeric range-bucket facets (computeRangeFacet), and drill-down filter intersection (applyFacetFilters). Tests: tests/test_faceted_search.cpp (20 tests).
    • SearchAnalytics (include/search/search_analytics.h): Thread-safe query event log (circular eviction at Config::max_events); computeMetrics() returns average/p95/p99 latency, zero-result rate, and top-20 queries. Tests: tests/test_search_analytics.cpp (26 tests).
    • AutocompleteEngine (include/search/autocomplete.h): Prefix-index suggestions via SecondaryIndexManager::scanKeysRange; popular-query suggestions via SearchAnalytics; combined, deduplicated, score-ranked output. Tests: tests/test_autocomplete.cpp (18 tests).
    • LearningToRank (include/search/learning_to_rank.h): Dot-product linear re-ranker over a 6-dimensional RankingFeatures vector; online pairwise gradient-descent training from ClickEvent data; deterministic A/B variant routing via selectVariant() / rerankWithVariant(). Tests: tests/test_learning_to_rank.cpp (28 tests).
    • MultiModalSearch (include/search/multi_modal_search.h): Accepts ModalQuery components (TEXT / IMAGE / AUDIO / CUSTOM), dispatches to SecondaryIndexManager or VectorIndexManager, fuses via weighted RRF. searchTextAndImage() convenience method. Tests: tests/test_multi_modal_search.cpp (18 tests).
  • Search Module v1.4.0 — HybridSearch production hardening 🔍

    • Configurable vector metric: Config::vector_metric (COSINE / DOT / L2) — was hardcoded to COSINE; DOT and L2 now correctly convert distance to similarity.
    • Strict config validation: constructor throws std::invalid_argument on k == 0, rrf_k ≤ 0, negative weights, k > max_k, k_bm25/k_vector > max_candidates, empty default_table / default_column.
    • Resource limits: Config::max_k and Config::max_candidates bound unbounded index scans (default 10,000 each).
    • Score normalization edge cases: range == 0 now yields 1.0 for positive scores, 0.0 for zero scores.
    • Linear-combination pre-normalization: BM25 and vector scores are always normalized to [0,1] before weighting, eliminating scale incompatibility.
    • SearchStats: appended to every search() return; exposes bm25_ok, vector_ok, partial_result, bm25_count, vector_count.
    • Exception safety: search() catches all backend and fusion exceptions, logs via THEMIS_ERROR, and returns empty/partial results rather than throwing.
    • Thread-safety and exception-safety documentation added to header.
    • normalizeScores promoted to public static for direct testability.
    • Tests: test_hybrid_search.cpp (35+ tests), test_rrf_fusion.cpp (20 tests), test_score_normalization.cpp (15 tests), test_hybrid_search_integration.cpp (18 integration tests).
    • Benchmark: benchmarks/benchmark_hybrid_search.cpp.
  • Shard Repair / Anti-Entropy Engine 🔧 (include/sharding/shard_repair_engine.h)

    • Background anti-entropy scan: periodic checkDocumentHealth() across all shards; degraded documents are automatically queued for recovery
    • Repair worker thread: drains job queue via RedundancyStrategy::recoverDocument() (RAID-5/6 + Mirror modes)
    • On-demand triggers returning trackable job IDs: triggerRepair(shard_id), triggerFullScan(), triggerDocumentRepair(doc_id)
    • Per-shard ShardHealthReport: status HEALTHY / DEGRADED / FAILED / REBUILDING, scan + repair counters
    • Prometheus metrics forwarding: repair events forwarded to PrometheusMetrics and exposed via exportPrometheusMetrics() and ShardingMetricsHandler::getMetrics()
    • Admin API repair endpoints: POST /admin/repair, POST /admin/repair/scan, GET /admin/repair/{job_id}
    • AutoRecoveryManager::setRepairEngine(): wires legacy AutoRecoveryManager to delegate repairDocument() to the new engine
  • Improved Reed-Solomon erasure decoder

    • Replaced XOR-only parity (single-chunk recovery) with Vandermonde matrix systematic codec over GF(2⁸)
    • Recovers up to parity_shards simultaneously lost chunks — enables true RAID-6 dual-parity recovery
    • Both ReedSolomonCoder and CauchyReedSolomonCoder now validate missing_indices.size() <= parity_shards
  • v1.5.x Query Optimizer Production Integration 🎯

    • Shard Metadata Integration (preparatory): Integration point for metadata-backed row estimates
      • DistributedQueryCostModel::getShardRowCount() replaces hardcoded 10K constant with dynamic estimates
      • Currently uses hash-based heuristic; full MetadataShard integration planned for v1.5.1
      • Provides foundation for accurate cardinality estimation in distributed queries
      • Integrates with existing sharding infrastructure
    • Predicate-based Selectivity Estimation: Calculate query selectivity from predicates
      • DistributedQueryCostModel::calculatePredicateSelectivity() analyzes query patterns
      • Histogram-based estimation framework (extensible)
      • Column-specific heuristics: ID columns (0.1%), status (20%), names (5%)
      • Combined predicates use product of individual selectivities
      • Bounded selectivity: [0.01%, 100%]
    • Network Latency Monitoring (preparatory): Integration point for latency-aware query planning
      • DistributedQueryCostModel::measureShardLatency() provides latency integration hook
      • Currently uses naming-convention heuristics; Prometheus integration planned for v1.5.1
      • Enables locality detection (< 1ms latency threshold)
      • Network-aware parallelism optimization
      • Foundation for latency-aware join strategies
    • Comprehensive Integration Tests: tests/test_optimizer_v1_5_x_integration.cpp
      • Tests for shard metadata integration
      • Tests for selectivity calculation
      • Tests for network latency awareness
      • Tests for partition pruning
      • Full pipeline integration tests
  • v1.5.x FAISS Vector Search Improvements 🚀

    • ADC (Asymmetric Distance Computation) Tables: ~40% faster vector search
      • Enabled by default in AdvancedVectorIndex::Config
      • Precomputed distance tables for IndexIVFPQ
      • Optional polysemous hash tables for early termination
      • No accuracy trade-off (bit-exact results)
      • Minimal memory overhead (~1-2% of index size)
    • Configuration Options:
      • use_adc_tables: Enable ADC distance tables (default: true)
      • polysemous_ht: Polysemous codes for early termination (default: 0)
    • Performance Impact:
      • Search speed: ~40% faster (varies by dataset)
      • Particularly effective for high-dimensional vectors (>128d)
      • Higher throughput with lower query latency

Changed

  • Write-Amplification Optimization (v1.5.0)

    • Larger Memtables: Increased default memtable_size_mb from 256MB to 512MB
      • ~50% fewer L0 file flushes → ~30-40% reduction in write-amplification
      • Improves write throughput for data ingestion and high-write workloads
    • More Write Buffers: Increased default max_write_buffer_number from 3 to 6
      • Allows writes to continue during memtable flush operations
      • Reduces write stalls and improves sustained write throughput
    • Total Write Buffer Limit: Set db_write_buffer_size_mb default to 2048MB (2GB)
      • Previously unlimited (0), now has sensible default to prevent OOM with many column families
      • Auto-manages write buffer allocation across all column families
    • Async I/O Enabled by Default: Enhanced asynchronous I/O for better scan performance
      • enable_async_io now defaults to true (was false)
      • async_io_readahead_size_mb increased from 64MB to 128MB
      • Expected improvement: 2-5x faster sequential scans and range queries
    • Documentation: Added comprehensive "Write-Amplification Optimization" section to PERFORMANCE_TIPS.md
      • Explains write-amp problem and solutions
      • Tuning guidelines for different workloads (high-throughput, balanced, low-latency, memory-constrained)
      • Monitoring metrics and Prometheus queries
      • Best practices and configuration examples
    • Server Logging: Updated main_server.cpp to display new optimization settings
      • Shows memtable size, write buffer count, and async I/O status at startup
      • Displays optimization profile (write-optimized, high-throughput, balanced, or low-latency)
    • Trade-offs: Higher memtable memory (up to ~2GB capped by db_write_buffer_size_mb; theoretical 3-4GB if cap is raised), longer recovery time
    • Backward Compatibility: All settings can be overridden via configuration
    • Testing: Added comprehensive configuration test suite (test_write_amplification_config.cpp)
  • Documentation Consolidation for Beta/RC 📚

    • Archived 70+ historical documents (GAP analyses, old roadmaps, TODO lists, implementation summaries)
    • Organized archives into structured directories: gaps/, roadmaps/, todos/, implementation-summaries/
    • Updated documentation index to reflect current Beta/RC-ready status (v1.5.0-dev)
    • Streamlined navigation and removed outdated references
    • See docs/ARCHIVED/README.md for archive index

Added

  • HSM Security Warning System (FIND-002) 🔒
    • Startup Warning Banner: Prominent warning displayed when stub HSM provider is active
      • 80-character ASCII box with clear security messaging
      • Directs users to HSM production setup documentation
      • Can be suppressed in development with --allow-stub-hsm flag
    • Periodic Security Logging: ERROR-level warnings logged every 5 minutes when stub HSM is active
      • Persistent reminder of insecure configuration
      • Helps prevent accidental production deployment with stub provider
    • Prometheus Metrics: HSM security status exposed via /metrics endpoint
      • themis_hsm_insecure_config: Gauge indicating insecure configuration (0=secure, 1=insecure)
      • themis_hsm_provider_type{provider="stub|real"}: Provider type information
      • hsm_security_stub_active: Legacy metric name for backward compatibility
      • hsm_compliance_status{standard="..."}: Compliance status for NIST, ISO, PCI DSS, GDPR
    • Command-Line Flag: --allow-stub-hsm flag for development environments
      • Suppresses warning banner and periodic logging
      • Documented in help output (--help)
    • Documentation Updates:
      • QUICKSTART.md now includes prominent HSM security warning at top
      • Configuration examples show HSM settings with warnings
      • References to docs/security/HSM_PRODUCTION_SETUP.md throughout
    • Compliance: Addresses critical security finding FIND-002 from v1.4.1 audit
      • Prevents master encryption keys from being unprotected in production
      • Supports NIST SP 800-53 SC-12, ISO 27001 A.8.24, PCI DSS 3.6, GDPR Art. 32

Changed

  • main_server.cpp now initializes HSM provider at startup and validates security configuration

  • Prometheus metrics endpoint (/metrics) now includes HSM security metrics

  • Help output (--help) now lists --allow-stub-hsm flag

  • Multi-GPU Vector Indexing API (v2.4) 🎉

    • MultiGPUVectorIndex: Multi-device API and partition/merge scaffolding for distributed vector search
      • Logical support for 2-8 devices via index partitioning (round-robin, hash-based, range-based, balanced)
      • Query fan-out and centralized top-k merge logic for aggregating per-partition results
      • Designed for future distributed search across multiple GPUs once GPU backends are available
      • Current execution: Uses CPU-based GPUVectorIndex backend (no actual multi-GPU execution yet)
      • Fault-tolerant design with graceful degradation when partitions are unavailable
      • GPU execution and collectives: Planned for v2.5+ (NCCL/RCCL, P2P transfers, actual GPU offload)
    • API Features (scaffolding):
      • enableMultiGPU configuration flag for multi-device indexing
      • deviceIds parameter for future GPU selection (configuration only, no GPU enumeration in v2.4)
      • partitionStrategy option for data distribution across logical partitions
      • Per-partition statistics with hooks for future per-GPU metrics (VRAM, utilization)
      • Load imbalance and scaling efficiency metrics computed over logical partitions
    • Testing:
      • Unit tests covering partitioning/merge logic and API behavior (394 lines)
      • Tests validate API correctness on CPU, ready for GPU backend integration
      • Example application demonstrating configuration and partition behavior (237 lines)
    • Documentation:
      • Complete API guide (docs/MULTI_GPU_VECTOR_INDEXING.md) with current CPU-only status clearly noted
      • API reference with code examples and notes on planned GPU backends (v2.5+)
      • Discussion of anticipated performance characteristics once GPU support lands
      • Troubleshooting guide noting current limitations (no GPU execution, no NCCL/RCCL yet)
  • Git-Like Features Integration 🎉

    • SnapshotManager Re-enabled: Named snapshots for MVCC are now fully operational
      • 5 REST endpoints for snapshot/tag management
      • Integration with DiffEngine for tag-based diffs
      • Persistent snapshot storage in RocksDB
    • PITR API Handler: Point-in-Time Recovery REST API integration
      • POST /api/v1/pitr/restore/sequence - Restore to specific sequence number
      • POST /api/v1/pitr/restore/tag - Restore to named snapshot tag
      • POST /api/v1/pitr/restore/timestamp - Restore to timestamp
      • POST /api/v1/pitr/preview - Preview restore operation (dry-run)
      • GET /api/v1/pitr/progress - Get current restore progress
    • DiffEngine Enhanced: Now accepts optional SnapshotManager for tag-based diffs
    • MergeEngine API Integration 🆕
      • 3-Way Merge Support: Full Git-like merge functionality now integrated
      • REST API endpoints for merge operations:
        • POST /api/v1/merge - Perform three-way merge between sequences
        • POST /api/v1/merge/preview - Preview merge without applying (dry-run)
        • POST /api/v1/merge/by-tag - Merge using snapshot tags instead of sequences
        • GET /api/v1/merge/can-fast-forward - Check if fast-forward merge is possible
      • BranchManager Enhanced: Non-fast-forward branch merges now supported
        • Automatic integration with MergeEngine for complex merges
        • Conflict detection and resolution strategies
        • Fast-forward detection and optimization
      • Conflict Resolution: Multiple strategies available (OURS, THEIRS, MANUAL, FAST_FORWARD)
      • Full Integration: MergeEngine properly initialized in HTTP server and connected to BranchManager

Changed

  • Updated DiffEngine initialization to support SnapshotManager reference
  • HTTP server now properly converts between Beast and httplib types for git-feature endpoints
  • CMake configuration updated to include multi-GPU vector indexing sources and tests

Fixed

  • Re-enabled previously disabled SnapshotManager due to incomplete type issues
  • Added proper error handling with default case in PITR progress phase conversion

Documentation

  • Module-Docs Sync 📚 — 2026-04-04

    • 52 Module indexiert; 691 Primary-Markdown-Dateien in src/ und include/
    • 17 Module ohne Sekundärdokumentation erkannt; Issues erzeugt
    • Sekundärdokumentation aktualisiert in docs/de/ und docs/en/
    • Tool: tools/module_docs_builder.py v1.0.0
  • Module-Docs Sync 📚 — 2026-04-03

    • 52 Module indexiert; 691 Primary-Markdown-Dateien in src/ und include/
    • 17 Module ohne Sekundärdokumentation erkannt; Issues erzeugt
    • Sekundärdokumentation aktualisiert in docs/de/ und docs/en/
    • Tool: tools/module_docs_builder.py v1.0.0
  • Module-Docs Sync 📚 — 2026-04-02

    • 52 Module indexiert; 691 Primary-Markdown-Dateien in src/ und include/
    • 17 Module ohne Sekundärdokumentation erkannt; Issues erzeugt
    • Sekundärdokumentation aktualisiert in docs/de/ und docs/en/
    • Tool: tools/module_docs_builder.py v1.0.0
  • Module-Docs Sync 📚 — 2026-04-01

    • 52 Module indexiert; 691 Primary-Markdown-Dateien in src/ und include/
    • 17 Module ohne Sekundärdokumentation erkannt; Issues erzeugt
    • Sekundärdokumentation aktualisiert in docs/de/ und docs/en/
    • Tool: tools/module_docs_builder.py v1.0.0
  • Module-Docs Sync 📚 — 2026-03-31

    • 52 Module indexiert; 691 Primary-Markdown-Dateien in src/ und include/
    • 17 Module ohne Sekundärdokumentation erkannt; Issues erzeugt
    • Sekundärdokumentation aktualisiert in docs/de/ und docs/en/
    • Tool: tools/module_docs_builder.py v1.0.0
  • Module-Docs Sync 📚 — 2026-03-30

    • 52 Module indexiert; 691 Primary-Markdown-Dateien in src/ und include/
    • 17 Module ohne Sekundärdokumentation erkannt; Issues erzeugt
    • Sekundärdokumentation aktualisiert in docs/de/ und docs/en/
    • Tool: tools/module_docs_builder.py v1.0.0
  • Module-Docs Sync 📚 — 2026-03-29

    • 52 Module indexiert; 691 Primary-Markdown-Dateien in src/ und include/
    • 17 Module ohne Sekundärdokumentation erkannt; Issues erzeugt
    • Sekundärdokumentation aktualisiert in docs/de/ und docs/en/
    • Tool: tools/module_docs_builder.py v1.0.0
  • Module-Docs Sync 📚 — 2026-03-28

    • 52 Module indexiert; 691 Primary-Markdown-Dateien in src/ und include/
    • 17 Module ohne Sekundärdokumentation erkannt; Issues erzeugt
    • Sekundärdokumentation aktualisiert in docs/de/ und docs/en/
    • Tool: tools/module_docs_builder.py v1.0.0
  • Module-Docs Sync 📚 — 2026-03-27

    • 52 Module indexiert; 691 Primary-Markdown-Dateien in src/ und include/
    • 17 Module ohne Sekundärdokumentation erkannt; Issues erzeugt
    • Sekundärdokumentation aktualisiert in docs/de/ und docs/en/
    • Tool: tools/module_docs_builder.py v1.0.0
  • Module-Docs Sync 📚 — 2026-03-16

    • 48 Module indexiert; 421 Primary-Markdown-Dateien in src/ und include/
    • 15 Module ohne Sekundärdokumentation erkannt; Issues erzeugt
    • Sekundärdokumentation aktualisiert in docs/de/ und docs/en/
    • Tool: tools/module_docs_builder.py v1.0.0
  • Module-Docs Sync 📚 — 2026-03-15

    • 48 Module indexiert; 421 Primary-Markdown-Dateien in src/ und include/
    • 15 Module ohne Sekundärdokumentation erkannt; Issues erzeugt
    • Sekundärdokumentation aktualisiert in docs/de/ und docs/en/
    • Tool: tools/module_docs_builder.py v1.0.0
  • Module-Docs Sync 📚 — 2026-03-14

    • 48 Module indexiert; 421 Primary-Markdown-Dateien in src/ und include/
    • 15 Module ohne Sekundärdokumentation erkannt; Issues erzeugt
    • Sekundärdokumentation aktualisiert in docs/de/ und docs/en/
    • Tool: tools/module_docs_builder.py v1.0.0
  • Module-Docs Sync 📚 — 2026-03-13

    • 48 Module indexiert; 421 Primary-Markdown-Dateien in src/ und include/
    • 15 Module ohne Sekundärdokumentation erkannt; Issues erzeugt
    • Sekundärdokumentation aktualisiert in docs/de/ und docs/en/
    • Tool: tools/module_docs_builder.py v1.0.0
  • Module-Docs Sync 📚 — 2026-03-12

    • 47 Module indexiert; 277 Primary-Markdown-Dateien in src/ und include/
    • 15 Module ohne Sekundärdokumentation erkannt; Issues erzeugt
    • Sekundärdokumentation aktualisiert in docs/de/ und docs/en/
    • Tool: tools/module_docs_builder.py v1.0.0
  • GPU Master Tracking Document 📋

    • Added docs/GPU_MASTER_TRACKING.md - Comprehensive master tracking document for GPU implementation roadmap (v2.x series)
    • Complete timeline and deliverables for all GPU backends (CUDA, Vulkan, HIP, Multi-GPU)
    • Performance targets, quality metrics, and success criteria
    • Risk mitigation strategies and resource planning
    • Cross-references to all GPU documentation: FUTURE_GPU_SUPPORT.md, GPU_SUPPORT_ROADMAP.md, GPU_VECTOR_INDEXING_ARCHITECTURE.md
    • Updated docs/00_DOCUMENTATION_INDEX.md with new GPU Vector Indexing section
  • Added MULTI_GPU_VECTOR_INDEXING.md documenting multi-GPU implementation

  • Added GIT_FEATURES_INTEGRATION_STATUS.md documenting integration status

  • Documented that BranchManager and MergeEngine are pending (separate draft PRs)


[1.5.0] - 2026-02-03

Added

  • RFC 3161 Timestamp Authority (TSA) - PRODUCTION READY 🎉

    • Full RFC 3161 client implementation with OpenSSL cryptographic operations
    • Integration with external TSA providers (FreeTSA, DigiCert, Sectigo)
    • eIDAS compliance support for qualified electronic timestamps
    • Long-term validation (LTV) for 30-year timestamp retention
    • Comprehensive TSA setup guide (docs/en/security/TSA_SETUP.md)
    • Configuration management via config/timestamp_authority.yaml
    • CMake option THEMIS_USE_OPENSSL_TSA to control TSA mode (default: ON)
    • Build-time and runtime warnings when stub mode is active
    • Support for SHA-256, SHA-384, SHA-512 hash algorithms
    • Certificate chain validation and verification
    • 10+ comprehensive tests for RFC 3161 compliance
  • FAISS Quantizer Integration - Production Ready (#1079) 🚀

    • FAISS K-means Integration: ProductQuantizer now uses FAISS K-means clustering
      • ProductQuantizer: FAISS K-means for 20-30% faster training with SIMD optimizations
      • Automatic fallback to custom K-means if FAISS unavailable or errors occur
      • Uses faiss::Clustering and faiss::IndexFlatL2 for optimal performance
    • FAISS-optimized Binary Operations: BinaryQuantizer uses compiler intrinsics
      • BinaryQuantizer: SIMD-optimized popcount for faster Hamming distance
      • Uses __builtin_popcount (GCC) or __popcnt (MSVC) same as FAISS
      • ResidualQuantizer: Inherits FAISS acceleration from ProductQuantizer stages (30% faster training)
    • Backend Selection: New prefer_faiss configuration option
      • Defaults to true when FAISS is available
      • Graceful fallback to custom implementation on errors
    • Runtime Inspection: getBackend() method reports actual backend in use
    • Build System: Uses existing THEMIS_HAS_FAISS conditional compilation
    • Production Ready: Fully tested with actual FAISS API integration

Changed

  • TSA implementation now uses OpenSSL by default (was stub in v1.4.1)
  • Improved CMake configuration for security features
  • Enhanced security feature reporting in build system
  • ProductQuantizer: Updated from v1.3.0 to v1.5.0 with actual FAISS K-means integration
  • BinaryQuantizer: Updated from v1.4.1 to v1.5.0 with FAISS-optimized Hamming distance
  • ResidualQuantizer: Updated from v1.4.1 to v1.5.0 with FAISS-accelerated composition
  • FAISS Integration Complete
    • Documented that AdvancedVectorIndex uses FAISS natively (IVF+PQ, HNSW, GPU)
    • Clarified that FAISS is the PRIMARY vector indexing solution for production
    • Custom quantizers now have actual FAISS integration with graceful fallback
    • Marked LearnedQuantizer as deprecated (research-only)
    • Updated LIBRARY_USAGE_ANALYSIS.md and LIBRARY_OPTIMIZATION_QUICKREF.md

Performance Improvements

  • 20-30% faster ProductQuantizer training with FAISS K-means (verified with actual integration)
  • 10-15% faster BinaryQuantizer Hamming distance with SIMD intrinsics
  • 30% faster ResidualQuantizer training (via FAISS ProductQuantizer composition)
  • Zero overhead when FAISS not available (graceful fallback maintained)

Backward Compatibility

  • ✅ All existing quantization code continues to work without changes
  • ✅ API remains unchanged (new options are optional with sensible defaults)
  • ✅ Default behavior gains performance boost with FAISS when available
  • ✅ Graceful degradation when FAISS unavailable

Removed

  • GPU Vector Index Stubs (CLEANUP) 🧹
    • Removed incomplete GPU backend implementations (~1500 LOC)
      • src/index/gpu_vector_index_cuda.cpp (384 lines, 3 TODOs)
      • src/index/gpu_vector_index_vulkan.cpp (385 lines, 6 TODOs)
      • src/index/gpu_vector_index_hip.cpp (419 lines, 4 TODOs)
      • src/index/gpu_vector_index_kernels.cu (CUDA kernels)
      • src/index/gpu_vector_index_hip_kernels.cpp (HIP kernels)
    • Removed GPU backend classes from public API
    • Removed GPU-specific CMake configuration
    • Rationale: These were research stubs with 65+ TODO comments and no functional GPU acceleration
    • Current Status: GPUVectorIndex now uses CPU-only implementation (SIMD-optimized)
    • Future Plans: Proper GPU support planned for v2.x series (see docs/FUTURE_GPU_SUPPORT.md)

Fixed

  • FIND-003 (CRITICAL): RFC 3161 Timestamp Authority implementation complete
    • Resolves eIDAS compliance gap for qualified electronic timestamps
    • Enables legally binding digital signatures in EU
    • Supports long-term signature validation for regulated industries

Security

  • Enabled cryptographic timestamps for audit trails and document signing
  • Added eIDAS-compliant timestamp validation
  • Improved certificate chain verification for TSA responses

Documentation

  • Added comprehensive TSA setup guide (400+ lines)
  • Documented integration with multiple TSA providers
  • Added troubleshooting guide for common TSA issues
  • Added GPU Support Roadmap Documentation
    • docs/FUTURE_GPU_SUPPORT.md - Detailed GPU roadmap for v2.x
    • docs/GPU_SUPPORT_ROADMAP.md - User migration guide
    • Updated docs/GPU_VECTOR_INDEXING.md - CPU-only status notice
    • Updated docs/GPU_VECTOR_INDEXING_ARCHITECTURE.md - Future architecture
    • Updated README.md - Clarified CPU-only vector indexing status
  • Updated compliance documentation for eIDAS and ETSI EN 319 422

[1.4.2] - 2026-02-06

Changed

  • Vector Quantization Migration to FAISS
    • ProductQuantizer now uses FAISS native implementation when available
    • Maintains API compatibility with existing code
    • Provides fallback implementation for non-FAISS builds
    • ResidualQuantizer automatically benefits through composition
    • Expected performance improvements through FAISS SIMD optimizations

Added

  • FAISS ADC Optimization: Implemented Asymmetric Distance Computation tables
    • ~40% faster asymmetric distance computation with FAISS
    • Uses precomputed asymmetric distance tables instead of decode + L2 distance
    • Automatic fallback to decode method on error or when FAISS unavailable
  • Performance Documentation: Added docs/PRODUCT_QUANTIZER_OPTIMIZATION.md
    • Detailed benchmarking guidelines
    • GPU acceleration architecture documentation
    • Performance tuning recommendations

Improved

  • Reduced quantization code complexity by leveraging FAISS library
  • Better maintainability through external library usage
  • Conditional compilation support for FAISS availability
  • Optimized distance computation path for production workloads

[1.4.0] - 2026-01-19

Added - Modular Architecture

  • Modular Build System: Split monolithic themis_core into focused module libraries
    • themis_base: Core utilities, cross-cutting concerns, plugin infrastructure
    • themis_storage: Storage engine, indexes, backup management
    • themis_query: Query engine, AQL parser, analytics
    • themis_security: Encryption, PKI, RBAC, authentication
    • themis_transaction: Transaction management, CDC, saga support
    • themis_network: HTTP/gRPC servers, API handlers
    • themis_sharding: Distributed system (optional)
    • themis_llm: LLM integration (optional)
    • themis_content: Content processors (optional)
    • themis_timeseries: Time-series support (optional)
    • themis_graph: Graph analytics (optional)
    • themis_geo: Geospatial features (optional)
  • Export Macro System: Platform-specific DLL export/import macros for all modules
  • Configurable Modules: Optional modules can be excluded via CMake options
  • Backward Compatibility: Monolithic build remains default; modular enabled with -DTHEMIS_BUILD_MODULAR=ON

Changed

  • BinaryQuantizer Simplified: Reduced implementation by 79 lines (-34%)

    • Marked as @deprecated - NOT used in production code
    • Recommends using FAISS IndexBinaryFlat for production workloads
    • Maintains API compatibility for existing tests
    • Part of FAISS migration initiative (see LIBRARY_USAGE_ANALYSIS.md)
  • LearnedQuantizer Marked as Research/Deprecated: 393 lines

    • Marked as @deprecated - NOT used in production code
    • Research implementation for vector compression studies
    • Maintained for experimental workloads only
    • Part of code cleanup initiative (see LIBRARY_USAGE_ANALYSIS.md)

Fixed

  • Windows Build Issues: Resolves COFF symbol limit (>65,000 symbols) by splitting into smaller modules
  • Build Performance: Parallel module compilation reduces full rebuild time by 30-50%

Documentation

  • Added docs/architecture/MODULARIZATION_GUIDE.md with comprehensive usage examples
  • Updated build documentation with modular build instructions

[1.8.0] - 2026-03-22

Release Aggregation Document: docs/de/releases/RELEASE_NOTES_v1.8.0.md Aggregation Issue: #4300

Added

  • JWT Scope EnforcementJWTClaims.scopes from OAuth2 scope/scp claims; authorizeViaJWT() / authorizeViaKerberos() enforce required_scope against role_scope_map_; setRoleScopeMapping() + setJWKSForTesting() API (PR #4279, #4270)
  • ArrowUserRegistrationPlugin — Apache Arrow-backed in-memory user store; bulkSyncFromArrow() upsert; authenticateFromArrow() SHA-256 verification; 13 tests (PR #4280, Issue #99)
  • CRL / OCSP Certificate RevocationPluginSecurityVerifier::checkCRL() + checkOCSP() with libcurl HTTP, OpenSSL DER parse, per-serial cache; 24 tests (PR #4283, #4292, Issue #38)
  • Serializable Snapshot Isolation (SSI)IsolationLevel::SerializableSnapshot=4; SSIConfig; detectConflicts() range-intersection; predicate lock API; 38 tests (PR #4281, Issue #122)
  • SAGA Orchestration EngineSAGAOrchestrator with execute/validate/getStatus/getMetrics/template management; 23 tests
  • Versioned API RoutingRouteVersionRouter (301 to /v1/); /v2/ bulk NDJSON, SSE streaming, async jobs via AdaptiveQueryCache; 37 tests (PR #4285)
  • PredictivePrefetcher Markov ML — order-1 Markov chain + 24-bucket ToD weighting; RocksDB persistence; A/B toggle; 14 tests
  • Cache Warmup Parallel Bulk Load — concurrent startup pre-population (PR #4250, Issue #244)
  • Geo ClusteringGeoClusteringEngine::dbscanCluster() + kmeansCluster(); 20 tests; perf opt-in via THEMIS_RUN_PERF_TESTS=1 (Issue #4003)
  • PolicyManager Hot-ReloadreloadPolicies() with PolicyValidator, double-buffer swap, governance_policy_reload_total counter; 7 tests
  • HuggingFace Hub 429 Back-offRetry-After parse (integer + HTTP-date); ExporterMetrics::recordRateLimitHit(); 5 tests
  • HardwareAccelerator operator completenessFilterLessThanOp + FilterGreaterThanOrEqualOp; 45 tests (PR #4289, Issue #85)
  • ExporterFactory — concrete ArrowIPCExporter, ParquetExporter, FeatherExporter, JSONCSVExporter; 43+ tests (PR #4284, Issue #3868)
  • JoinExporter — cross-collection hash-join export with PII redaction + memory budget (PR #4297)
  • Wire Protocol V2 — RFC 7540 §6.3 PRIORITY + §5.3.1 cycle detection; all 4 ACs complete (PR #4266, #4267)
  • SIGHUP Hot-Reload — inotify / kqueue / ReadDirectoryChangesW cross-platform file watcher; 250 ms debounce (PR #4253)
  • GpuErasureCoderOpenCL — OpenCL-accelerated encode/decode/batchEncode (PR #4265, Issue #105)
  • Intelligent Prefetching System — access-pattern prefetch scheduler with Markov lookahead (PR #4257, Issue #192)
  • Materialized Views & Incremental MaintenanceMaterializedViewManager with delta refresh (PR #4258, Issue #195)
  • UDP Ingestion Server — fire-and-forget UDP server for metrics/telemetry sinks (PR #4271, Issue #190)
  • Bandwidth Management / QoS — token-bucket rate limiting; CRITICAL/HIGH/NORMAL/BULK priority queues; Prometheus metrics (PR #4273, Issue #190)
  • MySQL / MariaDB Importer — streaming cursor, type mapping, TLS, connection pooling (PR #4288)
  • DistributedGraphManager read-path shared_mutex — TSAN-verified concurrent read/write locking (PR #4299)
  • ProcessGraphVisitLog — per-node visit timestamps for process graph traversal (PR #4254)
  • ProvenanceTracker live engine — replaces AQL template stubs with real AQLEngine connection (PR #4268)
  • TSStore buffering + SIMD decode — Gorilla insert buffering; AVX-512/AVX2/NEON/scalar dispatch; ~35% CPU reduction for single-point ingestion (PR #4269)
  • RAG real LLM engine — replaces LLMIntegration / LLMJudgeIntegration stubs (PR #4277)
  • CapabilityAutoGenerator persistence — schedule state + YAML output persistence (PR #4275, Issue #217)
  • /v1/admin/shards endpoints — list, detail, decommission; OrphanDetector wired to DistributedCoordinator (PR #4259, #4262)
  • /v1/admin/storage/stats endpoint — RocksDB SST-property-based accurate disk usage (PR #4274, Issue #205)
  • Multi-GPU NVML device monitoring — runtime device health via NVML (PR #4270)
  • AsyncIngestionWorker YAML config — YAML-driven configuration + user_context propagation (PR #4296)
  • Abuse detectionabuse_detector.cpp wired into CMake build (PR #4287)
  • SecuritySignatureManager RocksDB iteration — full-iteration batch signature verification (PR #4260, Issue #206)
  • ManifestDatabase::deleteManifest() — removes all associated sidecar files on entry removal (PR #4261)
  • Transaction Savepoints CI — full CI coverage for savepoints (PR #4276)
  • OCC CI + correctness audit — test accuracy fixes + CI workflow (PR #4264)
  • TaskScheduler user-context propagationuser_id / auth_method in all audit events (PR #4278)
  • ConfigEncryptedStore concurrent readsmutex_ upgraded to std::shared_mutex (PR #4295)
  • Config Audit Trail — atomic hot-path; concurrency regression test (PR #4286)
  • MetricsCollector concurrent reads — mutex upgraded to std::shared_mutex (PR #4272)
  • PluginRegistry concurrent reads — mutex upgraded to std::shared_mutex; WASM scaffold (PR #4256)
  • CDC sequence counter — audit complete; AUDIT.md updated (PR #4294)
  • PKIClient v1.8.0 — replaces fallback stub verification (PR #4263)

⚠️ Breaking Changes

  • ZSTD compressionStreamWriter replaces zlib (DEFLATE) with ZSTD; update link dependency from libz to libzstd (PR #4252)
  • HTTP path routing — unversioned paths now redirect 301 to /v1/; update client paths accordingly (PR #4285)
  • CI workflow paths — 138 workflows reorganised into 9 categories; see .github/WORKFLOW_REGISTRY.md for mapping (PR #4290)

Fixed

  • CEPEngine deadlock — window lock now released before invoking user callbacks (PR #4291)
  • PE certificate parsing off-by-one in DataDirectory[4] size; ELF .security sidecar added (PR #4292)
  • OCC conflict detection test correctness (PR #4264)
  • ProvenanceTracker AQL template stub (PR #4268)
  • Config Audit Trail concurrent entry drop under load (PR #4286)
  • SecuritySignatureManager prefix end-condition in RocksDB iterator (PR #4260)
  • ManifestDatabase orphaned sidecar artefacts on delete (PR #4261)

Changed

  • BackendRegistry logging upgraded from std::cout to structured logger (PR #4251)
  • RocksDBWrapper::approximateSize() uses SST property instead of estimate (PR #4274)
  • TaskScheduler audit events include authenticated user identity (PR #4278)

[1.7.0] - 2026-03-09

Release Aggregation Document: docs/de/releases/RELEASE_NOTES_v1.7.0.md Aggregation Issue: #3486 · Parent: #3073

Added

  • Config Architecture Reorganization — hierarchical config/ directory structure with 16 category subdirectories; ConfigPathResolver for backward-compatible legacy path resolution; migration guide at config/MIGRATION_GUIDE.md
  • Multi-GPU Vector Indexing API (v2.4 scaffolding)MultiGPUVectorIndex with round-robin/hash/range/balanced partition strategies; query fan-out and top-k merge (CPU-backed; GPU execution planned v2.5+)
  • Git-Like Features Integration — SnapshotManager (named MVCC snapshots), PITR REST API (restore by sequence/tag/timestamp + preview), MergeEngine REST API (3-way merge, fast-forward check), enhanced BranchManager
  • HybridSearch production hardening — configurable vector metric (COSINE/DOT/L2), strict config validation, SearchStats, exception-safe search, pre-normalization; 88+ tests
  • Distributed Query Optimizer (v1.5.x) — dynamic shard row estimates, predicate selectivity, network latency hooks
  • FAISS ADC distance table acceleration — ~40% faster IndexIVFPQ search via precomputed distance tables; enabled by default
  • Documentation validation CI.github/workflows/documentation-validation.yml with 5 jobs (link-check, markdown-lint, spell-check, structure-check, summary)
  • 44-module documentation audit — all module READMEs, ROADMAPs, and ARCHITECTUREs aligned with actual source implementations
  • Test + benchmark coverage audit — 6 new benchmark suites + 21 new unit test files closing coverage gaps across all 44 modules
  • RAG scientific foundationsdocs/en/rag/RAG_SCIENTIFIC_FOUNDATIONS.md: 460-line IEEE reference with 40 peer-reviewed citations

⚠️ Breaking Changes

  • themis module migration — module initialisation code migrated from src/utils/ and src/base/ to src/themis/; update #include paths accordingly

Fixed

  • 119 broken documentation links corrected in hub/index files
  • DiffEngine initialization updated to accept optional SnapshotManager reference
  • Re-enabled SnapshotManager (was disabled due to incomplete type issues)

Added

  • API Versioning and Compatibility Strategy: Comprehensive API versioning infrastructure
    • Accept-Version header support for REST APIs to specify desired API version
    • API-Version response header indicating the API version used to process the request
    • Deprecation tracking system with automated warning headers (Deprecation, Sunset, Link)
    • 24-month deprecation policy ensuring backward compatibility and smooth migrations
    • gRPC version negotiation via metadata (api-version key)
    • Version resolution supporting formats: v1.4.1, v1.4, v1, latest
    • APIVersionManager class for centralized version management
    • Compatibility matrix documenting supported versions (v1.0.0 to v1.4.1)
    • Migration guide framework with templates and best practices
    • Comprehensive documentation:
    • Updated proto files with API version metadata
    • Related to #751 (API-Versionierung und Kompatibilitäts-Strategie)
  • Query Result Pagination: Comprehensive pagination support for query results with multiple strategies
    • Cursor-based pagination with expiration and versioning (1-hour TTL default)
    • Keyset pagination using ORDER BY values for O(log n) performance
    • Configurable page sizes with validation (min: 1, max: 10,000, default: 100)
    • Enhanced PaginatedResponse with detailed metadata (PageInfo, has_next_page, has_prev_page)
    • ORDER BY value encoding in cursors eliminates database lookups for sort values
    • Cursor expiration prevents stale cursor accumulation
    • Multiple pagination methods supported: CURSOR, OFFSET, KEYSET
    • 17 comprehensive tests with 100% pass rate
    • Backward compatible with existing pagination API
    • Related to #751
  • Plugin Metrics and Monitoring: Comprehensive metrics tracking for all plugins with Prometheus integration
    • PluginMetrics class for thread-safe metrics collection
    • Automatic tracking of load time, reload time, function call latency (P95/P99)
    • Resource usage monitoring (memory per plugin)
    • Error tracking and count metrics
    • JSON API endpoint: /api/plugins/metrics
    • Prometheus metrics integrated into /metrics endpoint
    • <1% performance overhead from instrumentation
    • See Plugin Metrics Documentation
  • CHIMERA Suite Branding: Rebranded benchmark framework to "CHIMERA Suite" (Comprehensive Hybrid Inferencing & Multi-model Evaluation Resource Assessment)
    • Tagline: "Benchmark the Unbenchmarkable"
    • Vendor-neutral, scientifically rigorous benchmark framework
    • Updated all documentation, scripts, and CI workflows
    • Result files now use CHIMERA_RESULTS_* naming pattern
    • See CHIMERA Suite Documentation
  • Documentation Archival System - Formal process for archiving outdated documentation
  • Retroactive Release Building System - Build binaries from historical version tags
  • Schema Manager for database self-awareness and introspection
  • Independent Health/Error service on alternate port (9090)

Performance

  • Query Pagination Improvements:
    • Reduced database lookups by storing ORDER BY values in cursors
    • O(log n) keyset pagination vs O(n) offset-based pagination
    • Memory efficiency through configurable page size limits (max 10,000 items)
    • Cursor expiration prevents stale cursor accumulation

Changed

  • Documentation Reorganization: Major cleanup and restructuring of documentation
    • Fixed version inconsistencies across README, VERSION file, and badges
    • Moved 70+ historical implementation documents to docs/implementation-history/ archive
    • Created comprehensive archive README explaining historical documents
    • Updated all broken links in main documentation files
    • Added archive reference in main documentation index
    • Cleaner root directory with only essential documentation files
  • Improved documentation structure and organization
  • Benchmark suite renamed to CHIMERA Suite with comprehensive rebranding

[1.4.0-stable] - 2026-01-19

🎯 Extended Context Window (32K+) - Production Ready

Status Change: Experimental (v1.4.0-alpha) → Production-Ready (v1.4.0-stable)

Added

Configuration & Feature Flags:

  • Comprehensive extended context configuration (config/llm_extended_context.yaml)
  • Feature maturity status flags ("experimental", "beta", "stable")
  • Backward compatibility mode with automatic fallback
  • Production validation checks (memory, model support, RoPE config, thread-safety)
  • Model-specific configuration overrides
  • Configuration Reference

RoPE/YARN Scaling - Production Ready:

  • Finalized integration on both Model and API levels
  • All scaling methods production-ready: Linear, NTK, YaRN, Dynamic
  • YaRN parameters fully configurable (ext_factor, attn_factor, beta_fast, beta_slow)
  • Error handling and validation for scaling configuration
  • Production Guide

Memory Profiling & Monitoring:

  • 30+ new Prometheus metrics for extended context monitoring
    • Context window metrics: length, cache size, scaling factor
    • RoPE/YARN metrics: method, errors, YARN parameters
    • Memory metrics: RAM/VRAM usage, pressure, OOM events
    • Thread-safety metrics: LoRA switches, lock contention
  • Memory estimation utilities with accuracy tracking
  • Real-time RAM/VRAM profiling per model
  • Memory pressure alerts and OOM prevention
  • Grafana dashboard templates

Thread-Safety:

  • Sequential LoRA operations mode for context scaling
  • Configurable mutex-based synchronization
  • Lock timeout configuration (default: 1000ms)
  • Lock contention monitoring and alerts
  • Safe concurrent request handling

Documentation:

Changed

Extended Context:

  • Updated llm_config.example.yaml with extended_context section
  • Improved RoPE scaling quality for high factors (>8x)
  • Enhanced memory estimation accuracy (±10% for most models)
  • Better error messages for configuration issues

Fixed

Issues Resolved (GAP Analysis):

  • ✅ RoPE/YARN integration finalized on Model and API level
  • ✅ Thread-safety for Context Scaling with LoRA/Adapters
  • ✅ Comprehensive RAM/VRAM profiling and monitoring
  • ✅ Feature flags and backward compatibility
  • Reference: INVESTIGATION_GAPS_SIMULATIONS_THEMISDB.md

Production Readiness Score:

  • v1.4.0-alpha: 38% → v1.4.0-stable: 93%
  • All critical gaps addressed
  • Safe for production deployment with gradual rollout strategy

[1.4.0-alpha] - 2026-01-05

Added

🧠 Advanced LLM Capabilities

  • Grammar-Constrained Generation - EBNF/GBNF support for guaranteed valid JSON/XML/CSV outputs (95-99% reliability)

    • Built-in grammars: JSON, XML, CSV, ReAct Agent
    • Thread-safe grammar cache with LRU eviction
    • Documentation
  • RoPE Scaling - Extended context window from 4K → 32K tokens (8x increase)

  • Vision Support - Multi-modal LLMs with CLIP-based image encoding

    • LLaVA integration for image analysis
    • Single and multiple image support
    • Documentation
  • Flash Attention - CUDA kernels for 15-25% speedup, 30% memory reduction

  • Speculative Decoding - 2-3x faster inference with draft+target models

  • Continuous Batching - 2x+ throughput with dynamic request batching

🏢 Enterprise Features

  • Hot Spare Management - Automatic failover with health monitoring
  • Enhanced Prometheus Metrics - LLM inference and cache performance tracking
  • WAL Replication via gRPC - Distributed inter-shard replication
  • Multi-GPU LoRA Support - Distributed LoRA adapters across GPUs
  • PostgreSQL Protocol Enhancements - COPY, prepared statements, transaction support

Changed

  • 31 new test suites with comprehensive coverage
  • 11 new performance benchmarks
  • 17 new documentation guides
  • 938 files changed (+113,762 lines, -45,154 lines)

→ Complete Release Notes


[1.3.4-hotfix] - 2026-01-04

Fixed

  • CRITICAL: Fixed server hang at "Adaptive Index Manager initialized" in RAID cluster mode

    • Root cause: AdaptiveIndexManager MVCC coordination before Sharding Manager initialization
    • Solution: Conditional Column Family opening when THEMIS_ENABLE_SHARDING=true detected
    • Files: src/storage/rocksdb_wrapper.cpp, src/server/http_server.cpp
  • CRITICAL: Fixed incorrect Docker Compose port mappings (808X:8080808X:8765)

    • All 9 RAID shards now properly expose HTTP/REST API endpoints
    • File: docker/compose/docker-compose-sharding.yml

Added

  • RAID Endurance Test Suite - 2-hour automated testing for all RAID modes
    • Script: scripts/raid_endurance_test.py
    • Monitoring: scripts/monitor_raid_test.ps1
    • Verification: All 9 RAID shards (RAID 0/1/5) operational with 0% error rate

Changed

  • Docker build context reduced from 3GB to 85MB (97% reduction)
  • Updated .dockerignore to exclude build artifacts while preserving vcpkg baseline
  • Improved Dockerfile.themis-server for more reliable builds

→ Complete Release Notes


[1.3.4] - 2026-01-02

Security

Comprehensive Security Summary: See Security Work Summary v1.3.4

Fixed

  • 7 Critical Security Vulnerabilities in RocksDB wrapper (100% segfault risk elimination)

    • Use-after-free in BlockBasedTableOptions
    • Null-pointer checks for environment initialization
    • Transaction-based deletion to prevent deadlocks
    • GetBaseDB() null-pointer checks across 7 locations
    • Transaction resource leak fixes
    • Column Family handle cleanup improvements
    • BackupEngine exception safety
    • Audit Report
  • 8 Medium-Severity Issues

    • Improved transaction error handling
    • Enhanced iterator lifecycle management
    • Better snapshot handling
    • Backup engine null-checks

Changed

  • Upgraded Docker base image: Ubuntu 22.04 → Ubuntu 24.04 LTS (80%+ CVE reduction)
  • Secure token handling in Update Checker (no hardcoded credentials)
  • Binary authenticity verification with cryptographic manifest signing (RSA-4096, SHA-256)

→ Complete Release Notes


[1.3.3] - 2025-12-21

Added

  • HTTP/2 with Server Push - CDC/Changefeed with ~0ms latency
  • WebSocket Support - Bidirectional streaming for real-time communication
  • MQTT Broker - IoT messaging with WebSocket transport and monitoring
  • HTTP/3 Base Implementation - QUIC protocol (experimental)
  • PostgreSQL Wire Protocol - BI tool compatibility
  • MCP Server - Model Context Protocol support for LLM integration

→ Complete Release Notes


[1.3.2] - 2025-12-21

Added

  • Image Analysis AI Plugin Architecture running parallel with LLM
    • Multi-backend support: llama.cpp Vision, ONNX Runtime, OpenCV DNN, OpenVINO, ncnn
    • Plugin interfaces: IImageAnalysisBackend, ImageAnalysisManager
    • 15+ comprehensive unit tests and benchmarks

→ Complete Release Notes


[1.3.1] - 2025-12-20

Added

  • ATTRIBUTIONS.md documenting 15+ core dependencies
  • Documentation of ThemisDB's 12 unique innovations
  • Clear attribution for all major dependencies

→ Complete Release Notes


[1.3.0] - 2025-12-17

Added

  • Native LLM Integration with llama.cpp (optional feature)

    • Embedded LLM engine for LLaMA/Mistral/Phi-3 (1B-70B parameters)
    • GPU acceleration with NVIDIA CUDA support
    • PagedAttention for advanced memory management
    • Quantization support (Q4_K_M, Q5_K_M, Q8_0)
    • Grafana dashboards with metrics and alerts
    • Setup Guide
  • Voice Assistant Integration (Enterprise feature)

    • Natural language voice interaction (Whisper.cpp + Piper TTS + llama.cpp)
    • Phone call recording with automatic transcription
    • Meeting protocol generation with AI-powered minutes
    • Speaker diarization
    • Multi-language support (100+ languages)
    • Documentation

→ Complete Release Notes


Earlier Versions

For releases prior to v1.3.0, please see:


Release Notes

Detailed release notes for each version are available in the release-changelogs/ directory:


Upgrade Notes

From 1.3.x to 1.4.0-alpha

  • LLM features now include advanced capabilities (grammar constraints, RoPE scaling, vision support)
  • New configuration options available for Flash Attention and Speculative Decoding
  • See Migration Guide for detailed upgrade instructions

From 1.2.x to 1.3.x

  • LLM integration is now optional and requires explicit build flag: -DTHEMIS_ENABLE_LLM=ON
  • New protocols (HTTP/2, WebSocket, MQTT) require explicit opt-in for security
  • See Configuration Guide for new settings

Contributing

See CONTRIBUTING.md for guidelines on:

  • How to contribute to ThemisDB
  • Code style and standards
  • Pull request process
  • Documentation requirements

Version Format

ThemisDB follows Semantic Versioning:

  • MAJOR version for incompatible API changes
  • MINOR version for new functionality in a backward compatible manner
  • PATCH version for backward compatible bug fixes
  • -alpha, -beta, -rc suffixes for pre-release versions