Skip to content

buddylindsey/fleet-telemetry-analysis

Repository files navigation

Fleet Telemetry Analysis Demo πŸšœπŸ§ πŸ“ˆ

An end-to-end AI telemetry analysis demo: spec-driven scenario generation β†’ deterministic telemetry synthesis β†’ vectorization β†’ evidence-based summarization β†’ readonly dashboard.

.NET C# Postgres TimescaleDB Qdrant Docker

Intent: This is not a production system. It is a runnable design exercise and interview-style prototype focused on AI reasoning, deterministic validation around LLM output, and modular architecture with swappable providers/storage boundaries.

πŸ“Œ Table of Contents

✨ Overview

  • 🧠 LLM scenario planning: reads docs/spec.md, requests JSON from an LLM, and retries with validation feedback when the response fails deterministic rules.
  • βœ… Deterministic normalization/validation: strict contract checks enforce timeline, tractor assignment, sensor directives, and scenario policy before synthesis.
  • πŸ“ˆ Synthetic telemetry + events: generates 5-second telemetry aggregates and discrete events with reproducible inputs (scenario_start_utc + seed).
  • πŸ—„οΈ Operational storage in TimescaleDB: persists scenario_runs, telemetry_5s, events, and summaries for auditability and downstream processing.
  • πŸ”Ž Vectorization to Qdrant: observation-only chunks are built from telemetry/events (no scenario titles/descriptions) and upserted with deterministic IDs.
  • πŸ“ Evidence-based summarization: retrieves chunks from Qdrant, prompts an LLM with docs/summary_spec.md, validates strict JSON output, and writes summary rows.
  • 🌐 Readonly web UI: displays scenario runs, tractor summaries, evidence metadata, and original scenario JSON for review.

βœ… Key Features

  • Separate executables with clear boundaries:
    • FleetTelemetry.Generator
    • FleetTelemetry.Vectorizer
    • FleetTelemetry.Summarizer
    • FleetTelemetry.Web
  • Provider abstraction (FleetTelemetry.AI) so OpenAI can be swapped for another provider later (for example a local model service).
  • Run lifecycle auditing (started / succeeded / failed) with stage + error recording.
  • Deterministic replay support via persisted scenario_start_utc, seed, and attempt_count.
  • Raw SQL + Npgsql persistence (simple, explicit, testable).
  • Idempotent vectorizer point upserts and summarizer summary upserts for reruns.
  • Modular tests split by project area (Domain, Generator, Vectorizer, Summarizer) plus CI.

βš–οΈ Design Tradeoffs (Intentional)

  • LLM output is treated as untrusted input and must pass deterministic validation before any data generation occurs.
  • A wide-table telemetry schema is used first for simplicity and readability during prototyping.
  • Vectorization and summarization are separate processes from generation to keep concerns and operational boundaries clean.
  • No DI framework is used; composition is explicit in each Program.cs for clarity during review/interview walkthroughs.
  • The system is optimized for reasoning demo value, not max throughput or production-scale operability.

🚫 Non-goals

  • Production authentication/authorization or multi-tenant isolation.
  • Full tractor / J1939 fidelity or OEM-grade telemetry semantics.
  • Robust migration/versioning workflow for long-lived production databases.
  • Summary fidelity/reconstruction scoring against original scenarios (planned future feature).

πŸš€ Quickstart

  • Prereqs: Docker/Compose, Make, .NET 10 SDK.
  • Configure AI access before running the pipeline:
    • AI_PROVIDER=openai
    • AI_API_KEY=<your key>
    • optional: AI_TEXT_MODEL, AI_EMBEDDING_MODEL, AI_BASE_URL
    • if you are using the Docker targets, export these env vars in your shell so make can pass them into containers
  • Choose one pipeline path:
    • local .NET path: make run-full-cycle
    • Docker-only path: make containerized-run-full-cycle
  • Start infra + web:
    • make infra-up
  • Build + test:
    • make build
    • make test
  • Run the full pipeline locally with the .NET SDK (generate β†’ vectorize β†’ summarize):
    • make run-full-cycle
  • Run the full pipeline in containers with Docker only:
    • make containerized-run-full-cycle
  • Refresh the dashboard:

πŸ”— Local Endpoints

πŸ›  Useful Commands

  • Infra / logs:
    • make infra-up
    • make infra-down
    • make infra-logs
  • Database:
    • make psql
    • make list-scenario-runs
  • Qdrant checks:
    • make qdrant-ui
    • make qdrant-collections
    • make qdrant-info
    • make qdrant-count
  • App runs (local):
    • make run-generator
    • make run-vectorizer SCENARIO_RUN_ID=<run-id>
    • make run-summarizer SCENARIO_RUN_ID=<run-id>
    • make run-web
    • make run-full-cycle
  • App runs (containerized tools):
    • make run-generator-docker
    • make run-vectorizer-docker
    • make run-summarizer-docker
    • make containerized-run-full-cycle
  • Tests:
    • make test
    • make test-integration

🧰 Technologies

  • C# / .NET 10
  • ASP.NET Core (Razor Pages UI)
  • OpenAI .NET SDK (behind provider abstraction)
  • Npgsql
  • Postgres / TimescaleDB
  • Qdrant
  • xUnit + FluentAssertions
  • Docker Compose

🧭 Architecture

The architecture is intentionally split into small executables so each stage can be run, debugged, and replaced independently. The AI-related pieces are constrained by deterministic validators and explicit data contracts rather than implicit trust in model output.

1) Scenario generation path (Spec β†’ Validated Plan β†’ TimescaleDB)

flowchart LR
  spec["docs/spec.md"] --> gen["FleetTelemetry.Generator"]
  gen --> llm["AI text client"]
  llm --> retry["Retry w/ validation feedback"]
  retry --> normalize["Normalize + Validate"]
  normalize --> synth["Deterministic telemetry/event synthesis"]
  synth --> db[("TimescaleDB")]
Loading

2) Vectorization path (Observations β†’ Embeddings β†’ Qdrant)

flowchart LR
  db[("TimescaleDB telemetry/events")] --> vec["FleetTelemetry.Vectorizer"]
  vec --> chunk["Observation-only chunk builder"]
  chunk --> emb["AI embedding client"]
  emb --> qdrant[("Qdrant")]
Loading

3) Summarization path (Qdrant evidence β†’ JSON summary β†’ summaries table)

flowchart LR
  qdrant[("Qdrant evidence")] --> sum["FleetTelemetry.Summarizer"]
  sum --> sllm["AI text client"]
  sllm --> parse["Strict JSON parse/validate + retry"]
  parse --> summaries[("TimescaleDB summaries")]
  summaries --> web["FleetTelemetry.Web"]
Loading

βš™οΈ Configuration

Core

  • POSTGRES_CONNECTION_STRING
  • QDRANT_URL (default http://localhost:6333)
  • QDRANT_COLLECTION (default telemetry_chunks)

AI Provider

  • AI_PROVIDER (default openai)
  • AI_API_KEY
  • AI_BASE_URL (provider-specific; OpenAI defaults to https://api.openai.com/v1/)
  • AI_TEXT_MODEL (default gpt-5.2)
  • AI_EMBEDDING_MODEL (default text-embedding-3-small)

Legacy aliases are still accepted for compatibility (OPENAI_*, LLM_*, EMBEDDING_MODEL).

Generator / Scenario Validation Policy

  • ScenarioValidation:AllowedTractorIds (CSV)
  • ScenarioValidation:RequireCompleteAllowedTractorSet (true|false)
  • ScenarioValidation:RequireUniqueTractorsAcrossScenarios (true|false)
  • ScenarioValidation:RequireCatastrophicOutlier (true|false)

Vectorizer

  • SCENARIO_RUN_ID (optional filter)
  • CHUNK_WINDOW_SECONDS (default 300)
  • LOOKBACK_HOURS (optional)

Summarizer

  • SCENARIO_RUN_ID (required)
  • TRACTOR_ID (optional)
  • SUMMARY_WINDOW_SECONDS (default 3600)
  • TOP_K (default 20)
  • NOW_UTC (optional deterministic override)

πŸ“š Docs

  • docs/spec.md β€” scenario-generation prompt contract and hard constraints
  • docs/summary_spec.md β€” summarizer JSON output contract
  • docs/sensor-bounds.md β€” sensor ranges used by deterministic synthesis
  • docs/adr/ β€” architecture decision records (optional/in-progress)

🀝 Contributing / Next Steps

  • Keep changes focused and document tradeoffs in PR notes/commit messages.
  • Add tests for behavior changes, especially contract validation and retry behavior.
  • Future ideas:
    • local model provider implementation (for example Ollama)
    • bounded-cost Qdrant retrieval strategy for larger collections
    • post-summary reconstruction scoring (separate evaluator step)
    • richer dashboard filtering and comparison views across scenario runs

About

End-to-end telemetry pipeline demo with AI-assisted scenario generation, deterministic synthesis, vector retrieval, and evidence-based summarization.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors