An end-to-end AI telemetry analysis demo: spec-driven scenario generation β deterministic telemetry synthesis β vectorization β evidence-based summarization β readonly dashboard.
Intent: This is not a production system. It is a runnable design exercise and interview-style prototype focused on AI reasoning, deterministic validation around LLM output, and modular architecture with swappable providers/storage boundaries.
- Overview
- Key Features
- Design Tradeoffs (Intentional)
- Non-goals
- Quickstart
- Local Endpoints
- Useful Commands
- Technologies
- Architecture
- Configuration
- Docs
- Contributing / Next Steps
- π§ LLM scenario planning: reads
docs/spec.md, requests JSON from an LLM, and retries with validation feedback when the response fails deterministic rules. - β Deterministic normalization/validation: strict contract checks enforce timeline, tractor assignment, sensor directives, and scenario policy before synthesis.
- π Synthetic telemetry + events: generates 5-second telemetry aggregates and discrete events with reproducible inputs (
scenario_start_utc+seed). - ποΈ Operational storage in TimescaleDB: persists
scenario_runs,telemetry_5s,events, andsummariesfor auditability and downstream processing. - π Vectorization to Qdrant: observation-only chunks are built from telemetry/events (no scenario titles/descriptions) and upserted with deterministic IDs.
- π Evidence-based summarization: retrieves chunks from Qdrant, prompts an LLM with
docs/summary_spec.md, validates strict JSON output, and writes summary rows. - π Readonly web UI: displays scenario runs, tractor summaries, evidence metadata, and original scenario JSON for review.
- Separate executables with clear boundaries:
FleetTelemetry.GeneratorFleetTelemetry.VectorizerFleetTelemetry.SummarizerFleetTelemetry.Web
- Provider abstraction (
FleetTelemetry.AI) so OpenAI can be swapped for another provider later (for example a local model service). - Run lifecycle auditing (
started/succeeded/failed) with stage + error recording. - Deterministic replay support via persisted
scenario_start_utc,seed, andattempt_count. - Raw SQL + Npgsql persistence (simple, explicit, testable).
- Idempotent vectorizer point upserts and summarizer summary upserts for reruns.
- Modular tests split by project area (
Domain,Generator,Vectorizer,Summarizer) plus CI.
- LLM output is treated as untrusted input and must pass deterministic validation before any data generation occurs.
- A wide-table telemetry schema is used first for simplicity and readability during prototyping.
- Vectorization and summarization are separate processes from generation to keep concerns and operational boundaries clean.
- No DI framework is used; composition is explicit in each
Program.csfor clarity during review/interview walkthroughs. - The system is optimized for reasoning demo value, not max throughput or production-scale operability.
- Production authentication/authorization or multi-tenant isolation.
- Full tractor / J1939 fidelity or OEM-grade telemetry semantics.
- Robust migration/versioning workflow for long-lived production databases.
- Summary fidelity/reconstruction scoring against original scenarios (planned future feature).
- Prereqs: Docker/Compose, Make, .NET 10 SDK.
- Configure AI access before running the pipeline:
AI_PROVIDER=openaiAI_API_KEY=<your key>- optional:
AI_TEXT_MODEL,AI_EMBEDDING_MODEL,AI_BASE_URL - if you are using the Docker targets, export these env vars in your shell so
makecan pass them into containers
- Choose one pipeline path:
- local
.NETpath:make run-full-cycle - Docker-only path:
make containerized-run-full-cycle
- local
- Start infra + web:
make infra-up
- Build + test:
make buildmake test
- Run the full pipeline locally with the .NET SDK (generate β vectorize β summarize):
make run-full-cycle
- Run the full pipeline in containers with Docker only:
make containerized-run-full-cycle
- Refresh the dashboard:
- Readonly dashboard (ASP.NET Core): http://localhost:5000
- Qdrant API: http://localhost:6333
- TimescaleDB/Postgres:
localhost:5432(usepsql/make psql)
- Infra / logs:
make infra-upmake infra-downmake infra-logs
- Database:
make psqlmake list-scenario-runs
- Qdrant checks:
make qdrant-uimake qdrant-collectionsmake qdrant-infomake qdrant-count
- App runs (local):
make run-generatormake run-vectorizer SCENARIO_RUN_ID=<run-id>make run-summarizer SCENARIO_RUN_ID=<run-id>make run-webmake run-full-cycle
- App runs (containerized tools):
make run-generator-dockermake run-vectorizer-dockermake run-summarizer-dockermake containerized-run-full-cycle
- Tests:
make testmake test-integration
- C# / .NET 10
- ASP.NET Core (Razor Pages UI)
- OpenAI .NET SDK (behind provider abstraction)
- Npgsql
- Postgres / TimescaleDB
- Qdrant
- xUnit + FluentAssertions
- Docker Compose
The architecture is intentionally split into small executables so each stage can be run, debugged, and replaced independently. The AI-related pieces are constrained by deterministic validators and explicit data contracts rather than implicit trust in model output.
flowchart LR
spec["docs/spec.md"] --> gen["FleetTelemetry.Generator"]
gen --> llm["AI text client"]
llm --> retry["Retry w/ validation feedback"]
retry --> normalize["Normalize + Validate"]
normalize --> synth["Deterministic telemetry/event synthesis"]
synth --> db[("TimescaleDB")]
flowchart LR
db[("TimescaleDB telemetry/events")] --> vec["FleetTelemetry.Vectorizer"]
vec --> chunk["Observation-only chunk builder"]
chunk --> emb["AI embedding client"]
emb --> qdrant[("Qdrant")]
flowchart LR
qdrant[("Qdrant evidence")] --> sum["FleetTelemetry.Summarizer"]
sum --> sllm["AI text client"]
sllm --> parse["Strict JSON parse/validate + retry"]
parse --> summaries[("TimescaleDB summaries")]
summaries --> web["FleetTelemetry.Web"]
POSTGRES_CONNECTION_STRINGQDRANT_URL(defaulthttp://localhost:6333)QDRANT_COLLECTION(defaulttelemetry_chunks)
AI_PROVIDER(defaultopenai)AI_API_KEYAI_BASE_URL(provider-specific; OpenAI defaults tohttps://api.openai.com/v1/)AI_TEXT_MODEL(defaultgpt-5.2)AI_EMBEDDING_MODEL(defaulttext-embedding-3-small)
Legacy aliases are still accepted for compatibility (OPENAI_*, LLM_*, EMBEDDING_MODEL).
ScenarioValidation:AllowedTractorIds(CSV)ScenarioValidation:RequireCompleteAllowedTractorSet(true|false)ScenarioValidation:RequireUniqueTractorsAcrossScenarios(true|false)ScenarioValidation:RequireCatastrophicOutlier(true|false)
SCENARIO_RUN_ID(optional filter)CHUNK_WINDOW_SECONDS(default300)LOOKBACK_HOURS(optional)
SCENARIO_RUN_ID(required)TRACTOR_ID(optional)SUMMARY_WINDOW_SECONDS(default3600)TOP_K(default20)NOW_UTC(optional deterministic override)
docs/spec.mdβ scenario-generation prompt contract and hard constraintsdocs/summary_spec.mdβ summarizer JSON output contractdocs/sensor-bounds.mdβ sensor ranges used by deterministic synthesisdocs/adr/β architecture decision records (optional/in-progress)
- Keep changes focused and document tradeoffs in PR notes/commit messages.
- Add tests for behavior changes, especially contract validation and retry behavior.
- Future ideas:
- local model provider implementation (for example Ollama)
- bounded-cost Qdrant retrieval strategy for larger collections
- post-summary reconstruction scoring (separate evaluator step)
- richer dashboard filtering and comparison views across scenario runs