StatusServer

A non-disturbing data collector for the X-Environment (Integrated Control System for High-throughput Tomography). It aggregates control system data from Tango and/or TINE servers, tracks attribute availability, persists downtime intervals to MariaDB, and exposes live metrics via HTTP/Prometheus.

Requirements

Java 21
Maven 3.8+
Tango 9+ (for Tango attributes) / TINE (for TINE attributes)
MariaDB 11 (optional — required for downtime persistence)

Quick Start

# Build fat JAR
mvn clean package -Dmaven.test.skip=true

# Start MariaDB (and optional Tango stack)
docker compose up -d status-server-db

# Run
java -jar target/status-server-*.jar path/to/config.xml [http-port]
# http-port defaults to 9190

Docker

docker run -v /path/to/etc:/app/etc \                                                                                                                                                                                                                                                                                                                                                                                     
             -e SS_CONFIG=/app/etc/config.xml \
             -p 9190:9190 \                                                                                                                                                                                                                                                                                                                                                                                                 
             ghcr.io/scientific-software-hub/status-server:latest

Configuration

Configuration is an XML file. Minimal example:

<status-server stale-after="3" down-after="6">
  <devices>
    <device name="my/device/1" server="tango://host:10000">
      <attributes>
        <attribute name="Temperature" poll-delay="1000" interpolation="LINEAR"/>
      </attributes>
    </device>
  </devices>
</status-server>

Optional MariaDB section (omit to disable persistence):

<mariadb>
  <jdbc-url>jdbc:mariadb://localhost:3306/statusserver</jdbc-url>
  <user>ss</user>
  <password>ss</password>
</mariadb>

Parameter	Description	Default
`stale-after`	Consecutive failures before UP→STALE	3
`down-after`	Consecutive failures before STALE→DOWN	6

Architecture

DeviceSource (XML)
    │
    ▼
EngineFactory ──► Engine
                    │
          ┌─────────┴──────────┐
          ▼                    ▼
     PollTask              EventTask
          │                    │
          ▼                    ▼
  EventSink<SingleRecord<?>>   EventSink<TechnicalEvent>
  (telemetry)                  (AvailabilityAnalyzer)
          │                         │
          ▼                         ▼
   InMemoryWriter         EventSink<DomainEvent>
   (Snapshot only)        (EventDispatcher fan-out)
          │                         │
          ▼                    ┌────┴────┐
   MetricsServer           logger   MariaDbSink
   (HTTP /metrics)

Key components

EventSink<T> — unified observer interface replacing the old RecordWriter and TechnicalEventListener. Everything that consumes events implements this single generic interface.

EventDispatcher<T> — fan-out dispatcher. Calls all registered sinks, isolates failures per sink.

AvailabilityAnalyzer — consumes TechnicalEvents, maintains per-attribute state machines (UP/STALE/DOWN), emits DomainEvents.

AttributeAvailability — per-attribute state machine:

consecutive failures ≥ stale-after  →  UP    → STALE
consecutive failures ≥ down-after   →  STALE → DOWN  (+ DowntimeOpened)
any success                         →  any   → UP    (+ DowntimeClosed if from DOWN)

InMemoryWriter — snapshot-only in-memory store backing the /metrics endpoint.

MariaDbSink — persists domain events to MariaDB in ERPNext-compatible tables. Reconnects automatically on failure.

HTTP Endpoints

Endpoint	Description
`GET /metrics`	Prometheus gauge format. Each attribute emits a value gauge and `_up` (1=healthy, 0=failing).
`GET /health`	Liveness — always 200.
`GET /ready`	Readiness — 503 until engine has started.

Availability Tracking & Downtime Persistence

StatusServer classifies each read outcome as a technical event:

Event	Trigger
`ReadSuccess`	Successful attribute read
`ReadFailure`	Client exception during read
`Timeout`	Read timed out
`Disconnect` / `Reconnect`	Connection lost / restored

The AvailabilityAnalyzer aggregates these per attribute and emits domain events when thresholds are crossed:

Domain Event	Meaning
`AvailabilityTransitioned`	State changed (UP↔STALE↔DOWN)
`DowntimeOpened`	Attribute entered DOWN state
`DowntimeClosed`	Attribute recovered from DOWN

MariaDB Schema

Three ERPNext-compatible tables (standard tab{DocType} naming, standard audit columns):

tabState Transition   — full history of every state change
tabCurrent State      — one row per attribute, UPSERT on every transition
tabDowntime Interval  — one open row per active downtime, closed on recovery

Apply schema to a fresh database:

# Fresh container (wipes existing data)
docker compose down -v
docker compose up -d status-server-db

# Or apply manually to a running container
docker exec -i status-server-db mariadb -u ss -pss statusserver < db/schema.sql

On startup, if MariaDB is configured, StatusServer reads tabCurrent State and seeds the in-memory state machines — attributes that were DOWN when the server last stopped resume tracking correctly.

Observability

The observability/ directory contains a pre-configured Prometheus + Grafana stack.

# Start the full stack (MariaDB + Prometheus + Grafana)
docker compose up -d status-server-db prometheus grafana

# Or bring everything up at once including the Tango test stack
docker compose up -d

Service	URL	Credentials
Prometheus	http://localhost:9090	—
Grafana	http://localhost:3000	admin / admin

Prometheus scrapes StatusServer at host.docker.internal:9190 every 5 seconds.

Grafana Dashboards

Two dashboards are provided in observability/grafana/. Import them via Dashboards → Import → Upload JSON.

Status Server Health (status-server-grafana-dashboard.json)

Operational overview — no time series. Designed for a monitoring wall or quick health check.

┌─────────────────────────────────────────┐
│         Monitored Attributes: N         │
├────────────────────┬────────────────────┤
│   Max Signal Age   │ Freshness / Source │
├────────────────────┼────────────────────┤
│   Up Attributes    │ Failed Attributes  │
├────────────────────┴────────────────────┤
│         Device State & Status           │
├─────────────────────────────────────────┤
│         Failed Attributes (table)       │
└─────────────────────────────────────────┘

Status Server Live Values (status-server-grafana-dashboard-v2.json)

Live signal explorer. Use the Alias variable to drill into a specific attribute.

┌─────────────────────────────────────────┐
│      Current Values Snapshot (table)    │
├─────────────────────────────────────────┤
│      All Signal Values (timeseries)     │
├─────────────────────────────────────────┤
│      Signal Freshness — All             │
├────────────────────┬────────────────────┤
│  $alias — Value    │  $alias — Freshness│
├────────────────────┴────────────────────┤
│      Device State & Status              │
└─────────────────────────────────────────┘

Prometheus Metrics

Metric	Type	Description
`control_system_attribute_value`	gauge	Latest numeric value
`control_system_attribute_state`	gauge	Latest enum state (label `state=`)
`control_system_attribute_status`	gauge	Latest status text (label `status=`)
`control_system_attribute_up`	gauge	1 = last read succeeded, 0 = failed
`control_system_attribute_age_seconds`	gauge	Age of the latest sample
`control_system_attribute_source_timestamp_seconds`	gauge	Source timestamp of latest sample
`status_server_monitored_attributes`	gauge	Total attributes being monitored
`status_server_up_attributes`	gauge	Attributes with a successful last read
`status_server_failed_attributes`	gauge	Attributes with a failed last read

All per-attribute metrics carry labels: source, device, name, attribute, alias.

Development

# Build (skip environment-dependent tests)
mvn clean package -Dmaven.test.skip=true

# Run unit tests (no Tango/TINE required)
mvn test -Dtest="AvailabilityAnalyzerTest,StatusServerStatusServerConfigurationTest"

# Run all tests (requires live Tango)
mvn test

Safe unit tests (no external dependencies): data2/, configuration/, engine2/AvailabilityAnalyzerTest.

Architecture Decision Records

ADR-1: XML-only device source

Frappe/ERPNext is used as a configuration front-end. It exports device/attribute lists as XML files. StatusServer reads that XML directly. There is no runtime Frappe dependency — this removes the HTTP round-trip on startup and makes the server runnable without an ERPNext instance.

ADR-2: Unified `EventSink<T>` interface

RecordWriter (telemetry) and TechnicalEventListener (availability signals) were merged into a single EventSink<T> functional interface. Rationale: a writer is a listener — it reacts to events. The generic parameter carries the event type, keeping the type system honest while eliminating the duplicate observer hierarchies.

ADR-3: `EventDispatcher<T>` replaces `WriterDispatcher`

A single generic fan-out dispatcher replaces the telemetry-specific WriterDispatcher. Both the telemetry pipeline (SingleRecord<?>) and the domain event pipeline (DomainEvent) use the same implementation. Failures in one sink are logged and isolated; other sinks always receive the event.

ADR-4: Snapshot-only in-memory store

AllRecords (full time-series history) was retired. The in-memory store now keeps only the latest value per attribute (Snapshot). Rationale: historical queries are served by MariaDB; keeping a second growing in-memory copy adds memory pressure with no remaining consumer.

ADR-5: Global availability thresholds

stale-after and down-after are global values configured at the server level, not per attribute. Rationale: in this deployment all attributes are polled at similar rates; per-attribute thresholds add configuration complexity without practical benefit.

ADR-6: ERPNext-compatible MariaDB schema

Tables follow ERPNext naming (tab{DocType}) and include the standard Frappe audit columns (name, creation, modified, modified_by, owner, docstatus, idx). Rationale: rows can be imported into or consumed by a Frappe/ERPNext instance without transformation, enabling ERP-level downtime reporting and billing workflows.

ADR-7: No connection pool (plain JDBC with auto-reconnect)

MariaDbSink uses a single JDBC connection with isValid() check before each use and silent reconnect on failure. Rationale: domain events are low-frequency (state changes, not every poll); a full connection pool (HikariCP etc.) adds a dependency and warm-up complexity for negligible benefit at this throughput.

ADR-8: Java 21 virtual threads

The HTTP server and engine polling tasks use Thread.ofVirtual(). This allows a large number of concurrent blocking I/O operations (Tango/TINE reads) without the overhead of a large platform thread pool.

Name		Name	Last commit message	Last commit date
Latest commit History 658 Commits
.github/workflows		.github/workflows
.idea		.idea
.tine		.tine
bin		bin
db		db
docs		docs
etc		etc
lib/de/desy/tine		lib/de/desy/tine
observability		observability
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
mkdocs.yml		mkdocs.yml
pom.xml		pom.xml
register.sh		register.sh
status-server-downtime-architecture.md		status-server-downtime-architecture.md
status-server-task-list.md		status-server-task-list.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StatusServer

Requirements

Quick Start

Docker

Configuration

Architecture

Key components

HTTP Endpoints

Availability Tracking & Downtime Persistence

MariaDB Schema

Observability

Grafana Dashboards

Prometheus Metrics

Development

Architecture Decision Records

ADR-1: XML-only device source

ADR-2: Unified `EventSink<T>` interface

ADR-3: `EventDispatcher<T>` replaces `WriterDispatcher`

ADR-4: Snapshot-only in-memory store

ADR-5: Global availability thresholds

ADR-6: ERPNext-compatible MariaDB schema

ADR-7: No connection pool (plain JDBC with auto-reconnect)

ADR-8: Java 21 virtual threads

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

StatusServer

Requirements

Quick Start

Docker

Configuration

Architecture

Key components

HTTP Endpoints

Availability Tracking & Downtime Persistence

MariaDB Schema

Observability

Grafana Dashboards

Prometheus Metrics

Development

Architecture Decision Records

ADR-1: XML-only device source

ADR-2: Unified EventSink<T> interface

ADR-3: EventDispatcher<T> replaces WriterDispatcher

ADR-4: Snapshot-only in-memory store

ADR-5: Global availability thresholds

ADR-6: ERPNext-compatible MariaDB schema

ADR-7: No connection pool (plain JDBC with auto-reconnect)

ADR-8: Java 21 virtual threads

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

ADR-2: Unified `EventSink<T>` interface

ADR-3: `EventDispatcher<T>` replaces `WriterDispatcher`

Packages