This document is the "how" companion to the runbook. It explains what the machine is made of, which process owns which responsibility, and how a participant's recording moves through the stack from browser capture to room playback, expiry, revocation, and operator visibility.
If you only need deploy and repair steps, use docs/maintenance.md. This file
is for understanding the architecture well enough to modify it safely.
If you need explicit client-role instructions for a real install, use
docs/multi-machine-setup.md alongside this file.
If you need the shortest explicit browser/API boundary notes, use
docs/surface-contract.md.
If you need a faster first-glance map before reading this full architecture
walkthrough, use docs/AT_A_GLANCE.md.
The stack is intentionally local-first and appliance-shaped.
- A recording kiosk captures audio, while a separate playback surface can run the room loop.
- Django stores metadata in Postgres and object bytes in MinIO.
- Celery workers generate long-lived derivatives and run retention cleanup.
- The room loop is composed in the browser, but it asks the server for one eligible artifact at a time.
- Blob access is proxied through Django, so the browser never needs direct MinIO credentials or CORS access.
That split matters:
- browser code owns interaction feel, capture flow, and playback texture
- Django owns policy, retention, consent, access control, and playback eligibility
- Postgres is the source of truth for artifact state
- MinIO is the source of truth for stored bytes
- Redis is only transport for Celery
For question and repair, the pool payload now also exposes a small
thread_signal. The browser uses that to request one more same-topic return
when the room should briefly behave like a chorus or a bench notebook, without
adding a second opaque state system.
The deployed compose stack in docker-compose.yml has seven practical runtime
services:
proxyCaddy is the public entrypoint on ports80/443. It terminates TLS and forwards HTTP traffic to Django.apiThe Django app serves/kiosk/,/ops/,/healthz,/readyz, and the JSON API under/api/v1/.dbPostgres stores node metadata, consent manifests, artifacts, derivatives, and access events.redisCelery broker/result backend and the shared Django cache for throttle state, operator lockouts, worker/beat heartbeats, and playback-ack dedupe.workerCelery worker for spectrogram generation and cleanup tasks.beatCelery Beat scheduler for periodic expiry and derivative pruning.minioPrivate S3-compatible object storage for raw WAV files and derivative images.
There is also a one-shot minio_init helper that ensures the configured bucket
exists before the app starts using it.
In request terms, the paths are:
recording kiosk -> Caddy -> Django -> Postgres/MinIOplayback surface -> Caddy -> Django -> Postgres/MinIOoperator browser -> Caddy -> Django -> Postgres/MinIO
In background-job terms, the path is:
Django -> Redis -> Celery worker -> MinIO/Postgres
The shape looks like this in practice:
flowchart LR
recorder["Recording kiosk /kiosk/"]
playback["Listening surface /room/"]
ops["Operator browser /ops/"]
proxy["Caddy proxy"]
api["Django API + templates"]
db[("Postgres")]
redis[("Redis")]
minio[("MinIO")]
worker["Celery worker"]
beat["Celery beat"]
recorder -->|HTTPS / HTTP| proxy
playback -->|HTTPS / HTTP| proxy
ops -->|HTTPS / HTTP| proxy
proxy --> api
api --> db
api --> minio
api --> redis
beat -->|scheduled tasks| redis
redis --> worker
worker --> minio
worker --> db
The root URL table lives in api/memory_engine/urls.py.
/kiosk/renders the recording station/room/renders the dedicated playback surface/ops/renders the operator dashboard and sign-in surface/healthzreports narrow API/dependency health/readyzreports broader cluster readiness/api/v1/...exposes the kiosk and ops API surface
In the intended field setup:
- the recording machine opens
/kiosk/ - the playback machine opens
/room/ - the operator machine opens
/ops/and signs in withOPS_SHARED_SECRET
The app API routes in api/engine/urls.py break down into five groups:
- ingest
/api/v1/artifacts/audio/api/v1/ephemeral/audio - one-time ephemeral disposal
/api/v1/ephemeral/consume - consent revocation
/api/v1/revoke - playback and observation
/api/v1/pool/next/api/v1/surface/state/api/v1/surface/fossils/<token>/api/v1/node/status/api/v1/operator/artifacts/api/v1/operator/artifacts/<id>/metadata/api/v1/operator/controls - blob and derivative access
/api/v1/media/raw/<token>/api/v1/media/spectrogram/<token>/api/v1/derivatives/spectrograms
The core models live in api/engine/models.py.
Represents one physical installation or appliance. In practice the repo
currently assumes a single active node per database, and lazily creates the
first Node row from environment variables if none exists yet.
Stores the exact consent/retention policy applied when a take was submitted. The important detail here is that consent is persisted as JSON, not inferred later from UI labels. That means retention and derivative behavior are tied to the original submission policy, not to whatever the UI might say in the future.
It also stores a hash of the revocation token rather than the token itself.
This is the main unit of stored sound.
Important fields:
statusACTIVE,EXPIRED,REVOKED, orEPHEMERALraw_uriobject-storage key for the WAV fileduration_msused both for UI reporting and playback density heuristicswearaccumulated playback patina from0.0to1.0play_counthow often audible playback has been acknowledged by the room looplast_access_atrecent-play cooldown signalexpires_atwhen raw storage should no longer remain eligible
Currently used for two fossil-side derivative types:
spectrogram_pngessence_wav
The important policy distinction is that derivatives can outlive raw audio when the consent mode allows that.
Records each playback action. Right now the primary use is a lightweight audit trail and debugging signal around pool behavior.
Stores the current live operator posture for the whole node:
- whether intake is paused
- whether playback is paused
- whether quieter mode is active
This is a singleton-style row rather than per-browser state, because the room needs all client machines to agree on the same current stewardship posture.
Records each live operator control change with a timestamp, actor label, and a small JSON payload describing the change.
That same audit path now also records lightweight artifact metadata edits made
through /ops/, so topic/status stewardship remains visible without building a
larger moderation subsystem.
The /ops/ metadata editor stays intentionally narrow:
topic_tagremains free text for lightweight clusteringlifecycle_statusis presented as a deployment-specific picker- existing older custom status values are still preserved and editable if they already exist on an artifact
The core data shape is:
erDiagram
NODE ||--o{ ARTIFACT : owns
CONSENT_MANIFEST ||--o{ ARTIFACT : governs
ARTIFACT ||--o{ DERIVATIVE : yields
ARTIFACT ||--o{ ACCESS_EVENT : records
NODE {
bigint id
string name
string location_hint
datetime created_at
}
CONSENT_MANIFEST {
bigint id
json json
string revocation_token_hash
datetime created_at
}
ARTIFACT {
bigint id
string status
string raw_uri
string raw_sha256
int duration_ms
float wear
int play_count
datetime last_access_at
datetime expires_at
datetime created_at
}
DERIVATIVE {
bigint id
string kind
string uri
bool publishable
datetime expires_at
datetime created_at
}
ACCESS_EVENT {
bigint id
string context
string action
datetime ts
}
The consent policy builder lives in api/engine/consent.py.
There are three practical modes:
- raw audio stored locally
- no publishable derivative generated
- raw expires after
RAW_TTL_HOURS_ROOM - revocation is allowed
- raw audio stored locally
- spectrogram and low-storage audio-residue derivatives are allowed
- raw expires after
RAW_TTL_HOURS_FOSSIL - derivative expires after
DERIVATIVE_TTL_DAYS_FOSSIL - revocation is allowed
- raw audio is created only for immediate one-time playback
- no long-lived derivative is generated
- revocation is not offered because the take is meant to disappear immediately
The important implementation detail is that NOSAVE still creates a real
database artifact and a real object-storage blob briefly. That keeps the
playback path simple. The artifact is then consumed and revoked immediately
after the one-time play completes.
The ingest endpoints live in api/engine/api_views.py.
This is the normal path for ROOM and FOSSIL.
- The browser records mono audio, processes it locally, and uploads a WAV
file plus
consent_modeandduration_ms. - Django validates the mode, WAV structure, mono/bit-depth contract, upload size, and server-side duration ceiling before accepting the file.
- Django generates a revocation token and stores only its hash in
ConsentManifest. - Django creates an
Artifactrow inACTIVEstate with a retention deadline. - If the participant chose a memory color, Django stores that choice as
artifact.effect_profileplus structuredeffect_metadata; the WAV itself stays dry. - The memory-color profile catalog is shared across Django and the kiosk UI, so the same profile codes, labels, and descriptions drive validation, review button generation, room playback metadata, and operator summaries.
- The
Dreamprofile uses a seeded render path based on the decoded source audio, so a participant preview and later room playback stay materially aligned without baking a separate derivative. - The profile catalog also carries the first-pass DSP tuning values and a bounded processing topology for each memory color, so labels and playback shaping travel together while the browser still dispatches through a small authored set of effect builders instead of an arbitrary node graph.
- Django writes the WAV bytes to MinIO under
raw/<artifact_id>/audio.wav. - Django stores the resulting object key in
artifact.raw_uri. - If the consent mode permits derivatives, Django queues
generate_spectrogram.delay(artifact.id)andgenerate_essence_audio.delay(artifact.id)as needed. - Django returns the serialized artifact plus the plain revocation token.
Why this is structured this way:
- Postgres holds the policy and lifecycle state.
- MinIO holds the bytes.
- the client gets the revocation token once and only once.
Visually, the ingest branch looks like this:
flowchart TD
start["Recording kiosk captures mono WAV"] --> upload["POST /api/v1/artifacts/audio"]
upload --> validate["Django validates file + consent + WAV contract"]
validate --> consent["Create ConsentManifest + hashed revoke token"]
consent --> artifact["Create ACTIVE Artifact with retention deadline"]
artifact --> effect["Store memory color metadata separately"]
effect --> store["Write WAV to MinIO and save raw_uri"]
store --> branch{"Consent mode"}
branch -->|ROOM| room["Return artifact + revocation token"]
branch -->|FOSSIL| queue["Queue derivative generation"]
queue --> fossil["Return artifact + revocation token"]
nosaveStart["Recording kiosk captures one-time WAV"] --> nosaveUpload["POST /api/v1/ephemeral/audio"]
nosaveUpload --> eph["Create EPHEMERAL Artifact + one-time access token"]
eph --> ephStore["Write WAV to MinIO"]
ephStore --> once["Recording kiosk plays once"]
once --> purge["First media fetch deletes blob and marks artifact REVOKED"]
This exists for "Don't Save."
- Django validates the WAV with the same server-side checks used for saved audio.
- Django creates a
ConsentManifestforNOSAVE. - Django creates an
ArtifactinEPHEMERALstate with a very short TTL. - Django writes the WAV bytes to MinIO under
ephemeral/<artifact_id>/audio.wav. - Django creates a one-time access token, stores its hash inside the consent
JSON, and returns the
play_url. - The browser plays the file once.
- First media access validates that token, deletes the MinIO object, blanks
the URI, and marks the artifact
REVOKED.
This is a deliberate tradeoff: the ephemeral path is still traceable in the DB briefly, but it avoids inventing a second playback mechanism only for one-time audio.
POST /api/v1/revoke is a policy-enforcement endpoint, not just a UI feature.
The public /revoke/ page is only a thin participant-facing surface over that
same local-node policy path.
The flow is:
- hash the submitted token
- find the matching
ConsentManifest - find all non-revoked artifacts tied to that consent
- delete each raw object from MinIO when present
- mark each artifact
REVOKEDand blankraw_uri - delete any related derivatives from MinIO and the database
The result is that revocation removes both future playback eligibility and stored derivatives on that node.
The long arc of an artifact is:
stateDiagram-v2
[*] --> ACTIVE: ROOM / FOSSIL ingest
[*] --> EPHEMERAL: NOSAVE ingest
ACTIVE --> ACTIVE: playback increments wear + play_count
ACTIVE --> ACTIVE: FOSSIL raw expires but essence remains playable
ACTIVE --> EXPIRED: raw and derivative eligibility both end
ACTIVE --> REVOKED: revoke endpoint clears raw + derivatives
EPHEMERAL --> REVOKED: consume_ephemeral clears raw blob
EPHEMERAL --> REVOKED: expire_raw safety sweep
EXPIRED --> [*]
REVOKED --> [*]
The browser does not talk to MinIO directly.
GET /api/v1/media/raw/<token> in api/engine/api_views.py resolves the best
playable media for the signed artifact token, preferring the raw WAV when it
still exists and falling back to essence_wav when the raw fossil has
expired. It then opens the object stream through storage.stream_key and
returns it as a Django FileResponse with Cache-Control: no-store.
Important boundary change:
- raw and spectrogram media are no longer served as public artifact-ID routes
/api/v1/pool/nextnow mints short-lived media URLs for room playback- the public room visuals feed uses its own signed surface token
/api/v1/derivatives/spectrogramsis now an operator-only inventory view
This keeps the browser-facing trust model simple:
- no MinIO bucket is made public
- no direct MinIO URLs are needed
- no browser CORS configuration is required
- access policy remains centralized in Django
The server-side selection logic lives in api/engine/pool.py.
This is not a pure playlist. It is a weighted selection system with a few compositional categories layered on top.
Each artifact is classified into one lane:
freshlow wear, low play count, and relatively recentmidneither clearly fresh nor clearly wornwornolder, more played, or more weathered
The thresholds come from Django settings:
POOL_FRESH_MAX_AGE_HOURSPOOL_FRESH_MAX_WEARPOOL_FRESH_MAX_PLAY_COUNTPOOL_WORN_MIN_AGE_HOURSPOOL_WORN_MIN_WEARPOOL_WORN_MIN_PLAY_COUNT
Density is duration-based:
lightfor short clipsmediumfor middle-length clipsdensefor longer clips
Mood is derived from lane, density, and age. It is not stored in the database. Current moods are:
clearhushedsuspendedweatheredgathering
select_pool_artifact does the following:
- start from
ACTIVE, unexpired artifacts that still have either a raw WAV or a validessence_wavderivative - apply any anti-repetition exclusions from the client when possible
- prefer artifacts outside the recent-play cooldown window
- fall back to a broader candidate set if the pool is small
- narrow by requested lane, density, and mood when matching candidates exist
- weight candidates by cooldown, rarity, wear, age, and mood affinity
- choose one artifact randomly using those weights
This gives the room loop a composed feel without hard-coding exact sequences.
After the browser finishes a selection and acknowledges it back to the API:
play_countis incrementedwearis advanced byWEAR_EPSILON_PER_PLAYlast_access_atis updated- an
AccessEventwith audible-play intent is recorded
The raw object never changes. The wear is metadata only. The browser applies the audible patina at playback time.
The room-loop handoff between browser and server looks like this:
sequenceDiagram
participant RoomClient as Listening surface room loop
participant Local as localStorage anti-repeat window
participant API as Django /api/v1/pool/next
participant DB as Postgres
participant Blob as Django blob proxy
participant MinIO as MinIO
participant Audio as Web Audio playback
RoomClient->>Local: read recent artifact IDs
RoomClient->>API: GET /pool/next?lane=...&mood=...&exclude_ids=...
API->>DB: query ACTIVE playable artifacts
API-->>RoomClient: artifact_id, wear, audio_url, playback_ack_url, pool_size
RoomClient->>Local: persist selected artifact_id
RoomClient->>Blob: GET /media/raw/<token>
Blob->>MinIO: stream raw WAV or essence residue
MinIO-->>Blob: playable audio stream
Blob-->>RoomClient: no-store audio response
RoomClient->>Audio: apply wear-based playback chain
RoomClient->>API: POST /pool/heard/<token>
API->>DB: update play_count, wear, last_access_at
Audio-->>RoomClient: finish cue / continue movement
The room loop controller lives in api/engine/static/engine/kiosk-room-loop.js.
This is where the installation becomes more than a plain shuffle button.
The browser owns:
- long-form movement sequencing
- scene and cue progression
- adaptive gap timing
- room-tone bed behavior
- client-side persistent anti-repetition via
localStorage
The server still owns artifact eligibility and wear advancement.
That split is useful:
- the browser can make the room feel composed in real time
- the server remains the policy source for what may play
The scene, movement, and reusable room-policy definitions now come from
api/engine/room_composer.py and are embedded into the kiosk page as JSON in
kiosk_view.
Important browser loop behaviors:
- intensity profiles tune cue gaps, pause gaps, and room-tone level
- movement presets tune how many items a movement tends to include
- scarcity tiers, archive-gap tiers, overlap posture, and sequencer heuristics are declared in the room composer config rather than hard-coded in the loop runner
- the anti-repetition window persists recent artifact IDs in
localStorageand sends them back asexclude_idson futurepool/nextrequests
The browser-facing surfaces are split intentionally:
api/engine/templates/engine/kiosk.htmlis the recording stationapi/engine/templates/engine/playback.htmlis the dedicated listening surface
The JavaScript is intentionally split by responsibility.
Owns the guided interaction state machine:
- idle
- arming
- armed
- countdown
- recording
- review
- submitting
- complete
- error
It also owns:
- keyboard shortcuts
- review timeout reset
- quiet-take decision gate
- consent-mode submission
- handoff to the room loop controller
On the recording station, the room loop controller is present only as a shared
boundary now; the visible playback controls live on the separate /room/
surface instead of the recorder itself.
Owns microphone access and live metering.
- requests
getUserMedia - creates an
AnalyserNodefor the live level meter - creates the recording
AudioWorkletNode - emits chunked float buffers back to
kiosk.js - tears down the microphone cleanly when the session resets
Owns browser-side audio processing utilities.
- silence trimming
- quiet-take analysis
- peak normalization
- short fades
- WAV encoding
- playback smoothing and wear-based playback chain
- loading the shared AudioWorklet module
Holds the AudioWorklet processors used by capture and playback.
The key maintenance implication is that browser audio logic is now modular, but it is still custom Web Audio code rather than a framework abstraction.
Owns the dedicated listening surface.
- starts the room loop on the
/room/machine - exposes only start/pause playback controls
- keeps the UX framed as listening, not recording
- allows screenshot/test mode to disable autostart with
?autostart=0
The Celery tasks live in api/engine/tasks.py.
This task:
- downloads the stored WAV bytes from MinIO
- decodes mono PCM samples
- computes a spectrogram with
scipy.signal.spectrogram - renders the plot through Matplotlib
- uploads the PNG back to MinIO
- creates or updates a
Derivativerow
This is why the Python dependency surface is heavier than a basic Django app:
the project includes numpy, scipy, and matplotlib specifically for this
derivative path.
This task:
- downloads the stored WAV bytes from MinIO
- decodes mono PCM samples
- builds a short, low-storage residue by low-pass filtering, resampling, and shaping noise with the source contour
- uploads that residue back to MinIO as
essence.wav - creates or updates a
Derivativerow
The important design intent is that this is not "extra wear." It is an
explicit second-life derivative for FOSSIL consent.
This task:
- removes raw objects once their raw TTL has passed
- keeps
FOSSILartifactsACTIVEwhen anessence_wavderivative still exists and the derivative TTL has not ended - marks artifacts
EXPIREDonce no playable raw or essence remains
It also includes a safety sweep for stale ephemeral artifacts that somehow were not explicitly consumed.
This task deletes derivative objects and rows whose expires_at has passed.
The operator logic lives in api/engine/ops.py and the JSON endpoints in
api/engine/api_views.py.
This is the narrow API/dependency health endpoint.
It checks:
- database connectivity
- Redis reachability
- MinIO bucket reachability
This is used by both operators and the API container health check.
This is the broader cluster readiness endpoint.
It includes the narrow /healthz dependencies plus:
- Celery worker heartbeat freshness
- Celery beat heartbeat freshness
- current Redis-backed queue depth
- recent background task failures for derivative and housekeeping work
This is the right surface for operator checks that need to know whether background work is still advancing, without making the API container health probe depend on worker/beat state.
This is the richer operator payload.
It combines:
- dependency health
- artifact counts
- playable count
- lane distribution
- mood distribution
- disk headroom
- warnings for low pool size, lane/mood imbalance, stale worker/beat heartbeats, queue backlog, or recent background task failures
The HTML dashboard is rendered by Django and hydrated by
api/engine/static/engine/operator-dashboard.js, which polls
/api/v1/node/status every few seconds and classifies the node into ready,
degraded, or broken.
/ops/ is no longer a public status page. It now requires the shared steward
secret from OPS_SHARED_SECRET, can optionally be restricted by
OPS_ALLOWED_NETWORKS, stores that access in a browser-bound session (by
default user-agent-bound rather than IP-bound), and then
exposes live controls for:
- pausing intake
- pausing playback
- switching the playback surface into quieter mode
The kiosk and playback machines do not receive operator access. They poll the
lighter-weight public endpoint /api/v1/surface/state so they can obey the
current steward posture without seeing the full operator dashboard. The
recording station also receives a small ingest-budget snapshot there, so it can
warn participants when that specific kiosk is close to its current submission
ceiling.
The stack is configured almost entirely through environment variables in
api/memory_engine/settings.py and .env.
The major groups are:
- Django host/security settings
- database connection settings
- Redis/Celery settings
- MinIO endpoint and credentials
- playback and wear tuning
- room loop tuning
- steward access and session settings
- operator warning thresholds
There is also an installation-profile layer now:
INSTALLATION_PROFILE=customkeeps the plain repo baselinequiet_gallery,shared_lab, andactive_exhibitprovide curated behavior defaults- explicit env vars still override any value the profile supplies
Important design detail: most of these settings are read directly at process startup and then treated as constants. A settings change generally requires a redeploy.
The stack now also validates a handful of cross-setting relationships at startup, not just presence:
- secure-cookie posture must align with trusted-origin scheme choices
- MinIO endpoint must be an explicit
http://orhttps://URL - warning thresholds must stay ordered sanely
- playback and scarcity thresholds must stay in coherent ranges
The local check gate is scripts/check.sh.
It runs:
- browser JavaScript syntax checks
- frontend smoke tests from
frontend-tests/ - Python syntax compilation
- Django behavior tests with
memory_engine.settings_test - shell syntax checks
git diff --check
The test settings module in api/memory_engine/settings_test.py switches Django
to SQLite, eager Celery, and a stable temporary Matplotlib/staticfiles setup so
tests do not depend on the full compose stack.
GitHub Actions runs the same scripts/check.sh path from
.github/workflows/check.yml, using a repo-local .venv so CI follows the same
Python dependency layout as local maintenance.
The runtime contract is intentionally narrower than “any Python that happens to work on a laptop”:
- the official supported runtime is the Docker / Compose stack
- the API container is pinned to Python
3.12inapi/Dockerfile - local
.venvruns are useful for maintenance and CI parity, but they are best-effort rather than the primary support promise
That is why scripts/check.sh now prints the active Python version before it
runs the gate.
There is no explicit application-level "maximum memories" cap in the current stack. The real limits are:
- available object storage for raw WAV files and derivatives
- database/query performance for active-pool selection
- the retention windows that decide how long raw material stays eligible
Practically, the room does not try to keep every recording forever as active audio. The default posture is:
- raw audio remains in the playable pool while it is
ACTIVEand within its TTL - wear changes playback texture, not storage size
- once raw expires, playback eligibility ends and the room falls back to the synthetic room-tone bed plus whatever other active material still exists
Important design detail: the current degradation path does not ever fully collapse a contribution into room tone. Even at maximum wear, playback still keeps an intelligible, deliberately audible trace of the source. The room tone is a separate synthetic layer, not a final stage of a fully worn voice.
If the installation eventually needs a more storage-practical "essence only" stage, the safer design is to make that an explicit second-life derivative rather than letting wear implicitly destroy intelligibility. In other words:
- keep the current wear path as audible patina
- add a distinct archival derivative later if you want a memory to become more like spectral residue, filtered grain, or site-specific bed material
- then retire the raw sample by policy, not by overdriving the decay effect
The repo now has two different operator entrypoints on purpose.
Use:
./scripts/first_boot.sh --public-host memory.example.com --deployThis is the "make this server into a node" path. It creates .env if needed,
replaces obvious development defaults, generates a fresh OPS_SHARED_SECRET if
needed, and can chain into the first deploy.
Use:
./scripts/update.sh --public-host memory.example.comThis is the conservative "pull, test, backup, deploy, verify" path for an already-bootstrapped server. It exists so the operator does not have to reconstruct the maintenance sequence manually from memory.
If you want to extend the machine, these are the cleanest current seams:
- add new room-loop scenes or movement presets in
api/engine/room_composer.py - tune pool behavior in
api/engine/pool.py - add new operator warnings in
api/engine/ops.py - add new kiosk interaction rules in
api/engine/static/engine/kiosk.js - add new derivatives in
api/engine/tasks.py
The places that still deserve extra care are:
- changes that affect both client composition and server pool policy
- changes to consent semantics and retention JSON
- changes to raw/blob storage layout in MinIO
- changes to the artifact lifecycle states
Those are the parts most likely to create silent behavior drift if modified casually.