service-core

ARCHIVED — This repository has been merged into open-pryv.io. All development continues there. This repo is kept as read-only reference.

Pryv.io core server — handles user data (events, streams, accesses, webhooks) with pluggable storage engines.

Current version: 2.0.0-pre.2 Working branch: refactor/pre-v2 Node.js: 22.x Linting: neostandard with { semi: true }

What's new in v2

2.0.0-pre.2 (current)

System streams refactor — clean Mall-based account store replaces scattered serializer (639→158 lines)
openSource flag removed — all features always enabled (webhooks, HFS, cache sync, email check)
Webhooks in-process — webhooks service runs inside the API server process (no separate container)
Metadata updater inlined — direct function call in HFS replaces TChannel RPC; metadata/tprpc components and tchannel/protobufjs dependencies removed
Cluster mode — bin/master.js manages N API workers via Node.js cluster module (replaces runit)

2.0.0-pre.1

Storage plugin architecture — engines (MongoDB, PostgreSQL, SQLite, filesystem, InfluxDB) are plugins under storages/engines/ with manifest-driven config
Engine-agnostic production code — zero @pryv/boiler imports in engines, config/logging injected via _internals.js
Formal storage interfaces — storages/interfaces/ with contracts for all storage types

Architecture

node bin/master.js
  │
  ├── Master process
  │   ├── TCP pub/sub broker (:4222)
  │   └── Process manager (fork/monitor workers)
  │
  ├── N × API Worker (cluster, shared :3000)
  │   ├── API routes (events, streams, accesses, auth, …)
  │   ├── Socket.IO (real-time notifications)
  │   └── Webhooks subscriber (in-process)
  │
  ├── M × HFS Worker (cluster, shared :4000, 0 = disabled)
  │   ├── Series routes (high-frequency data)
  │   └── Metadata updater (in-process)
  │
  └── 0-1 × Previews Worker (:3001, lazy/optional)

Standalone mode (dev/test): just start api-server Cluster mode: just start-master or node bin/master.js

Configuration

cluster:
  apiWorkers: 2    # number of API workers (default: 2)
  hfsWorkers: 1    # number of HFS workers (default: 1, 0 = disabled)

webhooks:
  inProcess: true  # start webhooks in API server (default: true)
  minIntervalMs: 5000
  maxRetries: 5

Components

Component	Purpose
`api-server`	HTTP API + Socket.IO + webhooks service
`hfs-server`	High-frequency series data (InfluxDB)
`previews-server`	Image preview generation (GraphicsMagick)
`business`	Business logic, webhooks, series, system streams
`storage`	Engine-agnostic storage layer
`storages`	Plugin barrel — engines, interfaces, init
`mall`	Data access layer (events, streams, accesses)
`cache`	In-memory caching with pub/sub invalidation
`messages`	TCP pub/sub broker + client
`audit`	Audit logging (SQLite)
`middleware`	Express middleware (auth, versioning, errors)
`test-helpers`	Shared test infrastructure

Storage engines

Engine	Storage types	Status
MongoDB	base, dataStore	Production
PostgreSQL	base, dataStore, series, audit	Production
SQLite	dataStore (per-user), user account, user index, audit	Production
rqlite	platform (single- and multi-core)	Production
Filesystem	file (attachments)	Production
InfluxDB	series (HFS)	Production

Installation

Prerequisites:

make and C/C++ compilation support
Node.js 22.x (use nvm or n)
MongoDB 4.2+ (included via scripts/setup-dev-env)
InfluxDB 1.x (storages/engines/influxdb/scripts/setup on Linux, brew install influxdb@1 on macOS)
GraphicsMagick (optional, for previews): apt-get install graphicsmagick / brew install graphicsmagick
just

just setup-dev-env    # setup local file structure + MongoDB
just install          # install node modules

Running

just start-deps                   # start MongoDB + InfluxDB
just start-master                 # cluster mode (N API workers)
just start api-server             # single API server (dev)
just start hfs-server             # HFS server (dev)
just start-mon api-server         # auto-restart on file changes

Running with nginx

Use nginx as a reverse proxy in front of bin/master.js for SSL termination, domain routing, and serving multiple Pryv services on a single host.

Each backend uses cluster mode internally — bin/master.js forks N API workers sharing a single port, and HFS workers share their own port. nginx routes traffic to those ports.

Start the processes

# Single command — master manages all workers:
#   N API workers sharing :3000 (includes webhooks)
#   M HFS workers sharing :4000 (0 = disabled)
#   Previews worker on :3001 (optional, lazy)
NODE_ENV=production node bin/master.js --config /path/to/config.yml

# Config keys for worker counts
cluster:
  apiWorkers: 2    # N API workers (default: 2)
  hfsWorkers: 1    # M HFS workers (default: 1, 0 = disabled)
  previewsWorker: true  # lazy spawn on first request

The master process hosts the TCP pub/sub broker (:4222). All workers connect as clients automatically.

nginx configuration

upstream api_backend {
    # Cluster workers share :3000 — single upstream entry
    # ip_hash recommended for connection affinity (optional: Socket.IO uses WebSocket-only in cluster mode)
    ip_hash;
    server 127.0.0.1:3000;
}

upstream hfs_backend {
    server 127.0.0.1:4000;
}

server {
    listen 443 ssl;
    server_name core.example.com;

    ssl_certificate     /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;

    # API server (default)
    location / {
        proxy_pass http://api_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    # Socket.IO — requires WebSocket upgrade
    location /socket.io/ {
        proxy_pass http://api_backend;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
    }

    # HFS (high-frequency series)
    location ~ ^/[^/]+/series/ {
        proxy_pass http://hfs_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

master.js alone vs. behind nginx

	`bin/master.js` alone	Behind nginx
SSL termination	Via backloop.dev or config	nginx handles SSL
Domain routing	Single service	Multiple services per host
Static files	Not handled	nginx serves directly
Process management	Built-in auto-restart	Combine with systemd or PM2
Clustering	Built-in (cluster module)	Same — cluster runs behind nginx

Backup, Restore & Integrity

Backup

Export all user data (events, streams, accesses, profile, webhooks, account, audit, attachments) and platform data to a portable JSONL+gzip archive.

# Full backup (compressed by default)
node bin/backup.js --output /path/to/backup

# Single user
node bin/backup.js --output /path/to/backup --user <userId>

# Uncompressed (for debugging / human inspection)
node bin/backup.js --output /path/to/backup --no-compress

# Incremental (only changes since previous backup, auto-detected per user)
node bin/backup.js --output /path/to/backup --incremental

Output is engine-agnostic: the same backup can be restored into MongoDB, PostgreSQL, or SQLite.

Backups use snapshot consistency: a timestamp is recorded at start, and only items modified before that timestamp are exported. Concurrent writes during backup are excluded and will be captured by the next incremental run. No system interruption or user freeze required.

Restore

# Full restore
node bin/backup.js --restore /path/to/backup

# Overwrite existing data
node bin/backup.js --restore /path/to/backup --overwrite

# Single user
node bin/backup.js --restore /path/to/backup --user <userId>

# Skip conflicting users + cleanup
node bin/backup.js --restore /path/to/backup --skip-conflicts --move-on-success /path/to/done

# Verify integrity after restore (rolls back on failure)
node bin/backup.js --restore /path/to/backup --overwrite --verify-integrity

When --verify-integrity is set, integrity hashes are recomputed on every restored event and access. If any mismatch is found, the affected user's data is rolled back (cleared).

Integrity Check

Standalone per-user integrity verification for health data compliance. Recomputes hashes on events and accesses and compares against stored values.

# Check all users
node bin/integrity-check.js

# Check a single user
node bin/integrity-check.js --user <userId>

# JSON output (for automation)
node bin/integrity-check.js --json

Exit code 0 = all checks passed, 1 = integrity errors found.

Testing

just test all                     # all components (MongoDB)
just test api-server              # single component
just test-pg all                  # PostgreSQL mode
just test-detailed api-server     # verbose output
just test-debug api-server        # with debugger
just test-parallel all            # parallel file execution
just clean-test-data              # reset SQLite DBs + user dirs

Extra Mocha params: --bail (stop on first failure), --grep <text> (filter tests)

Environment variables: LOGS=<level> (show server output), DEBUG="*" (debug info)

Project structure

service-core/
├── bin/                    # Entry points
│   ├── master.js           # Cluster master (N API workers)
│   ├── backup.js           # Backup/restore CLI
│   └── integrity-check.js  # Data integrity verification CLI
├── components/             # Application components (npm workspaces)
│   ├── api-server/         # Main API server
│   ├── hfs-server/         # High-frequency series server
│   ├── previews-server/    # Image previews
│   ├── business/           # Business logic
│   ├── storage/            # Storage abstraction layer
│   ├── mall/               # Data access layer
│   ├── cache/              # Caching
│   ├── messages/           # TCP pub/sub
│   ├── audit/              # Audit logging
│   ├── middleware/         # Express middleware
│   ├── webhooks/           # Webhook business logic (runs in api-server)
│   └── test-helpers/       # Test infrastructure
├── storages/               # Plugin system (npm workspace)
│   ├── engines/            # mongodb, postgresql, sqlite, filesystem, influxdb
│   └── interfaces/         # Formal contracts per storage type
│       └── backup/         # Backup/restore writer/reader interfaces
├── build/                  # Docker + deployment
└── justfile                # Development commands

App configuration

Configuration loads from (last takes precedence):

Component default config (components/<name>/config/default-config.yml)
Environment-specific config ({env}-config.yml)
Config file via --config <path>
Command-line options (--key:path=value)

Multi-core deployments

Multi-core deployments host users across N cores sharing a single rqlite-replicated PlatformDB. Two topology variants:

Variant	DNS	`core.url`
Domain-derived (legacy)	`{username}.{domain}` resolved by the embedded DNS server or external wildcard	Auto-derived from `core.id + dns.domain`
DNSless (Plan 27 Phase 2)	Externally managed (load balancer, fixed FQDNs)	Explicit `core.url` per core in YAML

Both variants use:

GET /reg/cores?username=X — discovery route. Returns the URL of the core hosting the user. Any core can answer (reads from PlatformDB).
HTTP 421 wrong-core middleware — if a /:username/* request hits the wrong core, the response is 421 Misdirected Request with { error: { id: 'wrong-core', coreUrl } }. SDKs retry against coreUrl. No HTTP redirect (cross-origin redirects strip Authorization headers).
/reg/* and /system/* are intentionally load-balanced — the wrong-core middleware is bypassed for those.

See SINGLE-TO-MULTIPLE.md for the full upgrade procedure.

Configuration model: platform-wide vs per-core

service-core v2 groups configuration into three categories. Multi-core deployments must respect this split or cores will drift and users will see inconsistent behaviour depending on which core answers their request.

Category	Meaning	Source
Per-core	Local to this node — ports, IPs, worker counts, log paths, DB credentials for this host, local tuning	YAML/env, each node has its own values
Platform-wide	MUST be identical across all cores in a deployment — policy, user schema, identity, feature toggles	PlatformDB (rqlite-replicated) is authoritative; YAML seeds on first boot
Bootstrap	Platform-wide in meaning, but needed before PlatformDB is reachable — how to connect to PlatformDB, admin key, first-boot seeds	YAML only; operator responsibility to keep identical across cores

Per-core examples: http.port, cluster.apiWorkers, core.id, logs.*, storages.engines.mongodb.* Platform-wide examples: dns.domain, hostings, invitationTokens (already in PlatformDB), custom.systemStreams, password policy, integrity.algorithm, webhook retry contract Bootstrap examples: storages.platform.engine + storages.engines.rqlite.*, auth.adminAccessKey, first-boot seeds

Every block in config/default-config.yml is annotated with its category. On startup, cores log a warning when local YAML disagrees with PlatformDB for known platform-wide values — look for [platform-drift] in the logs.

For the full categorization of every config key, see _plans/27-pre-open-pryv-merge-atwork/CONFIG-SEPARATION.md.

Next steps (toward v2.0.0)

In progress — Plan 14: Unified master process

Consolidating service-core's separate processes behind a single master.

Phase 1.1: Inline metadata updater into HFS (remove TChannel RPC)
Phase 1.2: Webhooks as in-process subscriber (remove separate container)
Phase 2: Create bin/master.js with cluster module
Phase 3: Add HFS as configurable child processes (M workers, 0=disabled)
Phase 4: Add previews worker (config-toggleable, GM check)
Phase 5: Single Dockerfile (replace per-component Dockerfiles + runit)
Phase 6: Socket.IO WebSocket-only in cluster mode (no sticky sessions needed)

Backlog

Plan	Description	Priority
Merge service-register	Integrate user discovery service into service-core; evaluate PG/SQLite for shared register data	High
Merge service-mfa	Absorb MFA service as in-process module within API server	High
Previews: replace GM	Replace GraphicsMagick with pure-Node image processing	Medium
Finalize storage plugins	Complete test infrastructure for storage plugin architecture	Medium
SQLite streams storage	Re-implement SQLite nested-set tree for streams + prevent system stream leaks into storage	Low
TypeScript + ESM	Migrate from CommonJS to TypeScript with ESM output; enable top-level await	Low

License

BSD-3-Clause

License

BSD-3-Clause

Name		Name	Last commit message	Last commit date
Latest commit History 4,887 Commits
.vscode		.vscode
archive		archive
bin		bin
build		build
components		components
config		config
custom-extensions		custom-extensions
docs		docs
mongo-scripts		mongo-scripts
scripts		scripts
storages		storages
test		test
tools		tools
.api-version		.api-version
.dockerignore		.dockerignore
.gitignore		.gitignore
.licenser.yml		.licenser.yml
.mocharc.js		.mocharc.js
.nycrc.json		.nycrc.json
CHANGELOG-v2-back.md		CHANGELOG-v2-back.md
CHANGELOG-v2.md		CHANGELOG-v2.md
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README-DBs.md		README-DBs.md
README.md		README.md
SINGLE-TO-MULTIPLE.md		SINGLE-TO-MULTIPLE.md
eslint.config.js		eslint.config.js
jsconfig.json		jsconfig.json
justfile		justfile
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

service-core

What's new in v2

2.0.0-pre.2 (current)

2.0.0-pre.1

Architecture

Configuration

Components

Storage engines

Installation

Running

Running with nginx

Start the processes

nginx configuration

master.js alone vs. behind nginx

Backup, Restore & Integrity

Backup

Restore

Integrity Check

Testing

Project structure

App configuration

Multi-core deployments

Configuration model: platform-wide vs per-core

Next steps (toward v2.0.0)

In progress — Plan 14: Unified master process

Backlog

License

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages