diff --git a/docs/architecture/infrastructure.md b/docs/architecture/infrastructure.md new file mode 100644 index 0000000..c045a32 --- /dev/null +++ b/docs/architecture/infrastructure.md @@ -0,0 +1,209 @@ +# Infrastructure Architecture +### TheRON — OTIVM / CIVICVS / TESSERA Stack +### Status: Settled — do not reverse without explicit project owner instruction +### Date: 2026-04-28 + +--- + +## 1. Governing Principle + +**ROBUST is always the first response to any architectural decision.** + +Optimisation that encodes assumptions about co-location, shared infrastructure, +or current scale is not optimisation — it is deferred liability. Every decision +in this document was made by asking: what happens when (not if) the second box +exists in a different datacenter? + +--- + +## 2. Container Topology + +Five LXC containers on srv-a (10.0.0.11, Proxmox). Each container owns exactly +one domain. No container reaches into another container's data directly. + +| CT | Name | Role | WireGuard IP | +|---|---|---|---| +| **1101** | tessera-pipeline | TESSERA data pipeline — ingests, validates, and promotes physical world data | TBD | +| **1102** | tessera-store | TESSERA master store — authoritative read-only source for physical world data | TBD | +| **1103** | tessera-dev | Aggregation — reads player behavior, derives collective patterns, writes back to 1102 | TBD | +| **1104** | apt-cache | Infrastructure only — Debian package cache for the local network | TBD | +| **1105** | otivm-dev | Game server — serves the OTIVM browser game, holds 128 per-player SQLite databases | 10.110.0.18 | + +--- + +## 3. API Protocol + +**All inter-container data flows are REST over HTTPS on the WireGuard mesh.** + +No exceptions. No shared filesystem mounts between containers. No direct database +access across container boundaries. No assumptions about co-location. + +This discipline is enforced not because the containers are currently in different +datacenters — they are not — but because the architecture must survive the moment +they are. An API call works identically whether the target container is on the same +physical host or on a node in another country. A shared filesystem mount does not. + +Every API is: +- Versioned (`/api/v1/...`) +- Logged — every call, every response, every error +- Narrow — one domain, one owner +- Independently deployable + +--- + +## 4. Container APIs + +### 4.1 — 1101 tessera-pipeline +**Write API (internal only)** +Pushes validated physical world data into 1102. Not accessible by game containers. +No player-facing traffic. Called by the dataset assistant pipeline scripts. + +Defined in: `docs/architecture/api-1101.md` (pending) + +### 4.2 — 1102 tessera-store +**Read API** +The authoritative source for TESSERA physical data — cells, epochs, terrain, +hydrology, elevation, geology, occupation evidence. Every consumer that needs +physical world data calls this API. + +The `data/otivm.sqlite3` file currently on 1105 is a local cache of a subset of +what 1102 will eventually serve. When 1102's API is live, 1105 reads from it +directly. The local cache becomes a pre-seeded performance layer, not a source. + +Defined in: `docs/architecture/api-1102.md` (pending) + +### 4.3 — 1103 tessera-dev +**Read API** for derived behavioral data — market prices, route saturation, +collective patterns derived from aggregating player behavior. + +Also exposes a **scheduler** that calls 1105's internal API on a defined schedule +to collect player event snapshots, processes them, and writes derived aggregates +back to 1102 via its write endpoint. + +Runs no game logic. Has no player-facing traffic. + +Defined in: `docs/architecture/api-1103.md` (pending) + +### 4.4 — 1105 otivm-dev +Two APIs: + +**Player-facing API** — serves the OTIVM browser game. Handles save state, +map data, and all game logic. Currently live on port 3000. + +**Internal API** — exposes player event snapshots to 1103 on the aggregation +schedule. Returns anonymised behavioral data only — no personal identifiers, +no save file contents. 1103 never touches player SQLite files directly. + +Defined in: `docs/architecture/api-1105.md` (pending) + +--- + +## 5. Data Flow + +``` +1101 (pipeline) + │ + │ write — validated physical data + ▼ +1102 (tessera-store) + │ + │ read — physical world data + ▼ +1105 (game) ◄──────────────────────────────── player browser + │ + │ read — player event snapshots (scheduled) + ▼ +1103 (aggregation) + │ + │ write — derived aggregates + ▼ +1102 (tessera-store) +``` + +**Write domains — one per container, never shared:** + +| Container | Writes to | +|---|---| +| 1101 | 1102 physical data tables | +| 1103 | 1102 derived aggregate tables | +| 1105 | Its own 128 per-player SQLite databases only | + +No container writes to another container's primary data. The flow is always +downstream from physical reality toward player experience, with one upstream +path: aggregated behavior flowing back to inform the physical world model. + +--- + +## 6. Per-Player Database Model + +128 SQLite databases on 1105. One per player slot, pre-provisioned. +Each database is named by player token and lives in `data/players/`. + +A new player is assigned to an existing pre-provisioned database. +No database is created on demand under player load. + +The atomic unit of the per-player database is **time**. Every parameter, +every action, every event is a timestamped record. Voyage, otium, journal +entry, chapter advance — these are derived labels applied to intervals +on a continuous timeline. The database records moments. The application +derives meaning from sequences of moments. + +Schema defined in: `docs/architecture/player-database.md` (pending) + +--- + +## 7. Backup and Restore Strategy + +Each container is independently restorable from vzdump snapshots. +Archives are documented in `docs/archives.md`. + +**Recovery hierarchy by criticality:** + +| Priority | Container | Reason | +|---|---|---| +| Highest | 1102 | Physical world data — rebuilt from 1101 if lost, but slow | +| High | 1105 | Player databases — irreplaceable behavioral history | +| Medium | 1101 | Pipeline code — recoverable from Gitea | +| Medium | 1103 | Aggregation code — recoverable from Gitea | +| Low | 1104 | Infrastructure only — trivially replaceable | + +**Rule:** 1105 is backed up more frequently than any other container because +player databases are gitignored and exist only on disk. When the Simulator +launches and player databases represent months of participant history, this +frequency increases further. + +--- + +## 8. Scalability Path + +The second game container — 128 more participants, potentially in a different +datacenter — is a configuration change, not an architectural change. It: + +- Runs the same game server code as 1105 +- Calls 1102's API for physical world data (same endpoint, different client) +- Reports player events to 1103 via the same internal API contract +- Is backed up independently + +Nothing in this architecture assumes a single game container. The API boundaries +ensure that adding capacity is additive, not disruptive. + +--- + +## 9. What This Document Does Not Cover + +These topics are settled in principle but not yet specified in detail. +Each will receive its own architecture document. + +| Topic | Document | +|---|---| +| 1102 API endpoints and schema | `docs/architecture/api-1102.md` | +| 1103 aggregation schedule and logic | `docs/architecture/api-1103.md` | +| 1105 internal API for event export | `docs/architecture/api-1105.md` | +| Per-player SQLite schema | `docs/architecture/player-database.md` | +| Parameter registry | `docs/architecture/parameters.md` | +| CIVICVS Simulator integration | `docs/architecture/simulator.md` | + +--- + +*Infrastructure Architecture — settled 2026-04-28* +*TheRON — single contributor. AI assistants implement, document, flag — do not direct.*