Files
otivm/docs/RFC-TESSERA-4.0-001.md

326 lines
12 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# RFC-TESSERA-4.0-001
## Per-Hex Pipeline, Row Lifecycle, and Provenance Model
### Status: Draft v0.2
### Date: 2026-04-26
### Supersedes: RFC-TESSERA-2.0-001 (tile assembly), RFC-TESSERA-3.0-OCC-001 (occupation encoding)
---
## Abstract
TESSERA 4.0 replaces the one-shot global pipeline model with a per-hex
pipeline that is resumable, updatable, and portable. Any H3 hex at any
resolution can be inserted, updated, superseded, or retired independently
of all others. The database is the product from day one — not a tile
archive assembled in a second pass.
This RFC defines:
- The `otivm.sqlite3` schema — integer-normalized, compact, TESSERA 4.0 format
- Row lifecycle states as a lookup table
- Per-field provenance as integer foreign keys
- H5 coverage completeness tracking — no rugged edges at hex boundaries
- The pipeline contract — what a pipeline run must produce to be valid
- The staging protocol — `staging_otivm.sqlite3``otivm.sqlite3`
---
## 1. Design Principles
### 1.1 The hex is the unit of work
One pipeline run processes one H3 hex. It fetches source data for that
hex, derives the field values, records provenance, and writes to the
staging database. No global operations. No dependency on adjacent hexes.
A run that fails leaves no partial state — the hex either has a complete
`draft` row or no row.
### 1.2 The database is the product
There is no tile archive. There is no intermediate sidecar format.
The SQLite database is written directly by the pipeline. The game reads
directly from the database. There is no assembly stage.
### 1.3 Rows are never deleted
A row is never deleted. It is superseded or retired. The full history
of every cell is queryable. When a new dataset version produces a better
value, the old row becomes `superseded` and the new row becomes `current`.
The game always reads only `current` rows.
### 1.4 Dataset versions evolve independently per field
GEBCO releases a new bathymetry dataset. Only the `elev_cm` field needs
refreshing — terrain, hydrology, and geology are unchanged. The pipeline
runs only for the affected field. Only the affected rows are superseded.
### 1.5 Academic credibility travels with the cell
Every field value carries a confidence grade as an integer FK. A cell
with IGME5000 geology coverage has `measured` or `indicated` confidence.
A cell outside IGME5000 coverage has `inferred` or `no_data`. This grade
is queryable, displayable, and affects how the game and simulator use
the data. It is not optional metadata — it is a first-class field.
### 1.6 No rugged edges at hex boundaries
The game reads only cells within H5 hexes that are marked complete in
`h5_coverage`. An H5 hex is complete when all its H9 cells are present
with `status = current`. Incomplete H5 hexes are staging — they are
never served to the game, preventing elevation seams at partially-fetched
boundaries.
### 1.7 The 32GB constraint is the reality check
The pipeline fetches and stores only what the game needs. Storage grows
proportionally to game coverage, not to Earth's surface area.
---
## 2. Lookup Tables
All enumerated values are stored as integers referencing these tables.
Written once at database creation and never modified.
### 2.1 `lifecycle_states`
```sql
CREATE TABLE lifecycle_states (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL UNIQUE
);
INSERT INTO lifecycle_states VALUES
(1, 'draft'),
(2, 'current'),
(3, 'superseded'),
(4, 'retired');
```
### 2.2 `confidence_grades`
```sql
CREATE TABLE confidence_grades (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL UNIQUE,
description TEXT NOT NULL
);
INSERT INTO confidence_grades VALUES
(1, 'measured', 'Directly observed or instrumentally measured. Published dataset with explicit methodology.'),
(2, 'indicated', 'Recorded in registry or survey without direct measurement. Classification may be broad.'),
(3, 'inferred', 'Derived from landscape position, proximity to measured cells, or modelled from adjacent data.'),
(4, 'no_data', 'Source dataset has no coverage for this cell. Field value is a known placeholder.');
```
### 2.3 `source_registry`
```sql
CREATE TABLE source_registry (
id INTEGER PRIMARY KEY AUTOINCREMENT,
source_key TEXT NOT NULL UNIQUE, -- e.g. 'GEBCO_2025'
source_name TEXT NOT NULL,
source_url TEXT,
version TEXT NOT NULL,
license TEXT,
citation TEXT,
registered_at TEXT NOT NULL -- ISO 8601 UTC
);
```
---
## 3. Core Tables
### 3.1 `pipeline_runs`
```sql
CREATE TABLE pipeline_runs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
run_key TEXT NOT NULL UNIQUE, -- human label, e.g. 'tessera3-seed-2026-04-26'
started_at TEXT NOT NULL, -- ISO 8601 UTC
completed_at TEXT, -- NULL while running
status INTEGER NOT NULL REFERENCES lifecycle_states(id),
h5_cells TEXT NOT NULL, -- JSON array of H3 res-5 integer IDs
fields_updated TEXT NOT NULL, -- JSON array of field names
source_versions TEXT NOT NULL, -- JSON object: {source_key: version}
row_count INTEGER, -- NULL while running
notes TEXT
);
```
### 3.2 `tessera_cells` — one row per H9 cell per pipeline run
H3 cell IDs are stored as INTEGER (64-bit H3 index) not TEXT.
Use `h3.cellToIndex()` / `h3.indexToCell()` for conversion.
```sql
CREATE TABLE tessera_cells (
-- Identity
id INTEGER PRIMARY KEY AUTOINCREMENT,
h9 INTEGER NOT NULL, -- H3 res-9 index (64-bit)
h7 INTEGER NOT NULL, -- H3 res-7 parent index
h5 INTEGER NOT NULL, -- H3 res-5 grandparent index (waypoint)
lat REAL NOT NULL, -- H9 centroid latitude
lon REAL NOT NULL, -- H9 centroid longitude
-- Physical fields (RFC-TESSERA-2.0-001 byte layout preserved)
elev_cm INTEGER, -- Elevation in cm, signed 24-bit range
terrain INTEGER, -- Appendix A terrain code
hydro INTEGER, -- Section 3.3 hydrology code
geo_dep INTEGER, -- Section 3.4 deposit code
geo_flag INTEGER, -- Section 3.5 geology flag code
occ_flag INTEGER, -- RFC-TESSERA-3.0-OCC-001 Section 2 code
-- Provenance per field (source FK + confidence FK)
elev_src INTEGER REFERENCES source_registry(id),
elev_conf INTEGER REFERENCES confidence_grades(id),
terr_src INTEGER REFERENCES source_registry(id),
terr_conf INTEGER REFERENCES confidence_grades(id),
hydro_src INTEGER REFERENCES source_registry(id),
hydro_conf INTEGER REFERENCES confidence_grades(id),
gdep_src INTEGER REFERENCES source_registry(id),
gdep_conf INTEGER REFERENCES confidence_grades(id),
gflag_src INTEGER REFERENCES source_registry(id),
gflag_conf INTEGER REFERENCES confidence_grades(id),
occ_src INTEGER REFERENCES source_registry(id),
occ_conf INTEGER REFERENCES confidence_grades(id),
-- Lifecycle
status INTEGER NOT NULL DEFAULT 1
REFERENCES lifecycle_states(id),
run_id INTEGER NOT NULL REFERENCES pipeline_runs(id),
created_at TEXT NOT NULL, -- ISO 8601 UTC
superseded_by INTEGER REFERENCES tessera_cells(id),
retired_reason TEXT
);
CREATE INDEX idx_cells_h9_status ON tessera_cells(h9, status);
CREATE INDEX idx_cells_h5_status ON tessera_cells(h5, status);
CREATE INDEX idx_cells_h7_status ON tessera_cells(h7, status);
CREATE INDEX idx_cells_run ON tessera_cells(run_id);
```
### 3.3 `h5_coverage` — H5 completeness tracking
An H5 hex is `complete` (status=2) when all its H9 cells are present
with `status = 2` (current). The game reads only cells in complete H5
hexes. This prevents elevation seams at partially-fetched boundaries.
```sql
CREATE TABLE h5_coverage (
h5 INTEGER PRIMARY KEY, -- H3 res-5 index
status INTEGER NOT NULL REFERENCES lifecycle_states(id),
-- 1=draft (in progress), 2=current (complete), 4=retired
h9_total INTEGER NOT NULL, -- Expected H9 count (typically 2401)
h9_current INTEGER NOT NULL DEFAULT 0,
last_updated TEXT NOT NULL, -- ISO 8601 UTC
run_id INTEGER NOT NULL REFERENCES pipeline_runs(id),
notes TEXT
);
```
A pipeline step increments `h9_current` on each H9 promotion and sets
`status = 2` when `h9_current = h9_total`.
---
## 4. Row Lifecycle
```
[pipeline run] → status 1 (draft)
[validation pass] → status 2 (current)
[new pipeline run for same cell] → old row: status 3 (superseded), new row: status 2 (current)
[data error confirmed] → status 4 (retired, with reason)
```
The game's canonical query:
```sql
SELECT tc.*
FROM tessera_cells tc
JOIN h5_coverage h5c ON tc.h5 = h5c.h5
WHERE h5c.status = 2 -- complete H5 hexes only
AND tc.status = 2 -- current rows only
AND tc.h5 = ? -- specific waypoint
```
This compound filter guarantees no boundary seams and no stale data.
---
## 5. Pipeline Contract
A pipeline run is valid if and only if:
1. Writes a `pipeline_runs` row with `status = 1` before any cells
2. All source datasets used are in `source_registry` before the run starts
3. Every `tessera_cells` row has `status = 1`, correct `run_id`, and
non-null provenance FKs for every field written
4. On completion: updates `pipeline_runs` to `status = 2`, sets `row_count`
5. On failure: updates `pipeline_runs` to `status = 4` — draft rows from
this run remain draft and are invisible to the game
Promotion (draft → current) is a separate explicit validation step:
- Verify row count matches expected H9 count for the H5 hex
- Update `tessera_cells.status` 1 → 2 for the run's rows
- Mark previous current rows for the same H9 cells as `status = 3`
- Update `h5_coverage` accordingly
---
## 6. Staging Protocol
**`staging_otivm.sqlite3`** — pipeline writes here. Identical schema.
New hexes are fetched, processed, and validated here before production.
Can be deleted and rebuilt without affecting the game.
**`otivm.sqlite3`** — production. Game reads only `current` rows within
`complete` H5 hexes. Promotion from staging is explicit and never automatic.
---
## 7. Seed Data — TESSERA 3.0 → TESSERA 4.0
Source: `tessera.db` (TESSERA 3.0, SpatiaLite, 158GB, Dell SSD).
Seed run: `run_key = 'tessera3-seed-2026-04-26'`
The five OTIVM launch H5 waypoints:
| City | H5 (TEXT) | H5 (INTEGER) |
|---|---|---|
| Ostia | `851e805bfffffff` | resolved at extraction |
| Capua | `851e8333fffffff` | resolved at extraction |
| Brundisium | `851e8ba3fffffff` | resolved at extraction |
| Carthago | `85386e23fffffff` | resolved at extraction |
| Alexandria | `853f5ba7fffffff` | resolved at extraction |
Each H5 contains 343 H9 cells (7 H7 × 49 H9 per H7). Total seed rows:
5 × 343 = 1,715 H9 cells minimum. Adjacent H5 cells along trade routes
may also be seeded to prevent boundary seams in route rendering.
Seed confidence grades:
- `elev_conf`: 2 (indicated) — GEBCO 2025, direct sample
- `terr_conf`: 2 (indicated) — ESA WorldCover v200
- `hydro_conf`: 2 (indicated) — HydroSHEDS v1.1
- `gdep_conf`: 2 or 4 — MRDS point data where present, no_data elsewhere
- `gflag_conf`: 2 or 4 — IGME5000 polygon where present, no_data elsewhere
- `occ_conf`: 4 (no_data) — stage 06 not yet run; occ_flag = 0
---
## 8. What This RFC Does Not Cover
| Topic | Where it belongs |
|---|---|
| Game queries and views | OTIVM game code |
| Occupation evidence detail | RFC-TESSERA-3.0-OCC-001 (adapted) |
| CIVICVS simulation state | RFC-CIVICVS stack |
| Online source fetch scripts | Pipeline implementation |
| OTIVM roadmap changes | docs/roadmap.md |
---
*RFC-TESSERA-4.0-001 Draft v0.2 — 2026-04-26*
*Status: Draft — pending project owner approval before implementation*
*Next action: project owner approves schema, seed extraction from tessera.db begins*