12 KiB
RFC-TESSERA-4.0-001
Per-Hex Pipeline, Row Lifecycle, and Provenance Model
Status: Draft v0.2
Date: 2026-04-26
Supersedes: RFC-TESSERA-2.0-001 (tile assembly), RFC-TESSERA-3.0-OCC-001 (occupation encoding)
Abstract
TESSERA 4.0 replaces the one-shot global pipeline model with a per-hex pipeline that is resumable, updatable, and portable. Any H3 hex at any resolution can be inserted, updated, superseded, or retired independently of all others. The database is the product from day one — not a tile archive assembled in a second pass.
This RFC defines:
- The
otivm.sqlite3schema — integer-normalized, compact, TESSERA 4.0 format - Row lifecycle states as a lookup table
- Per-field provenance as integer foreign keys
- H5 coverage completeness tracking — no rugged edges at hex boundaries
- The pipeline contract — what a pipeline run must produce to be valid
- The staging protocol —
staging_otivm.sqlite3→otivm.sqlite3
1. Design Principles
1.1 The hex is the unit of work
One pipeline run processes one H3 hex. It fetches source data for that
hex, derives the field values, records provenance, and writes to the
staging database. No global operations. No dependency on adjacent hexes.
A run that fails leaves no partial state — the hex either has a complete
draft row or no row.
1.2 The database is the product
There is no tile archive. There is no intermediate sidecar format. The SQLite database is written directly by the pipeline. The game reads directly from the database. There is no assembly stage.
1.3 Rows are never deleted
A row is never deleted. It is superseded or retired. The full history
of every cell is queryable. When a new dataset version produces a better
value, the old row becomes superseded and the new row becomes current.
The game always reads only current rows.
1.4 Dataset versions evolve independently per field
GEBCO releases a new bathymetry dataset. Only the elev_cm field needs
refreshing — terrain, hydrology, and geology are unchanged. The pipeline
runs only for the affected field. Only the affected rows are superseded.
1.5 Academic credibility travels with the cell
Every field value carries a confidence grade as an integer FK. A cell
with IGME5000 geology coverage has measured or indicated confidence.
A cell outside IGME5000 coverage has inferred or no_data. This grade
is queryable, displayable, and affects how the game and simulator use
the data. It is not optional metadata — it is a first-class field.
1.6 No rugged edges at hex boundaries
The game reads only cells within H5 hexes that are marked complete in
h5_coverage. An H5 hex is complete when all its H9 cells are present
with status = current. Incomplete H5 hexes are staging — they are
never served to the game, preventing elevation seams at partially-fetched
boundaries.
1.7 The 32GB constraint is the reality check
The pipeline fetches and stores only what the game needs. Storage grows proportionally to game coverage, not to Earth's surface area.
2. Lookup Tables
All enumerated values are stored as integers referencing these tables. Written once at database creation and never modified.
2.1 lifecycle_states
CREATE TABLE lifecycle_states (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL UNIQUE
);
INSERT INTO lifecycle_states VALUES
(1, 'draft'),
(2, 'current'),
(3, 'superseded'),
(4, 'retired');
2.2 confidence_grades
CREATE TABLE confidence_grades (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL UNIQUE,
description TEXT NOT NULL
);
INSERT INTO confidence_grades VALUES
(1, 'measured', 'Directly observed or instrumentally measured. Published dataset with explicit methodology.'),
(2, 'indicated', 'Recorded in registry or survey without direct measurement. Classification may be broad.'),
(3, 'inferred', 'Derived from landscape position, proximity to measured cells, or modelled from adjacent data.'),
(4, 'no_data', 'Source dataset has no coverage for this cell. Field value is a known placeholder.');
2.3 source_registry
CREATE TABLE source_registry (
id INTEGER PRIMARY KEY AUTOINCREMENT,
source_key TEXT NOT NULL UNIQUE, -- e.g. 'GEBCO_2025'
source_name TEXT NOT NULL,
source_url TEXT,
version TEXT NOT NULL,
license TEXT,
citation TEXT,
registered_at TEXT NOT NULL -- ISO 8601 UTC
);
3. Core Tables
3.1 pipeline_runs
CREATE TABLE pipeline_runs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
run_key TEXT NOT NULL UNIQUE, -- human label, e.g. 'tessera3-seed-2026-04-26'
started_at TEXT NOT NULL, -- ISO 8601 UTC
completed_at TEXT, -- NULL while running
status INTEGER NOT NULL REFERENCES lifecycle_states(id),
h5_cells TEXT NOT NULL, -- JSON array of H3 res-5 integer IDs
fields_updated TEXT NOT NULL, -- JSON array of field names
source_versions TEXT NOT NULL, -- JSON object: {source_key: version}
row_count INTEGER, -- NULL while running
notes TEXT
);
3.2 tessera_cells — one row per H9 cell per pipeline run
H3 cell IDs are stored as INTEGER (64-bit H3 index) not TEXT.
Use h3.cellToIndex() / h3.indexToCell() for conversion.
CREATE TABLE tessera_cells (
-- Identity
id INTEGER PRIMARY KEY AUTOINCREMENT,
h9 INTEGER NOT NULL, -- H3 res-9 index (64-bit)
h7 INTEGER NOT NULL, -- H3 res-7 parent index
h5 INTEGER NOT NULL, -- H3 res-5 grandparent index (waypoint)
lat REAL NOT NULL, -- H9 centroid latitude
lon REAL NOT NULL, -- H9 centroid longitude
-- Physical fields (RFC-TESSERA-2.0-001 byte layout preserved)
elev_cm INTEGER, -- Elevation in cm, signed 24-bit range
terrain INTEGER, -- Appendix A terrain code
hydro INTEGER, -- Section 3.3 hydrology code
geo_dep INTEGER, -- Section 3.4 deposit code
geo_flag INTEGER, -- Section 3.5 geology flag code
occ_flag INTEGER, -- RFC-TESSERA-3.0-OCC-001 Section 2 code
-- Provenance per field (source FK + confidence FK)
elev_src INTEGER REFERENCES source_registry(id),
elev_conf INTEGER REFERENCES confidence_grades(id),
terr_src INTEGER REFERENCES source_registry(id),
terr_conf INTEGER REFERENCES confidence_grades(id),
hydro_src INTEGER REFERENCES source_registry(id),
hydro_conf INTEGER REFERENCES confidence_grades(id),
gdep_src INTEGER REFERENCES source_registry(id),
gdep_conf INTEGER REFERENCES confidence_grades(id),
gflag_src INTEGER REFERENCES source_registry(id),
gflag_conf INTEGER REFERENCES confidence_grades(id),
occ_src INTEGER REFERENCES source_registry(id),
occ_conf INTEGER REFERENCES confidence_grades(id),
-- Lifecycle
status INTEGER NOT NULL DEFAULT 1
REFERENCES lifecycle_states(id),
run_id INTEGER NOT NULL REFERENCES pipeline_runs(id),
created_at TEXT NOT NULL, -- ISO 8601 UTC
superseded_by INTEGER REFERENCES tessera_cells(id),
retired_reason TEXT
);
CREATE INDEX idx_cells_h9_status ON tessera_cells(h9, status);
CREATE INDEX idx_cells_h5_status ON tessera_cells(h5, status);
CREATE INDEX idx_cells_h7_status ON tessera_cells(h7, status);
CREATE INDEX idx_cells_run ON tessera_cells(run_id);
3.3 h5_coverage — H5 completeness tracking
An H5 hex is complete (status=2) when all its H9 cells are present
with status = 2 (current). The game reads only cells in complete H5
hexes. This prevents elevation seams at partially-fetched boundaries.
CREATE TABLE h5_coverage (
h5 INTEGER PRIMARY KEY, -- H3 res-5 index
status INTEGER NOT NULL REFERENCES lifecycle_states(id),
-- 1=draft (in progress), 2=current (complete), 4=retired
h9_total INTEGER NOT NULL, -- Expected H9 count (typically 2401)
h9_current INTEGER NOT NULL DEFAULT 0,
last_updated TEXT NOT NULL, -- ISO 8601 UTC
run_id INTEGER NOT NULL REFERENCES pipeline_runs(id),
notes TEXT
);
A pipeline step increments h9_current on each H9 promotion and sets
status = 2 when h9_current = h9_total.
4. Row Lifecycle
[pipeline run] → status 1 (draft)
[validation pass] → status 2 (current)
[new pipeline run for same cell] → old row: status 3 (superseded), new row: status 2 (current)
[data error confirmed] → status 4 (retired, with reason)
The game's canonical query:
SELECT tc.*
FROM tessera_cells tc
JOIN h5_coverage h5c ON tc.h5 = h5c.h5
WHERE h5c.status = 2 -- complete H5 hexes only
AND tc.status = 2 -- current rows only
AND tc.h5 = ? -- specific waypoint
This compound filter guarantees no boundary seams and no stale data.
5. Pipeline Contract
A pipeline run is valid if and only if:
- Writes a
pipeline_runsrow withstatus = 1before any cells - All source datasets used are in
source_registrybefore the run starts - Every
tessera_cellsrow hasstatus = 1, correctrun_id, and non-null provenance FKs for every field written - On completion: updates
pipeline_runstostatus = 2, setsrow_count - On failure: updates
pipeline_runstostatus = 4— draft rows from this run remain draft and are invisible to the game
Promotion (draft → current) is a separate explicit validation step:
- Verify row count matches expected H9 count for the H5 hex
- Update
tessera_cells.status1 → 2 for the run's rows - Mark previous current rows for the same H9 cells as
status = 3 - Update
h5_coverageaccordingly
6. Staging Protocol
staging_otivm.sqlite3 — pipeline writes here. Identical schema.
New hexes are fetched, processed, and validated here before production.
Can be deleted and rebuilt without affecting the game.
otivm.sqlite3 — production. Game reads only current rows within
complete H5 hexes. Promotion from staging is explicit and never automatic.
7. Seed Data — TESSERA 3.0 → TESSERA 4.0
Source: tessera.db (TESSERA 3.0, SpatiaLite, 158GB, Dell SSD).
Seed run: run_key = 'tessera3-seed-2026-04-26'
The five OTIVM launch H5 waypoints:
| City | H5 (TEXT) | H5 (INTEGER) |
|---|---|---|
| Ostia | 851e805bfffffff |
resolved at extraction |
| Capua | 851e8333fffffff |
resolved at extraction |
| Brundisium | 851e8ba3fffffff |
resolved at extraction |
| Carthago | 85386e23fffffff |
resolved at extraction |
| Alexandria | 853f5ba7fffffff |
resolved at extraction |
Each H5 contains 343 H9 cells (7 H7 × 49 H9 per H7). Total seed rows: 5 × 343 = 1,715 H9 cells minimum. Adjacent H5 cells along trade routes may also be seeded to prevent boundary seams in route rendering.
Seed confidence grades:
elev_conf: 2 (indicated) — GEBCO 2025, direct sampleterr_conf: 2 (indicated) — ESA WorldCover v200hydro_conf: 2 (indicated) — HydroSHEDS v1.1gdep_conf: 2 or 4 — MRDS point data where present, no_data elsewheregflag_conf: 2 or 4 — IGME5000 polygon where present, no_data elsewhereocc_conf: 4 (no_data) — stage 06 not yet run; occ_flag = 0
8. What This RFC Does Not Cover
| Topic | Where it belongs |
|---|---|
| Game queries and views | OTIVM game code |
| Occupation evidence detail | RFC-TESSERA-3.0-OCC-001 (adapted) |
| CIVICVS simulation state | RFC-CIVICVS stack |
| Online source fetch scripts | Pipeline implementation |
| OTIVM roadmap changes | docs/roadmap.md |
RFC-TESSERA-4.0-001 Draft v0.2 — 2026-04-26 Status: Draft — pending project owner approval before implementation Next action: project owner approves schema, seed extraction from tessera.db begins