326 lines
12 KiB
Markdown
326 lines
12 KiB
Markdown
# RFC-TESSERA-4.0-001
|
||
## Per-Hex Pipeline, Row Lifecycle, and Provenance Model
|
||
### Status: Draft v0.2
|
||
### Date: 2026-04-26
|
||
### Supersedes: RFC-TESSERA-2.0-001 (tile assembly), RFC-TESSERA-3.0-OCC-001 (occupation encoding)
|
||
|
||
---
|
||
|
||
## Abstract
|
||
|
||
TESSERA 4.0 replaces the one-shot global pipeline model with a per-hex
|
||
pipeline that is resumable, updatable, and portable. Any H3 hex at any
|
||
resolution can be inserted, updated, superseded, or retired independently
|
||
of all others. The database is the product from day one — not a tile
|
||
archive assembled in a second pass.
|
||
|
||
This RFC defines:
|
||
- The `otivm.sqlite3` schema — integer-normalized, compact, TESSERA 4.0 format
|
||
- Row lifecycle states as a lookup table
|
||
- Per-field provenance as integer foreign keys
|
||
- H5 coverage completeness tracking — no rugged edges at hex boundaries
|
||
- The pipeline contract — what a pipeline run must produce to be valid
|
||
- The staging protocol — `staging_otivm.sqlite3` → `otivm.sqlite3`
|
||
|
||
---
|
||
|
||
## 1. Design Principles
|
||
|
||
### 1.1 The hex is the unit of work
|
||
|
||
One pipeline run processes one H3 hex. It fetches source data for that
|
||
hex, derives the field values, records provenance, and writes to the
|
||
staging database. No global operations. No dependency on adjacent hexes.
|
||
A run that fails leaves no partial state — the hex either has a complete
|
||
`draft` row or no row.
|
||
|
||
### 1.2 The database is the product
|
||
|
||
There is no tile archive. There is no intermediate sidecar format.
|
||
The SQLite database is written directly by the pipeline. The game reads
|
||
directly from the database. There is no assembly stage.
|
||
|
||
### 1.3 Rows are never deleted
|
||
|
||
A row is never deleted. It is superseded or retired. The full history
|
||
of every cell is queryable. When a new dataset version produces a better
|
||
value, the old row becomes `superseded` and the new row becomes `current`.
|
||
The game always reads only `current` rows.
|
||
|
||
### 1.4 Dataset versions evolve independently per field
|
||
|
||
GEBCO releases a new bathymetry dataset. Only the `elev_cm` field needs
|
||
refreshing — terrain, hydrology, and geology are unchanged. The pipeline
|
||
runs only for the affected field. Only the affected rows are superseded.
|
||
|
||
### 1.5 Academic credibility travels with the cell
|
||
|
||
Every field value carries a confidence grade as an integer FK. A cell
|
||
with IGME5000 geology coverage has `measured` or `indicated` confidence.
|
||
A cell outside IGME5000 coverage has `inferred` or `no_data`. This grade
|
||
is queryable, displayable, and affects how the game and simulator use
|
||
the data. It is not optional metadata — it is a first-class field.
|
||
|
||
### 1.6 No rugged edges at hex boundaries
|
||
|
||
The game reads only cells within H5 hexes that are marked complete in
|
||
`h5_coverage`. An H5 hex is complete when all its H9 cells are present
|
||
with `status = current`. Incomplete H5 hexes are staging — they are
|
||
never served to the game, preventing elevation seams at partially-fetched
|
||
boundaries.
|
||
|
||
### 1.7 The 32GB constraint is the reality check
|
||
|
||
The pipeline fetches and stores only what the game needs. Storage grows
|
||
proportionally to game coverage, not to Earth's surface area.
|
||
|
||
---
|
||
|
||
## 2. Lookup Tables
|
||
|
||
All enumerated values are stored as integers referencing these tables.
|
||
Written once at database creation and never modified.
|
||
|
||
### 2.1 `lifecycle_states`
|
||
|
||
```sql
|
||
CREATE TABLE lifecycle_states (
|
||
id INTEGER PRIMARY KEY,
|
||
name TEXT NOT NULL UNIQUE
|
||
);
|
||
|
||
INSERT INTO lifecycle_states VALUES
|
||
(1, 'draft'),
|
||
(2, 'current'),
|
||
(3, 'superseded'),
|
||
(4, 'retired');
|
||
```
|
||
|
||
### 2.2 `confidence_grades`
|
||
|
||
```sql
|
||
CREATE TABLE confidence_grades (
|
||
id INTEGER PRIMARY KEY,
|
||
name TEXT NOT NULL UNIQUE,
|
||
description TEXT NOT NULL
|
||
);
|
||
|
||
INSERT INTO confidence_grades VALUES
|
||
(1, 'measured', 'Directly observed or instrumentally measured. Published dataset with explicit methodology.'),
|
||
(2, 'indicated', 'Recorded in registry or survey without direct measurement. Classification may be broad.'),
|
||
(3, 'inferred', 'Derived from landscape position, proximity to measured cells, or modelled from adjacent data.'),
|
||
(4, 'no_data', 'Source dataset has no coverage for this cell. Field value is a known placeholder.');
|
||
```
|
||
|
||
### 2.3 `source_registry`
|
||
|
||
```sql
|
||
CREATE TABLE source_registry (
|
||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||
source_key TEXT NOT NULL UNIQUE, -- e.g. 'GEBCO_2025'
|
||
source_name TEXT NOT NULL,
|
||
source_url TEXT,
|
||
version TEXT NOT NULL,
|
||
license TEXT,
|
||
citation TEXT,
|
||
registered_at TEXT NOT NULL -- ISO 8601 UTC
|
||
);
|
||
```
|
||
|
||
---
|
||
|
||
## 3. Core Tables
|
||
|
||
### 3.1 `pipeline_runs`
|
||
|
||
```sql
|
||
CREATE TABLE pipeline_runs (
|
||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||
run_key TEXT NOT NULL UNIQUE, -- human label, e.g. 'tessera3-seed-2026-04-26'
|
||
started_at TEXT NOT NULL, -- ISO 8601 UTC
|
||
completed_at TEXT, -- NULL while running
|
||
status INTEGER NOT NULL REFERENCES lifecycle_states(id),
|
||
h5_cells TEXT NOT NULL, -- JSON array of H3 res-5 integer IDs
|
||
fields_updated TEXT NOT NULL, -- JSON array of field names
|
||
source_versions TEXT NOT NULL, -- JSON object: {source_key: version}
|
||
row_count INTEGER, -- NULL while running
|
||
notes TEXT
|
||
);
|
||
```
|
||
|
||
### 3.2 `tessera_cells` — one row per H9 cell per pipeline run
|
||
|
||
H3 cell IDs are stored as INTEGER (64-bit H3 index) not TEXT.
|
||
Use `h3.cellToIndex()` / `h3.indexToCell()` for conversion.
|
||
|
||
```sql
|
||
CREATE TABLE tessera_cells (
|
||
-- Identity
|
||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||
h9 INTEGER NOT NULL, -- H3 res-9 index (64-bit)
|
||
h7 INTEGER NOT NULL, -- H3 res-7 parent index
|
||
h5 INTEGER NOT NULL, -- H3 res-5 grandparent index (waypoint)
|
||
lat REAL NOT NULL, -- H9 centroid latitude
|
||
lon REAL NOT NULL, -- H9 centroid longitude
|
||
|
||
-- Physical fields (RFC-TESSERA-2.0-001 byte layout preserved)
|
||
elev_cm INTEGER, -- Elevation in cm, signed 24-bit range
|
||
terrain INTEGER, -- Appendix A terrain code
|
||
hydro INTEGER, -- Section 3.3 hydrology code
|
||
geo_dep INTEGER, -- Section 3.4 deposit code
|
||
geo_flag INTEGER, -- Section 3.5 geology flag code
|
||
occ_flag INTEGER, -- RFC-TESSERA-3.0-OCC-001 Section 2 code
|
||
|
||
-- Provenance per field (source FK + confidence FK)
|
||
elev_src INTEGER REFERENCES source_registry(id),
|
||
elev_conf INTEGER REFERENCES confidence_grades(id),
|
||
terr_src INTEGER REFERENCES source_registry(id),
|
||
terr_conf INTEGER REFERENCES confidence_grades(id),
|
||
hydro_src INTEGER REFERENCES source_registry(id),
|
||
hydro_conf INTEGER REFERENCES confidence_grades(id),
|
||
gdep_src INTEGER REFERENCES source_registry(id),
|
||
gdep_conf INTEGER REFERENCES confidence_grades(id),
|
||
gflag_src INTEGER REFERENCES source_registry(id),
|
||
gflag_conf INTEGER REFERENCES confidence_grades(id),
|
||
occ_src INTEGER REFERENCES source_registry(id),
|
||
occ_conf INTEGER REFERENCES confidence_grades(id),
|
||
|
||
-- Lifecycle
|
||
status INTEGER NOT NULL DEFAULT 1
|
||
REFERENCES lifecycle_states(id),
|
||
run_id INTEGER NOT NULL REFERENCES pipeline_runs(id),
|
||
created_at TEXT NOT NULL, -- ISO 8601 UTC
|
||
superseded_by INTEGER REFERENCES tessera_cells(id),
|
||
retired_reason TEXT
|
||
);
|
||
|
||
CREATE INDEX idx_cells_h9_status ON tessera_cells(h9, status);
|
||
CREATE INDEX idx_cells_h5_status ON tessera_cells(h5, status);
|
||
CREATE INDEX idx_cells_h7_status ON tessera_cells(h7, status);
|
||
CREATE INDEX idx_cells_run ON tessera_cells(run_id);
|
||
```
|
||
|
||
### 3.3 `h5_coverage` — H5 completeness tracking
|
||
|
||
An H5 hex is `complete` (status=2) when all its H9 cells are present
|
||
with `status = 2` (current). The game reads only cells in complete H5
|
||
hexes. This prevents elevation seams at partially-fetched boundaries.
|
||
|
||
```sql
|
||
CREATE TABLE h5_coverage (
|
||
h5 INTEGER PRIMARY KEY, -- H3 res-5 index
|
||
status INTEGER NOT NULL REFERENCES lifecycle_states(id),
|
||
-- 1=draft (in progress), 2=current (complete), 4=retired
|
||
h9_total INTEGER NOT NULL, -- Expected H9 count (typically 2401)
|
||
h9_current INTEGER NOT NULL DEFAULT 0,
|
||
last_updated TEXT NOT NULL, -- ISO 8601 UTC
|
||
run_id INTEGER NOT NULL REFERENCES pipeline_runs(id),
|
||
notes TEXT
|
||
);
|
||
```
|
||
|
||
A pipeline step increments `h9_current` on each H9 promotion and sets
|
||
`status = 2` when `h9_current = h9_total`.
|
||
|
||
---
|
||
|
||
## 4. Row Lifecycle
|
||
|
||
```
|
||
[pipeline run] → status 1 (draft)
|
||
[validation pass] → status 2 (current)
|
||
[new pipeline run for same cell] → old row: status 3 (superseded), new row: status 2 (current)
|
||
[data error confirmed] → status 4 (retired, with reason)
|
||
```
|
||
|
||
The game's canonical query:
|
||
|
||
```sql
|
||
SELECT tc.*
|
||
FROM tessera_cells tc
|
||
JOIN h5_coverage h5c ON tc.h5 = h5c.h5
|
||
WHERE h5c.status = 2 -- complete H5 hexes only
|
||
AND tc.status = 2 -- current rows only
|
||
AND tc.h5 = ? -- specific waypoint
|
||
```
|
||
|
||
This compound filter guarantees no boundary seams and no stale data.
|
||
|
||
---
|
||
|
||
## 5. Pipeline Contract
|
||
|
||
A pipeline run is valid if and only if:
|
||
|
||
1. Writes a `pipeline_runs` row with `status = 1` before any cells
|
||
2. All source datasets used are in `source_registry` before the run starts
|
||
3. Every `tessera_cells` row has `status = 1`, correct `run_id`, and
|
||
non-null provenance FKs for every field written
|
||
4. On completion: updates `pipeline_runs` to `status = 2`, sets `row_count`
|
||
5. On failure: updates `pipeline_runs` to `status = 4` — draft rows from
|
||
this run remain draft and are invisible to the game
|
||
|
||
Promotion (draft → current) is a separate explicit validation step:
|
||
- Verify row count matches expected H9 count for the H5 hex
|
||
- Update `tessera_cells.status` 1 → 2 for the run's rows
|
||
- Mark previous current rows for the same H9 cells as `status = 3`
|
||
- Update `h5_coverage` accordingly
|
||
|
||
---
|
||
|
||
## 6. Staging Protocol
|
||
|
||
**`staging_otivm.sqlite3`** — pipeline writes here. Identical schema.
|
||
New hexes are fetched, processed, and validated here before production.
|
||
Can be deleted and rebuilt without affecting the game.
|
||
|
||
**`otivm.sqlite3`** — production. Game reads only `current` rows within
|
||
`complete` H5 hexes. Promotion from staging is explicit and never automatic.
|
||
|
||
---
|
||
|
||
## 7. Seed Data — TESSERA 3.0 → TESSERA 4.0
|
||
|
||
Source: `tessera.db` (TESSERA 3.0, SpatiaLite, 158GB, Dell SSD).
|
||
Seed run: `run_key = 'tessera3-seed-2026-04-26'`
|
||
|
||
The five OTIVM launch H5 waypoints:
|
||
|
||
| City | H5 (TEXT) | H5 (INTEGER) |
|
||
|---|---|---|
|
||
| Ostia | `851e805bfffffff` | resolved at extraction |
|
||
| Capua | `851e8333fffffff` | resolved at extraction |
|
||
| Brundisium | `851e8ba3fffffff` | resolved at extraction |
|
||
| Carthago | `85386e23fffffff` | resolved at extraction |
|
||
| Alexandria | `853f5ba7fffffff` | resolved at extraction |
|
||
|
||
Each H5 contains 343 H9 cells (7 H7 × 49 H9 per H7). Total seed rows:
|
||
5 × 343 = 1,715 H9 cells minimum. Adjacent H5 cells along trade routes
|
||
may also be seeded to prevent boundary seams in route rendering.
|
||
|
||
Seed confidence grades:
|
||
- `elev_conf`: 2 (indicated) — GEBCO 2025, direct sample
|
||
- `terr_conf`: 2 (indicated) — ESA WorldCover v200
|
||
- `hydro_conf`: 2 (indicated) — HydroSHEDS v1.1
|
||
- `gdep_conf`: 2 or 4 — MRDS point data where present, no_data elsewhere
|
||
- `gflag_conf`: 2 or 4 — IGME5000 polygon where present, no_data elsewhere
|
||
- `occ_conf`: 4 (no_data) — stage 06 not yet run; occ_flag = 0
|
||
|
||
---
|
||
|
||
## 8. What This RFC Does Not Cover
|
||
|
||
| Topic | Where it belongs |
|
||
|---|---|
|
||
| Game queries and views | OTIVM game code |
|
||
| Occupation evidence detail | RFC-TESSERA-3.0-OCC-001 (adapted) |
|
||
| CIVICVS simulation state | RFC-CIVICVS stack |
|
||
| Online source fetch scripts | Pipeline implementation |
|
||
| OTIVM roadmap changes | docs/roadmap.md |
|
||
|
||
---
|
||
|
||
*RFC-TESSERA-4.0-001 Draft v0.2 — 2026-04-26*
|
||
*Status: Draft — pending project owner approval before implementation*
|
||
*Next action: project owner approves schema, seed extraction from tessera.db begins*
|