Revise RFC-TESSERA-4.0-001: integer normalization, h5_coverage completeness
This commit is contained in:
325
docs/RFC-TESSERA-4.0-001.md
Normal file
325
docs/RFC-TESSERA-4.0-001.md
Normal file
@@ -0,0 +1,325 @@
|
|||||||
|
# RFC-TESSERA-4.0-001
|
||||||
|
## Per-Hex Pipeline, Row Lifecycle, and Provenance Model
|
||||||
|
### Status: Draft v0.2
|
||||||
|
### Date: 2026-04-26
|
||||||
|
### Supersedes: RFC-TESSERA-2.0-001 (tile assembly), RFC-TESSERA-3.0-OCC-001 (occupation encoding)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Abstract
|
||||||
|
|
||||||
|
TESSERA 4.0 replaces the one-shot global pipeline model with a per-hex
|
||||||
|
pipeline that is resumable, updatable, and portable. Any H3 hex at any
|
||||||
|
resolution can be inserted, updated, superseded, or retired independently
|
||||||
|
of all others. The database is the product from day one — not a tile
|
||||||
|
archive assembled in a second pass.
|
||||||
|
|
||||||
|
This RFC defines:
|
||||||
|
- The `otivm.sqlite3` schema — integer-normalized, compact, TESSERA 4.0 format
|
||||||
|
- Row lifecycle states as a lookup table
|
||||||
|
- Per-field provenance as integer foreign keys
|
||||||
|
- H5 coverage completeness tracking — no rugged edges at hex boundaries
|
||||||
|
- The pipeline contract — what a pipeline run must produce to be valid
|
||||||
|
- The staging protocol — `staging_otivm.sqlite3` → `otivm.sqlite3`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Design Principles
|
||||||
|
|
||||||
|
### 1.1 The hex is the unit of work
|
||||||
|
|
||||||
|
One pipeline run processes one H3 hex. It fetches source data for that
|
||||||
|
hex, derives the field values, records provenance, and writes to the
|
||||||
|
staging database. No global operations. No dependency on adjacent hexes.
|
||||||
|
A run that fails leaves no partial state — the hex either has a complete
|
||||||
|
`draft` row or no row.
|
||||||
|
|
||||||
|
### 1.2 The database is the product
|
||||||
|
|
||||||
|
There is no tile archive. There is no intermediate sidecar format.
|
||||||
|
The SQLite database is written directly by the pipeline. The game reads
|
||||||
|
directly from the database. There is no assembly stage.
|
||||||
|
|
||||||
|
### 1.3 Rows are never deleted
|
||||||
|
|
||||||
|
A row is never deleted. It is superseded or retired. The full history
|
||||||
|
of every cell is queryable. When a new dataset version produces a better
|
||||||
|
value, the old row becomes `superseded` and the new row becomes `current`.
|
||||||
|
The game always reads only `current` rows.
|
||||||
|
|
||||||
|
### 1.4 Dataset versions evolve independently per field
|
||||||
|
|
||||||
|
GEBCO releases a new bathymetry dataset. Only the `elev_cm` field needs
|
||||||
|
refreshing — terrain, hydrology, and geology are unchanged. The pipeline
|
||||||
|
runs only for the affected field. Only the affected rows are superseded.
|
||||||
|
|
||||||
|
### 1.5 Academic credibility travels with the cell
|
||||||
|
|
||||||
|
Every field value carries a confidence grade as an integer FK. A cell
|
||||||
|
with IGME5000 geology coverage has `measured` or `indicated` confidence.
|
||||||
|
A cell outside IGME5000 coverage has `inferred` or `no_data`. This grade
|
||||||
|
is queryable, displayable, and affects how the game and simulator use
|
||||||
|
the data. It is not optional metadata — it is a first-class field.
|
||||||
|
|
||||||
|
### 1.6 No rugged edges at hex boundaries
|
||||||
|
|
||||||
|
The game reads only cells within H5 hexes that are marked complete in
|
||||||
|
`h5_coverage`. An H5 hex is complete when all its H9 cells are present
|
||||||
|
with `status = current`. Incomplete H5 hexes are staging — they are
|
||||||
|
never served to the game, preventing elevation seams at partially-fetched
|
||||||
|
boundaries.
|
||||||
|
|
||||||
|
### 1.7 The 32GB constraint is the reality check
|
||||||
|
|
||||||
|
The pipeline fetches and stores only what the game needs. Storage grows
|
||||||
|
proportionally to game coverage, not to Earth's surface area.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Lookup Tables
|
||||||
|
|
||||||
|
All enumerated values are stored as integers referencing these tables.
|
||||||
|
Written once at database creation and never modified.
|
||||||
|
|
||||||
|
### 2.1 `lifecycle_states`
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE lifecycle_states (
|
||||||
|
id INTEGER PRIMARY KEY,
|
||||||
|
name TEXT NOT NULL UNIQUE
|
||||||
|
);
|
||||||
|
|
||||||
|
INSERT INTO lifecycle_states VALUES
|
||||||
|
(1, 'draft'),
|
||||||
|
(2, 'current'),
|
||||||
|
(3, 'superseded'),
|
||||||
|
(4, 'retired');
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2.2 `confidence_grades`
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE confidence_grades (
|
||||||
|
id INTEGER PRIMARY KEY,
|
||||||
|
name TEXT NOT NULL UNIQUE,
|
||||||
|
description TEXT NOT NULL
|
||||||
|
);
|
||||||
|
|
||||||
|
INSERT INTO confidence_grades VALUES
|
||||||
|
(1, 'measured', 'Directly observed or instrumentally measured. Published dataset with explicit methodology.'),
|
||||||
|
(2, 'indicated', 'Recorded in registry or survey without direct measurement. Classification may be broad.'),
|
||||||
|
(3, 'inferred', 'Derived from landscape position, proximity to measured cells, or modelled from adjacent data.'),
|
||||||
|
(4, 'no_data', 'Source dataset has no coverage for this cell. Field value is a known placeholder.');
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2.3 `source_registry`
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE source_registry (
|
||||||
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||||
|
source_key TEXT NOT NULL UNIQUE, -- e.g. 'GEBCO_2025'
|
||||||
|
source_name TEXT NOT NULL,
|
||||||
|
source_url TEXT,
|
||||||
|
version TEXT NOT NULL,
|
||||||
|
license TEXT,
|
||||||
|
citation TEXT,
|
||||||
|
registered_at TEXT NOT NULL -- ISO 8601 UTC
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Core Tables
|
||||||
|
|
||||||
|
### 3.1 `pipeline_runs`
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE pipeline_runs (
|
||||||
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||||
|
run_key TEXT NOT NULL UNIQUE, -- human label, e.g. 'tessera3-seed-2026-04-26'
|
||||||
|
started_at TEXT NOT NULL, -- ISO 8601 UTC
|
||||||
|
completed_at TEXT, -- NULL while running
|
||||||
|
status INTEGER NOT NULL REFERENCES lifecycle_states(id),
|
||||||
|
h5_cells TEXT NOT NULL, -- JSON array of H3 res-5 integer IDs
|
||||||
|
fields_updated TEXT NOT NULL, -- JSON array of field names
|
||||||
|
source_versions TEXT NOT NULL, -- JSON object: {source_key: version}
|
||||||
|
row_count INTEGER, -- NULL while running
|
||||||
|
notes TEXT
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3.2 `tessera_cells` — one row per H9 cell per pipeline run
|
||||||
|
|
||||||
|
H3 cell IDs are stored as INTEGER (64-bit H3 index) not TEXT.
|
||||||
|
Use `h3.cellToIndex()` / `h3.indexToCell()` for conversion.
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE tessera_cells (
|
||||||
|
-- Identity
|
||||||
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||||
|
h9 INTEGER NOT NULL, -- H3 res-9 index (64-bit)
|
||||||
|
h7 INTEGER NOT NULL, -- H3 res-7 parent index
|
||||||
|
h5 INTEGER NOT NULL, -- H3 res-5 grandparent index (waypoint)
|
||||||
|
lat REAL NOT NULL, -- H9 centroid latitude
|
||||||
|
lon REAL NOT NULL, -- H9 centroid longitude
|
||||||
|
|
||||||
|
-- Physical fields (RFC-TESSERA-2.0-001 byte layout preserved)
|
||||||
|
elev_cm INTEGER, -- Elevation in cm, signed 24-bit range
|
||||||
|
terrain INTEGER, -- Appendix A terrain code
|
||||||
|
hydro INTEGER, -- Section 3.3 hydrology code
|
||||||
|
geo_dep INTEGER, -- Section 3.4 deposit code
|
||||||
|
geo_flag INTEGER, -- Section 3.5 geology flag code
|
||||||
|
occ_flag INTEGER, -- RFC-TESSERA-3.0-OCC-001 Section 2 code
|
||||||
|
|
||||||
|
-- Provenance per field (source FK + confidence FK)
|
||||||
|
elev_src INTEGER REFERENCES source_registry(id),
|
||||||
|
elev_conf INTEGER REFERENCES confidence_grades(id),
|
||||||
|
terr_src INTEGER REFERENCES source_registry(id),
|
||||||
|
terr_conf INTEGER REFERENCES confidence_grades(id),
|
||||||
|
hydro_src INTEGER REFERENCES source_registry(id),
|
||||||
|
hydro_conf INTEGER REFERENCES confidence_grades(id),
|
||||||
|
gdep_src INTEGER REFERENCES source_registry(id),
|
||||||
|
gdep_conf INTEGER REFERENCES confidence_grades(id),
|
||||||
|
gflag_src INTEGER REFERENCES source_registry(id),
|
||||||
|
gflag_conf INTEGER REFERENCES confidence_grades(id),
|
||||||
|
occ_src INTEGER REFERENCES source_registry(id),
|
||||||
|
occ_conf INTEGER REFERENCES confidence_grades(id),
|
||||||
|
|
||||||
|
-- Lifecycle
|
||||||
|
status INTEGER NOT NULL DEFAULT 1
|
||||||
|
REFERENCES lifecycle_states(id),
|
||||||
|
run_id INTEGER NOT NULL REFERENCES pipeline_runs(id),
|
||||||
|
created_at TEXT NOT NULL, -- ISO 8601 UTC
|
||||||
|
superseded_by INTEGER REFERENCES tessera_cells(id),
|
||||||
|
retired_reason TEXT
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE INDEX idx_cells_h9_status ON tessera_cells(h9, status);
|
||||||
|
CREATE INDEX idx_cells_h5_status ON tessera_cells(h5, status);
|
||||||
|
CREATE INDEX idx_cells_h7_status ON tessera_cells(h7, status);
|
||||||
|
CREATE INDEX idx_cells_run ON tessera_cells(run_id);
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3.3 `h5_coverage` — H5 completeness tracking
|
||||||
|
|
||||||
|
An H5 hex is `complete` (status=2) when all its H9 cells are present
|
||||||
|
with `status = 2` (current). The game reads only cells in complete H5
|
||||||
|
hexes. This prevents elevation seams at partially-fetched boundaries.
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE h5_coverage (
|
||||||
|
h5 INTEGER PRIMARY KEY, -- H3 res-5 index
|
||||||
|
status INTEGER NOT NULL REFERENCES lifecycle_states(id),
|
||||||
|
-- 1=draft (in progress), 2=current (complete), 4=retired
|
||||||
|
h9_total INTEGER NOT NULL, -- Expected H9 count (typically 2401)
|
||||||
|
h9_current INTEGER NOT NULL DEFAULT 0,
|
||||||
|
last_updated TEXT NOT NULL, -- ISO 8601 UTC
|
||||||
|
run_id INTEGER NOT NULL REFERENCES pipeline_runs(id),
|
||||||
|
notes TEXT
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
A pipeline step increments `h9_current` on each H9 promotion and sets
|
||||||
|
`status = 2` when `h9_current = h9_total`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Row Lifecycle
|
||||||
|
|
||||||
|
```
|
||||||
|
[pipeline run] → status 1 (draft)
|
||||||
|
[validation pass] → status 2 (current)
|
||||||
|
[new pipeline run for same cell] → old row: status 3 (superseded), new row: status 2 (current)
|
||||||
|
[data error confirmed] → status 4 (retired, with reason)
|
||||||
|
```
|
||||||
|
|
||||||
|
The game's canonical query:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT tc.*
|
||||||
|
FROM tessera_cells tc
|
||||||
|
JOIN h5_coverage h5c ON tc.h5 = h5c.h5
|
||||||
|
WHERE h5c.status = 2 -- complete H5 hexes only
|
||||||
|
AND tc.status = 2 -- current rows only
|
||||||
|
AND tc.h5 = ? -- specific waypoint
|
||||||
|
```
|
||||||
|
|
||||||
|
This compound filter guarantees no boundary seams and no stale data.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Pipeline Contract
|
||||||
|
|
||||||
|
A pipeline run is valid if and only if:
|
||||||
|
|
||||||
|
1. Writes a `pipeline_runs` row with `status = 1` before any cells
|
||||||
|
2. All source datasets used are in `source_registry` before the run starts
|
||||||
|
3. Every `tessera_cells` row has `status = 1`, correct `run_id`, and
|
||||||
|
non-null provenance FKs for every field written
|
||||||
|
4. On completion: updates `pipeline_runs` to `status = 2`, sets `row_count`
|
||||||
|
5. On failure: updates `pipeline_runs` to `status = 4` — draft rows from
|
||||||
|
this run remain draft and are invisible to the game
|
||||||
|
|
||||||
|
Promotion (draft → current) is a separate explicit validation step:
|
||||||
|
- Verify row count matches expected H9 count for the H5 hex
|
||||||
|
- Update `tessera_cells.status` 1 → 2 for the run's rows
|
||||||
|
- Mark previous current rows for the same H9 cells as `status = 3`
|
||||||
|
- Update `h5_coverage` accordingly
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Staging Protocol
|
||||||
|
|
||||||
|
**`staging_otivm.sqlite3`** — pipeline writes here. Identical schema.
|
||||||
|
New hexes are fetched, processed, and validated here before production.
|
||||||
|
Can be deleted and rebuilt without affecting the game.
|
||||||
|
|
||||||
|
**`otivm.sqlite3`** — production. Game reads only `current` rows within
|
||||||
|
`complete` H5 hexes. Promotion from staging is explicit and never automatic.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Seed Data — TESSERA 3.0 → TESSERA 4.0
|
||||||
|
|
||||||
|
Source: `tessera.db` (TESSERA 3.0, SpatiaLite, 158GB, Dell SSD).
|
||||||
|
Seed run: `run_key = 'tessera3-seed-2026-04-26'`
|
||||||
|
|
||||||
|
The five OTIVM launch H5 waypoints:
|
||||||
|
|
||||||
|
| City | H5 (TEXT) | H5 (INTEGER) |
|
||||||
|
|---|---|---|
|
||||||
|
| Ostia | `851e805bfffffff` | resolved at extraction |
|
||||||
|
| Capua | `851e8333fffffff` | resolved at extraction |
|
||||||
|
| Brundisium | `851e8ba3fffffff` | resolved at extraction |
|
||||||
|
| Carthago | `85386e23fffffff` | resolved at extraction |
|
||||||
|
| Alexandria | `853f5ba7fffffff` | resolved at extraction |
|
||||||
|
|
||||||
|
Each H5 contains 343 H9 cells (7 H7 × 49 H9 per H7). Total seed rows:
|
||||||
|
5 × 343 = 1,715 H9 cells minimum. Adjacent H5 cells along trade routes
|
||||||
|
may also be seeded to prevent boundary seams in route rendering.
|
||||||
|
|
||||||
|
Seed confidence grades:
|
||||||
|
- `elev_conf`: 2 (indicated) — GEBCO 2025, direct sample
|
||||||
|
- `terr_conf`: 2 (indicated) — ESA WorldCover v200
|
||||||
|
- `hydro_conf`: 2 (indicated) — HydroSHEDS v1.1
|
||||||
|
- `gdep_conf`: 2 or 4 — MRDS point data where present, no_data elsewhere
|
||||||
|
- `gflag_conf`: 2 or 4 — IGME5000 polygon where present, no_data elsewhere
|
||||||
|
- `occ_conf`: 4 (no_data) — stage 06 not yet run; occ_flag = 0
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. What This RFC Does Not Cover
|
||||||
|
|
||||||
|
| Topic | Where it belongs |
|
||||||
|
|---|---|
|
||||||
|
| Game queries and views | OTIVM game code |
|
||||||
|
| Occupation evidence detail | RFC-TESSERA-3.0-OCC-001 (adapted) |
|
||||||
|
| CIVICVS simulation state | RFC-CIVICVS stack |
|
||||||
|
| Online source fetch scripts | Pipeline implementation |
|
||||||
|
| OTIVM roadmap changes | docs/roadmap.md |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*RFC-TESSERA-4.0-001 Draft v0.2 — 2026-04-26*
|
||||||
|
*Status: Draft — pending project owner approval before implementation*
|
||||||
|
*Next action: project owner approves schema, seed extraction from tessera.db begins*
|
||||||
Reference in New Issue
Block a user