Files
otivm/docs/RFC-TESSERA-4.0-001.md

12 KiB
Raw Permalink Blame History

RFC-TESSERA-4.0-001

Per-Hex Pipeline, Row Lifecycle, and Provenance Model

Status: Draft v0.2

Date: 2026-04-26

Supersedes: RFC-TESSERA-2.0-001 (tile assembly), RFC-TESSERA-3.0-OCC-001 (occupation encoding)


Abstract

TESSERA 4.0 replaces the one-shot global pipeline model with a per-hex pipeline that is resumable, updatable, and portable. Any H3 hex at any resolution can be inserted, updated, superseded, or retired independently of all others. The database is the product from day one — not a tile archive assembled in a second pass.

This RFC defines:

  • The otivm.sqlite3 schema — integer-normalized, compact, TESSERA 4.0 format
  • Row lifecycle states as a lookup table
  • Per-field provenance as integer foreign keys
  • H5 coverage completeness tracking — no rugged edges at hex boundaries
  • The pipeline contract — what a pipeline run must produce to be valid
  • The staging protocol — staging_otivm.sqlite3otivm.sqlite3

1. Design Principles

1.1 The hex is the unit of work

One pipeline run processes one H3 hex. It fetches source data for that hex, derives the field values, records provenance, and writes to the staging database. No global operations. No dependency on adjacent hexes. A run that fails leaves no partial state — the hex either has a complete draft row or no row.

1.2 The database is the product

There is no tile archive. There is no intermediate sidecar format. The SQLite database is written directly by the pipeline. The game reads directly from the database. There is no assembly stage.

1.3 Rows are never deleted

A row is never deleted. It is superseded or retired. The full history of every cell is queryable. When a new dataset version produces a better value, the old row becomes superseded and the new row becomes current. The game always reads only current rows.

1.4 Dataset versions evolve independently per field

GEBCO releases a new bathymetry dataset. Only the elev_cm field needs refreshing — terrain, hydrology, and geology are unchanged. The pipeline runs only for the affected field. Only the affected rows are superseded.

1.5 Academic credibility travels with the cell

Every field value carries a confidence grade as an integer FK. A cell with IGME5000 geology coverage has measured or indicated confidence. A cell outside IGME5000 coverage has inferred or no_data. This grade is queryable, displayable, and affects how the game and simulator use the data. It is not optional metadata — it is a first-class field.

1.6 No rugged edges at hex boundaries

The game reads only cells within H5 hexes that are marked complete in h5_coverage. An H5 hex is complete when all its H9 cells are present with status = current. Incomplete H5 hexes are staging — they are never served to the game, preventing elevation seams at partially-fetched boundaries.

1.7 The 32GB constraint is the reality check

The pipeline fetches and stores only what the game needs. Storage grows proportionally to game coverage, not to Earth's surface area.


2. Lookup Tables

All enumerated values are stored as integers referencing these tables. Written once at database creation and never modified.

2.1 lifecycle_states

CREATE TABLE lifecycle_states (
    id      INTEGER PRIMARY KEY,
    name    TEXT NOT NULL UNIQUE
);

INSERT INTO lifecycle_states VALUES
    (1, 'draft'),
    (2, 'current'),
    (3, 'superseded'),
    (4, 'retired');

2.2 confidence_grades

CREATE TABLE confidence_grades (
    id          INTEGER PRIMARY KEY,
    name        TEXT NOT NULL UNIQUE,
    description TEXT NOT NULL
);

INSERT INTO confidence_grades VALUES
    (1, 'measured',  'Directly observed or instrumentally measured. Published dataset with explicit methodology.'),
    (2, 'indicated', 'Recorded in registry or survey without direct measurement. Classification may be broad.'),
    (3, 'inferred',  'Derived from landscape position, proximity to measured cells, or modelled from adjacent data.'),
    (4, 'no_data',   'Source dataset has no coverage for this cell. Field value is a known placeholder.');

2.3 source_registry

CREATE TABLE source_registry (
    id            INTEGER PRIMARY KEY AUTOINCREMENT,
    source_key    TEXT NOT NULL UNIQUE,  -- e.g. 'GEBCO_2025'
    source_name   TEXT NOT NULL,
    source_url    TEXT,
    version       TEXT NOT NULL,
    license       TEXT,
    citation      TEXT,
    registered_at TEXT NOT NULL          -- ISO 8601 UTC
);

3. Core Tables

3.1 pipeline_runs

CREATE TABLE pipeline_runs (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    run_key         TEXT NOT NULL UNIQUE,  -- human label, e.g. 'tessera3-seed-2026-04-26'
    started_at      TEXT NOT NULL,         -- ISO 8601 UTC
    completed_at    TEXT,                  -- NULL while running
    status          INTEGER NOT NULL REFERENCES lifecycle_states(id),
    h5_cells        TEXT NOT NULL,         -- JSON array of H3 res-5 integer IDs
    fields_updated  TEXT NOT NULL,         -- JSON array of field names
    source_versions TEXT NOT NULL,         -- JSON object: {source_key: version}
    row_count       INTEGER,               -- NULL while running
    notes           TEXT
);

3.2 tessera_cells — one row per H9 cell per pipeline run

H3 cell IDs are stored as INTEGER (64-bit H3 index) not TEXT. Use h3.cellToIndex() / h3.indexToCell() for conversion.

CREATE TABLE tessera_cells (
    -- Identity
    id          INTEGER PRIMARY KEY AUTOINCREMENT,
    h9          INTEGER NOT NULL,   -- H3 res-9 index (64-bit)
    h7          INTEGER NOT NULL,   -- H3 res-7 parent index
    h5          INTEGER NOT NULL,   -- H3 res-5 grandparent index (waypoint)
    lat         REAL    NOT NULL,   -- H9 centroid latitude
    lon         REAL    NOT NULL,   -- H9 centroid longitude

    -- Physical fields (RFC-TESSERA-2.0-001 byte layout preserved)
    elev_cm     INTEGER,            -- Elevation in cm, signed 24-bit range
    terrain     INTEGER,            -- Appendix A terrain code
    hydro       INTEGER,            -- Section 3.3 hydrology code
    geo_dep     INTEGER,            -- Section 3.4 deposit code
    geo_flag    INTEGER,            -- Section 3.5 geology flag code
    occ_flag    INTEGER,            -- RFC-TESSERA-3.0-OCC-001 Section 2 code

    -- Provenance per field (source FK + confidence FK)
    elev_src    INTEGER REFERENCES source_registry(id),
    elev_conf   INTEGER REFERENCES confidence_grades(id),
    terr_src    INTEGER REFERENCES source_registry(id),
    terr_conf   INTEGER REFERENCES confidence_grades(id),
    hydro_src   INTEGER REFERENCES source_registry(id),
    hydro_conf  INTEGER REFERENCES confidence_grades(id),
    gdep_src    INTEGER REFERENCES source_registry(id),
    gdep_conf   INTEGER REFERENCES confidence_grades(id),
    gflag_src   INTEGER REFERENCES source_registry(id),
    gflag_conf  INTEGER REFERENCES confidence_grades(id),
    occ_src     INTEGER REFERENCES source_registry(id),
    occ_conf    INTEGER REFERENCES confidence_grades(id),

    -- Lifecycle
    status          INTEGER NOT NULL DEFAULT 1
                            REFERENCES lifecycle_states(id),
    run_id          INTEGER NOT NULL REFERENCES pipeline_runs(id),
    created_at      TEXT    NOT NULL,   -- ISO 8601 UTC
    superseded_by   INTEGER             REFERENCES tessera_cells(id),
    retired_reason  TEXT
);

CREATE INDEX idx_cells_h9_status ON tessera_cells(h9, status);
CREATE INDEX idx_cells_h5_status ON tessera_cells(h5, status);
CREATE INDEX idx_cells_h7_status ON tessera_cells(h7, status);
CREATE INDEX idx_cells_run       ON tessera_cells(run_id);

3.3 h5_coverage — H5 completeness tracking

An H5 hex is complete (status=2) when all its H9 cells are present with status = 2 (current). The game reads only cells in complete H5 hexes. This prevents elevation seams at partially-fetched boundaries.

CREATE TABLE h5_coverage (
    h5              INTEGER PRIMARY KEY,  -- H3 res-5 index
    status          INTEGER NOT NULL REFERENCES lifecycle_states(id),
                    -- 1=draft (in progress), 2=current (complete), 4=retired
    h9_total        INTEGER NOT NULL,     -- Expected H9 count (typically 2401)
    h9_current      INTEGER NOT NULL DEFAULT 0,
    last_updated    TEXT NOT NULL,        -- ISO 8601 UTC
    run_id          INTEGER NOT NULL REFERENCES pipeline_runs(id),
    notes           TEXT
);

A pipeline step increments h9_current on each H9 promotion and sets status = 2 when h9_current = h9_total.


4. Row Lifecycle

[pipeline run] → status 1 (draft)
[validation pass] → status 2 (current)
[new pipeline run for same cell] → old row: status 3 (superseded), new row: status 2 (current)
[data error confirmed] → status 4 (retired, with reason)

The game's canonical query:

SELECT tc.*
FROM tessera_cells tc
JOIN h5_coverage h5c ON tc.h5 = h5c.h5
WHERE h5c.status = 2   -- complete H5 hexes only
  AND tc.status = 2    -- current rows only
  AND tc.h5 = ?        -- specific waypoint

This compound filter guarantees no boundary seams and no stale data.


5. Pipeline Contract

A pipeline run is valid if and only if:

  1. Writes a pipeline_runs row with status = 1 before any cells
  2. All source datasets used are in source_registry before the run starts
  3. Every tessera_cells row has status = 1, correct run_id, and non-null provenance FKs for every field written
  4. On completion: updates pipeline_runs to status = 2, sets row_count
  5. On failure: updates pipeline_runs to status = 4 — draft rows from this run remain draft and are invisible to the game

Promotion (draft → current) is a separate explicit validation step:

  • Verify row count matches expected H9 count for the H5 hex
  • Update tessera_cells.status 1 → 2 for the run's rows
  • Mark previous current rows for the same H9 cells as status = 3
  • Update h5_coverage accordingly

6. Staging Protocol

staging_otivm.sqlite3 — pipeline writes here. Identical schema. New hexes are fetched, processed, and validated here before production. Can be deleted and rebuilt without affecting the game.

otivm.sqlite3 — production. Game reads only current rows within complete H5 hexes. Promotion from staging is explicit and never automatic.


7. Seed Data — TESSERA 3.0 → TESSERA 4.0

Source: tessera.db (TESSERA 3.0, SpatiaLite, 158GB, Dell SSD). Seed run: run_key = 'tessera3-seed-2026-04-26'

The five OTIVM launch H5 waypoints:

City H5 (TEXT) H5 (INTEGER)
Ostia 851e805bfffffff resolved at extraction
Capua 851e8333fffffff resolved at extraction
Brundisium 851e8ba3fffffff resolved at extraction
Carthago 85386e23fffffff resolved at extraction
Alexandria 853f5ba7fffffff resolved at extraction

Each H5 contains 343 H9 cells (7 H7 × 49 H9 per H7). Total seed rows: 5 × 343 = 1,715 H9 cells minimum. Adjacent H5 cells along trade routes may also be seeded to prevent boundary seams in route rendering.

Seed confidence grades:

  • elev_conf: 2 (indicated) — GEBCO 2025, direct sample
  • terr_conf: 2 (indicated) — ESA WorldCover v200
  • hydro_conf: 2 (indicated) — HydroSHEDS v1.1
  • gdep_conf: 2 or 4 — MRDS point data where present, no_data elsewhere
  • gflag_conf: 2 or 4 — IGME5000 polygon where present, no_data elsewhere
  • occ_conf: 4 (no_data) — stage 06 not yet run; occ_flag = 0

8. What This RFC Does Not Cover

Topic Where it belongs
Game queries and views OTIVM game code
Occupation evidence detail RFC-TESSERA-3.0-OCC-001 (adapted)
CIVICVS simulation state RFC-CIVICVS stack
Online source fetch scripts Pipeline implementation
OTIVM roadmap changes docs/roadmap.md

RFC-TESSERA-4.0-001 Draft v0.2 — 2026-04-26 Status: Draft — pending project owner approval before implementation Next action: project owner approves schema, seed extraction from tessera.db begins