Files
otivm/docs/TESSERA-pipeline-registry.md

11 KiB
Raw Blame History

TESSERA Pipeline Registry

Date: 2026-04-26

Author: Claude Sonnet 4.6 — written with full session context

Status: Normative reference for all pipeline work


What this document is

A single authoritative reference for every pipeline stage — what it does, what source it reads, what it writes, where its output lives, and what its current status is. Written by the assistant that ran the pipeline end-to-end. Read this before touching any pipeline script.


The 8-byte cell record (RFC-TESSERA-2.0-001)

Every H9 cell in tessera.db is described by 8 bytes:

Byte 0-2: elev_cm    — elevation in cm, signed 24-bit
Byte 3:   terrain    — RFC-TESSERA-2.0-001 Appendix A terrain code
Byte 4:   hydro      — RFC-TESSERA-2.0-001 Section 3.3 hydrology code
Byte 5:   geo_dep    — RFC-TESSERA-2.0-001 Section 3.4 deposit code
Byte 6:   geo_flag   — RFC-TESSERA-2.0-001 Section 3.5 geology flag code
Byte 7:   occ_flag   — RFC-TESSERA-3.0-OCC-001 Section 2 occupation code

In otivm.sqlite3 (TESSERA 4.0), these are stored as separate INTEGER columns with the same names, plus per-field provenance FKs.


Scale

  • Interaction sphere: 1572N, 15W75E
  • H7 tiles: 8,591,961
  • H9 cells: 421,006,081
  • Primary resolution: H9 (~180m diameter)
  • Tile unit: H7 (~5km, contains 49 H9 cells)

Stage 00 — Elevation

Property Value
Script build_tessera_db.py (integrated)
Source GEBCO 2025 Grid — global 15 arc-second bathymetry/topography
Source URL https://www.gebco.net/data_and_products/gridded_bathymetry_data/
License CC-BY 4.0
Output field elev_cm (bytes 0-2)
Output file /mnt/tessera-tiles/{h7}/tile_values.bin.gz
Fingerprint per-tile SHA-256
Status COMPLETE — all 8,591,961 H7 tiles
Notes GEBCO is a modern dataset (2025). Elevation reflects current sea level. Doggerland cells are ocean in this dataset — they will require palaeoDEM correction in a future stage (RFC-TESSERA-3.0-PALEO-001, not yet written).

Stage 01 — Terrain

Property Value
Script 01_sample_terrain.py
Source ESA WorldCover 2021 v200 — global 10m land cover classification
Source URL https://esa-worldcover.org/
License CC-BY 4.0
Fingerprint ac7f5d74a006d248
Output field terrain (byte 3)
Output file /mnt/tessera-scratch/terrain/{h7}/terrain_values.bin.gz
Magic b'TES\x01'
Status COMPLETE — all H7 tiles
Notes Modern land cover, not Mesolithic. Forest, wetland, urban classifications reflect 2021 conditions. Mesolithic terrain correction is a future RFC (RFC-TESSERA-3.0-PALEO-001). The dataset is the ground truth for current physical terrain; simulation layers apply temporal corrections on top.

Stage 02 — Hydrology

Property Value
Script 02_sample_hydrology.py
Source HydroSHEDS v1.1 — flow direction and accumulation at 15 arc-second
Source URL https://www.hydrosheds.org/
License CC-BY 4.0
Fingerprint dcf6460a2bc0ebb5
Output field hydro (byte 4)
Output file /mnt/tessera-scratch/hydrology/{h7}/hydrology_values.bin.gz
Magic b'TES\x02'
Status COMPLETE — all H7 tiles
Notes One cross-sidecar correction applied in stage 03: where WorldCover identifies a lake or river but HydroSHEDS has no water body type (WB_NONE), the terrain sidecar overrides. HydroSHEDS v2.0 expected October 2026 — review then.

Stage 03 — Tile Assembly

Property Value
Script 03_assemble_tiles.py
Source Stages 00 + 01 + 02 sidecars
Output field bytes 0-4 (all physical fields except geology and occupation)
Output file /mnt/tessera-tiles/{h7}/tile_values_final.bin.gz
Magic b'TES2'
Status COMPLETE — all H7 tiles
Notes Bytes 5-6 (geo_dep, geo_flag) written as placeholders: byte 5 = 0xFF (NO_DEPOSIT), byte 6 = 0x00. Byte 7 (occ_flag) = 0x00. These placeholders were later updated in tessera.db by stage 05 for cells where geology data exists. The tile archive on USB still has placeholder bytes 5-6 for most tiles — the authoritative values are in tessera.db.

Stage 04a — Geology Flag

Property Value
Script 04a_sample_igme5000.py
Source BGR IGME 5000 — 1:5M International Geological Map of Europe, layer 23
Source URL https://services.bgr.de/arcgis/rest/services/geologie/igme5000/MapServer/23
License Geonutz 2013 — open, no registration
Citation Datenquelle: IGME5000, (c) BGR Hannover, 2007
Fingerprint 97448797fc4e3e31
Output field geo_flag (byte 6)
Output file /mnt/tessera-scratch/geology_flag/{h7}/geology_flag_values.bin.gz
Magic b'TES\x04'
Status COMPLETE — all H7 tiles
Notes Bit layout: bits 5-4 = rock class (00=superficial, 01=sedimentary, 10=metamorphic, 11=igneous), bits 3-2 = confidence (00=no_data, 01=inferred, 10=indicated, 11=measured). Coverage gaps outside European shelf return 0x00 (no_data). Method: H5 bounding box query → shapely point-in-polygon for H9 centroids. v2 of this script (geometry-based) replaced v1 (per-H9-centroid API query) to avoid 421M API calls.

Stage 04b — Geology Deposit

Property Value
Script 04b_sample_mrds.py
Source USGS MRDS — Mineral Resources Data System, mrds.csv downloaded 2022-08-23
Source URL https://mrdata.usgs.gov/mrds/
DOI 10.3133/ds52
License USGS public domain
Fingerprint ebf10a548e617164
Output field geo_dep (byte 5)
Output file /mnt/tessera-scratch/geology_dep/{h7}/geology_dep_values.bin.gz
Magic b'TES\x05'
Status COMPLETE — all H7 tiles
Notes Commodity codes in mrds_commodity_map.yaml. Only the highest-priority deposit per H9 cell is encoded. European coverage is uneven — MRDS systematic updates ceased 2011. Almadén mercury mine: RESOLVED 2026-04-18. MRDS coordinates are ~34km from actual mine due to MRDS data quality, not a pipeline error. Deposit correctly encoded as Mercury (0x1d) in H7 87390e4d9ffffff.

Stage 05 — Geology Assembly into tessera.db

Property Value
Script 05_assemble_geology.py (v5 — bulk load approach)
Source Stage 03 tile archive + stages 04a + 04b sidecars (all USB, read-only)
Target tessera.db — UPDATE tessera_cells SET geo_dep=?, geo_flag=?
Status PARTIALLY COMPLETE
Notes See below.

Stage 05 detailed status

Five versions were written. V5 (bulk load: stage db → batch UPDATE) ran twice but crashed at exactly the same point both times:

  • Crash point: 8,361,990 / 8,591,961 H7 cells (97.3% complete)
  • Crash time: ~80 hours into Phase 1 (reading USB sidecars)
  • Root cause: unknown — clean exit (code 0), no traceback captured, no OOM, no disk full, no system reboot. Deterministic crash at same H7 count suggests a specific problematic tile or resource exhaustion in the staging SQLite db at ~410M rows.

Consequence for otivm.sqlite3: The five OTIVM Mediterranean waypoints (Ostia, Capua, Brundisium, Carthago, Alexandria) were processed well before the crash point. Their geo_dep and geo_flag values are correctly populated in tessera.db and were correctly seeded into otivm.sqlite3.

The remaining ~230,000 H7 tiles (the last 2.7%) have geo_dep = 255 and geo_flag = 0 placeholders in tessera.db. These tiles are at the edge of the interaction sphere — not OTIVM waypoints.

Decision taken: Stage 05 is not being restarted. The OTIVM seed database has correct geology for all five waypoints. Future runs of stage 06 against otivm.sqlite3 directly (TESSERA 4.0 model) do not require stage 05 to be complete in tessera.db.


Stage 06 — Occupation / Culture Sampling

Property Value
Script NOT YET WRITTEN
Source Archaeological databases — ARIADNE, SEAD, published excavation records
Target field occ_flag (byte 7) — RFC-TESSERA-3.0-OCC-001
Status NOT STARTED

Stage 06 design — TESSERA 4.0 approach

Under TESSERA 4.0, stage 06 does NOT run against the global tessera.db. It runs against otivm.sqlite3 directly, updating only the 12,005 H9 cells already in production.

occ_flag bit layout (RFC-TESSERA-3.0-OCC-001 Section 2):

Bits 7-6: Occupation period
Bits 5-4: Evidence type
Bits 3-2: Confidence
Bits 1-0: Reserved

Four Mesolithic cultures for the Mediterranean waypoints:

Code Culture Period BCE Region
MAGL Maglemosian 9000-6000 Denmark, S.Sweden, N.Germany, N.Poland
ERTE Ertebølle 5400-3900 Denmark, S.Sweden, N.Germany coast
SAUV Sauveterrian 9000-6500 SW France, N.Spain, N.Italy
AZIL Azilian 10000-8500 SW France, N.Spain, Switzerland

Source investigation required before writing stage 06:

Stage 06 script structure (when written):

  • Reads culture polygon GIS data for the OTIVM waypoint regions
  • Point-in-polygon test for each H9 centroid
  • Updates occ_flag, occ_src, occ_conf in otivm.sqlite3
  • Follows RFC-TESSERA-4.0-001 pipeline contract (draft → validate → promote)

Current state of otivm.sqlite3

Field Status Notes
elev_cm ✓ Current GEBCO 2025, indicated confidence
terrain ✓ Current ESA WorldCover v200, indicated confidence
hydro ✓ Current HydroSHEDS v1.1, indicated confidence
geo_dep ✓ Current USGS MRDS — indicated where present, no_data elsewhere
geo_flag ✓ Current BGR IGME5000 — indicated where present, no_data elsewhere
occ_flag ✗ Placeholder 0x00 everywhere — stage 06 not yet written

Scripts on tessera-pipeline CT

Location: /opt/tessera-pipeline/ Python venv: /opt/tessera-pipeline/venv/bin/python3

Script Stage Status
01_sample_terrain.py 01 Complete — do not re-run
02_sample_hydrology.py 02 Complete — do not re-run
03_assemble_tiles.py 03 Complete — do not re-run
04a_sample_igme5000.py 04a Complete — do not re-run
04b_sample_mrds.py 04b Complete — do not re-run
05_assemble_geology.py 05 Crashed at 97% — abandoned
build_tessera_db.py DB build Complete — do not re-run
seed_extract.py TESSERA 4.0 seed Complete — do not re-run
seed_promote.py TESSERA 4.0 promote Complete — do not re-run

Hard rules

  • USB drive (/mnt/tessera-tiles, /mnt/tessera-scratch, /mnt/tessera-source) is READ-ONLY
  • tessera.db on SSD (/mnt/tessera-db/tessera.db) is the immutable source — do not modify
  • otivm.sqlite3 is the production game database — write only via RFC-TESSERA-4.0-001 pipeline contract
  • Do not re-run any completed stage without explicit project owner instruction

TESSERA-pipeline-registry.md — 2026-04-26 Written by Claude Sonnet 4.6 with full pipeline session context Next pipeline work: stage 06 (occ_flag) against otivm.sqlite3 directly