Add TESSERA pipeline registry: all stages, sources, status, next steps
This commit is contained in:
273
docs/TESSERA-pipeline-registry.md
Normal file
273
docs/TESSERA-pipeline-registry.md
Normal file
@@ -0,0 +1,273 @@
|
|||||||
|
# TESSERA Pipeline Registry
|
||||||
|
### Date: 2026-04-26
|
||||||
|
### Author: Claude Sonnet 4.6 — written with full session context
|
||||||
|
### Status: Normative reference for all pipeline work
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What this document is
|
||||||
|
|
||||||
|
A single authoritative reference for every pipeline stage — what it
|
||||||
|
does, what source it reads, what it writes, where its output lives,
|
||||||
|
and what its current status is. Written by the assistant that ran the
|
||||||
|
pipeline end-to-end. Read this before touching any pipeline script.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The 8-byte cell record (RFC-TESSERA-2.0-001)
|
||||||
|
|
||||||
|
Every H9 cell in tessera.db is described by 8 bytes:
|
||||||
|
|
||||||
|
```
|
||||||
|
Byte 0-2: elev_cm — elevation in cm, signed 24-bit
|
||||||
|
Byte 3: terrain — RFC-TESSERA-2.0-001 Appendix A terrain code
|
||||||
|
Byte 4: hydro — RFC-TESSERA-2.0-001 Section 3.3 hydrology code
|
||||||
|
Byte 5: geo_dep — RFC-TESSERA-2.0-001 Section 3.4 deposit code
|
||||||
|
Byte 6: geo_flag — RFC-TESSERA-2.0-001 Section 3.5 geology flag code
|
||||||
|
Byte 7: occ_flag — RFC-TESSERA-3.0-OCC-001 Section 2 occupation code
|
||||||
|
```
|
||||||
|
|
||||||
|
In `otivm.sqlite3` (TESSERA 4.0), these are stored as separate INTEGER
|
||||||
|
columns with the same names, plus per-field provenance FKs.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Scale
|
||||||
|
|
||||||
|
- Interaction sphere: 15–72N, 15W–75E
|
||||||
|
- H7 tiles: 8,591,961
|
||||||
|
- H9 cells: 421,006,081
|
||||||
|
- Primary resolution: H9 (~180m diameter)
|
||||||
|
- Tile unit: H7 (~5km, contains 49 H9 cells)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Stage 00 — Elevation
|
||||||
|
|
||||||
|
| Property | Value |
|
||||||
|
|---|---|
|
||||||
|
| Script | `build_tessera_db.py` (integrated) |
|
||||||
|
| Source | GEBCO 2025 Grid — global 15 arc-second bathymetry/topography |
|
||||||
|
| Source URL | https://www.gebco.net/data_and_products/gridded_bathymetry_data/ |
|
||||||
|
| License | CC-BY 4.0 |
|
||||||
|
| Output field | `elev_cm` (bytes 0-2) |
|
||||||
|
| Output file | `/mnt/tessera-tiles/{h7}/tile_values.bin.gz` |
|
||||||
|
| Fingerprint | per-tile SHA-256 |
|
||||||
|
| Status | **COMPLETE** — all 8,591,961 H7 tiles |
|
||||||
|
| Notes | GEBCO is a modern dataset (2025). Elevation reflects current sea level. Doggerland cells are ocean in this dataset — they will require palaeoDEM correction in a future stage (RFC-TESSERA-3.0-PALEO-001, not yet written). |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Stage 01 — Terrain
|
||||||
|
|
||||||
|
| Property | Value |
|
||||||
|
|---|---|
|
||||||
|
| Script | `01_sample_terrain.py` |
|
||||||
|
| Source | ESA WorldCover 2021 v200 — global 10m land cover classification |
|
||||||
|
| Source URL | https://esa-worldcover.org/ |
|
||||||
|
| License | CC-BY 4.0 |
|
||||||
|
| Fingerprint | `ac7f5d74a006d248` |
|
||||||
|
| Output field | `terrain` (byte 3) |
|
||||||
|
| Output file | `/mnt/tessera-scratch/terrain/{h7}/terrain_values.bin.gz` |
|
||||||
|
| Magic | `b'TES\x01'` |
|
||||||
|
| Status | **COMPLETE** — all H7 tiles |
|
||||||
|
| Notes | Modern land cover, not Mesolithic. Forest, wetland, urban classifications reflect 2021 conditions. Mesolithic terrain correction is a future RFC (RFC-TESSERA-3.0-PALEO-001). The dataset is the ground truth for current physical terrain; simulation layers apply temporal corrections on top. |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Stage 02 — Hydrology
|
||||||
|
|
||||||
|
| Property | Value |
|
||||||
|
|---|---|
|
||||||
|
| Script | `02_sample_hydrology.py` |
|
||||||
|
| Source | HydroSHEDS v1.1 — flow direction and accumulation at 15 arc-second |
|
||||||
|
| Source URL | https://www.hydrosheds.org/ |
|
||||||
|
| License | CC-BY 4.0 |
|
||||||
|
| Fingerprint | `dcf6460a2bc0ebb5` |
|
||||||
|
| Output field | `hydro` (byte 4) |
|
||||||
|
| Output file | `/mnt/tessera-scratch/hydrology/{h7}/hydrology_values.bin.gz` |
|
||||||
|
| Magic | `b'TES\x02'` |
|
||||||
|
| Status | **COMPLETE** — all H7 tiles |
|
||||||
|
| Notes | One cross-sidecar correction applied in stage 03: where WorldCover identifies a lake or river but HydroSHEDS has no water body type (WB_NONE), the terrain sidecar overrides. HydroSHEDS v2.0 expected October 2026 — review then. |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Stage 03 — Tile Assembly
|
||||||
|
|
||||||
|
| Property | Value |
|
||||||
|
|---|---|
|
||||||
|
| Script | `03_assemble_tiles.py` |
|
||||||
|
| Source | Stages 00 + 01 + 02 sidecars |
|
||||||
|
| Output field | bytes 0-4 (all physical fields except geology and occupation) |
|
||||||
|
| Output file | `/mnt/tessera-tiles/{h7}/tile_values_final.bin.gz` |
|
||||||
|
| Magic | `b'TES2'` |
|
||||||
|
| Status | **COMPLETE** — all H7 tiles |
|
||||||
|
| Notes | Bytes 5-6 (geo_dep, geo_flag) written as placeholders: byte 5 = 0xFF (NO_DEPOSIT), byte 6 = 0x00. Byte 7 (occ_flag) = 0x00. These placeholders were later updated in tessera.db by stage 05 for cells where geology data exists. The tile archive on USB still has placeholder bytes 5-6 for most tiles — the authoritative values are in tessera.db. |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Stage 04a — Geology Flag
|
||||||
|
|
||||||
|
| Property | Value |
|
||||||
|
|---|---|
|
||||||
|
| Script | `04a_sample_igme5000.py` |
|
||||||
|
| Source | BGR IGME 5000 — 1:5M International Geological Map of Europe, layer 23 |
|
||||||
|
| Source URL | https://services.bgr.de/arcgis/rest/services/geologie/igme5000/MapServer/23 |
|
||||||
|
| License | Geonutz 2013 — open, no registration |
|
||||||
|
| Citation | Datenquelle: IGME5000, (c) BGR Hannover, 2007 |
|
||||||
|
| Fingerprint | `97448797fc4e3e31` |
|
||||||
|
| Output field | `geo_flag` (byte 6) |
|
||||||
|
| Output file | `/mnt/tessera-scratch/geology_flag/{h7}/geology_flag_values.bin.gz` |
|
||||||
|
| Magic | `b'TES\x04'` |
|
||||||
|
| Status | **COMPLETE** — all H7 tiles |
|
||||||
|
| Notes | Bit layout: bits 5-4 = rock class (00=superficial, 01=sedimentary, 10=metamorphic, 11=igneous), bits 3-2 = confidence (00=no_data, 01=inferred, 10=indicated, 11=measured). Coverage gaps outside European shelf return 0x00 (no_data). Method: H5 bounding box query → shapely point-in-polygon for H9 centroids. v2 of this script (geometry-based) replaced v1 (per-H9-centroid API query) to avoid 421M API calls. |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Stage 04b — Geology Deposit
|
||||||
|
|
||||||
|
| Property | Value |
|
||||||
|
|---|---|
|
||||||
|
| Script | `04b_sample_mrds.py` |
|
||||||
|
| Source | USGS MRDS — Mineral Resources Data System, mrds.csv downloaded 2022-08-23 |
|
||||||
|
| Source URL | https://mrdata.usgs.gov/mrds/ |
|
||||||
|
| DOI | 10.3133/ds52 |
|
||||||
|
| License | USGS public domain |
|
||||||
|
| Fingerprint | `ebf10a548e617164` |
|
||||||
|
| Output field | `geo_dep` (byte 5) |
|
||||||
|
| Output file | `/mnt/tessera-scratch/geology_dep/{h7}/geology_dep_values.bin.gz` |
|
||||||
|
| Magic | `b'TES\x05'` |
|
||||||
|
| Status | **COMPLETE** — all H7 tiles |
|
||||||
|
| Notes | Commodity codes in `mrds_commodity_map.yaml`. Only the highest-priority deposit per H9 cell is encoded. European coverage is uneven — MRDS systematic updates ceased 2011. Almadén mercury mine: RESOLVED 2026-04-18. MRDS coordinates are ~34km from actual mine due to MRDS data quality, not a pipeline error. Deposit correctly encoded as Mercury (0x1d) in H7 `87390e4d9ffffff`. |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Stage 05 — Geology Assembly into tessera.db
|
||||||
|
|
||||||
|
| Property | Value |
|
||||||
|
|---|---|
|
||||||
|
| Script | `05_assemble_geology.py` (v5 — bulk load approach) |
|
||||||
|
| Source | Stage 03 tile archive + stages 04a + 04b sidecars (all USB, read-only) |
|
||||||
|
| Target | `tessera.db` — UPDATE tessera_cells SET geo_dep=?, geo_flag=? |
|
||||||
|
| Status | **PARTIALLY COMPLETE** |
|
||||||
|
| Notes | See below. |
|
||||||
|
|
||||||
|
### Stage 05 detailed status
|
||||||
|
|
||||||
|
Five versions were written. V5 (bulk load: stage db → batch UPDATE) ran
|
||||||
|
twice but crashed at exactly the same point both times:
|
||||||
|
|
||||||
|
- Crash point: 8,361,990 / 8,591,961 H7 cells (97.3% complete)
|
||||||
|
- Crash time: ~80 hours into Phase 1 (reading USB sidecars)
|
||||||
|
- Root cause: unknown — clean exit (code 0), no traceback captured,
|
||||||
|
no OOM, no disk full, no system reboot. Deterministic crash at same H7
|
||||||
|
count suggests a specific problematic tile or resource exhaustion in
|
||||||
|
the staging SQLite db at ~410M rows.
|
||||||
|
|
||||||
|
**Consequence for otivm.sqlite3:** The five OTIVM Mediterranean waypoints
|
||||||
|
(Ostia, Capua, Brundisium, Carthago, Alexandria) were processed well
|
||||||
|
before the crash point. Their `geo_dep` and `geo_flag` values are
|
||||||
|
correctly populated in tessera.db and were correctly seeded into
|
||||||
|
otivm.sqlite3.
|
||||||
|
|
||||||
|
**The remaining ~230,000 H7 tiles** (the last 2.7%) have `geo_dep = 255`
|
||||||
|
and `geo_flag = 0` placeholders in tessera.db. These tiles are at the
|
||||||
|
edge of the interaction sphere — not OTIVM waypoints.
|
||||||
|
|
||||||
|
**Decision taken:** Stage 05 is not being restarted. The OTIVM seed
|
||||||
|
database has correct geology for all five waypoints. Future runs of
|
||||||
|
stage 06 against otivm.sqlite3 directly (TESSERA 4.0 model) do not
|
||||||
|
require stage 05 to be complete in tessera.db.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Stage 06 — Occupation / Culture Sampling
|
||||||
|
|
||||||
|
| Property | Value |
|
||||||
|
|---|---|
|
||||||
|
| Script | **NOT YET WRITTEN** |
|
||||||
|
| Source | Archaeological databases — ARIADNE, SEAD, published excavation records |
|
||||||
|
| Target field | `occ_flag` (byte 7) — RFC-TESSERA-3.0-OCC-001 |
|
||||||
|
| Status | **NOT STARTED** |
|
||||||
|
|
||||||
|
### Stage 06 design — TESSERA 4.0 approach
|
||||||
|
|
||||||
|
Under TESSERA 4.0, stage 06 does NOT run against the global tessera.db.
|
||||||
|
It runs against `otivm.sqlite3` directly, updating only the 12,005 H9
|
||||||
|
cells already in production.
|
||||||
|
|
||||||
|
`occ_flag` bit layout (RFC-TESSERA-3.0-OCC-001 Section 2):
|
||||||
|
```
|
||||||
|
Bits 7-6: Occupation period
|
||||||
|
Bits 5-4: Evidence type
|
||||||
|
Bits 3-2: Confidence
|
||||||
|
Bits 1-0: Reserved
|
||||||
|
```
|
||||||
|
|
||||||
|
Four Mesolithic cultures for the Mediterranean waypoints:
|
||||||
|
|
||||||
|
| Code | Culture | Period BCE | Region |
|
||||||
|
|---|---|---|---|
|
||||||
|
| MAGL | Maglemosian | 9000-6000 | Denmark, S.Sweden, N.Germany, N.Poland |
|
||||||
|
| ERTE | Ertebølle | 5400-3900 | Denmark, S.Sweden, N.Germany coast |
|
||||||
|
| SAUV | Sauveterrian | 9000-6500 | SW France, N.Spain, N.Italy |
|
||||||
|
| AZIL | Azilian | 10000-8500 | SW France, N.Spain, Switzerland |
|
||||||
|
|
||||||
|
**Source investigation required before writing stage 06:**
|
||||||
|
- ARIADNE portal: https://portal.ariadne-infrastructure.eu/
|
||||||
|
- SEAD: https://www.sead.se/
|
||||||
|
- Each source must be documented in `otivm.sqlite3` `source_registry`
|
||||||
|
before any rows are written
|
||||||
|
|
||||||
|
**Stage 06 script structure (when written):**
|
||||||
|
- Reads culture polygon GIS data for the OTIVM waypoint regions
|
||||||
|
- Point-in-polygon test for each H9 centroid
|
||||||
|
- Updates `occ_flag`, `occ_src`, `occ_conf` in `otivm.sqlite3`
|
||||||
|
- Follows RFC-TESSERA-4.0-001 pipeline contract (draft → validate → promote)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Current state of otivm.sqlite3
|
||||||
|
|
||||||
|
| Field | Status | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| `elev_cm` | ✓ Current | GEBCO 2025, indicated confidence |
|
||||||
|
| `terrain` | ✓ Current | ESA WorldCover v200, indicated confidence |
|
||||||
|
| `hydro` | ✓ Current | HydroSHEDS v1.1, indicated confidence |
|
||||||
|
| `geo_dep` | ✓ Current | USGS MRDS — indicated where present, no_data elsewhere |
|
||||||
|
| `geo_flag` | ✓ Current | BGR IGME5000 — indicated where present, no_data elsewhere |
|
||||||
|
| `occ_flag` | ✗ Placeholder | 0x00 everywhere — stage 06 not yet written |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Scripts on tessera-pipeline CT
|
||||||
|
|
||||||
|
Location: `/opt/tessera-pipeline/`
|
||||||
|
Python venv: `/opt/tessera-pipeline/venv/bin/python3`
|
||||||
|
|
||||||
|
| Script | Stage | Status |
|
||||||
|
|---|---|---|
|
||||||
|
| `01_sample_terrain.py` | 01 | Complete — do not re-run |
|
||||||
|
| `02_sample_hydrology.py` | 02 | Complete — do not re-run |
|
||||||
|
| `03_assemble_tiles.py` | 03 | Complete — do not re-run |
|
||||||
|
| `04a_sample_igme5000.py` | 04a | Complete — do not re-run |
|
||||||
|
| `04b_sample_mrds.py` | 04b | Complete — do not re-run |
|
||||||
|
| `05_assemble_geology.py` | 05 | Crashed at 97% — abandoned |
|
||||||
|
| `build_tessera_db.py` | DB build | Complete — do not re-run |
|
||||||
|
| `seed_extract.py` | TESSERA 4.0 seed | Complete — do not re-run |
|
||||||
|
| `seed_promote.py` | TESSERA 4.0 promote | Complete — do not re-run |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Hard rules
|
||||||
|
|
||||||
|
- USB drive (`/mnt/tessera-tiles`, `/mnt/tessera-scratch`, `/mnt/tessera-source`) is **READ-ONLY**
|
||||||
|
- `tessera.db` on SSD (`/mnt/tessera-db/tessera.db`) is the immutable source — do not modify
|
||||||
|
- `otivm.sqlite3` is the production game database — write only via RFC-TESSERA-4.0-001 pipeline contract
|
||||||
|
- Do not re-run any completed stage without explicit project owner instruction
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*TESSERA-pipeline-registry.md — 2026-04-26*
|
||||||
|
*Written by Claude Sonnet 4.6 with full pipeline session context*
|
||||||
|
*Next pipeline work: stage 06 (occ_flag) against otivm.sqlite3 directly*
|
||||||
Reference in New Issue
Block a user