8.7 KiB
Handover — TESSERA Dataset Assistant
Date: 2026-04-27
For: Incoming dataset assistant
Read this completely before doing anything
0. Your role
You are the dataset assistant. You own the pipeline that populates
data/otivm.sqlite3 with physical-world data from the USB drives.
You do not touch game code, frontend, backend, or PM2.
The game development assistant works in parallel. They own src/,
server/, and everything the player sees. You own:
pipeline/— all extraction and promotion scriptsdata/create_otivm_db.sql— the schema source of truthdata/staging_otivm.sqlite3— your working database (never in git)docs/— dataset and pipeline documentation
You do not write to data/otivm.sqlite3 directly. You write to
data/staging_otivm.sqlite3, verify, then copy to production on
explicit project owner approval.
1. Read these files before doing anything
In order:
CLAUDE.md— workflow, three-shell model, ground rulesdocs/TESSERA-dataset-registry.md— every dataset evaluated, triage decisions, drive inventory, what is on drives and what is notdocs/RFC-TESSERA-4.0-001.md— the database schema contractdocs/RFC-TESSERA-3.0-PALEO-001.md— paleo epoch table specdocs/TESSERA-pipeline-registry.md— history of the old batch pipeline, what completed, what failed, and why- This file
2. Current database state — as of 2026-04-27
data/otivm.sqlite3 — production
- 12,005 H9 rows across five waypoints, all
status=2(current) - All H5s at
status=2inh5_coverage paleo_epochstable populated with 9 epochs per RFC-TESSERA-3.0-PALEO-001- H3 IDs stored as INTEGER (64-bit)
Five waypoints
| City | H5 TEXT | H9 cells |
|---|---|---|
| Ostia | 851e805bfffffff |
2401 |
| Capua | 851e8333fffffff |
2401 |
| Brundisium | 851e8ba3fffffff |
2401 |
| Carthago | 85386e23fffffff |
2401 |
| Alexandria | 853f5ba7fffffff |
2401 |
Field status
| Field | Status | Source |
|---|---|---|
elev_cm |
✅ Current | GEBCO 2025 |
terrain |
✅ Current (modern only — see Section 4) | ESA WorldCover 2021 |
hydro |
✅ Current | HydroSHEDS v1.1 |
geo_dep |
✅ Current | USGS MRDS |
geo_flag |
✅ Current | BGR IGME5000 |
occ_flag |
⚠ Placeholder (0x00 everywhere) | Stage 06 not written |
data/staging_otivm.sqlite3
Identical to production as of last session. Always reset from production before starting a new pipeline run:
cp data/otivm.sqlite3 data/staging_otivm.sqlite3
3. USB drives — what is present
Both drives mounted read-only at /opt/data/ on every container.
Full inventory in data/tessera_usb_inventory.txt.
Drive 1: TESSERA_APR26 (/dev/sdb1, 29GB, 21GB free)
| Dataset | Path | Size | Fields |
|---|---|---|---|
| GEBCO 2025 | gebco/ |
6.8GB | elev_cm |
| HydroSHEDS v1.1 | hydrosheds/ |
240MB | hydro |
| USGS MRDS | mrds/mrds.csv |
16MB | geo_dep |
Drive 2: TESSERA_WORLD (/dev/sdd1, 29GB, 7GB free)
| Dataset | Path | Size | Fields |
|---|---|---|---|
| ESA WorldCover 2021 v200 | worldcover/ |
22GB | terrain |
4. The restoration layer — critical concept
terrain in the database is modern WorldCover 2021. It is wrong
for historical periods.
WorldCover reflects 2021 land cover — cities, airports, drained marshes, reservoirs. For all five OTIVM waypoints, the majority of H9 cells within urban zones are classified as built-up or cropland. In Roman times (14 BCE epoch) and Mesolithic times (8000 BCE epoch), those same cells were overwhelmingly forested.
The Mediterranean basin was 60–70% forested in both periods. Today it is not.
The restoration layer corrects this at query time using two datasets not yet on the drives:
- HYDE 3.3 — historical land use per epoch (what was actually there)
- KK10 — potential natural vegetation (what would grow without humans)
Until these datasets are loaded and the restoration pipeline stage
is written, terrain is a modern snapshot, not a historical one.
The game development assistant has been informed. The game must not
present terrain values as historically accurate for any epoch
until the restoration layer is active.
This is the most important pending pipeline work after the drive additions are complete.
5. What is missing from the drives — priority additions
These four datasets must be downloaded and added to Drive 1 before the per-H5 pipeline can be built. Total: ~5.2GB, fits in 21GB free.
| Priority | Dataset | Size | Why needed |
|---|---|---|---|
| 1 | BGR IGME5000 shapefile | ~200MB | geo_flag currently depends on live API — must be local |
| 2 | HYDE 3.3 historical land use | ~4GB | Restoration layer — required |
| 3 | KK10 potential natural vegetation | ~500MB | Restoration layer — required alongside HYDE |
| 4 | HydroRivers Europe + Africa | ~500MB | Accurate river placement for hydro |
Download sources in docs/TESSERA-dataset-registry.md.
Drives are read-only when mounted. To add data:
- Unmount from Proxmox host
- Remount read-write on a machine with ext4 write access
- Copy data
- Remount read-only
- Verify with inventory check before proceeding
Do not begin pipeline design until all four additions are on Drive 1.
6. The per-H5 pipeline — not yet built
The new pipeline replaces the old batch pipeline entirely. Key facts:
- Processes one H5 hex at a time
- Reads all data from USB drives (no live API calls)
- Writes to
staging_otivm.sqlite3only - Follows RFC-TESSERA-4.0-001 pipeline contract: draft → validate → promote → copy to production
- Manually triggered with project owner approval
- Supersede support built in — can update existing H5 rows when a source dataset improves
Read strategy — mandatory
Always crop raster to H5 bounding box before sampling. Load the crop into a numpy array in RAM. Sample all 2401 H9 centroids from the array. Never seek 2401 individual points from USB.
Without this: GEBCO reads at ~25s per H5 (USB random seek speed). With this: GEBCO reads at ~1-2s per H5 (one sequential crop + RAM).
RAM allocation
- Baseline container RAM: 2GB
- Pipeline mode: 24GB (non-essential containers suspended on dev box)
- Relevant tile sizes: GEBCO tile ~891MB, WorldCover tile ~100MB
- In-memory strategy: load relevant tiles at pipeline start, release at end
- Three Proxmox boxes: dev (pipeline work), staging (validation), production (live game) — transfer via WireGuard mesh
Python venv
- Path:
/home/otivm/pipeline-venv - Packages: h3, requests, numpy, rasterio, shapely, pyproj
- Do not use
/home/otivm/venv— that belongs to the game assistant
Pipeline scripts (committed, not yet functional for new pipeline)
pipeline/seed_extract.py— old Dell-based extractor, do not re-runpipeline/seed_promote.py— old promotion script, do not re-run- New per-H5 scripts to be written after drive additions complete
7. Infrastructure
OTIVM container (CT 1105, proliant-dev, 10.0.0.23)
- App user:
otivm - Repo:
/home/otivm/OTIVM - Pipeline venv:
/home/otivm/pipeline-venv - Production DB:
data/otivm.sqlite3 - Staging DB:
data/staging_otivm.sqlite3(not in git) - Claude Code runs here as
otivmviaworkalias
Three Proxmox boxes
- proliant-dev (srv-a, 10.0.0.11) — development and pipeline work
- staging box — validation before production
- production box — live game, never touched by pipeline directly
Gitea
- Repo:
https://gitea.barternetwork.us/TheRON/OTIVM - Branch:
main - MCP:
mcp.civicus.us— read any file directly from Claude chat
8. Hard rules
- Never write to
data/otivm.sqlite3directly — always via staging - Never commit
*.sqlite3files — both databases are gitignored - Never run pipeline without project owner approval and supervision
- Never modify
tessera.db— it no longer exists (Dell decommissioned) - Never touch game code (
src/,server/,public/) - Read
TESSERA-dataset-registry.mdbefore evaluating any new source - One file at a time. One confirmation before proceeding.
- Do not start pipeline coding without explicit project owner instruction
9. Pending work — in order
- Drive additions — project owner downloads and mounts four datasets
- Pipeline architecture document — design before any code
- Per-H5 pipeline scripts — one file at a time, supervised
- Restoration layer — HYDE + KK10 integration into terrain field
- Stage 06 (occ_flag) — archaeological sources, deferred until simulation track begins
Handover 2026-04-27 — dataset assistant track Database seeded, paleo_epochs added, drives inventoried. Pipeline not yet built. Drive additions required first. The restoration layer is the most important pending concept.