TheRON/otivm

Fork 0

Files

otivm e92e1bf980 Add dataset assistant handover document

2026-04-27 09:22:42 +00:00

8.7 KiB

Raw Permalink Blame History

Handover — TESSERA Dataset Assistant

Date: 2026-04-27

For: Incoming dataset assistant

Read this completely before doing anything

0. Your role

You are the dataset assistant. You own the pipeline that populates data/otivm.sqlite3 with physical-world data from the USB drives. You do not touch game code, frontend, backend, or PM2.

The game development assistant works in parallel. They own src/, server/, and everything the player sees. You own:

pipeline/ — all extraction and promotion scripts
data/create_otivm_db.sql — the schema source of truth
data/staging_otivm.sqlite3 — your working database (never in git)
docs/ — dataset and pipeline documentation

You do not write to data/otivm.sqlite3 directly. You write to data/staging_otivm.sqlite3, verify, then copy to production on explicit project owner approval.

1. Read these files before doing anything

In order:

CLAUDE.md — workflow, three-shell model, ground rules
docs/TESSERA-dataset-registry.md — every dataset evaluated, triage decisions, drive inventory, what is on drives and what is not
docs/RFC-TESSERA-4.0-001.md — the database schema contract
docs/RFC-TESSERA-3.0-PALEO-001.md — paleo epoch table spec
docs/TESSERA-pipeline-registry.md — history of the old batch pipeline, what completed, what failed, and why
This file

2. Current database state — as of 2026-04-27

`data/otivm.sqlite3` — production

12,005 H9 rows across five waypoints, all status=2 (current)
All H5s at status=2 in h5_coverage
paleo_epochs table populated with 9 epochs per RFC-TESSERA-3.0-PALEO-001
H3 IDs stored as INTEGER (64-bit)

Five waypoints

City	H5 TEXT	H9 cells
Ostia	`851e805bfffffff`	2401
Capua	`851e8333fffffff`	2401
Brundisium	`851e8ba3fffffff`	2401
Carthago	`85386e23fffffff`	2401
Alexandria	`853f5ba7fffffff`	2401

Field status

Field	Status	Source
`elev_cm`	✅ Current	GEBCO 2025
`terrain`	✅ Current (modern only — see Section 4)	ESA WorldCover 2021
`hydro`	✅ Current	HydroSHEDS v1.1
`geo_dep`	✅ Current	USGS MRDS
`geo_flag`	✅ Current	BGR IGME5000
`occ_flag`	⚠ Placeholder (0x00 everywhere)	Stage 06 not written

`data/staging_otivm.sqlite3`

Identical to production as of last session. Always reset from production before starting a new pipeline run:

cp data/otivm.sqlite3 data/staging_otivm.sqlite3

3. USB drives — what is present

Both drives mounted read-only at /opt/data/ on every container. Full inventory in data/tessera_usb_inventory.txt.

Drive 1: TESSERA_APR26 (/dev/sdb1, 29GB, 21GB free)

Dataset	Path	Size	Fields
GEBCO 2025	`gebco/`	6.8GB	`elev_cm`
HydroSHEDS v1.1	`hydrosheds/`	240MB	`hydro`
USGS MRDS	`mrds/mrds.csv`	16MB	`geo_dep`

Drive 2: TESSERA_WORLD (/dev/sdd1, 29GB, 7GB free)

Dataset	Path	Size	Fields
ESA WorldCover 2021 v200	`worldcover/`	22GB	`terrain`

4. The restoration layer — critical concept

terrain in the database is modern WorldCover 2021. It is wrong for historical periods.

WorldCover reflects 2021 land cover — cities, airports, drained marshes, reservoirs. For all five OTIVM waypoints, the majority of H9 cells within urban zones are classified as built-up or cropland. In Roman times (14 BCE epoch) and Mesolithic times (8000 BCE epoch), those same cells were overwhelmingly forested.

The Mediterranean basin was 60–70% forested in both periods. Today it is not.

The restoration layer corrects this at query time using two datasets not yet on the drives:

HYDE 3.3 — historical land use per epoch (what was actually there)
KK10 — potential natural vegetation (what would grow without humans)

Until these datasets are loaded and the restoration pipeline stage is written, terrain is a modern snapshot, not a historical one. The game development assistant has been informed. The game must not present terrain values as historically accurate for any epoch until the restoration layer is active.

This is the most important pending pipeline work after the drive additions are complete.

5. What is missing from the drives — priority additions

These four datasets must be downloaded and added to Drive 1 before the per-H5 pipeline can be built. Total: ~5.2GB, fits in 21GB free.

Priority	Dataset	Size	Why needed
1	BGR IGME5000 shapefile	~200MB	`geo_flag` currently depends on live API — must be local
2	HYDE 3.3 historical land use	~4GB	Restoration layer — required
3	KK10 potential natural vegetation	~500MB	Restoration layer — required alongside HYDE
4	HydroRivers Europe + Africa	~500MB	Accurate river placement for `hydro`

Download sources in docs/TESSERA-dataset-registry.md.

Drives are read-only when mounted. To add data:

Unmount from Proxmox host
Remount read-write on a machine with ext4 write access
Copy data
Remount read-only
Verify with inventory check before proceeding

Do not begin pipeline design until all four additions are on Drive 1.

6. The per-H5 pipeline — not yet built

The new pipeline replaces the old batch pipeline entirely. Key facts:

Processes one H5 hex at a time
Reads all data from USB drives (no live API calls)
Writes to staging_otivm.sqlite3 only
Follows RFC-TESSERA-4.0-001 pipeline contract: draft → validate → promote → copy to production
Manually triggered with project owner approval
Supersede support built in — can update existing H5 rows when a source dataset improves

Read strategy — mandatory

Always crop raster to H5 bounding box before sampling. Load the crop into a numpy array in RAM. Sample all 2401 H9 centroids from the array. Never seek 2401 individual points from USB.

Without this: GEBCO reads at ~25s per H5 (USB random seek speed). With this: GEBCO reads at ~1-2s per H5 (one sequential crop + RAM).

RAM allocation

Baseline container RAM: 2GB
Pipeline mode: 24GB (non-essential containers suspended on dev box)
Relevant tile sizes: GEBCO tile ~891MB, WorldCover tile ~100MB
In-memory strategy: load relevant tiles at pipeline start, release at end
Three Proxmox boxes: dev (pipeline work), staging (validation), production (live game) — transfer via WireGuard mesh

Python venv

Path: /home/otivm/pipeline-venv
Packages: h3, requests, numpy, rasterio, shapely, pyproj
Do not use /home/otivm/venv — that belongs to the game assistant

Pipeline scripts (committed, not yet functional for new pipeline)

pipeline/seed_extract.py — old Dell-based extractor, do not re-run
pipeline/seed_promote.py — old promotion script, do not re-run
New per-H5 scripts to be written after drive additions complete

7. Infrastructure

OTIVM container (CT 1105, proliant-dev, 10.0.0.23)

App user: otivm
Repo: /home/otivm/OTIVM
Pipeline venv: /home/otivm/pipeline-venv
Production DB: data/otivm.sqlite3
Staging DB: data/staging_otivm.sqlite3 (not in git)
Claude Code runs here as otivm via work alias

Three Proxmox boxes

proliant-dev (srv-a, 10.0.0.11) — development and pipeline work
staging box — validation before production
production box — live game, never touched by pipeline directly

Gitea

Repo: https://gitea.barternetwork.us/TheRON/OTIVM
Branch: main
MCP: mcp.civicus.us — read any file directly from Claude chat

8. Hard rules

Never write to data/otivm.sqlite3 directly — always via staging
Never commit *.sqlite3 files — both databases are gitignored
Never run pipeline without project owner approval and supervision
Never modify tessera.db — it no longer exists (Dell decommissioned)
Never touch game code (src/, server/, public/)
Read TESSERA-dataset-registry.md before evaluating any new source
One file at a time. One confirmation before proceeding.
Do not start pipeline coding without explicit project owner instruction

9. Pending work — in order

Drive additions — project owner downloads and mounts four datasets
Pipeline architecture document — design before any code
Per-H5 pipeline scripts — one file at a time, supervised
Restoration layer — HYDE + KK10 integration into terrain field
Stage 06 (occ_flag) — archaeological sources, deferred until simulation track begins

Handover 2026-04-27 — dataset assistant track Database seeded, paleo_epochs added, drives inventoried. Pipeline not yet built. Drive additions required first. The restoration layer is the most important pending concept.

8.7 KiB Raw Permalink Blame History Unescape Escape