init: CIVICVS repository — CBPs, corpus standard, directory structure

- README.md: project identity, TESSERA relationship, directory layout - CIVICVS-CBPs.md: CBP-001 through CBP-006 adapted for CIVICVS - docs/corpus/mesolithic-corpus-standard-v1.md: 10-table schema, 6-sprint plan, 25 seed concepts Per CBP-001: committed same session as produced.
2026-04-18 05:29:09 +00:00
commit 34316d2429
3 changed files with 663 additions and 0 deletions
--- a/CIVICVS-CBPs.md
+++ b/CIVICVS-CBPs.md
@@ -0,0 +1,145 @@
+# CIVICVS — Critical Baseline Protocols
+### Status: Normative
+### Date: 2026-04-18
+### Adapted from: TESSERA v3.0 CBPs (ssot/docs/v3/README.md)
+
+---
+
+These are not guidelines. They are not best practices. They are the
+conditions under which this project operates. Deviation is not
+permitted. If a CBP cannot be followed, work stops until it can be.
+
+CIVICVS inherits the same process failure mode that cost TESSERA v2.0
+months of work: documents produced in chat sessions, never committed,
+permanently lost. These CBPs exist to prevent that failure from
+recurring in CIVICVS.
+
+---
+
+## CBP-001 — Every document is committed before the session ends
+
+Any document produced in a session — schema, governance doc,
+architecture decision, corpus entry, session log — is committed to
+this repository before the session ends. Not summarised. Not noted
+for later. Committed.
+
+A session that produces a document and does not commit it has produced
+nothing. The chat log is not a repository. It is not durable. It is
+not citable. It does not exist.
+
+**Enforcement:** The last action of every session is to verify that
+every document produced in that session exists in the repository.
+If it does not, the session is not over.
+
+---
+
+## CBP-002 — Every session produces a session log
+
+A session log is a raw, detailed record of what happened. It includes:
+- Every decision made, with the reasoning
+- Every failure encountered, with the exact error
+- Every workaround discovered, with the exact command
+- Every assumption that proved wrong
+- Every benchmark measured, with the actual number
+
+Session logs are not polished. They are not summaries. They are the
+unfiltered record. Future contributors — human or AI — must be able
+to reconstruct exactly what was tried, what worked, and what did not,
+without repeating the same experiments.
+
+**Enforcement:** A session log is committed before any other end-of-
+session work. It is the first commit, not the last.
+
+---
+
+## CBP-003 — Infrastructure is tested before it is designed around
+
+No component — Saltcorn, ChromaDB, Ollama, any API — is designed
+around an assumption about what it can do. The assumption is tested
+first with a minimal real operation. Only then is the design built.
+
+If a data source, service, or tool is investigated and found
+unsuitable, that finding is documented. Failures are permanent
+knowledge. Do not let the next session repeat the same investigation.
+
+**Enforcement:** Before any new service is incorporated into CIVICVS
+design, its actual behaviour, actual data format, and actual
+limitations are documented with the date of investigation.
+
+---
+
+## CBP-004 — The file transfer protocol is followed without exception
+
+Every file that reaches a server node travels this exact path:
+
+```
+Claude produces tarball → user downloads → user uploads to /tmp/ on target node → commands run there
+```
+
+There is no other path. There is no heredoc injection. There is no
+"write it directly." There is no assuming a file is present because
+it was produced in a previous step.
+
+If a command references a file that is not confirmed present in /tmp/
+on the target node, the command is not run.
+
+**Enforcement:** Every deploy sequence begins with confirming the
+tarball is present at /tmp/ on the target node before any git or
+extract command runs.
+
+---
+
+## CBP-005 — The Mesolithic Corpus Standard is the source of truth for corpus work
+
+When the corpus schema says one thing and a corpus entry does another,
+the entry is wrong until proven otherwise. If the schema is wrong,
+it is corrected immediately and the correction is committed with an
+inline note explaining what was wrong and when it was fixed.
+
+The corpus standard is at `docs/corpus/mesolithic-corpus-standard-v1.md`.
+It is normative. All corpus entries, Saltcorn tables, and views must
+conform to it. Deviations are not tolerated silently — they are
+either corrected or the standard is updated with explicit reasoning.
+
+**Enforcement:** Before any corpus sprint begins, the corpus standard
+and the current Saltcorn table schema are read together and confirmed
+consistent.
+
+---
+
+## CBP-006 — The handover is written for the next assistant, not for posterity
+
+A handover document is not a summary of what was accomplished. It is
+an operational briefing for an assistant who has no prior context and
+must be able to continue the work without asking clarifying questions
+about project state.
+
+A handover must state:
+- What is currently running and its status
+- What is pending and why
+- What is broken and what the exact error is
+- The first task for the next session, unambiguously
+
+A handover that requires the recipient to make assumptions is
+incomplete.
+
+**Enforcement:** The handover is tested by asking: "What is the first
+thing to do?" If the answer is uncertain, the handover is rewritten.
+
+---
+
+## What CIVICVS CBPs do not cover
+
+These CBPs govern session continuity and commit discipline. They do
+not cover:
+
+- The corpus schema (see `docs/corpus/mesolithic-corpus-standard-v1.md`)
+- Infrastructure decisions (see TESSERA `ssot/docs/v3/infrastructure.md`)
+- The simulation RFC stack (to be created)
+- The TESSERA data model (see TESSERA RFC stack)
+
+---
+
+*CIVICVS-CBPs.md — 2026-04-18*
+*Status: Normative*
+*The process is the project.*
--- a/README.md
+++ b/README.md
@@ -0,0 +1,52 @@
+# CIVICVS
+
+Mesolithic narrative simulator built on TESSERA spatial data.
+Set in approximately 8000 BCE, Spree-Havel river valley, Berlin.
+
+---
+
+## Relationship to TESSERA
+
+CIVICVS is a separate project from TESSERA. They share infrastructure
+and the TESSERA SpatiaLite database is the spatial ground truth for
+CIVICVS, but they have separate repositories, separate RFC stacks,
+and separate failure modes. Do not confuse them.
+
+TESSERA repository: `https://gitea.barternetwork.us/TheRON/tesserav3`
+CIVICVS repository:  `https://gitea.barternetwork.us/TheRON/civicvs`
+
+---
+
+## Read before doing anything
+
+1. `CIVICVS-CBPs.md` — session continuity and commit discipline. Non-negotiable.
+2. `docs/corpus/mesolithic-corpus-standard-v1.md` — corpus schema and workflow.
+3. `repo/docs/sessions/` — most recent session log first.
+
+---
+
+## Repository layout
+
+```
+docs/
+  corpus/
+    mesolithic-corpus-standard-v1.md   Corpus schema, sprint plan, seed concepts
+  decisions/                           Architecture decision records
+repo/
+  docs/
+    sessions/                          Session logs — raw, committed same day
+  pipeline/
+    scripts/                           Pipeline scripts
+```
+
+---
+
+## Active branch: dev
+
+Direct push to `dev` allowed.
+`main` and `staging` are protected — PR only.
+
+---
+
+*CIVICVS — founded 2026-04-18*
+*The process is the project.*
--- a/docs/corpus/mesolithic-corpus-standard-v1.md
+++ b/docs/corpus/mesolithic-corpus-standard-v1.md
@@ -0,0 +1,466 @@
+# Mesolithic Corpus Standard
+### Version: 1.0
+### Status: Normative
+### Date: 2026-04-13
+### Author: Claude Sonnet 4.6, approved by project owner
+
+---
+
+## 1. Mission and scope
+
+Build a defensible Mesolithic Thesaurus, Vocabulary, and Dictionary in
+Saltcorn to support controlled corpus generation for a language model
+grounded in prehistoric lifeways.
+
+### 1.1 Core outputs
+
+| Output | Purpose |
+|---|---|
+| Thesaurus | Meaning relationships — domains, concepts, scales, frames |
+| Vocabulary | Approved lexical forms per concept |
+| Dictionary | Human-readable entries combining concept + vocabulary |
+| Ground truth corpus | Stable causal relations for model training |
+| Simulation triage corpus | Decision and priority patterns for model training |
+
+### 1.2 Constraints
+
+- No modern units or modern-only categories in any generated language
+- Meaning-first design — surface forms are secondary to semantic structure
+- Culture-aware context — concepts tagged to applicable culture horizons
+- UI-first workflow — table → view → page → data, without exception
+- Constraint enforcement is editorial, not schema-enforced. A future
+  model analysis pass will check the corpus for violations. No
+  constraint tables in this schema.
+
+### 1.3 Initial focus
+
+Maglemosian / Nerava northern wetland context. All four culture horizons
+are represented in the schema but Maglemosian is populated first.
+
+### 1.4 Out of scope
+
+- Game systems and full simulation engines
+- Speculative conlang reconstruction
+- Broad ontology sprawl
+- Academic citation management
+- Constraint enforcement tables (deferred to model analysis)
+
+---
+
+## 2. Schema
+
+Ten tables. No table is added without a proven workflow need.
+
+### 2.1 `domain`
+
+The semantic domain hierarchy. Domains are self-referential — a domain
+can have a parent domain.
+
+| Field | Type | Notes |
+|---|---|---|
+| id | integer | Primary key |
+| label | text | Human-readable name (e.g. "Weather", "Wetness") |
+| parent_id | integer | References `domain.id` — null for top-level domains |
+
+**Seed domains (in priority order):**
+Weather, Wetness, Fire, Shelter, Water travel, Hunting, Fishing,
+Injury, Storage, Terrain, Time cycles, Social roles.
+
+---
+
+### 2.2 `culture`
+
+The four target Mesolithic culture horizons. Lookup table — values are
+fixed and do not grow without explicit decision.
+
+| Field | Type | Notes |
+|---|---|---|
+| id | integer | Primary key |
+| label | text | Culture name |
+| ecology_note | text | Brief ecological context |
+| date_range_note | text | Approximate date range |
+
+**Fixed values:**
+
+| Label | Ecology | Date range |
+|---|---|---|
+| Maglemosian | Northern lake/peatland, open woodland | ~9500–6000 BCE |
+| Ertebølle | Coastal, lagoonal, shell midden | ~5400–3900 BCE |
+| Sauveterrian | Western Mediterranean upland/lowland | ~9000–6000 BCE |
+| Azilian | Franco-Cantabrian cave/rock-shelter | ~12000–9000 BCE |
+
+---
+
+### 2.3 `concept`
+
+The core meaning nodes of the thesaurus. Each concept belongs to a
+domain and carries an evidence grade.
+
+| Field | Type | Notes |
+|---|---|---|
+| id | integer | Primary key |
+| domain_id | integer | References `domain.id` |
+| label | text | Concept identifier (e.g. "wet", "ember", "crossing") |
+| definition | text | Plain language definition, measurement-free |
+| evidence_grade | enum | `direct` / `analogue` / `inferred` |
+| notes | text | Optional authoring notes |
+
+**Evidence grade values:**
+- `direct` — concept is directly supported by archaeological record
+- `analogue` — concept is supported by ethnographic analogue
+- `inferred` — concept follows from physical or ecological inference
+
+Culture applicability is stored in `concept_culture`, not here.
+
+---
+
+### 2.4 `concept_culture`
+
+Join table linking concepts to applicable culture horizons. A concept
+with no rows here applies to all cultures.
+
+| Field | Type | Notes |
+|---|---|---|
+| id | integer | Primary key |
+| concept_id | integer | References `concept.id` |
+| culture_id | integer | References `culture.id` |
+| context_note | text | Optional note on culture-specific usage |
+
+---
+
+### 2.5 `scale`
+
+A gradient dimension associated with a concept. A concept may have
+multiple scales (e.g. "wetness" has a dryness scale and a weight scale).
+
+| Field | Type | Notes |
+|---|---|---|
+| id | integer | Primary key |
+| concept_id | integer | References `concept.id` |
+| label | text | Scale name (e.g. "dryness", "ice safety") |
+
+---
+
+### 2.6 `scale_step`
+
+Ordered steps within a scale. Steps are ordered by rank and may
+reference an antonym step.
+
+| Field | Type | Notes |
+|---|---|---|
+| id | integer | Primary key |
+| scale_id | integer | References `scale.id` |
+| rank | integer | Ordering — lower = one end of spectrum |
+| label | text | Step label (e.g. "dry", "damp", "soaked") |
+| antonym_step_id | integer | References another `scale_step.id` — optional |
+| is_danger_threshold | boolean | Marks steps that represent hazard onset |
+| notes | text | Optional authoring notes |
+
+**Example — wetness scale:**
+
+| Rank | Label | Danger threshold |
+|---|---|---|
+| 1 | dry | No |
+| 2 | damp | No |
+| 3 | wet | No |
+| 4 | soaked | Yes |
+
+---
+
+### 2.7 `frame`
+
+An action frame associated with a concept. Stores the typical roles
+(actor, patient, tool, place) for actions involving this concept.
+One frame per concept is the norm; complex concepts may have more.
+
+| Field | Type | Notes |
+|---|---|---|
+| id | integer | Primary key |
+| concept_id | integer | References `concept.id` |
+| label | text | Frame name (e.g. "drying hides", "crossing river") |
+| actor | text | Who performs the action |
+| patient | text | What is acted upon |
+| tool | text | What instrument is used |
+| place | text | Where the action occurs |
+| notes | text | Optional authoring notes |
+
+---
+
+### 2.8 `vocabulary_item`
+
+Approved lexical forms for a concept. A concept may have multiple
+vocabulary items — one preferred, others allowed alternates.
+
+| Field | Type | Notes |
+|---|---|---|
+| id | integer | Primary key |
+| concept_id | integer | References `concept.id` |
+| term | text | The surface form (e.g. "wet", "soaked", "waterlogged") |
+| preferred | boolean | True for the primary term |
+| register | text | Usage register (e.g. "narrative", "triage", "both") |
+| status | enum | `approved` / `deprecated` / `restricted` |
+| notes | text | Optional governance notes |
+
+**Status values:**
+- `approved` — use freely
+- `deprecated` — do not use in new corpus items; kept for historical record
+- `restricted` — use only in specified contexts (noted in `notes`)
+
+---
+
+### 2.9 `corpus_item`
+
+A single ground truth or triage corpus item. Ground truth items teach
+stable causal relations. Triage items teach decisions and priorities.
+
+| Field | Type | Notes |
+|---|---|---|
+| id | integer | Primary key |
+| corpus_type | enum | `ground_truth` / `triage` |
+| culture_id | integer | References `culture.id` — null means all cultures |
+| text | text | The corpus statement (ground truth) or scenario (triage) |
+| confidence | enum | `high` / `medium` / `low` |
+| approved | boolean | True when reviewed and approved for training use |
+| notes | text | Optional authoring notes |
+
+**Ground truth example:**
+```
+corpus_type: ground_truth
+text: "Fire dries wet hides."
+confidence: high
+approved: true
+```
+
+**Triage example:**
+```
+corpus_type: triage
+text: "Hunter returns with deep leg wound and cannot walk unassisted."
+confidence: high
+approved: true
+```
+
+Triage options are stored in `triage_option`.
+
+---
+
+### 2.10 `corpus_concept`
+
+Join table linking corpus items to the concepts they involve. Enables
+completeness checks and concept-driven corpus browsing.
+
+| Field | Type | Notes |
+|---|---|---|
+| id | integer | Primary key |
+| corpus_item_id | integer | References `corpus_item.id` |
+| concept_id | integer | References `concept.id` |
+| role_note | text | Optional note on how concept appears in this item |
+
+---
+
+### 2.11 `triage_option`
+
+Structured options for triage corpus items. Each triage item has 2-4
+options, exactly one marked as preferred.
+
+| Field | Type | Notes |
+|---|---|---|
+| id | integer | Primary key |
+| corpus_item_id | integer | References `corpus_item.id` |
+| option_text | text | Description of this option |
+| is_preferred | boolean | True for the recommended action |
+| reason | text | Why this option is preferred or not |
+| rank | integer | Display order |
+
+**Example — triage options for wounded hunter scenario:**
+
+| Option | Preferred | Reason |
+|---|---|---|
+| Carry hunter back immediately | Yes | Wound is deep, cannot walk, delay increases risk |
+| Continue hunt, send one person back | No | Splits group, leaves hunter without full support |
+| Make camp here and rest | No | Wound needs shelter and fire, not open ground |
+
+---
+
+## 3. Workflow rule
+
+Every table follows this delivery sequence without exception:
+
+```
+1. Table     — created in Saltcorn
+2. View      — at minimum a list view and a detail view
+3. Page      — at minimum one usable entry/edit page
+4. Data      — production records entered only via pages, never raw grids
+```
+
+**Rules:**
+- No production records entered in raw table grids
+- Every new table ships with at least one usable page before data entry begins
+- Build vertically, not horizontally — one complete table/view/page/data
+  cycle before starting the next table
+
+---
+
+## 4. Sprint plan
+
+Sprints are ordered by dependency. Do not start a sprint until the
+previous sprint's data entry phase is complete and verified.
+
+### Sprint 1 — Foundation
+Tables: `domain`, `culture`
+Data: 12 seed domains, 4 culture records
+Deliverable: domain browser page, culture lookup page
+
+### Sprint 2 — Core concepts
+Tables: `concept`, `concept_culture`
+Data: 25 seed concepts from DOC-006, tagged to Maglemosian
+Deliverable: concept editor page with domain and culture assignment
+
+### Sprint 3 — Scales
+Tables: `scale`, `scale_step`
+Data: scales for wetness, fire state, ice safety, injury severity
+Deliverable: scale builder page with ordered steps
+
+### Sprint 4 — Frames
+Table: `frame`
+Data: frames for key action concepts (drying, crossing, fishing, triage)
+Deliverable: frame editor page
+
+### Sprint 5 — Vocabulary
+Table: `vocabulary_item`
+Data: preferred terms for all 25 seed concepts
+Deliverable: vocabulary editor with preferred/alternate/deprecated status
+
+### Sprint 6 — Corpus
+Tables: `corpus_item`, `corpus_concept`, `triage_option`
+Data: first 20 ground truth items, first 10 triage items
+Deliverable: corpus entry page, triage option builder, concept linkage
+
+---
+
+## 5. Seed concepts — Sprint 2 data
+
+From DOC-006. All tagged Maglemosian initially.
+
+| Concept | Domain | Evidence grade |
+|---|---|---|
+| wet | Wetness | direct |
+| dry | Wetness | direct |
+| damp | Wetness | direct |
+| soaked | Wetness | inferred |
+| fire | Fire | direct |
+| ember | Fire | direct |
+| smoke | Fire | direct |
+| shelter | Shelter | direct |
+| hide | Shelter | direct |
+| bark | Shelter | direct |
+| marsh | Terrain | direct |
+| reed | Terrain | direct |
+| path | Terrain | inferred |
+| river | Water travel | direct |
+| crossing | Water travel | inferred |
+| fish | Fishing | direct |
+| trap | Fishing | direct |
+| spear | Hunting | direct |
+| wound | Injury | direct |
+| limp | Injury | inferred |
+| carry | Injury | inferred |
+| dawn | Time cycles | inferred |
+| dusk | Time cycles | inferred |
+| elder | Social roles | analogue |
+| child | Social roles | analogue |
+
+---
+
+## 6. Corpus specification
+
+### 6.1 Ground truth corpus
+
+Teaches stable causal relations. Statements must be:
+- Present tense, declarative
+- Measurement-free
+- Culturally plausible for the tagged culture
+- Linked to at least one concept via `corpus_concept`
+
+**Field summary:**
+- `text` — the causal statement
+- `culture_id` — null for universal statements
+- `confidence` — high/medium/low
+- `approved` — reviewed and ready for training
+
+**Examples:**
+- Fire dries wet hides.
+- Rain softens paths.
+- Smoke drives insects away.
+- Wet wood makes reluctant fire.
+- Soaked bark floor cannot be slept on dry.
+- Rising water warns of flood.
+
+### 6.2 Simulation triage corpus
+
+Teaches decisions and priorities under constraint. Each item must have
+2-4 structured options via `triage_option`, exactly one marked preferred.
+
+**Field summary:**
+- `text` — the scenario description
+- `culture_id` — null for universal scenarios
+- `confidence` — high/medium/low
+- `approved` — reviewed and ready for training
+
+**Triage option fields:**
+- `option_text` — what this choice involves
+- `is_preferred` — the recommended action
+- `reason` — why preferred or not preferred
+- `rank` — display order
+
+**Examples:**
+- Wounded hunter cannot walk. (carry first vs continue hunt vs make camp)
+- Fire goes out in heavy rain. (seek dry tinder vs use ember from shelter vs wait)
+- Path floods at crossing. (find higher crossing vs wait vs wade)
+
+---
+
+## 7. Lexical governance
+
+### 7.1 Purpose
+
+Prevent semantic drift. Ensure vocabulary items remain measurement-free
+and culturally coherent across authors and sessions.
+
+### 7.2 Controls per vocabulary item
+
+| Control | Field | Notes |
+|---|---|---|
+| Preferred term | `preferred = true` | One per concept |
+| Allowed alternates | `status = approved, preferred = false` | Multiple allowed |
+| Deprecated terms | `status = deprecated` | Kept for record, not used in new corpus |
+| Restricted terms | `status = restricted` | Context specified in `notes` |
+
+### 7.3 Approval history
+
+Saltcorn's built-in record history tracks who changed what and when.
+No separate approval log table is needed at this stage.
+
+### 7.4 Constraint enforcement
+
+Modern units and modern-only categories are excluded by editorial
+discipline at authoring time. A future model analysis pass will scan
+the corpus for violations and flag them for review. No constraint
+tables are maintained in this schema version.
+
+---
+
+## 8. What this does not decide
+
+- The language model architecture or training pipeline
+- How corpus items are exported to training format
+- Whether vocabulary items are used as literal tokens or as semantic
+  seeds for generation
+- The multi-clan expansion beyond Maglemosian
+- The integration between this corpus and the TESSERA spatial data layer
+- Constraint enforcement implementation (deferred to model analysis pass)
+
+---
+
+*Mesolithic Corpus Standard v1.0 — 2026-04-13*
+*Status: Normative*
+*Next review: after Sprint 2 data entry is complete*