From 34316d2429bbbbcf46f589e9aba858a40615f493 Mon Sep 17 00:00:00 2001 From: TheRON Date: Sat, 18 Apr 2026 05:29:09 +0000 Subject: [PATCH] =?UTF-8?q?init:=20CIVICVS=20repository=20=E2=80=94=20CBPs?= =?UTF-8?q?,=20corpus=20standard,=20directory=20structure=20-=20README.md:?= =?UTF-8?q?=20project=20identity,=20TESSERA=20relationship,=20directory=20?= =?UTF-8?q?layout=20-=20CIVICVS-CBPs.md:=20CBP-001=20through=20CBP-006=20a?= =?UTF-8?q?dapted=20for=20CIVICVS=20-=20docs/corpus/mesolithic-corpus-stan?= =?UTF-8?q?dard-v1.md:=2010-table=20schema,=206-sprint=20plan,=2025=20seed?= =?UTF-8?q?=20concepts=20Per=20CBP-001:=20committed=20same=20session=20as?= =?UTF-8?q?=20produced.?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- CIVICVS-CBPs.md | 145 ++++++ README.md | 52 +++ docs/corpus/mesolithic-corpus-standard-v1.md | 466 +++++++++++++++++++ 3 files changed, 663 insertions(+) create mode 100644 CIVICVS-CBPs.md create mode 100644 README.md create mode 100644 docs/corpus/mesolithic-corpus-standard-v1.md diff --git a/CIVICVS-CBPs.md b/CIVICVS-CBPs.md new file mode 100644 index 0000000..c41b16f --- /dev/null +++ b/CIVICVS-CBPs.md @@ -0,0 +1,145 @@ +# CIVICVS — Critical Baseline Protocols +### Status: Normative +### Date: 2026-04-18 +### Adapted from: TESSERA v3.0 CBPs (ssot/docs/v3/README.md) + +--- + +These are not guidelines. They are not best practices. They are the +conditions under which this project operates. Deviation is not +permitted. If a CBP cannot be followed, work stops until it can be. + +CIVICVS inherits the same process failure mode that cost TESSERA v2.0 +months of work: documents produced in chat sessions, never committed, +permanently lost. These CBPs exist to prevent that failure from +recurring in CIVICVS. + +--- + +## CBP-001 — Every document is committed before the session ends + +Any document produced in a session — schema, governance doc, +architecture decision, corpus entry, session log — is committed to +this repository before the session ends. Not summarised. Not noted +for later. Committed. + +A session that produces a document and does not commit it has produced +nothing. The chat log is not a repository. It is not durable. It is +not citable. It does not exist. + +**Enforcement:** The last action of every session is to verify that +every document produced in that session exists in the repository. +If it does not, the session is not over. + +--- + +## CBP-002 — Every session produces a session log + +A session log is a raw, detailed record of what happened. It includes: +- Every decision made, with the reasoning +- Every failure encountered, with the exact error +- Every workaround discovered, with the exact command +- Every assumption that proved wrong +- Every benchmark measured, with the actual number + +Session logs are not polished. They are not summaries. They are the +unfiltered record. Future contributors — human or AI — must be able +to reconstruct exactly what was tried, what worked, and what did not, +without repeating the same experiments. + +**Enforcement:** A session log is committed before any other end-of- +session work. It is the first commit, not the last. + +--- + +## CBP-003 — Infrastructure is tested before it is designed around + +No component — Saltcorn, ChromaDB, Ollama, any API — is designed +around an assumption about what it can do. The assumption is tested +first with a minimal real operation. Only then is the design built. + +If a data source, service, or tool is investigated and found +unsuitable, that finding is documented. Failures are permanent +knowledge. Do not let the next session repeat the same investigation. + +**Enforcement:** Before any new service is incorporated into CIVICVS +design, its actual behaviour, actual data format, and actual +limitations are documented with the date of investigation. + +--- + +## CBP-004 — The file transfer protocol is followed without exception + +Every file that reaches a server node travels this exact path: + +``` +Claude produces tarball → user downloads → user uploads to /tmp/ on target node → commands run there +``` + +There is no other path. There is no heredoc injection. There is no +"write it directly." There is no assuming a file is present because +it was produced in a previous step. + +If a command references a file that is not confirmed present in /tmp/ +on the target node, the command is not run. + +**Enforcement:** Every deploy sequence begins with confirming the +tarball is present at /tmp/ on the target node before any git or +extract command runs. + +--- + +## CBP-005 — The Mesolithic Corpus Standard is the source of truth for corpus work + +When the corpus schema says one thing and a corpus entry does another, +the entry is wrong until proven otherwise. If the schema is wrong, +it is corrected immediately and the correction is committed with an +inline note explaining what was wrong and when it was fixed. + +The corpus standard is at `docs/corpus/mesolithic-corpus-standard-v1.md`. +It is normative. All corpus entries, Saltcorn tables, and views must +conform to it. Deviations are not tolerated silently — they are +either corrected or the standard is updated with explicit reasoning. + +**Enforcement:** Before any corpus sprint begins, the corpus standard +and the current Saltcorn table schema are read together and confirmed +consistent. + +--- + +## CBP-006 — The handover is written for the next assistant, not for posterity + +A handover document is not a summary of what was accomplished. It is +an operational briefing for an assistant who has no prior context and +must be able to continue the work without asking clarifying questions +about project state. + +A handover must state: +- What is currently running and its status +- What is pending and why +- What is broken and what the exact error is +- The first task for the next session, unambiguously + +A handover that requires the recipient to make assumptions is +incomplete. + +**Enforcement:** The handover is tested by asking: "What is the first +thing to do?" If the answer is uncertain, the handover is rewritten. + +--- + +## What CIVICVS CBPs do not cover + +These CBPs govern session continuity and commit discipline. They do +not cover: + +- The corpus schema (see `docs/corpus/mesolithic-corpus-standard-v1.md`) +- Infrastructure decisions (see TESSERA `ssot/docs/v3/infrastructure.md`) +- The simulation RFC stack (to be created) +- The TESSERA data model (see TESSERA RFC stack) + +--- + +*CIVICVS-CBPs.md — 2026-04-18* +*Status: Normative* +*The process is the project.* diff --git a/README.md b/README.md new file mode 100644 index 0000000..0584fac --- /dev/null +++ b/README.md @@ -0,0 +1,52 @@ +# CIVICVS + +Mesolithic narrative simulator built on TESSERA spatial data. +Set in approximately 8000 BCE, Spree-Havel river valley, Berlin. + +--- + +## Relationship to TESSERA + +CIVICVS is a separate project from TESSERA. They share infrastructure +and the TESSERA SpatiaLite database is the spatial ground truth for +CIVICVS, but they have separate repositories, separate RFC stacks, +and separate failure modes. Do not confuse them. + +TESSERA repository: `https://gitea.barternetwork.us/TheRON/tesserav3` +CIVICVS repository: `https://gitea.barternetwork.us/TheRON/civicvs` + +--- + +## Read before doing anything + +1. `CIVICVS-CBPs.md` — session continuity and commit discipline. Non-negotiable. +2. `docs/corpus/mesolithic-corpus-standard-v1.md` — corpus schema and workflow. +3. `repo/docs/sessions/` — most recent session log first. + +--- + +## Repository layout + +``` +docs/ + corpus/ + mesolithic-corpus-standard-v1.md Corpus schema, sprint plan, seed concepts + decisions/ Architecture decision records +repo/ + docs/ + sessions/ Session logs — raw, committed same day + pipeline/ + scripts/ Pipeline scripts +``` + +--- + +## Active branch: dev + +Direct push to `dev` allowed. +`main` and `staging` are protected — PR only. + +--- + +*CIVICVS — founded 2026-04-18* +*The process is the project.* diff --git a/docs/corpus/mesolithic-corpus-standard-v1.md b/docs/corpus/mesolithic-corpus-standard-v1.md new file mode 100644 index 0000000..943da9f --- /dev/null +++ b/docs/corpus/mesolithic-corpus-standard-v1.md @@ -0,0 +1,466 @@ +# Mesolithic Corpus Standard +### Version: 1.0 +### Status: Normative +### Date: 2026-04-13 +### Author: Claude Sonnet 4.6, approved by project owner + +--- + +## 1. Mission and scope + +Build a defensible Mesolithic Thesaurus, Vocabulary, and Dictionary in +Saltcorn to support controlled corpus generation for a language model +grounded in prehistoric lifeways. + +### 1.1 Core outputs + +| Output | Purpose | +|---|---| +| Thesaurus | Meaning relationships — domains, concepts, scales, frames | +| Vocabulary | Approved lexical forms per concept | +| Dictionary | Human-readable entries combining concept + vocabulary | +| Ground truth corpus | Stable causal relations for model training | +| Simulation triage corpus | Decision and priority patterns for model training | + +### 1.2 Constraints + +- No modern units or modern-only categories in any generated language +- Meaning-first design — surface forms are secondary to semantic structure +- Culture-aware context — concepts tagged to applicable culture horizons +- UI-first workflow — table → view → page → data, without exception +- Constraint enforcement is editorial, not schema-enforced. A future + model analysis pass will check the corpus for violations. No + constraint tables in this schema. + +### 1.3 Initial focus + +Maglemosian / Nerava northern wetland context. All four culture horizons +are represented in the schema but Maglemosian is populated first. + +### 1.4 Out of scope + +- Game systems and full simulation engines +- Speculative conlang reconstruction +- Broad ontology sprawl +- Academic citation management +- Constraint enforcement tables (deferred to model analysis) + +--- + +## 2. Schema + +Ten tables. No table is added without a proven workflow need. + +### 2.1 `domain` + +The semantic domain hierarchy. Domains are self-referential — a domain +can have a parent domain. + +| Field | Type | Notes | +|---|---|---| +| id | integer | Primary key | +| label | text | Human-readable name (e.g. "Weather", "Wetness") | +| parent_id | integer | References `domain.id` — null for top-level domains | + +**Seed domains (in priority order):** +Weather, Wetness, Fire, Shelter, Water travel, Hunting, Fishing, +Injury, Storage, Terrain, Time cycles, Social roles. + +--- + +### 2.2 `culture` + +The four target Mesolithic culture horizons. Lookup table — values are +fixed and do not grow without explicit decision. + +| Field | Type | Notes | +|---|---|---| +| id | integer | Primary key | +| label | text | Culture name | +| ecology_note | text | Brief ecological context | +| date_range_note | text | Approximate date range | + +**Fixed values:** + +| Label | Ecology | Date range | +|---|---|---| +| Maglemosian | Northern lake/peatland, open woodland | ~9500–6000 BCE | +| Ertebølle | Coastal, lagoonal, shell midden | ~5400–3900 BCE | +| Sauveterrian | Western Mediterranean upland/lowland | ~9000–6000 BCE | +| Azilian | Franco-Cantabrian cave/rock-shelter | ~12000–9000 BCE | + +--- + +### 2.3 `concept` + +The core meaning nodes of the thesaurus. Each concept belongs to a +domain and carries an evidence grade. + +| Field | Type | Notes | +|---|---|---| +| id | integer | Primary key | +| domain_id | integer | References `domain.id` | +| label | text | Concept identifier (e.g. "wet", "ember", "crossing") | +| definition | text | Plain language definition, measurement-free | +| evidence_grade | enum | `direct` / `analogue` / `inferred` | +| notes | text | Optional authoring notes | + +**Evidence grade values:** +- `direct` — concept is directly supported by archaeological record +- `analogue` — concept is supported by ethnographic analogue +- `inferred` — concept follows from physical or ecological inference + +Culture applicability is stored in `concept_culture`, not here. + +--- + +### 2.4 `concept_culture` + +Join table linking concepts to applicable culture horizons. A concept +with no rows here applies to all cultures. + +| Field | Type | Notes | +|---|---|---| +| id | integer | Primary key | +| concept_id | integer | References `concept.id` | +| culture_id | integer | References `culture.id` | +| context_note | text | Optional note on culture-specific usage | + +--- + +### 2.5 `scale` + +A gradient dimension associated with a concept. A concept may have +multiple scales (e.g. "wetness" has a dryness scale and a weight scale). + +| Field | Type | Notes | +|---|---|---| +| id | integer | Primary key | +| concept_id | integer | References `concept.id` | +| label | text | Scale name (e.g. "dryness", "ice safety") | + +--- + +### 2.6 `scale_step` + +Ordered steps within a scale. Steps are ordered by rank and may +reference an antonym step. + +| Field | Type | Notes | +|---|---|---| +| id | integer | Primary key | +| scale_id | integer | References `scale.id` | +| rank | integer | Ordering — lower = one end of spectrum | +| label | text | Step label (e.g. "dry", "damp", "soaked") | +| antonym_step_id | integer | References another `scale_step.id` — optional | +| is_danger_threshold | boolean | Marks steps that represent hazard onset | +| notes | text | Optional authoring notes | + +**Example — wetness scale:** + +| Rank | Label | Danger threshold | +|---|---|---| +| 1 | dry | No | +| 2 | damp | No | +| 3 | wet | No | +| 4 | soaked | Yes | + +--- + +### 2.7 `frame` + +An action frame associated with a concept. Stores the typical roles +(actor, patient, tool, place) for actions involving this concept. +One frame per concept is the norm; complex concepts may have more. + +| Field | Type | Notes | +|---|---|---| +| id | integer | Primary key | +| concept_id | integer | References `concept.id` | +| label | text | Frame name (e.g. "drying hides", "crossing river") | +| actor | text | Who performs the action | +| patient | text | What is acted upon | +| tool | text | What instrument is used | +| place | text | Where the action occurs | +| notes | text | Optional authoring notes | + +--- + +### 2.8 `vocabulary_item` + +Approved lexical forms for a concept. A concept may have multiple +vocabulary items — one preferred, others allowed alternates. + +| Field | Type | Notes | +|---|---|---| +| id | integer | Primary key | +| concept_id | integer | References `concept.id` | +| term | text | The surface form (e.g. "wet", "soaked", "waterlogged") | +| preferred | boolean | True for the primary term | +| register | text | Usage register (e.g. "narrative", "triage", "both") | +| status | enum | `approved` / `deprecated` / `restricted` | +| notes | text | Optional governance notes | + +**Status values:** +- `approved` — use freely +- `deprecated` — do not use in new corpus items; kept for historical record +- `restricted` — use only in specified contexts (noted in `notes`) + +--- + +### 2.9 `corpus_item` + +A single ground truth or triage corpus item. Ground truth items teach +stable causal relations. Triage items teach decisions and priorities. + +| Field | Type | Notes | +|---|---|---| +| id | integer | Primary key | +| corpus_type | enum | `ground_truth` / `triage` | +| culture_id | integer | References `culture.id` — null means all cultures | +| text | text | The corpus statement (ground truth) or scenario (triage) | +| confidence | enum | `high` / `medium` / `low` | +| approved | boolean | True when reviewed and approved for training use | +| notes | text | Optional authoring notes | + +**Ground truth example:** +``` +corpus_type: ground_truth +text: "Fire dries wet hides." +confidence: high +approved: true +``` + +**Triage example:** +``` +corpus_type: triage +text: "Hunter returns with deep leg wound and cannot walk unassisted." +confidence: high +approved: true +``` + +Triage options are stored in `triage_option`. + +--- + +### 2.10 `corpus_concept` + +Join table linking corpus items to the concepts they involve. Enables +completeness checks and concept-driven corpus browsing. + +| Field | Type | Notes | +|---|---|---| +| id | integer | Primary key | +| corpus_item_id | integer | References `corpus_item.id` | +| concept_id | integer | References `concept.id` | +| role_note | text | Optional note on how concept appears in this item | + +--- + +### 2.11 `triage_option` + +Structured options for triage corpus items. Each triage item has 2-4 +options, exactly one marked as preferred. + +| Field | Type | Notes | +|---|---|---| +| id | integer | Primary key | +| corpus_item_id | integer | References `corpus_item.id` | +| option_text | text | Description of this option | +| is_preferred | boolean | True for the recommended action | +| reason | text | Why this option is preferred or not | +| rank | integer | Display order | + +**Example — triage options for wounded hunter scenario:** + +| Option | Preferred | Reason | +|---|---|---| +| Carry hunter back immediately | Yes | Wound is deep, cannot walk, delay increases risk | +| Continue hunt, send one person back | No | Splits group, leaves hunter without full support | +| Make camp here and rest | No | Wound needs shelter and fire, not open ground | + +--- + +## 3. Workflow rule + +Every table follows this delivery sequence without exception: + +``` +1. Table — created in Saltcorn +2. View — at minimum a list view and a detail view +3. Page — at minimum one usable entry/edit page +4. Data — production records entered only via pages, never raw grids +``` + +**Rules:** +- No production records entered in raw table grids +- Every new table ships with at least one usable page before data entry begins +- Build vertically, not horizontally — one complete table/view/page/data + cycle before starting the next table + +--- + +## 4. Sprint plan + +Sprints are ordered by dependency. Do not start a sprint until the +previous sprint's data entry phase is complete and verified. + +### Sprint 1 — Foundation +Tables: `domain`, `culture` +Data: 12 seed domains, 4 culture records +Deliverable: domain browser page, culture lookup page + +### Sprint 2 — Core concepts +Tables: `concept`, `concept_culture` +Data: 25 seed concepts from DOC-006, tagged to Maglemosian +Deliverable: concept editor page with domain and culture assignment + +### Sprint 3 — Scales +Tables: `scale`, `scale_step` +Data: scales for wetness, fire state, ice safety, injury severity +Deliverable: scale builder page with ordered steps + +### Sprint 4 — Frames +Table: `frame` +Data: frames for key action concepts (drying, crossing, fishing, triage) +Deliverable: frame editor page + +### Sprint 5 — Vocabulary +Table: `vocabulary_item` +Data: preferred terms for all 25 seed concepts +Deliverable: vocabulary editor with preferred/alternate/deprecated status + +### Sprint 6 — Corpus +Tables: `corpus_item`, `corpus_concept`, `triage_option` +Data: first 20 ground truth items, first 10 triage items +Deliverable: corpus entry page, triage option builder, concept linkage + +--- + +## 5. Seed concepts — Sprint 2 data + +From DOC-006. All tagged Maglemosian initially. + +| Concept | Domain | Evidence grade | +|---|---|---| +| wet | Wetness | direct | +| dry | Wetness | direct | +| damp | Wetness | direct | +| soaked | Wetness | inferred | +| fire | Fire | direct | +| ember | Fire | direct | +| smoke | Fire | direct | +| shelter | Shelter | direct | +| hide | Shelter | direct | +| bark | Shelter | direct | +| marsh | Terrain | direct | +| reed | Terrain | direct | +| path | Terrain | inferred | +| river | Water travel | direct | +| crossing | Water travel | inferred | +| fish | Fishing | direct | +| trap | Fishing | direct | +| spear | Hunting | direct | +| wound | Injury | direct | +| limp | Injury | inferred | +| carry | Injury | inferred | +| dawn | Time cycles | inferred | +| dusk | Time cycles | inferred | +| elder | Social roles | analogue | +| child | Social roles | analogue | + +--- + +## 6. Corpus specification + +### 6.1 Ground truth corpus + +Teaches stable causal relations. Statements must be: +- Present tense, declarative +- Measurement-free +- Culturally plausible for the tagged culture +- Linked to at least one concept via `corpus_concept` + +**Field summary:** +- `text` — the causal statement +- `culture_id` — null for universal statements +- `confidence` — high/medium/low +- `approved` — reviewed and ready for training + +**Examples:** +- Fire dries wet hides. +- Rain softens paths. +- Smoke drives insects away. +- Wet wood makes reluctant fire. +- Soaked bark floor cannot be slept on dry. +- Rising water warns of flood. + +### 6.2 Simulation triage corpus + +Teaches decisions and priorities under constraint. Each item must have +2-4 structured options via `triage_option`, exactly one marked preferred. + +**Field summary:** +- `text` — the scenario description +- `culture_id` — null for universal scenarios +- `confidence` — high/medium/low +- `approved` — reviewed and ready for training + +**Triage option fields:** +- `option_text` — what this choice involves +- `is_preferred` — the recommended action +- `reason` — why preferred or not preferred +- `rank` — display order + +**Examples:** +- Wounded hunter cannot walk. (carry first vs continue hunt vs make camp) +- Fire goes out in heavy rain. (seek dry tinder vs use ember from shelter vs wait) +- Path floods at crossing. (find higher crossing vs wait vs wade) + +--- + +## 7. Lexical governance + +### 7.1 Purpose + +Prevent semantic drift. Ensure vocabulary items remain measurement-free +and culturally coherent across authors and sessions. + +### 7.2 Controls per vocabulary item + +| Control | Field | Notes | +|---|---|---| +| Preferred term | `preferred = true` | One per concept | +| Allowed alternates | `status = approved, preferred = false` | Multiple allowed | +| Deprecated terms | `status = deprecated` | Kept for record, not used in new corpus | +| Restricted terms | `status = restricted` | Context specified in `notes` | + +### 7.3 Approval history + +Saltcorn's built-in record history tracks who changed what and when. +No separate approval log table is needed at this stage. + +### 7.4 Constraint enforcement + +Modern units and modern-only categories are excluded by editorial +discipline at authoring time. A future model analysis pass will scan +the corpus for violations and flag them for review. No constraint +tables are maintained in this schema version. + +--- + +## 8. What this does not decide + +- The language model architecture or training pipeline +- How corpus items are exported to training format +- Whether vocabulary items are used as literal tokens or as semantic + seeds for generation +- The multi-clan expansion beyond Maglemosian +- The integration between this corpus and the TESSERA spatial data layer +- Constraint enforcement implementation (deferred to model analysis pass) + +--- + +*Mesolithic Corpus Standard v1.0 — 2026-04-13* +*Status: Normative* +*Next review: after Sprint 2 data entry is complete*