init: CIVICVS repository — CBPs, corpus standard, directory structure

- README.md: project identity, TESSERA relationship, directory layout
- CIVICVS-CBPs.md: CBP-001 through CBP-006 adapted for CIVICVS
- docs/corpus/mesolithic-corpus-standard-v1.md: 10-table schema, 6-sprint plan, 25 seed concepts
Per CBP-001: committed same session as produced.
This commit is contained in:
2026-04-18 05:29:09 +00:00
commit 34316d2429
3 changed files with 663 additions and 0 deletions

145
CIVICVS-CBPs.md Normal file
View File

@@ -0,0 +1,145 @@
# CIVICVS — Critical Baseline Protocols
### Status: Normative
### Date: 2026-04-18
### Adapted from: TESSERA v3.0 CBPs (ssot/docs/v3/README.md)
---
These are not guidelines. They are not best practices. They are the
conditions under which this project operates. Deviation is not
permitted. If a CBP cannot be followed, work stops until it can be.
CIVICVS inherits the same process failure mode that cost TESSERA v2.0
months of work: documents produced in chat sessions, never committed,
permanently lost. These CBPs exist to prevent that failure from
recurring in CIVICVS.
---
## CBP-001 — Every document is committed before the session ends
Any document produced in a session — schema, governance doc,
architecture decision, corpus entry, session log — is committed to
this repository before the session ends. Not summarised. Not noted
for later. Committed.
A session that produces a document and does not commit it has produced
nothing. The chat log is not a repository. It is not durable. It is
not citable. It does not exist.
**Enforcement:** The last action of every session is to verify that
every document produced in that session exists in the repository.
If it does not, the session is not over.
---
## CBP-002 — Every session produces a session log
A session log is a raw, detailed record of what happened. It includes:
- Every decision made, with the reasoning
- Every failure encountered, with the exact error
- Every workaround discovered, with the exact command
- Every assumption that proved wrong
- Every benchmark measured, with the actual number
Session logs are not polished. They are not summaries. They are the
unfiltered record. Future contributors — human or AI — must be able
to reconstruct exactly what was tried, what worked, and what did not,
without repeating the same experiments.
**Enforcement:** A session log is committed before any other end-of-
session work. It is the first commit, not the last.
---
## CBP-003 — Infrastructure is tested before it is designed around
No component — Saltcorn, ChromaDB, Ollama, any API — is designed
around an assumption about what it can do. The assumption is tested
first with a minimal real operation. Only then is the design built.
If a data source, service, or tool is investigated and found
unsuitable, that finding is documented. Failures are permanent
knowledge. Do not let the next session repeat the same investigation.
**Enforcement:** Before any new service is incorporated into CIVICVS
design, its actual behaviour, actual data format, and actual
limitations are documented with the date of investigation.
---
## CBP-004 — The file transfer protocol is followed without exception
Every file that reaches a server node travels this exact path:
```
Claude produces tarball → user downloads → user uploads to /tmp/ on target node → commands run there
```
There is no other path. There is no heredoc injection. There is no
"write it directly." There is no assuming a file is present because
it was produced in a previous step.
If a command references a file that is not confirmed present in /tmp/
on the target node, the command is not run.
**Enforcement:** Every deploy sequence begins with confirming the
tarball is present at /tmp/ on the target node before any git or
extract command runs.
---
## CBP-005 — The Mesolithic Corpus Standard is the source of truth for corpus work
When the corpus schema says one thing and a corpus entry does another,
the entry is wrong until proven otherwise. If the schema is wrong,
it is corrected immediately and the correction is committed with an
inline note explaining what was wrong and when it was fixed.
The corpus standard is at `docs/corpus/mesolithic-corpus-standard-v1.md`.
It is normative. All corpus entries, Saltcorn tables, and views must
conform to it. Deviations are not tolerated silently — they are
either corrected or the standard is updated with explicit reasoning.
**Enforcement:** Before any corpus sprint begins, the corpus standard
and the current Saltcorn table schema are read together and confirmed
consistent.
---
## CBP-006 — The handover is written for the next assistant, not for posterity
A handover document is not a summary of what was accomplished. It is
an operational briefing for an assistant who has no prior context and
must be able to continue the work without asking clarifying questions
about project state.
A handover must state:
- What is currently running and its status
- What is pending and why
- What is broken and what the exact error is
- The first task for the next session, unambiguously
A handover that requires the recipient to make assumptions is
incomplete.
**Enforcement:** The handover is tested by asking: "What is the first
thing to do?" If the answer is uncertain, the handover is rewritten.
---
## What CIVICVS CBPs do not cover
These CBPs govern session continuity and commit discipline. They do
not cover:
- The corpus schema (see `docs/corpus/mesolithic-corpus-standard-v1.md`)
- Infrastructure decisions (see TESSERA `ssot/docs/v3/infrastructure.md`)
- The simulation RFC stack (to be created)
- The TESSERA data model (see TESSERA RFC stack)
---
*CIVICVS-CBPs.md — 2026-04-18*
*Status: Normative*
*The process is the project.*

52
README.md Normal file
View File

@@ -0,0 +1,52 @@
# CIVICVS
Mesolithic narrative simulator built on TESSERA spatial data.
Set in approximately 8000 BCE, Spree-Havel river valley, Berlin.
---
## Relationship to TESSERA
CIVICVS is a separate project from TESSERA. They share infrastructure
and the TESSERA SpatiaLite database is the spatial ground truth for
CIVICVS, but they have separate repositories, separate RFC stacks,
and separate failure modes. Do not confuse them.
TESSERA repository: `https://gitea.barternetwork.us/TheRON/tesserav3`
CIVICVS repository: `https://gitea.barternetwork.us/TheRON/civicvs`
---
## Read before doing anything
1. `CIVICVS-CBPs.md` — session continuity and commit discipline. Non-negotiable.
2. `docs/corpus/mesolithic-corpus-standard-v1.md` — corpus schema and workflow.
3. `repo/docs/sessions/` — most recent session log first.
---
## Repository layout
```
docs/
corpus/
mesolithic-corpus-standard-v1.md Corpus schema, sprint plan, seed concepts
decisions/ Architecture decision records
repo/
docs/
sessions/ Session logs — raw, committed same day
pipeline/
scripts/ Pipeline scripts
```
---
## Active branch: dev
Direct push to `dev` allowed.
`main` and `staging` are protected — PR only.
---
*CIVICVS — founded 2026-04-18*
*The process is the project.*

View File

@@ -0,0 +1,466 @@
# Mesolithic Corpus Standard
### Version: 1.0
### Status: Normative
### Date: 2026-04-13
### Author: Claude Sonnet 4.6, approved by project owner
---
## 1. Mission and scope
Build a defensible Mesolithic Thesaurus, Vocabulary, and Dictionary in
Saltcorn to support controlled corpus generation for a language model
grounded in prehistoric lifeways.
### 1.1 Core outputs
| Output | Purpose |
|---|---|
| Thesaurus | Meaning relationships — domains, concepts, scales, frames |
| Vocabulary | Approved lexical forms per concept |
| Dictionary | Human-readable entries combining concept + vocabulary |
| Ground truth corpus | Stable causal relations for model training |
| Simulation triage corpus | Decision and priority patterns for model training |
### 1.2 Constraints
- No modern units or modern-only categories in any generated language
- Meaning-first design — surface forms are secondary to semantic structure
- Culture-aware context — concepts tagged to applicable culture horizons
- UI-first workflow — table → view → page → data, without exception
- Constraint enforcement is editorial, not schema-enforced. A future
model analysis pass will check the corpus for violations. No
constraint tables in this schema.
### 1.3 Initial focus
Maglemosian / Nerava northern wetland context. All four culture horizons
are represented in the schema but Maglemosian is populated first.
### 1.4 Out of scope
- Game systems and full simulation engines
- Speculative conlang reconstruction
- Broad ontology sprawl
- Academic citation management
- Constraint enforcement tables (deferred to model analysis)
---
## 2. Schema
Ten tables. No table is added without a proven workflow need.
### 2.1 `domain`
The semantic domain hierarchy. Domains are self-referential — a domain
can have a parent domain.
| Field | Type | Notes |
|---|---|---|
| id | integer | Primary key |
| label | text | Human-readable name (e.g. "Weather", "Wetness") |
| parent_id | integer | References `domain.id` — null for top-level domains |
**Seed domains (in priority order):**
Weather, Wetness, Fire, Shelter, Water travel, Hunting, Fishing,
Injury, Storage, Terrain, Time cycles, Social roles.
---
### 2.2 `culture`
The four target Mesolithic culture horizons. Lookup table — values are
fixed and do not grow without explicit decision.
| Field | Type | Notes |
|---|---|---|
| id | integer | Primary key |
| label | text | Culture name |
| ecology_note | text | Brief ecological context |
| date_range_note | text | Approximate date range |
**Fixed values:**
| Label | Ecology | Date range |
|---|---|---|
| Maglemosian | Northern lake/peatland, open woodland | ~95006000 BCE |
| Ertebølle | Coastal, lagoonal, shell midden | ~54003900 BCE |
| Sauveterrian | Western Mediterranean upland/lowland | ~90006000 BCE |
| Azilian | Franco-Cantabrian cave/rock-shelter | ~120009000 BCE |
---
### 2.3 `concept`
The core meaning nodes of the thesaurus. Each concept belongs to a
domain and carries an evidence grade.
| Field | Type | Notes |
|---|---|---|
| id | integer | Primary key |
| domain_id | integer | References `domain.id` |
| label | text | Concept identifier (e.g. "wet", "ember", "crossing") |
| definition | text | Plain language definition, measurement-free |
| evidence_grade | enum | `direct` / `analogue` / `inferred` |
| notes | text | Optional authoring notes |
**Evidence grade values:**
- `direct` — concept is directly supported by archaeological record
- `analogue` — concept is supported by ethnographic analogue
- `inferred` — concept follows from physical or ecological inference
Culture applicability is stored in `concept_culture`, not here.
---
### 2.4 `concept_culture`
Join table linking concepts to applicable culture horizons. A concept
with no rows here applies to all cultures.
| Field | Type | Notes |
|---|---|---|
| id | integer | Primary key |
| concept_id | integer | References `concept.id` |
| culture_id | integer | References `culture.id` |
| context_note | text | Optional note on culture-specific usage |
---
### 2.5 `scale`
A gradient dimension associated with a concept. A concept may have
multiple scales (e.g. "wetness" has a dryness scale and a weight scale).
| Field | Type | Notes |
|---|---|---|
| id | integer | Primary key |
| concept_id | integer | References `concept.id` |
| label | text | Scale name (e.g. "dryness", "ice safety") |
---
### 2.6 `scale_step`
Ordered steps within a scale. Steps are ordered by rank and may
reference an antonym step.
| Field | Type | Notes |
|---|---|---|
| id | integer | Primary key |
| scale_id | integer | References `scale.id` |
| rank | integer | Ordering — lower = one end of spectrum |
| label | text | Step label (e.g. "dry", "damp", "soaked") |
| antonym_step_id | integer | References another `scale_step.id` — optional |
| is_danger_threshold | boolean | Marks steps that represent hazard onset |
| notes | text | Optional authoring notes |
**Example — wetness scale:**
| Rank | Label | Danger threshold |
|---|---|---|
| 1 | dry | No |
| 2 | damp | No |
| 3 | wet | No |
| 4 | soaked | Yes |
---
### 2.7 `frame`
An action frame associated with a concept. Stores the typical roles
(actor, patient, tool, place) for actions involving this concept.
One frame per concept is the norm; complex concepts may have more.
| Field | Type | Notes |
|---|---|---|
| id | integer | Primary key |
| concept_id | integer | References `concept.id` |
| label | text | Frame name (e.g. "drying hides", "crossing river") |
| actor | text | Who performs the action |
| patient | text | What is acted upon |
| tool | text | What instrument is used |
| place | text | Where the action occurs |
| notes | text | Optional authoring notes |
---
### 2.8 `vocabulary_item`
Approved lexical forms for a concept. A concept may have multiple
vocabulary items — one preferred, others allowed alternates.
| Field | Type | Notes |
|---|---|---|
| id | integer | Primary key |
| concept_id | integer | References `concept.id` |
| term | text | The surface form (e.g. "wet", "soaked", "waterlogged") |
| preferred | boolean | True for the primary term |
| register | text | Usage register (e.g. "narrative", "triage", "both") |
| status | enum | `approved` / `deprecated` / `restricted` |
| notes | text | Optional governance notes |
**Status values:**
- `approved` — use freely
- `deprecated` — do not use in new corpus items; kept for historical record
- `restricted` — use only in specified contexts (noted in `notes`)
---
### 2.9 `corpus_item`
A single ground truth or triage corpus item. Ground truth items teach
stable causal relations. Triage items teach decisions and priorities.
| Field | Type | Notes |
|---|---|---|
| id | integer | Primary key |
| corpus_type | enum | `ground_truth` / `triage` |
| culture_id | integer | References `culture.id` — null means all cultures |
| text | text | The corpus statement (ground truth) or scenario (triage) |
| confidence | enum | `high` / `medium` / `low` |
| approved | boolean | True when reviewed and approved for training use |
| notes | text | Optional authoring notes |
**Ground truth example:**
```
corpus_type: ground_truth
text: "Fire dries wet hides."
confidence: high
approved: true
```
**Triage example:**
```
corpus_type: triage
text: "Hunter returns with deep leg wound and cannot walk unassisted."
confidence: high
approved: true
```
Triage options are stored in `triage_option`.
---
### 2.10 `corpus_concept`
Join table linking corpus items to the concepts they involve. Enables
completeness checks and concept-driven corpus browsing.
| Field | Type | Notes |
|---|---|---|
| id | integer | Primary key |
| corpus_item_id | integer | References `corpus_item.id` |
| concept_id | integer | References `concept.id` |
| role_note | text | Optional note on how concept appears in this item |
---
### 2.11 `triage_option`
Structured options for triage corpus items. Each triage item has 2-4
options, exactly one marked as preferred.
| Field | Type | Notes |
|---|---|---|
| id | integer | Primary key |
| corpus_item_id | integer | References `corpus_item.id` |
| option_text | text | Description of this option |
| is_preferred | boolean | True for the recommended action |
| reason | text | Why this option is preferred or not |
| rank | integer | Display order |
**Example — triage options for wounded hunter scenario:**
| Option | Preferred | Reason |
|---|---|---|
| Carry hunter back immediately | Yes | Wound is deep, cannot walk, delay increases risk |
| Continue hunt, send one person back | No | Splits group, leaves hunter without full support |
| Make camp here and rest | No | Wound needs shelter and fire, not open ground |
---
## 3. Workflow rule
Every table follows this delivery sequence without exception:
```
1. Table — created in Saltcorn
2. View — at minimum a list view and a detail view
3. Page — at minimum one usable entry/edit page
4. Data — production records entered only via pages, never raw grids
```
**Rules:**
- No production records entered in raw table grids
- Every new table ships with at least one usable page before data entry begins
- Build vertically, not horizontally — one complete table/view/page/data
cycle before starting the next table
---
## 4. Sprint plan
Sprints are ordered by dependency. Do not start a sprint until the
previous sprint's data entry phase is complete and verified.
### Sprint 1 — Foundation
Tables: `domain`, `culture`
Data: 12 seed domains, 4 culture records
Deliverable: domain browser page, culture lookup page
### Sprint 2 — Core concepts
Tables: `concept`, `concept_culture`
Data: 25 seed concepts from DOC-006, tagged to Maglemosian
Deliverable: concept editor page with domain and culture assignment
### Sprint 3 — Scales
Tables: `scale`, `scale_step`
Data: scales for wetness, fire state, ice safety, injury severity
Deliverable: scale builder page with ordered steps
### Sprint 4 — Frames
Table: `frame`
Data: frames for key action concepts (drying, crossing, fishing, triage)
Deliverable: frame editor page
### Sprint 5 — Vocabulary
Table: `vocabulary_item`
Data: preferred terms for all 25 seed concepts
Deliverable: vocabulary editor with preferred/alternate/deprecated status
### Sprint 6 — Corpus
Tables: `corpus_item`, `corpus_concept`, `triage_option`
Data: first 20 ground truth items, first 10 triage items
Deliverable: corpus entry page, triage option builder, concept linkage
---
## 5. Seed concepts — Sprint 2 data
From DOC-006. All tagged Maglemosian initially.
| Concept | Domain | Evidence grade |
|---|---|---|
| wet | Wetness | direct |
| dry | Wetness | direct |
| damp | Wetness | direct |
| soaked | Wetness | inferred |
| fire | Fire | direct |
| ember | Fire | direct |
| smoke | Fire | direct |
| shelter | Shelter | direct |
| hide | Shelter | direct |
| bark | Shelter | direct |
| marsh | Terrain | direct |
| reed | Terrain | direct |
| path | Terrain | inferred |
| river | Water travel | direct |
| crossing | Water travel | inferred |
| fish | Fishing | direct |
| trap | Fishing | direct |
| spear | Hunting | direct |
| wound | Injury | direct |
| limp | Injury | inferred |
| carry | Injury | inferred |
| dawn | Time cycles | inferred |
| dusk | Time cycles | inferred |
| elder | Social roles | analogue |
| child | Social roles | analogue |
---
## 6. Corpus specification
### 6.1 Ground truth corpus
Teaches stable causal relations. Statements must be:
- Present tense, declarative
- Measurement-free
- Culturally plausible for the tagged culture
- Linked to at least one concept via `corpus_concept`
**Field summary:**
- `text` — the causal statement
- `culture_id` — null for universal statements
- `confidence` — high/medium/low
- `approved` — reviewed and ready for training
**Examples:**
- Fire dries wet hides.
- Rain softens paths.
- Smoke drives insects away.
- Wet wood makes reluctant fire.
- Soaked bark floor cannot be slept on dry.
- Rising water warns of flood.
### 6.2 Simulation triage corpus
Teaches decisions and priorities under constraint. Each item must have
2-4 structured options via `triage_option`, exactly one marked preferred.
**Field summary:**
- `text` — the scenario description
- `culture_id` — null for universal scenarios
- `confidence` — high/medium/low
- `approved` — reviewed and ready for training
**Triage option fields:**
- `option_text` — what this choice involves
- `is_preferred` — the recommended action
- `reason` — why preferred or not preferred
- `rank` — display order
**Examples:**
- Wounded hunter cannot walk. (carry first vs continue hunt vs make camp)
- Fire goes out in heavy rain. (seek dry tinder vs use ember from shelter vs wait)
- Path floods at crossing. (find higher crossing vs wait vs wade)
---
## 7. Lexical governance
### 7.1 Purpose
Prevent semantic drift. Ensure vocabulary items remain measurement-free
and culturally coherent across authors and sessions.
### 7.2 Controls per vocabulary item
| Control | Field | Notes |
|---|---|---|
| Preferred term | `preferred = true` | One per concept |
| Allowed alternates | `status = approved, preferred = false` | Multiple allowed |
| Deprecated terms | `status = deprecated` | Kept for record, not used in new corpus |
| Restricted terms | `status = restricted` | Context specified in `notes` |
### 7.3 Approval history
Saltcorn's built-in record history tracks who changed what and when.
No separate approval log table is needed at this stage.
### 7.4 Constraint enforcement
Modern units and modern-only categories are excluded by editorial
discipline at authoring time. A future model analysis pass will scan
the corpus for violations and flag them for review. No constraint
tables are maintained in this schema version.
---
## 8. What this does not decide
- The language model architecture or training pipeline
- How corpus items are exported to training format
- Whether vocabulary items are used as literal tokens or as semantic
seeds for generation
- The multi-clan expansion beyond Maglemosian
- The integration between this corpus and the TESSERA spatial data layer
- Constraint enforcement implementation (deferred to model analysis pass)
---
*Mesolithic Corpus Standard v1.0 — 2026-04-13*
*Status: Normative*
*Next review: after Sprint 2 data entry is complete*