init: CIVICVS repository — CBPs, corpus standard, directory structure
- README.md: project identity, TESSERA relationship, directory layout - CIVICVS-CBPs.md: CBP-001 through CBP-006 adapted for CIVICVS - docs/corpus/mesolithic-corpus-standard-v1.md: 10-table schema, 6-sprint plan, 25 seed concepts Per CBP-001: committed same session as produced.
This commit is contained in:
145
CIVICVS-CBPs.md
Normal file
145
CIVICVS-CBPs.md
Normal file
@@ -0,0 +1,145 @@
|
||||
# CIVICVS — Critical Baseline Protocols
|
||||
### Status: Normative
|
||||
### Date: 2026-04-18
|
||||
### Adapted from: TESSERA v3.0 CBPs (ssot/docs/v3/README.md)
|
||||
|
||||
---
|
||||
|
||||
These are not guidelines. They are not best practices. They are the
|
||||
conditions under which this project operates. Deviation is not
|
||||
permitted. If a CBP cannot be followed, work stops until it can be.
|
||||
|
||||
CIVICVS inherits the same process failure mode that cost TESSERA v2.0
|
||||
months of work: documents produced in chat sessions, never committed,
|
||||
permanently lost. These CBPs exist to prevent that failure from
|
||||
recurring in CIVICVS.
|
||||
|
||||
---
|
||||
|
||||
## CBP-001 — Every document is committed before the session ends
|
||||
|
||||
Any document produced in a session — schema, governance doc,
|
||||
architecture decision, corpus entry, session log — is committed to
|
||||
this repository before the session ends. Not summarised. Not noted
|
||||
for later. Committed.
|
||||
|
||||
A session that produces a document and does not commit it has produced
|
||||
nothing. The chat log is not a repository. It is not durable. It is
|
||||
not citable. It does not exist.
|
||||
|
||||
**Enforcement:** The last action of every session is to verify that
|
||||
every document produced in that session exists in the repository.
|
||||
If it does not, the session is not over.
|
||||
|
||||
---
|
||||
|
||||
## CBP-002 — Every session produces a session log
|
||||
|
||||
A session log is a raw, detailed record of what happened. It includes:
|
||||
- Every decision made, with the reasoning
|
||||
- Every failure encountered, with the exact error
|
||||
- Every workaround discovered, with the exact command
|
||||
- Every assumption that proved wrong
|
||||
- Every benchmark measured, with the actual number
|
||||
|
||||
Session logs are not polished. They are not summaries. They are the
|
||||
unfiltered record. Future contributors — human or AI — must be able
|
||||
to reconstruct exactly what was tried, what worked, and what did not,
|
||||
without repeating the same experiments.
|
||||
|
||||
**Enforcement:** A session log is committed before any other end-of-
|
||||
session work. It is the first commit, not the last.
|
||||
|
||||
---
|
||||
|
||||
## CBP-003 — Infrastructure is tested before it is designed around
|
||||
|
||||
No component — Saltcorn, ChromaDB, Ollama, any API — is designed
|
||||
around an assumption about what it can do. The assumption is tested
|
||||
first with a minimal real operation. Only then is the design built.
|
||||
|
||||
If a data source, service, or tool is investigated and found
|
||||
unsuitable, that finding is documented. Failures are permanent
|
||||
knowledge. Do not let the next session repeat the same investigation.
|
||||
|
||||
**Enforcement:** Before any new service is incorporated into CIVICVS
|
||||
design, its actual behaviour, actual data format, and actual
|
||||
limitations are documented with the date of investigation.
|
||||
|
||||
---
|
||||
|
||||
## CBP-004 — The file transfer protocol is followed without exception
|
||||
|
||||
Every file that reaches a server node travels this exact path:
|
||||
|
||||
```
|
||||
Claude produces tarball → user downloads → user uploads to /tmp/ on target node → commands run there
|
||||
```
|
||||
|
||||
There is no other path. There is no heredoc injection. There is no
|
||||
"write it directly." There is no assuming a file is present because
|
||||
it was produced in a previous step.
|
||||
|
||||
If a command references a file that is not confirmed present in /tmp/
|
||||
on the target node, the command is not run.
|
||||
|
||||
**Enforcement:** Every deploy sequence begins with confirming the
|
||||
tarball is present at /tmp/ on the target node before any git or
|
||||
extract command runs.
|
||||
|
||||
---
|
||||
|
||||
## CBP-005 — The Mesolithic Corpus Standard is the source of truth for corpus work
|
||||
|
||||
When the corpus schema says one thing and a corpus entry does another,
|
||||
the entry is wrong until proven otherwise. If the schema is wrong,
|
||||
it is corrected immediately and the correction is committed with an
|
||||
inline note explaining what was wrong and when it was fixed.
|
||||
|
||||
The corpus standard is at `docs/corpus/mesolithic-corpus-standard-v1.md`.
|
||||
It is normative. All corpus entries, Saltcorn tables, and views must
|
||||
conform to it. Deviations are not tolerated silently — they are
|
||||
either corrected or the standard is updated with explicit reasoning.
|
||||
|
||||
**Enforcement:** Before any corpus sprint begins, the corpus standard
|
||||
and the current Saltcorn table schema are read together and confirmed
|
||||
consistent.
|
||||
|
||||
---
|
||||
|
||||
## CBP-006 — The handover is written for the next assistant, not for posterity
|
||||
|
||||
A handover document is not a summary of what was accomplished. It is
|
||||
an operational briefing for an assistant who has no prior context and
|
||||
must be able to continue the work without asking clarifying questions
|
||||
about project state.
|
||||
|
||||
A handover must state:
|
||||
- What is currently running and its status
|
||||
- What is pending and why
|
||||
- What is broken and what the exact error is
|
||||
- The first task for the next session, unambiguously
|
||||
|
||||
A handover that requires the recipient to make assumptions is
|
||||
incomplete.
|
||||
|
||||
**Enforcement:** The handover is tested by asking: "What is the first
|
||||
thing to do?" If the answer is uncertain, the handover is rewritten.
|
||||
|
||||
---
|
||||
|
||||
## What CIVICVS CBPs do not cover
|
||||
|
||||
These CBPs govern session continuity and commit discipline. They do
|
||||
not cover:
|
||||
|
||||
- The corpus schema (see `docs/corpus/mesolithic-corpus-standard-v1.md`)
|
||||
- Infrastructure decisions (see TESSERA `ssot/docs/v3/infrastructure.md`)
|
||||
- The simulation RFC stack (to be created)
|
||||
- The TESSERA data model (see TESSERA RFC stack)
|
||||
|
||||
---
|
||||
|
||||
*CIVICVS-CBPs.md — 2026-04-18*
|
||||
*Status: Normative*
|
||||
*The process is the project.*
|
||||
52
README.md
Normal file
52
README.md
Normal file
@@ -0,0 +1,52 @@
|
||||
# CIVICVS
|
||||
|
||||
Mesolithic narrative simulator built on TESSERA spatial data.
|
||||
Set in approximately 8000 BCE, Spree-Havel river valley, Berlin.
|
||||
|
||||
---
|
||||
|
||||
## Relationship to TESSERA
|
||||
|
||||
CIVICVS is a separate project from TESSERA. They share infrastructure
|
||||
and the TESSERA SpatiaLite database is the spatial ground truth for
|
||||
CIVICVS, but they have separate repositories, separate RFC stacks,
|
||||
and separate failure modes. Do not confuse them.
|
||||
|
||||
TESSERA repository: `https://gitea.barternetwork.us/TheRON/tesserav3`
|
||||
CIVICVS repository: `https://gitea.barternetwork.us/TheRON/civicvs`
|
||||
|
||||
---
|
||||
|
||||
## Read before doing anything
|
||||
|
||||
1. `CIVICVS-CBPs.md` — session continuity and commit discipline. Non-negotiable.
|
||||
2. `docs/corpus/mesolithic-corpus-standard-v1.md` — corpus schema and workflow.
|
||||
3. `repo/docs/sessions/` — most recent session log first.
|
||||
|
||||
---
|
||||
|
||||
## Repository layout
|
||||
|
||||
```
|
||||
docs/
|
||||
corpus/
|
||||
mesolithic-corpus-standard-v1.md Corpus schema, sprint plan, seed concepts
|
||||
decisions/ Architecture decision records
|
||||
repo/
|
||||
docs/
|
||||
sessions/ Session logs — raw, committed same day
|
||||
pipeline/
|
||||
scripts/ Pipeline scripts
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Active branch: dev
|
||||
|
||||
Direct push to `dev` allowed.
|
||||
`main` and `staging` are protected — PR only.
|
||||
|
||||
---
|
||||
|
||||
*CIVICVS — founded 2026-04-18*
|
||||
*The process is the project.*
|
||||
466
docs/corpus/mesolithic-corpus-standard-v1.md
Normal file
466
docs/corpus/mesolithic-corpus-standard-v1.md
Normal file
@@ -0,0 +1,466 @@
|
||||
# Mesolithic Corpus Standard
|
||||
### Version: 1.0
|
||||
### Status: Normative
|
||||
### Date: 2026-04-13
|
||||
### Author: Claude Sonnet 4.6, approved by project owner
|
||||
|
||||
---
|
||||
|
||||
## 1. Mission and scope
|
||||
|
||||
Build a defensible Mesolithic Thesaurus, Vocabulary, and Dictionary in
|
||||
Saltcorn to support controlled corpus generation for a language model
|
||||
grounded in prehistoric lifeways.
|
||||
|
||||
### 1.1 Core outputs
|
||||
|
||||
| Output | Purpose |
|
||||
|---|---|
|
||||
| Thesaurus | Meaning relationships — domains, concepts, scales, frames |
|
||||
| Vocabulary | Approved lexical forms per concept |
|
||||
| Dictionary | Human-readable entries combining concept + vocabulary |
|
||||
| Ground truth corpus | Stable causal relations for model training |
|
||||
| Simulation triage corpus | Decision and priority patterns for model training |
|
||||
|
||||
### 1.2 Constraints
|
||||
|
||||
- No modern units or modern-only categories in any generated language
|
||||
- Meaning-first design — surface forms are secondary to semantic structure
|
||||
- Culture-aware context — concepts tagged to applicable culture horizons
|
||||
- UI-first workflow — table → view → page → data, without exception
|
||||
- Constraint enforcement is editorial, not schema-enforced. A future
|
||||
model analysis pass will check the corpus for violations. No
|
||||
constraint tables in this schema.
|
||||
|
||||
### 1.3 Initial focus
|
||||
|
||||
Maglemosian / Nerava northern wetland context. All four culture horizons
|
||||
are represented in the schema but Maglemosian is populated first.
|
||||
|
||||
### 1.4 Out of scope
|
||||
|
||||
- Game systems and full simulation engines
|
||||
- Speculative conlang reconstruction
|
||||
- Broad ontology sprawl
|
||||
- Academic citation management
|
||||
- Constraint enforcement tables (deferred to model analysis)
|
||||
|
||||
---
|
||||
|
||||
## 2. Schema
|
||||
|
||||
Ten tables. No table is added without a proven workflow need.
|
||||
|
||||
### 2.1 `domain`
|
||||
|
||||
The semantic domain hierarchy. Domains are self-referential — a domain
|
||||
can have a parent domain.
|
||||
|
||||
| Field | Type | Notes |
|
||||
|---|---|---|
|
||||
| id | integer | Primary key |
|
||||
| label | text | Human-readable name (e.g. "Weather", "Wetness") |
|
||||
| parent_id | integer | References `domain.id` — null for top-level domains |
|
||||
|
||||
**Seed domains (in priority order):**
|
||||
Weather, Wetness, Fire, Shelter, Water travel, Hunting, Fishing,
|
||||
Injury, Storage, Terrain, Time cycles, Social roles.
|
||||
|
||||
---
|
||||
|
||||
### 2.2 `culture`
|
||||
|
||||
The four target Mesolithic culture horizons. Lookup table — values are
|
||||
fixed and do not grow without explicit decision.
|
||||
|
||||
| Field | Type | Notes |
|
||||
|---|---|---|
|
||||
| id | integer | Primary key |
|
||||
| label | text | Culture name |
|
||||
| ecology_note | text | Brief ecological context |
|
||||
| date_range_note | text | Approximate date range |
|
||||
|
||||
**Fixed values:**
|
||||
|
||||
| Label | Ecology | Date range |
|
||||
|---|---|---|
|
||||
| Maglemosian | Northern lake/peatland, open woodland | ~9500–6000 BCE |
|
||||
| Ertebølle | Coastal, lagoonal, shell midden | ~5400–3900 BCE |
|
||||
| Sauveterrian | Western Mediterranean upland/lowland | ~9000–6000 BCE |
|
||||
| Azilian | Franco-Cantabrian cave/rock-shelter | ~12000–9000 BCE |
|
||||
|
||||
---
|
||||
|
||||
### 2.3 `concept`
|
||||
|
||||
The core meaning nodes of the thesaurus. Each concept belongs to a
|
||||
domain and carries an evidence grade.
|
||||
|
||||
| Field | Type | Notes |
|
||||
|---|---|---|
|
||||
| id | integer | Primary key |
|
||||
| domain_id | integer | References `domain.id` |
|
||||
| label | text | Concept identifier (e.g. "wet", "ember", "crossing") |
|
||||
| definition | text | Plain language definition, measurement-free |
|
||||
| evidence_grade | enum | `direct` / `analogue` / `inferred` |
|
||||
| notes | text | Optional authoring notes |
|
||||
|
||||
**Evidence grade values:**
|
||||
- `direct` — concept is directly supported by archaeological record
|
||||
- `analogue` — concept is supported by ethnographic analogue
|
||||
- `inferred` — concept follows from physical or ecological inference
|
||||
|
||||
Culture applicability is stored in `concept_culture`, not here.
|
||||
|
||||
---
|
||||
|
||||
### 2.4 `concept_culture`
|
||||
|
||||
Join table linking concepts to applicable culture horizons. A concept
|
||||
with no rows here applies to all cultures.
|
||||
|
||||
| Field | Type | Notes |
|
||||
|---|---|---|
|
||||
| id | integer | Primary key |
|
||||
| concept_id | integer | References `concept.id` |
|
||||
| culture_id | integer | References `culture.id` |
|
||||
| context_note | text | Optional note on culture-specific usage |
|
||||
|
||||
---
|
||||
|
||||
### 2.5 `scale`
|
||||
|
||||
A gradient dimension associated with a concept. A concept may have
|
||||
multiple scales (e.g. "wetness" has a dryness scale and a weight scale).
|
||||
|
||||
| Field | Type | Notes |
|
||||
|---|---|---|
|
||||
| id | integer | Primary key |
|
||||
| concept_id | integer | References `concept.id` |
|
||||
| label | text | Scale name (e.g. "dryness", "ice safety") |
|
||||
|
||||
---
|
||||
|
||||
### 2.6 `scale_step`
|
||||
|
||||
Ordered steps within a scale. Steps are ordered by rank and may
|
||||
reference an antonym step.
|
||||
|
||||
| Field | Type | Notes |
|
||||
|---|---|---|
|
||||
| id | integer | Primary key |
|
||||
| scale_id | integer | References `scale.id` |
|
||||
| rank | integer | Ordering — lower = one end of spectrum |
|
||||
| label | text | Step label (e.g. "dry", "damp", "soaked") |
|
||||
| antonym_step_id | integer | References another `scale_step.id` — optional |
|
||||
| is_danger_threshold | boolean | Marks steps that represent hazard onset |
|
||||
| notes | text | Optional authoring notes |
|
||||
|
||||
**Example — wetness scale:**
|
||||
|
||||
| Rank | Label | Danger threshold |
|
||||
|---|---|---|
|
||||
| 1 | dry | No |
|
||||
| 2 | damp | No |
|
||||
| 3 | wet | No |
|
||||
| 4 | soaked | Yes |
|
||||
|
||||
---
|
||||
|
||||
### 2.7 `frame`
|
||||
|
||||
An action frame associated with a concept. Stores the typical roles
|
||||
(actor, patient, tool, place) for actions involving this concept.
|
||||
One frame per concept is the norm; complex concepts may have more.
|
||||
|
||||
| Field | Type | Notes |
|
||||
|---|---|---|
|
||||
| id | integer | Primary key |
|
||||
| concept_id | integer | References `concept.id` |
|
||||
| label | text | Frame name (e.g. "drying hides", "crossing river") |
|
||||
| actor | text | Who performs the action |
|
||||
| patient | text | What is acted upon |
|
||||
| tool | text | What instrument is used |
|
||||
| place | text | Where the action occurs |
|
||||
| notes | text | Optional authoring notes |
|
||||
|
||||
---
|
||||
|
||||
### 2.8 `vocabulary_item`
|
||||
|
||||
Approved lexical forms for a concept. A concept may have multiple
|
||||
vocabulary items — one preferred, others allowed alternates.
|
||||
|
||||
| Field | Type | Notes |
|
||||
|---|---|---|
|
||||
| id | integer | Primary key |
|
||||
| concept_id | integer | References `concept.id` |
|
||||
| term | text | The surface form (e.g. "wet", "soaked", "waterlogged") |
|
||||
| preferred | boolean | True for the primary term |
|
||||
| register | text | Usage register (e.g. "narrative", "triage", "both") |
|
||||
| status | enum | `approved` / `deprecated` / `restricted` |
|
||||
| notes | text | Optional governance notes |
|
||||
|
||||
**Status values:**
|
||||
- `approved` — use freely
|
||||
- `deprecated` — do not use in new corpus items; kept for historical record
|
||||
- `restricted` — use only in specified contexts (noted in `notes`)
|
||||
|
||||
---
|
||||
|
||||
### 2.9 `corpus_item`
|
||||
|
||||
A single ground truth or triage corpus item. Ground truth items teach
|
||||
stable causal relations. Triage items teach decisions and priorities.
|
||||
|
||||
| Field | Type | Notes |
|
||||
|---|---|---|
|
||||
| id | integer | Primary key |
|
||||
| corpus_type | enum | `ground_truth` / `triage` |
|
||||
| culture_id | integer | References `culture.id` — null means all cultures |
|
||||
| text | text | The corpus statement (ground truth) or scenario (triage) |
|
||||
| confidence | enum | `high` / `medium` / `low` |
|
||||
| approved | boolean | True when reviewed and approved for training use |
|
||||
| notes | text | Optional authoring notes |
|
||||
|
||||
**Ground truth example:**
|
||||
```
|
||||
corpus_type: ground_truth
|
||||
text: "Fire dries wet hides."
|
||||
confidence: high
|
||||
approved: true
|
||||
```
|
||||
|
||||
**Triage example:**
|
||||
```
|
||||
corpus_type: triage
|
||||
text: "Hunter returns with deep leg wound and cannot walk unassisted."
|
||||
confidence: high
|
||||
approved: true
|
||||
```
|
||||
|
||||
Triage options are stored in `triage_option`.
|
||||
|
||||
---
|
||||
|
||||
### 2.10 `corpus_concept`
|
||||
|
||||
Join table linking corpus items to the concepts they involve. Enables
|
||||
completeness checks and concept-driven corpus browsing.
|
||||
|
||||
| Field | Type | Notes |
|
||||
|---|---|---|
|
||||
| id | integer | Primary key |
|
||||
| corpus_item_id | integer | References `corpus_item.id` |
|
||||
| concept_id | integer | References `concept.id` |
|
||||
| role_note | text | Optional note on how concept appears in this item |
|
||||
|
||||
---
|
||||
|
||||
### 2.11 `triage_option`
|
||||
|
||||
Structured options for triage corpus items. Each triage item has 2-4
|
||||
options, exactly one marked as preferred.
|
||||
|
||||
| Field | Type | Notes |
|
||||
|---|---|---|
|
||||
| id | integer | Primary key |
|
||||
| corpus_item_id | integer | References `corpus_item.id` |
|
||||
| option_text | text | Description of this option |
|
||||
| is_preferred | boolean | True for the recommended action |
|
||||
| reason | text | Why this option is preferred or not |
|
||||
| rank | integer | Display order |
|
||||
|
||||
**Example — triage options for wounded hunter scenario:**
|
||||
|
||||
| Option | Preferred | Reason |
|
||||
|---|---|---|
|
||||
| Carry hunter back immediately | Yes | Wound is deep, cannot walk, delay increases risk |
|
||||
| Continue hunt, send one person back | No | Splits group, leaves hunter without full support |
|
||||
| Make camp here and rest | No | Wound needs shelter and fire, not open ground |
|
||||
|
||||
---
|
||||
|
||||
## 3. Workflow rule
|
||||
|
||||
Every table follows this delivery sequence without exception:
|
||||
|
||||
```
|
||||
1. Table — created in Saltcorn
|
||||
2. View — at minimum a list view and a detail view
|
||||
3. Page — at minimum one usable entry/edit page
|
||||
4. Data — production records entered only via pages, never raw grids
|
||||
```
|
||||
|
||||
**Rules:**
|
||||
- No production records entered in raw table grids
|
||||
- Every new table ships with at least one usable page before data entry begins
|
||||
- Build vertically, not horizontally — one complete table/view/page/data
|
||||
cycle before starting the next table
|
||||
|
||||
---
|
||||
|
||||
## 4. Sprint plan
|
||||
|
||||
Sprints are ordered by dependency. Do not start a sprint until the
|
||||
previous sprint's data entry phase is complete and verified.
|
||||
|
||||
### Sprint 1 — Foundation
|
||||
Tables: `domain`, `culture`
|
||||
Data: 12 seed domains, 4 culture records
|
||||
Deliverable: domain browser page, culture lookup page
|
||||
|
||||
### Sprint 2 — Core concepts
|
||||
Tables: `concept`, `concept_culture`
|
||||
Data: 25 seed concepts from DOC-006, tagged to Maglemosian
|
||||
Deliverable: concept editor page with domain and culture assignment
|
||||
|
||||
### Sprint 3 — Scales
|
||||
Tables: `scale`, `scale_step`
|
||||
Data: scales for wetness, fire state, ice safety, injury severity
|
||||
Deliverable: scale builder page with ordered steps
|
||||
|
||||
### Sprint 4 — Frames
|
||||
Table: `frame`
|
||||
Data: frames for key action concepts (drying, crossing, fishing, triage)
|
||||
Deliverable: frame editor page
|
||||
|
||||
### Sprint 5 — Vocabulary
|
||||
Table: `vocabulary_item`
|
||||
Data: preferred terms for all 25 seed concepts
|
||||
Deliverable: vocabulary editor with preferred/alternate/deprecated status
|
||||
|
||||
### Sprint 6 — Corpus
|
||||
Tables: `corpus_item`, `corpus_concept`, `triage_option`
|
||||
Data: first 20 ground truth items, first 10 triage items
|
||||
Deliverable: corpus entry page, triage option builder, concept linkage
|
||||
|
||||
---
|
||||
|
||||
## 5. Seed concepts — Sprint 2 data
|
||||
|
||||
From DOC-006. All tagged Maglemosian initially.
|
||||
|
||||
| Concept | Domain | Evidence grade |
|
||||
|---|---|---|
|
||||
| wet | Wetness | direct |
|
||||
| dry | Wetness | direct |
|
||||
| damp | Wetness | direct |
|
||||
| soaked | Wetness | inferred |
|
||||
| fire | Fire | direct |
|
||||
| ember | Fire | direct |
|
||||
| smoke | Fire | direct |
|
||||
| shelter | Shelter | direct |
|
||||
| hide | Shelter | direct |
|
||||
| bark | Shelter | direct |
|
||||
| marsh | Terrain | direct |
|
||||
| reed | Terrain | direct |
|
||||
| path | Terrain | inferred |
|
||||
| river | Water travel | direct |
|
||||
| crossing | Water travel | inferred |
|
||||
| fish | Fishing | direct |
|
||||
| trap | Fishing | direct |
|
||||
| spear | Hunting | direct |
|
||||
| wound | Injury | direct |
|
||||
| limp | Injury | inferred |
|
||||
| carry | Injury | inferred |
|
||||
| dawn | Time cycles | inferred |
|
||||
| dusk | Time cycles | inferred |
|
||||
| elder | Social roles | analogue |
|
||||
| child | Social roles | analogue |
|
||||
|
||||
---
|
||||
|
||||
## 6. Corpus specification
|
||||
|
||||
### 6.1 Ground truth corpus
|
||||
|
||||
Teaches stable causal relations. Statements must be:
|
||||
- Present tense, declarative
|
||||
- Measurement-free
|
||||
- Culturally plausible for the tagged culture
|
||||
- Linked to at least one concept via `corpus_concept`
|
||||
|
||||
**Field summary:**
|
||||
- `text` — the causal statement
|
||||
- `culture_id` — null for universal statements
|
||||
- `confidence` — high/medium/low
|
||||
- `approved` — reviewed and ready for training
|
||||
|
||||
**Examples:**
|
||||
- Fire dries wet hides.
|
||||
- Rain softens paths.
|
||||
- Smoke drives insects away.
|
||||
- Wet wood makes reluctant fire.
|
||||
- Soaked bark floor cannot be slept on dry.
|
||||
- Rising water warns of flood.
|
||||
|
||||
### 6.2 Simulation triage corpus
|
||||
|
||||
Teaches decisions and priorities under constraint. Each item must have
|
||||
2-4 structured options via `triage_option`, exactly one marked preferred.
|
||||
|
||||
**Field summary:**
|
||||
- `text` — the scenario description
|
||||
- `culture_id` — null for universal scenarios
|
||||
- `confidence` — high/medium/low
|
||||
- `approved` — reviewed and ready for training
|
||||
|
||||
**Triage option fields:**
|
||||
- `option_text` — what this choice involves
|
||||
- `is_preferred` — the recommended action
|
||||
- `reason` — why preferred or not preferred
|
||||
- `rank` — display order
|
||||
|
||||
**Examples:**
|
||||
- Wounded hunter cannot walk. (carry first vs continue hunt vs make camp)
|
||||
- Fire goes out in heavy rain. (seek dry tinder vs use ember from shelter vs wait)
|
||||
- Path floods at crossing. (find higher crossing vs wait vs wade)
|
||||
|
||||
---
|
||||
|
||||
## 7. Lexical governance
|
||||
|
||||
### 7.1 Purpose
|
||||
|
||||
Prevent semantic drift. Ensure vocabulary items remain measurement-free
|
||||
and culturally coherent across authors and sessions.
|
||||
|
||||
### 7.2 Controls per vocabulary item
|
||||
|
||||
| Control | Field | Notes |
|
||||
|---|---|---|
|
||||
| Preferred term | `preferred = true` | One per concept |
|
||||
| Allowed alternates | `status = approved, preferred = false` | Multiple allowed |
|
||||
| Deprecated terms | `status = deprecated` | Kept for record, not used in new corpus |
|
||||
| Restricted terms | `status = restricted` | Context specified in `notes` |
|
||||
|
||||
### 7.3 Approval history
|
||||
|
||||
Saltcorn's built-in record history tracks who changed what and when.
|
||||
No separate approval log table is needed at this stage.
|
||||
|
||||
### 7.4 Constraint enforcement
|
||||
|
||||
Modern units and modern-only categories are excluded by editorial
|
||||
discipline at authoring time. A future model analysis pass will scan
|
||||
the corpus for violations and flag them for review. No constraint
|
||||
tables are maintained in this schema version.
|
||||
|
||||
---
|
||||
|
||||
## 8. What this does not decide
|
||||
|
||||
- The language model architecture or training pipeline
|
||||
- How corpus items are exported to training format
|
||||
- Whether vocabulary items are used as literal tokens or as semantic
|
||||
seeds for generation
|
||||
- The multi-clan expansion beyond Maglemosian
|
||||
- The integration between this corpus and the TESSERA spatial data layer
|
||||
- Constraint enforcement implementation (deferred to model analysis pass)
|
||||
|
||||
---
|
||||
|
||||
*Mesolithic Corpus Standard v1.0 — 2026-04-13*
|
||||
*Status: Normative*
|
||||
*Next review: after Sprint 2 data entry is complete*
|
||||
Reference in New Issue
Block a user