- README.md: project identity, TESSERA relationship, directory layout - CIVICVS-CBPs.md: CBP-001 through CBP-006 adapted for CIVICVS - docs/corpus/mesolithic-corpus-standard-v1.md: 10-table schema, 6-sprint plan, 25 seed concepts Per CBP-001: committed same session as produced.
14 KiB
Mesolithic Corpus Standard
Version: 1.0
Status: Normative
Date: 2026-04-13
Author: Claude Sonnet 4.6, approved by project owner
1. Mission and scope
Build a defensible Mesolithic Thesaurus, Vocabulary, and Dictionary in Saltcorn to support controlled corpus generation for a language model grounded in prehistoric lifeways.
1.1 Core outputs
| Output | Purpose |
|---|---|
| Thesaurus | Meaning relationships — domains, concepts, scales, frames |
| Vocabulary | Approved lexical forms per concept |
| Dictionary | Human-readable entries combining concept + vocabulary |
| Ground truth corpus | Stable causal relations for model training |
| Simulation triage corpus | Decision and priority patterns for model training |
1.2 Constraints
- No modern units or modern-only categories in any generated language
- Meaning-first design — surface forms are secondary to semantic structure
- Culture-aware context — concepts tagged to applicable culture horizons
- UI-first workflow — table → view → page → data, without exception
- Constraint enforcement is editorial, not schema-enforced. A future model analysis pass will check the corpus for violations. No constraint tables in this schema.
1.3 Initial focus
Maglemosian / Nerava northern wetland context. All four culture horizons are represented in the schema but Maglemosian is populated first.
1.4 Out of scope
- Game systems and full simulation engines
- Speculative conlang reconstruction
- Broad ontology sprawl
- Academic citation management
- Constraint enforcement tables (deferred to model analysis)
2. Schema
Ten tables. No table is added without a proven workflow need.
2.1 domain
The semantic domain hierarchy. Domains are self-referential — a domain can have a parent domain.
| Field | Type | Notes |
|---|---|---|
| id | integer | Primary key |
| label | text | Human-readable name (e.g. "Weather", "Wetness") |
| parent_id | integer | References domain.id — null for top-level domains |
Seed domains (in priority order): Weather, Wetness, Fire, Shelter, Water travel, Hunting, Fishing, Injury, Storage, Terrain, Time cycles, Social roles.
2.2 culture
The four target Mesolithic culture horizons. Lookup table — values are fixed and do not grow without explicit decision.
| Field | Type | Notes |
|---|---|---|
| id | integer | Primary key |
| label | text | Culture name |
| ecology_note | text | Brief ecological context |
| date_range_note | text | Approximate date range |
Fixed values:
| Label | Ecology | Date range |
|---|---|---|
| Maglemosian | Northern lake/peatland, open woodland | ~9500–6000 BCE |
| Ertebølle | Coastal, lagoonal, shell midden | ~5400–3900 BCE |
| Sauveterrian | Western Mediterranean upland/lowland | ~9000–6000 BCE |
| Azilian | Franco-Cantabrian cave/rock-shelter | ~12000–9000 BCE |
2.3 concept
The core meaning nodes of the thesaurus. Each concept belongs to a domain and carries an evidence grade.
| Field | Type | Notes |
|---|---|---|
| id | integer | Primary key |
| domain_id | integer | References domain.id |
| label | text | Concept identifier (e.g. "wet", "ember", "crossing") |
| definition | text | Plain language definition, measurement-free |
| evidence_grade | enum | direct / analogue / inferred |
| notes | text | Optional authoring notes |
Evidence grade values:
direct— concept is directly supported by archaeological recordanalogue— concept is supported by ethnographic analogueinferred— concept follows from physical or ecological inference
Culture applicability is stored in concept_culture, not here.
2.4 concept_culture
Join table linking concepts to applicable culture horizons. A concept with no rows here applies to all cultures.
| Field | Type | Notes |
|---|---|---|
| id | integer | Primary key |
| concept_id | integer | References concept.id |
| culture_id | integer | References culture.id |
| context_note | text | Optional note on culture-specific usage |
2.5 scale
A gradient dimension associated with a concept. A concept may have multiple scales (e.g. "wetness" has a dryness scale and a weight scale).
| Field | Type | Notes |
|---|---|---|
| id | integer | Primary key |
| concept_id | integer | References concept.id |
| label | text | Scale name (e.g. "dryness", "ice safety") |
2.6 scale_step
Ordered steps within a scale. Steps are ordered by rank and may reference an antonym step.
| Field | Type | Notes |
|---|---|---|
| id | integer | Primary key |
| scale_id | integer | References scale.id |
| rank | integer | Ordering — lower = one end of spectrum |
| label | text | Step label (e.g. "dry", "damp", "soaked") |
| antonym_step_id | integer | References another scale_step.id — optional |
| is_danger_threshold | boolean | Marks steps that represent hazard onset |
| notes | text | Optional authoring notes |
Example — wetness scale:
| Rank | Label | Danger threshold |
|---|---|---|
| 1 | dry | No |
| 2 | damp | No |
| 3 | wet | No |
| 4 | soaked | Yes |
2.7 frame
An action frame associated with a concept. Stores the typical roles (actor, patient, tool, place) for actions involving this concept. One frame per concept is the norm; complex concepts may have more.
| Field | Type | Notes |
|---|---|---|
| id | integer | Primary key |
| concept_id | integer | References concept.id |
| label | text | Frame name (e.g. "drying hides", "crossing river") |
| actor | text | Who performs the action |
| patient | text | What is acted upon |
| tool | text | What instrument is used |
| place | text | Where the action occurs |
| notes | text | Optional authoring notes |
2.8 vocabulary_item
Approved lexical forms for a concept. A concept may have multiple vocabulary items — one preferred, others allowed alternates.
| Field | Type | Notes |
|---|---|---|
| id | integer | Primary key |
| concept_id | integer | References concept.id |
| term | text | The surface form (e.g. "wet", "soaked", "waterlogged") |
| preferred | boolean | True for the primary term |
| register | text | Usage register (e.g. "narrative", "triage", "both") |
| status | enum | approved / deprecated / restricted |
| notes | text | Optional governance notes |
Status values:
approved— use freelydeprecated— do not use in new corpus items; kept for historical recordrestricted— use only in specified contexts (noted innotes)
2.9 corpus_item
A single ground truth or triage corpus item. Ground truth items teach stable causal relations. Triage items teach decisions and priorities.
| Field | Type | Notes |
|---|---|---|
| id | integer | Primary key |
| corpus_type | enum | ground_truth / triage |
| culture_id | integer | References culture.id — null means all cultures |
| text | text | The corpus statement (ground truth) or scenario (triage) |
| confidence | enum | high / medium / low |
| approved | boolean | True when reviewed and approved for training use |
| notes | text | Optional authoring notes |
Ground truth example:
corpus_type: ground_truth
text: "Fire dries wet hides."
confidence: high
approved: true
Triage example:
corpus_type: triage
text: "Hunter returns with deep leg wound and cannot walk unassisted."
confidence: high
approved: true
Triage options are stored in triage_option.
2.10 corpus_concept
Join table linking corpus items to the concepts they involve. Enables completeness checks and concept-driven corpus browsing.
| Field | Type | Notes |
|---|---|---|
| id | integer | Primary key |
| corpus_item_id | integer | References corpus_item.id |
| concept_id | integer | References concept.id |
| role_note | text | Optional note on how concept appears in this item |
2.11 triage_option
Structured options for triage corpus items. Each triage item has 2-4 options, exactly one marked as preferred.
| Field | Type | Notes |
|---|---|---|
| id | integer | Primary key |
| corpus_item_id | integer | References corpus_item.id |
| option_text | text | Description of this option |
| is_preferred | boolean | True for the recommended action |
| reason | text | Why this option is preferred or not |
| rank | integer | Display order |
Example — triage options for wounded hunter scenario:
| Option | Preferred | Reason |
|---|---|---|
| Carry hunter back immediately | Yes | Wound is deep, cannot walk, delay increases risk |
| Continue hunt, send one person back | No | Splits group, leaves hunter without full support |
| Make camp here and rest | No | Wound needs shelter and fire, not open ground |
3. Workflow rule
Every table follows this delivery sequence without exception:
1. Table — created in Saltcorn
2. View — at minimum a list view and a detail view
3. Page — at minimum one usable entry/edit page
4. Data — production records entered only via pages, never raw grids
Rules:
- No production records entered in raw table grids
- Every new table ships with at least one usable page before data entry begins
- Build vertically, not horizontally — one complete table/view/page/data cycle before starting the next table
4. Sprint plan
Sprints are ordered by dependency. Do not start a sprint until the previous sprint's data entry phase is complete and verified.
Sprint 1 — Foundation
Tables: domain, culture
Data: 12 seed domains, 4 culture records
Deliverable: domain browser page, culture lookup page
Sprint 2 — Core concepts
Tables: concept, concept_culture
Data: 25 seed concepts from DOC-006, tagged to Maglemosian
Deliverable: concept editor page with domain and culture assignment
Sprint 3 — Scales
Tables: scale, scale_step
Data: scales for wetness, fire state, ice safety, injury severity
Deliverable: scale builder page with ordered steps
Sprint 4 — Frames
Table: frame
Data: frames for key action concepts (drying, crossing, fishing, triage)
Deliverable: frame editor page
Sprint 5 — Vocabulary
Table: vocabulary_item
Data: preferred terms for all 25 seed concepts
Deliverable: vocabulary editor with preferred/alternate/deprecated status
Sprint 6 — Corpus
Tables: corpus_item, corpus_concept, triage_option
Data: first 20 ground truth items, first 10 triage items
Deliverable: corpus entry page, triage option builder, concept linkage
5. Seed concepts — Sprint 2 data
From DOC-006. All tagged Maglemosian initially.
| Concept | Domain | Evidence grade |
|---|---|---|
| wet | Wetness | direct |
| dry | Wetness | direct |
| damp | Wetness | direct |
| soaked | Wetness | inferred |
| fire | Fire | direct |
| ember | Fire | direct |
| smoke | Fire | direct |
| shelter | Shelter | direct |
| hide | Shelter | direct |
| bark | Shelter | direct |
| marsh | Terrain | direct |
| reed | Terrain | direct |
| path | Terrain | inferred |
| river | Water travel | direct |
| crossing | Water travel | inferred |
| fish | Fishing | direct |
| trap | Fishing | direct |
| spear | Hunting | direct |
| wound | Injury | direct |
| limp | Injury | inferred |
| carry | Injury | inferred |
| dawn | Time cycles | inferred |
| dusk | Time cycles | inferred |
| elder | Social roles | analogue |
| child | Social roles | analogue |
6. Corpus specification
6.1 Ground truth corpus
Teaches stable causal relations. Statements must be:
- Present tense, declarative
- Measurement-free
- Culturally plausible for the tagged culture
- Linked to at least one concept via
corpus_concept
Field summary:
text— the causal statementculture_id— null for universal statementsconfidence— high/medium/lowapproved— reviewed and ready for training
Examples:
- Fire dries wet hides.
- Rain softens paths.
- Smoke drives insects away.
- Wet wood makes reluctant fire.
- Soaked bark floor cannot be slept on dry.
- Rising water warns of flood.
6.2 Simulation triage corpus
Teaches decisions and priorities under constraint. Each item must have
2-4 structured options via triage_option, exactly one marked preferred.
Field summary:
text— the scenario descriptionculture_id— null for universal scenariosconfidence— high/medium/lowapproved— reviewed and ready for training
Triage option fields:
option_text— what this choice involvesis_preferred— the recommended actionreason— why preferred or not preferredrank— display order
Examples:
- Wounded hunter cannot walk. (carry first vs continue hunt vs make camp)
- Fire goes out in heavy rain. (seek dry tinder vs use ember from shelter vs wait)
- Path floods at crossing. (find higher crossing vs wait vs wade)
7. Lexical governance
7.1 Purpose
Prevent semantic drift. Ensure vocabulary items remain measurement-free and culturally coherent across authors and sessions.
7.2 Controls per vocabulary item
| Control | Field | Notes |
|---|---|---|
| Preferred term | preferred = true |
One per concept |
| Allowed alternates | status = approved, preferred = false |
Multiple allowed |
| Deprecated terms | status = deprecated |
Kept for record, not used in new corpus |
| Restricted terms | status = restricted |
Context specified in notes |
7.3 Approval history
Saltcorn's built-in record history tracks who changed what and when. No separate approval log table is needed at this stage.
7.4 Constraint enforcement
Modern units and modern-only categories are excluded by editorial discipline at authoring time. A future model analysis pass will scan the corpus for violations and flag them for review. No constraint tables are maintained in this schema version.
8. What this does not decide
- The language model architecture or training pipeline
- How corpus items are exported to training format
- Whether vocabulary items are used as literal tokens or as semantic seeds for generation
- The multi-clan expansion beyond Maglemosian
- The integration between this corpus and the TESSERA spatial data layer
- Constraint enforcement implementation (deferred to model analysis pass)
Mesolithic Corpus Standard v1.0 — 2026-04-13 Status: Normative Next review: after Sprint 2 data entry is complete