# Mesolithic Corpus Standard ### Version: 1.0 ### Status: Normative ### Date: 2026-04-13 ### Author: Claude Sonnet 4.6, approved by project owner --- ## 1. Mission and scope Build a defensible Mesolithic Thesaurus, Vocabulary, and Dictionary in Saltcorn to support controlled corpus generation for a language model grounded in prehistoric lifeways. ### 1.1 Core outputs | Output | Purpose | |---|---| | Thesaurus | Meaning relationships — domains, concepts, scales, frames | | Vocabulary | Approved lexical forms per concept | | Dictionary | Human-readable entries combining concept + vocabulary | | Ground truth corpus | Stable causal relations for model training | | Simulation triage corpus | Decision and priority patterns for model training | ### 1.2 Constraints - No modern units or modern-only categories in any generated language - Meaning-first design — surface forms are secondary to semantic structure - Culture-aware context — concepts tagged to applicable culture horizons - UI-first workflow — table → view → page → data, without exception - Constraint enforcement is editorial, not schema-enforced. A future model analysis pass will check the corpus for violations. No constraint tables in this schema. ### 1.3 Initial focus Maglemosian / Nerava northern wetland context. All four culture horizons are represented in the schema but Maglemosian is populated first. ### 1.4 Out of scope - Game systems and full simulation engines - Speculative conlang reconstruction - Broad ontology sprawl - Academic citation management - Constraint enforcement tables (deferred to model analysis) --- ## 2. Schema Ten tables. No table is added without a proven workflow need. ### 2.1 `domain` The semantic domain hierarchy. Domains are self-referential — a domain can have a parent domain. | Field | Type | Notes | |---|---|---| | id | integer | Primary key | | label | text | Human-readable name (e.g. "Weather", "Wetness") | | parent_id | integer | References `domain.id` — null for top-level domains | **Seed domains (in priority order):** Weather, Wetness, Fire, Shelter, Water travel, Hunting, Fishing, Injury, Storage, Terrain, Time cycles, Social roles. --- ### 2.2 `culture` The four target Mesolithic culture horizons. Lookup table — values are fixed and do not grow without explicit decision. | Field | Type | Notes | |---|---|---| | id | integer | Primary key | | label | text | Culture name | | ecology_note | text | Brief ecological context | | date_range_note | text | Approximate date range | **Fixed values:** | Label | Ecology | Date range | |---|---|---| | Maglemosian | Northern lake/peatland, open woodland | ~9500–6000 BCE | | Ertebølle | Coastal, lagoonal, shell midden | ~5400–3900 BCE | | Sauveterrian | Western Mediterranean upland/lowland | ~9000–6000 BCE | | Azilian | Franco-Cantabrian cave/rock-shelter | ~12000–9000 BCE | --- ### 2.3 `concept` The core meaning nodes of the thesaurus. Each concept belongs to a domain and carries an evidence grade. | Field | Type | Notes | |---|---|---| | id | integer | Primary key | | domain_id | integer | References `domain.id` | | label | text | Concept identifier (e.g. "wet", "ember", "crossing") | | definition | text | Plain language definition, measurement-free | | evidence_grade | enum | `direct` / `analogue` / `inferred` | | notes | text | Optional authoring notes | **Evidence grade values:** - `direct` — concept is directly supported by archaeological record - `analogue` — concept is supported by ethnographic analogue - `inferred` — concept follows from physical or ecological inference Culture applicability is stored in `concept_culture`, not here. --- ### 2.4 `concept_culture` Join table linking concepts to applicable culture horizons. A concept with no rows here applies to all cultures. | Field | Type | Notes | |---|---|---| | id | integer | Primary key | | concept_id | integer | References `concept.id` | | culture_id | integer | References `culture.id` | | context_note | text | Optional note on culture-specific usage | --- ### 2.5 `scale` A gradient dimension associated with a concept. A concept may have multiple scales (e.g. "wetness" has a dryness scale and a weight scale). | Field | Type | Notes | |---|---|---| | id | integer | Primary key | | concept_id | integer | References `concept.id` | | label | text | Scale name (e.g. "dryness", "ice safety") | --- ### 2.6 `scale_step` Ordered steps within a scale. Steps are ordered by rank and may reference an antonym step. | Field | Type | Notes | |---|---|---| | id | integer | Primary key | | scale_id | integer | References `scale.id` | | rank | integer | Ordering — lower = one end of spectrum | | label | text | Step label (e.g. "dry", "damp", "soaked") | | antonym_step_id | integer | References another `scale_step.id` — optional | | is_danger_threshold | boolean | Marks steps that represent hazard onset | | notes | text | Optional authoring notes | **Example — wetness scale:** | Rank | Label | Danger threshold | |---|---|---| | 1 | dry | No | | 2 | damp | No | | 3 | wet | No | | 4 | soaked | Yes | --- ### 2.7 `frame` An action frame associated with a concept. Stores the typical roles (actor, patient, tool, place) for actions involving this concept. One frame per concept is the norm; complex concepts may have more. | Field | Type | Notes | |---|---|---| | id | integer | Primary key | | concept_id | integer | References `concept.id` | | label | text | Frame name (e.g. "drying hides", "crossing river") | | actor | text | Who performs the action | | patient | text | What is acted upon | | tool | text | What instrument is used | | place | text | Where the action occurs | | notes | text | Optional authoring notes | --- ### 2.8 `vocabulary_item` Approved lexical forms for a concept. A concept may have multiple vocabulary items — one preferred, others allowed alternates. | Field | Type | Notes | |---|---|---| | id | integer | Primary key | | concept_id | integer | References `concept.id` | | term | text | The surface form (e.g. "wet", "soaked", "waterlogged") | | preferred | boolean | True for the primary term | | register | text | Usage register (e.g. "narrative", "triage", "both") | | status | enum | `approved` / `deprecated` / `restricted` | | notes | text | Optional governance notes | **Status values:** - `approved` — use freely - `deprecated` — do not use in new corpus items; kept for historical record - `restricted` — use only in specified contexts (noted in `notes`) --- ### 2.9 `corpus_item` A single ground truth or triage corpus item. Ground truth items teach stable causal relations. Triage items teach decisions and priorities. | Field | Type | Notes | |---|---|---| | id | integer | Primary key | | corpus_type | enum | `ground_truth` / `triage` | | culture_id | integer | References `culture.id` — null means all cultures | | text | text | The corpus statement (ground truth) or scenario (triage) | | confidence | enum | `high` / `medium` / `low` | | approved | boolean | True when reviewed and approved for training use | | notes | text | Optional authoring notes | **Ground truth example:** ``` corpus_type: ground_truth text: "Fire dries wet hides." confidence: high approved: true ``` **Triage example:** ``` corpus_type: triage text: "Hunter returns with deep leg wound and cannot walk unassisted." confidence: high approved: true ``` Triage options are stored in `triage_option`. --- ### 2.10 `corpus_concept` Join table linking corpus items to the concepts they involve. Enables completeness checks and concept-driven corpus browsing. | Field | Type | Notes | |---|---|---| | id | integer | Primary key | | corpus_item_id | integer | References `corpus_item.id` | | concept_id | integer | References `concept.id` | | role_note | text | Optional note on how concept appears in this item | --- ### 2.11 `triage_option` Structured options for triage corpus items. Each triage item has 2-4 options, exactly one marked as preferred. | Field | Type | Notes | |---|---|---| | id | integer | Primary key | | corpus_item_id | integer | References `corpus_item.id` | | option_text | text | Description of this option | | is_preferred | boolean | True for the recommended action | | reason | text | Why this option is preferred or not | | rank | integer | Display order | **Example — triage options for wounded hunter scenario:** | Option | Preferred | Reason | |---|---|---| | Carry hunter back immediately | Yes | Wound is deep, cannot walk, delay increases risk | | Continue hunt, send one person back | No | Splits group, leaves hunter without full support | | Make camp here and rest | No | Wound needs shelter and fire, not open ground | --- ## 3. Workflow rule Every table follows this delivery sequence without exception: ``` 1. Table — created in Saltcorn 2. View — at minimum a list view and a detail view 3. Page — at minimum one usable entry/edit page 4. Data — production records entered only via pages, never raw grids ``` **Rules:** - No production records entered in raw table grids - Every new table ships with at least one usable page before data entry begins - Build vertically, not horizontally — one complete table/view/page/data cycle before starting the next table --- ## 4. Sprint plan Sprints are ordered by dependency. Do not start a sprint until the previous sprint's data entry phase is complete and verified. ### Sprint 1 — Foundation Tables: `domain`, `culture` Data: 12 seed domains, 4 culture records Deliverable: domain browser page, culture lookup page ### Sprint 2 — Core concepts Tables: `concept`, `concept_culture` Data: 25 seed concepts from DOC-006, tagged to Maglemosian Deliverable: concept editor page with domain and culture assignment ### Sprint 3 — Scales Tables: `scale`, `scale_step` Data: scales for wetness, fire state, ice safety, injury severity Deliverable: scale builder page with ordered steps ### Sprint 4 — Frames Table: `frame` Data: frames for key action concepts (drying, crossing, fishing, triage) Deliverable: frame editor page ### Sprint 5 — Vocabulary Table: `vocabulary_item` Data: preferred terms for all 25 seed concepts Deliverable: vocabulary editor with preferred/alternate/deprecated status ### Sprint 6 — Corpus Tables: `corpus_item`, `corpus_concept`, `triage_option` Data: first 20 ground truth items, first 10 triage items Deliverable: corpus entry page, triage option builder, concept linkage --- ## 5. Seed concepts — Sprint 2 data From DOC-006. All tagged Maglemosian initially. | Concept | Domain | Evidence grade | |---|---|---| | wet | Wetness | direct | | dry | Wetness | direct | | damp | Wetness | direct | | soaked | Wetness | inferred | | fire | Fire | direct | | ember | Fire | direct | | smoke | Fire | direct | | shelter | Shelter | direct | | hide | Shelter | direct | | bark | Shelter | direct | | marsh | Terrain | direct | | reed | Terrain | direct | | path | Terrain | inferred | | river | Water travel | direct | | crossing | Water travel | inferred | | fish | Fishing | direct | | trap | Fishing | direct | | spear | Hunting | direct | | wound | Injury | direct | | limp | Injury | inferred | | carry | Injury | inferred | | dawn | Time cycles | inferred | | dusk | Time cycles | inferred | | elder | Social roles | analogue | | child | Social roles | analogue | --- ## 6. Corpus specification ### 6.1 Ground truth corpus Teaches stable causal relations. Statements must be: - Present tense, declarative - Measurement-free - Culturally plausible for the tagged culture - Linked to at least one concept via `corpus_concept` **Field summary:** - `text` — the causal statement - `culture_id` — null for universal statements - `confidence` — high/medium/low - `approved` — reviewed and ready for training **Examples:** - Fire dries wet hides. - Rain softens paths. - Smoke drives insects away. - Wet wood makes reluctant fire. - Soaked bark floor cannot be slept on dry. - Rising water warns of flood. ### 6.2 Simulation triage corpus Teaches decisions and priorities under constraint. Each item must have 2-4 structured options via `triage_option`, exactly one marked preferred. **Field summary:** - `text` — the scenario description - `culture_id` — null for universal scenarios - `confidence` — high/medium/low - `approved` — reviewed and ready for training **Triage option fields:** - `option_text` — what this choice involves - `is_preferred` — the recommended action - `reason` — why preferred or not preferred - `rank` — display order **Examples:** - Wounded hunter cannot walk. (carry first vs continue hunt vs make camp) - Fire goes out in heavy rain. (seek dry tinder vs use ember from shelter vs wait) - Path floods at crossing. (find higher crossing vs wait vs wade) --- ## 7. Lexical governance ### 7.1 Purpose Prevent semantic drift. Ensure vocabulary items remain measurement-free and culturally coherent across authors and sessions. ### 7.2 Controls per vocabulary item | Control | Field | Notes | |---|---|---| | Preferred term | `preferred = true` | One per concept | | Allowed alternates | `status = approved, preferred = false` | Multiple allowed | | Deprecated terms | `status = deprecated` | Kept for record, not used in new corpus | | Restricted terms | `status = restricted` | Context specified in `notes` | ### 7.3 Approval history Saltcorn's built-in record history tracks who changed what and when. No separate approval log table is needed at this stage. ### 7.4 Constraint enforcement Modern units and modern-only categories are excluded by editorial discipline at authoring time. A future model analysis pass will scan the corpus for violations and flag them for review. No constraint tables are maintained in this schema version. --- ## 8. What this does not decide - The language model architecture or training pipeline - How corpus items are exported to training format - Whether vocabulary items are used as literal tokens or as semantic seeds for generation - The multi-clan expansion beyond Maglemosian - The integration between this corpus and the TESSERA spatial data layer - Constraint enforcement implementation (deferred to model analysis pass) --- *Mesolithic Corpus Standard v1.0 — 2026-04-13* *Status: Normative* *Next review: after Sprint 2 data entry is complete*