Files

TheRON 34316d2429 init: CIVICVS repository — CBPs, corpus standard, directory structure

- README.md: project identity, TESSERA relationship, directory layout
- CIVICVS-CBPs.md: CBP-001 through CBP-006 adapted for CIVICVS
- docs/corpus/mesolithic-corpus-standard-v1.md: 10-table schema, 6-sprint plan, 25 seed concepts
Per CBP-001: committed same session as produced.

2026-04-18 05:29:09 +00:00

14 KiB

Raw Blame History

Mesolithic Corpus Standard

Version: 1.0

Status: Normative

Date: 2026-04-13

Author: Claude Sonnet 4.6, approved by project owner

1. Mission and scope

Build a defensible Mesolithic Thesaurus, Vocabulary, and Dictionary in Saltcorn to support controlled corpus generation for a language model grounded in prehistoric lifeways.

1.1 Core outputs

Output	Purpose
Thesaurus	Meaning relationships — domains, concepts, scales, frames
Vocabulary	Approved lexical forms per concept
Dictionary	Human-readable entries combining concept + vocabulary
Ground truth corpus	Stable causal relations for model training
Simulation triage corpus	Decision and priority patterns for model training

1.2 Constraints

No modern units or modern-only categories in any generated language
Meaning-first design — surface forms are secondary to semantic structure
Culture-aware context — concepts tagged to applicable culture horizons
UI-first workflow — table → view → page → data, without exception
Constraint enforcement is editorial, not schema-enforced. A future model analysis pass will check the corpus for violations. No constraint tables in this schema.

1.3 Initial focus

Maglemosian / Nerava northern wetland context. All four culture horizons are represented in the schema but Maglemosian is populated first.

1.4 Out of scope

Game systems and full simulation engines
Speculative conlang reconstruction
Broad ontology sprawl
Academic citation management
Constraint enforcement tables (deferred to model analysis)

2. Schema

Ten tables. No table is added without a proven workflow need.

2.1 `domain`

The semantic domain hierarchy. Domains are self-referential — a domain can have a parent domain.

Field	Type	Notes
id	integer	Primary key
label	text	Human-readable name (e.g. "Weather", "Wetness")
parent_id	integer	References `domain.id` — null for top-level domains

Seed domains (in priority order): Weather, Wetness, Fire, Shelter, Water travel, Hunting, Fishing, Injury, Storage, Terrain, Time cycles, Social roles.

2.2 `culture`

The four target Mesolithic culture horizons. Lookup table — values are fixed and do not grow without explicit decision.

Field	Type	Notes
id	integer	Primary key
label	text	Culture name
ecology_note	text	Brief ecological context
date_range_note	text	Approximate date range

Fixed values:

Label	Ecology	Date range
Maglemosian	Northern lake/peatland, open woodland	~9500–6000 BCE
Ertebølle	Coastal, lagoonal, shell midden	~5400–3900 BCE
Sauveterrian	Western Mediterranean upland/lowland	~9000–6000 BCE
Azilian	Franco-Cantabrian cave/rock-shelter	~12000–9000 BCE

2.3 `concept`

The core meaning nodes of the thesaurus. Each concept belongs to a domain and carries an evidence grade.

Field	Type	Notes
id	integer	Primary key
domain_id	integer	References `domain.id`
label	text	Concept identifier (e.g. "wet", "ember", "crossing")
definition	text	Plain language definition, measurement-free
evidence_grade	enum	`direct` / `analogue` / `inferred`
notes	text	Optional authoring notes

Evidence grade values:

direct — concept is directly supported by archaeological record
analogue — concept is supported by ethnographic analogue
inferred — concept follows from physical or ecological inference

Culture applicability is stored in concept_culture, not here.

2.4 `concept_culture`

Join table linking concepts to applicable culture horizons. A concept with no rows here applies to all cultures.

Field	Type	Notes
id	integer	Primary key
concept_id	integer	References `concept.id`
culture_id	integer	References `culture.id`
context_note	text	Optional note on culture-specific usage

2.5 `scale`

A gradient dimension associated with a concept. A concept may have multiple scales (e.g. "wetness" has a dryness scale and a weight scale).

Field	Type	Notes
id	integer	Primary key
concept_id	integer	References `concept.id`
label	text	Scale name (e.g. "dryness", "ice safety")

2.6 `scale_step`

Ordered steps within a scale. Steps are ordered by rank and may reference an antonym step.

Field	Type	Notes
id	integer	Primary key
scale_id	integer	References `scale.id`
rank	integer	Ordering — lower = one end of spectrum
label	text	Step label (e.g. "dry", "damp", "soaked")
antonym_step_id	integer	References another `scale_step.id` — optional
is_danger_threshold	boolean	Marks steps that represent hazard onset
notes	text	Optional authoring notes

Example — wetness scale:

Rank	Label	Danger threshold
1	dry	No
2	damp	No
3	wet	No
4	soaked	Yes

2.7 `frame`

An action frame associated with a concept. Stores the typical roles (actor, patient, tool, place) for actions involving this concept. One frame per concept is the norm; complex concepts may have more.

Field	Type	Notes
id	integer	Primary key
concept_id	integer	References `concept.id`
label	text	Frame name (e.g. "drying hides", "crossing river")
actor	text	Who performs the action
patient	text	What is acted upon
tool	text	What instrument is used
place	text	Where the action occurs
notes	text	Optional authoring notes

2.8 `vocabulary_item`

Approved lexical forms for a concept. A concept may have multiple vocabulary items — one preferred, others allowed alternates.

Field	Type	Notes
id	integer	Primary key
concept_id	integer	References `concept.id`
term	text	The surface form (e.g. "wet", "soaked", "waterlogged")
preferred	boolean	True for the primary term
register	text	Usage register (e.g. "narrative", "triage", "both")
status	enum	`approved` / `deprecated` / `restricted`
notes	text	Optional governance notes

Status values:

approved — use freely
deprecated — do not use in new corpus items; kept for historical record
restricted — use only in specified contexts (noted in notes)

2.9 `corpus_item`

A single ground truth or triage corpus item. Ground truth items teach stable causal relations. Triage items teach decisions and priorities.

Field	Type	Notes
id	integer	Primary key
corpus_type	enum	`ground_truth` / `triage`
culture_id	integer	References `culture.id` — null means all cultures
text	text	The corpus statement (ground truth) or scenario (triage)
confidence	enum	`high` / `medium` / `low`
approved	boolean	True when reviewed and approved for training use
notes	text	Optional authoring notes

Ground truth example:

corpus_type: ground_truth
text: "Fire dries wet hides."
confidence: high
approved: true

Triage example:

corpus_type: triage
text: "Hunter returns with deep leg wound and cannot walk unassisted."
confidence: high
approved: true

Triage options are stored in triage_option.

2.10 `corpus_concept`

Join table linking corpus items to the concepts they involve. Enables completeness checks and concept-driven corpus browsing.

Field	Type	Notes
id	integer	Primary key
corpus_item_id	integer	References `corpus_item.id`
concept_id	integer	References `concept.id`
role_note	text	Optional note on how concept appears in this item

2.11 `triage_option`

Structured options for triage corpus items. Each triage item has 2-4 options, exactly one marked as preferred.

Field	Type	Notes
id	integer	Primary key
corpus_item_id	integer	References `corpus_item.id`
option_text	text	Description of this option
is_preferred	boolean	True for the recommended action
reason	text	Why this option is preferred or not
rank	integer	Display order

Example — triage options for wounded hunter scenario:

Option	Preferred	Reason
Carry hunter back immediately	Yes	Wound is deep, cannot walk, delay increases risk
Continue hunt, send one person back	No	Splits group, leaves hunter without full support
Make camp here and rest	No	Wound needs shelter and fire, not open ground

3. Workflow rule

Every table follows this delivery sequence without exception:

1. Table     — created in Saltcorn
2. View      — at minimum a list view and a detail view
3. Page      — at minimum one usable entry/edit page
4. Data      — production records entered only via pages, never raw grids

Rules:

No production records entered in raw table grids
Every new table ships with at least one usable page before data entry begins
Build vertically, not horizontally — one complete table/view/page/data cycle before starting the next table

4. Sprint plan

Sprints are ordered by dependency. Do not start a sprint until the previous sprint's data entry phase is complete and verified.

Sprint 1 — Foundation

Tables: domain, culture Data: 12 seed domains, 4 culture records Deliverable: domain browser page, culture lookup page

Sprint 2 — Core concepts

Tables: concept, concept_culture Data: 25 seed concepts from DOC-006, tagged to Maglemosian Deliverable: concept editor page with domain and culture assignment

Sprint 3 — Scales

Tables: scale, scale_step Data: scales for wetness, fire state, ice safety, injury severity Deliverable: scale builder page with ordered steps

Sprint 4 — Frames

Table: frame Data: frames for key action concepts (drying, crossing, fishing, triage) Deliverable: frame editor page

Sprint 5 — Vocabulary

Table: vocabulary_item Data: preferred terms for all 25 seed concepts Deliverable: vocabulary editor with preferred/alternate/deprecated status

Sprint 6 — Corpus

Tables: corpus_item, corpus_concept, triage_option Data: first 20 ground truth items, first 10 triage items Deliverable: corpus entry page, triage option builder, concept linkage

5. Seed concepts — Sprint 2 data

From DOC-006. All tagged Maglemosian initially.

Concept	Domain	Evidence grade
wet	Wetness	direct
dry	Wetness	direct
damp	Wetness	direct
soaked	Wetness	inferred
fire	Fire	direct
ember	Fire	direct
smoke	Fire	direct
shelter	Shelter	direct
hide	Shelter	direct
bark	Shelter	direct
marsh	Terrain	direct
reed	Terrain	direct
path	Terrain	inferred
river	Water travel	direct
crossing	Water travel	inferred
fish	Fishing	direct
trap	Fishing	direct
spear	Hunting	direct
wound	Injury	direct
limp	Injury	inferred
carry	Injury	inferred
dawn	Time cycles	inferred
dusk	Time cycles	inferred
elder	Social roles	analogue
child	Social roles	analogue

6. Corpus specification

6.1 Ground truth corpus

Teaches stable causal relations. Statements must be:

Present tense, declarative
Measurement-free
Culturally plausible for the tagged culture
Linked to at least one concept via corpus_concept

Field summary:

text — the causal statement
culture_id — null for universal statements
confidence — high/medium/low
approved — reviewed and ready for training

Examples:

Fire dries wet hides.
Rain softens paths.
Smoke drives insects away.
Wet wood makes reluctant fire.
Soaked bark floor cannot be slept on dry.
Rising water warns of flood.

6.2 Simulation triage corpus

Teaches decisions and priorities under constraint. Each item must have 2-4 structured options via triage_option, exactly one marked preferred.

Field summary:

text — the scenario description
culture_id — null for universal scenarios
confidence — high/medium/low
approved — reviewed and ready for training

Triage option fields:

option_text — what this choice involves
is_preferred — the recommended action
reason — why preferred or not preferred
rank — display order

Examples:

Wounded hunter cannot walk. (carry first vs continue hunt vs make camp)
Fire goes out in heavy rain. (seek dry tinder vs use ember from shelter vs wait)
Path floods at crossing. (find higher crossing vs wait vs wade)

7. Lexical governance

7.1 Purpose

Prevent semantic drift. Ensure vocabulary items remain measurement-free and culturally coherent across authors and sessions.

7.2 Controls per vocabulary item

Control	Field	Notes
Preferred term	`preferred = true`	One per concept
Allowed alternates	`status = approved, preferred = false`	Multiple allowed
Deprecated terms	`status = deprecated`	Kept for record, not used in new corpus
Restricted terms	`status = restricted`	Context specified in `notes`

7.3 Approval history

Saltcorn's built-in record history tracks who changed what and when. No separate approval log table is needed at this stage.

7.4 Constraint enforcement

Modern units and modern-only categories are excluded by editorial discipline at authoring time. A future model analysis pass will scan the corpus for violations and flag them for review. No constraint tables are maintained in this schema version.

8. What this does not decide

The language model architecture or training pipeline
How corpus items are exported to training format
Whether vocabulary items are used as literal tokens or as semantic seeds for generation
The multi-clan expansion beyond Maglemosian
The integration between this corpus and the TESSERA spatial data layer
Constraint enforcement implementation (deferred to model analysis pass)

Mesolithic Corpus Standard v1.0 — 2026-04-13 Status: Normative Next review: after Sprint 2 data entry is complete

14 KiB Raw Blame History Unescape Escape