initial upload
This commit is contained in:
702
docs/training/chunking/CIVICUS-ROMAN-MODEL-VISION-0001.md
Normal file
702
docs/training/chunking/CIVICUS-ROMAN-MODEL-VISION-0001.md
Normal file
@@ -0,0 +1,702 @@
|
||||
# CIVICUS-ROMAN-MODEL-VISION-0001
|
||||
## Rational Vision For A Bounded Roman Simulator Model
|
||||
### Status: Draft Vision
|
||||
### Layer: Training Infrastructure
|
||||
### Purpose: Define the practical rationale, scope, and training plan for the CIVICUS-ROMAN model
|
||||
### Repository Path: docs/training/chunking/CIVICUS-ROMAN-MODEL-VISION-0001.md
|
||||
|
||||
---
|
||||
|
||||
## 0. Purpose
|
||||
|
||||
This document defines the rational vision for the CIVICUS-ROMAN model.
|
||||
|
||||
The model is not intended to be a general chatbot.
|
||||
|
||||
The model is not intended to know all of history.
|
||||
|
||||
The model is not intended to imitate modern English reasoning with Roman facts attached.
|
||||
|
||||
The model is intended to operate inside a bounded Roman simulator world.
|
||||
|
||||
Its task is to reason, ask, answer, and speak from within that world.
|
||||
|
||||
---
|
||||
|
||||
## 1. Core Claim
|
||||
|
||||
A narrow Roman simulator model may be viable because the intended world is deliberately reduced.
|
||||
|
||||
The model does not need the full ontology of modern life.
|
||||
|
||||
It needs a bounded set of:
|
||||
|
||||
```text
|
||||
objects
|
||||
actions
|
||||
pressures
|
||||
actors
|
||||
places
|
||||
procedures
|
||||
records
|
||||
obligations
|
||||
materials
|
||||
routes
|
||||
risks
|
||||
social meanings
|
||||
```
|
||||
|
||||
The target is not general intelligence.
|
||||
|
||||
The target is Roman-bounded simulator intelligence.
|
||||
|
||||
---
|
||||
|
||||
## 2. The Problem With Existing Models
|
||||
|
||||
Existing general models are trained on modern reality.
|
||||
|
||||
Even when given Roman context, they tend to leak modern assumptions:
|
||||
|
||||
```text
|
||||
universal market price
|
||||
modern legal enforcement
|
||||
modern contract logic
|
||||
state-backed regulatory assumptions
|
||||
instant information
|
||||
abstract finance vocabulary
|
||||
modern supply-chain concepts
|
||||
consumer-market behavior
|
||||
modern moral and institutional framing
|
||||
```
|
||||
|
||||
Retrieval alone does not solve this.
|
||||
|
||||
RAG can supply correct facts, but the base model still interprets those facts through a modern ontology.
|
||||
|
||||
The goal of CIVICUS-ROMAN is to reduce or remove that ontology problem.
|
||||
|
||||
---
|
||||
|
||||
## 3. What The Model Must Learn
|
||||
|
||||
The model must learn to reason from Roman-visible primitives.
|
||||
|
||||
Examples:
|
||||
|
||||
```text
|
||||
Who saw it?
|
||||
Who heard it?
|
||||
Who wrote it?
|
||||
How old is the message?
|
||||
Is the seal broken?
|
||||
Who witnessed the bargain?
|
||||
Where are the carts?
|
||||
Can the goods move?
|
||||
Who benefits if the rumor is believed?
|
||||
What can safely be entered in the account?
|
||||
Is the obligation settled, pledged, delayed, or disputed?
|
||||
```
|
||||
|
||||
It must not default to:
|
||||
|
||||
```text
|
||||
What is the market price?
|
||||
Is the contract enforceable?
|
||||
What is the regulatory risk?
|
||||
What is the optimal modern transaction?
|
||||
```
|
||||
|
||||
The model should ask and answer in terms of objects, actions, pressures, and visible social facts.
|
||||
|
||||
---
|
||||
|
||||
## 4. Reduced World Grammar
|
||||
|
||||
The CIVICUS-ROMAN model should be trained around a controlled world grammar.
|
||||
|
||||
### Objects
|
||||
|
||||
```text
|
||||
coin
|
||||
purse
|
||||
chest
|
||||
tablet
|
||||
seal
|
||||
witness
|
||||
cart
|
||||
wheel
|
||||
mule
|
||||
road
|
||||
warehouse
|
||||
wall
|
||||
roof
|
||||
jar
|
||||
amphora
|
||||
crate
|
||||
rope
|
||||
weight
|
||||
measure
|
||||
gate
|
||||
market
|
||||
portico
|
||||
yard
|
||||
dust
|
||||
rain
|
||||
lamp
|
||||
grain
|
||||
oil
|
||||
bronze
|
||||
timber
|
||||
glass
|
||||
stone
|
||||
```
|
||||
|
||||
### Actions
|
||||
|
||||
```text
|
||||
buy
|
||||
sell
|
||||
carry
|
||||
store
|
||||
seal
|
||||
open
|
||||
count
|
||||
weigh
|
||||
measure
|
||||
pledge
|
||||
write
|
||||
witness
|
||||
hire
|
||||
repair
|
||||
delay
|
||||
ask
|
||||
refuse
|
||||
accuse
|
||||
confirm
|
||||
return
|
||||
split
|
||||
hold
|
||||
move
|
||||
settle
|
||||
hide
|
||||
leak
|
||||
wait
|
||||
rot
|
||||
spoil
|
||||
break
|
||||
arrive
|
||||
depart
|
||||
```
|
||||
|
||||
### Pressures
|
||||
|
||||
```text
|
||||
hunger
|
||||
rain
|
||||
delay
|
||||
spoilage
|
||||
debt
|
||||
rivalry
|
||||
shame
|
||||
praise
|
||||
shortage
|
||||
crowd
|
||||
rumor
|
||||
cart scarcity
|
||||
storage scarcity
|
||||
buyer urgency
|
||||
creditor pressure
|
||||
official attention
|
||||
bad road
|
||||
old news
|
||||
broken seal
|
||||
empty purse
|
||||
full warehouse
|
||||
```
|
||||
|
||||
The model should learn to combine these before reaching for abstract explanation.
|
||||
|
||||
---
|
||||
|
||||
## 5. Speech Principle
|
||||
|
||||
The model should prefer Roman-visible commercial speech.
|
||||
|
||||
Preferred:
|
||||
|
||||
```text
|
||||
The wheels are gone.
|
||||
The tablet arrived old.
|
||||
He owns jars, not coin.
|
||||
The road has eaten the profit.
|
||||
The crate is heavier than its name.
|
||||
The purse is fat and the street has eyes.
|
||||
```
|
||||
|
||||
Avoided:
|
||||
|
||||
```text
|
||||
Transport capacity is constrained.
|
||||
The information is stale.
|
||||
His assets are illiquid.
|
||||
Transportation cost eliminated the margin.
|
||||
The cargo is misclassified.
|
||||
Liquidity creates security risk.
|
||||
```
|
||||
|
||||
The purpose is not ornament.
|
||||
|
||||
The purpose is ontology.
|
||||
|
||||
A model learns the kind of world it inhabits through the language it is trained to use.
|
||||
|
||||
---
|
||||
|
||||
## 6. Corpus Architecture
|
||||
|
||||
The corpus is layered.
|
||||
|
||||
Each layer teaches a different kind of reasoning.
|
||||
|
||||
```text
|
||||
Layer 0 — Primitive Facts
|
||||
basic world rules
|
||||
|
||||
Layer 1 — Worked Examples
|
||||
arithmetic, cost, movement, profit, loss, settlement
|
||||
|
||||
Layer 2 — Uncertainty
|
||||
reports, rumors, old messages, hidden truth, confidence, confirmation
|
||||
|
||||
Layer 3 — Actor Perspective
|
||||
same event read differently by different Roman-world actors
|
||||
|
||||
Layer 4 — Dialogues
|
||||
in-world scenes that teach through speech, action, and consequence
|
||||
```
|
||||
|
||||
This layering is essential.
|
||||
|
||||
The model should not merely memorize dialogue.
|
||||
|
||||
It should learn the underlying reasoning forms that make the dialogue valid.
|
||||
|
||||
---
|
||||
|
||||
## 7. Vocabulary Generation Pipeline
|
||||
|
||||
A major part of the model vocabulary can be built through a generate-review-promote workflow.
|
||||
|
||||
The generator combines:
|
||||
|
||||
```text
|
||||
Object + Action + Pressure
|
||||
```
|
||||
|
||||
Example:
|
||||
|
||||
```text
|
||||
cart + hired elsewhere + buyer waiting
|
||||
= The wheels are gone, and the buyer will not wait for our excuses.
|
||||
```
|
||||
|
||||
Most generated phrases will be weak.
|
||||
|
||||
That is acceptable.
|
||||
|
||||
Humans are faster at recognizing strong expressions than inventing them cold.
|
||||
|
||||
The workflow is:
|
||||
|
||||
```text
|
||||
generate many candidates
|
||||
human flags useful expressions
|
||||
accepted expressions enter vocabulary
|
||||
strong expressions influence dialogue
|
||||
canonical expressions become simulator templates
|
||||
```
|
||||
|
||||
Only reviewed material enters training.
|
||||
|
||||
Raw churn is not training data.
|
||||
|
||||
---
|
||||
|
||||
## 8. Human And Agent Roles
|
||||
|
||||
Agents will perform much of the production work.
|
||||
|
||||
Agents can generate:
|
||||
|
||||
```text
|
||||
candidate expressions
|
||||
dialogue variants
|
||||
actor readings
|
||||
primitive examples
|
||||
uncertainty cases
|
||||
law scenarios
|
||||
architecture scenarios
|
||||
technology scenarios
|
||||
negative examples
|
||||
contamination tests
|
||||
```
|
||||
|
||||
Agents can also assist with:
|
||||
|
||||
```text
|
||||
format validation
|
||||
tag audit
|
||||
style checks
|
||||
duplicate detection
|
||||
forbidden vocabulary detection
|
||||
chunk extraction
|
||||
statistics
|
||||
regression tests
|
||||
```
|
||||
|
||||
Humans remain responsible for:
|
||||
|
||||
```text
|
||||
canon
|
||||
ontology
|
||||
final approval
|
||||
style judgment
|
||||
failure judgment
|
||||
domain boundaries
|
||||
promotion to training data
|
||||
```
|
||||
|
||||
The human role shifts from authoring every line to governing the corpus.
|
||||
|
||||
---
|
||||
|
||||
## 9. Training Strategy
|
||||
|
||||
The first serious training target should not be a general-purpose language model.
|
||||
|
||||
The first target should be a compact bounded simulator model.
|
||||
|
||||
A rational training progression:
|
||||
|
||||
```text
|
||||
Stage 1:
|
||||
Roman-visible vocabulary expressions
|
||||
|
||||
Stage 2:
|
||||
primitive facts and terse Q/A
|
||||
|
||||
Stage 3:
|
||||
worked examples with arithmetic and consequence
|
||||
|
||||
Stage 4:
|
||||
uncertainty examples and knowledge-boundary tests
|
||||
|
||||
Stage 5:
|
||||
actor-perspective readings
|
||||
|
||||
Stage 6:
|
||||
in-world dialogues
|
||||
|
||||
Stage 7:
|
||||
simulator-state-to-response pairs
|
||||
```
|
||||
|
||||
The model should learn from simple controlled forms before complex dialogue.
|
||||
|
||||
---
|
||||
|
||||
## 10. Scratch Training Reconsidered
|
||||
|
||||
Training a general model from nothing is expensive because the model must learn broad language, broad world knowledge, and general reasoning.
|
||||
|
||||
CIVICUS-ROMAN is different.
|
||||
|
||||
It does not need to answer every question.
|
||||
|
||||
It does not need modern breadth.
|
||||
|
||||
It does not need open-ended knowledge.
|
||||
|
||||
It needs competence inside a small Roman simulator world.
|
||||
|
||||
Therefore scratch or near-scratch training may be viable if the model is deliberately narrow.
|
||||
|
||||
The fair comparison is not:
|
||||
|
||||
```text
|
||||
small project vs general LLM
|
||||
```
|
||||
|
||||
The fair comparison is:
|
||||
|
||||
```text
|
||||
bounded simulator grammar + controlled corpus + agent-assisted data generation
|
||||
```
|
||||
|
||||
against:
|
||||
|
||||
```text
|
||||
modern-prior leakage from general models
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11. Simulator Ownership Of Reality
|
||||
|
||||
The model should not own the simulator state.
|
||||
|
||||
The simulator owns:
|
||||
|
||||
```text
|
||||
actors
|
||||
locations
|
||||
time
|
||||
inventory
|
||||
money
|
||||
routes
|
||||
documents
|
||||
seals
|
||||
witnesses
|
||||
obligations
|
||||
weather
|
||||
prices
|
||||
rumors
|
||||
official attention
|
||||
```
|
||||
|
||||
The model interprets, asks, answers, and speaks within that state.
|
||||
|
||||
The model should not invent facts that the simulator has not provided.
|
||||
|
||||
The model should prefer questions when state is insufficient.
|
||||
|
||||
Example:
|
||||
|
||||
```text
|
||||
What can be known?
|
||||
Who saw it?
|
||||
Who wrote it?
|
||||
Can the cart still move?
|
||||
Was the seal broken?
|
||||
Is there a witness?
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 12. Evaluation
|
||||
|
||||
The model must be tested against modern contamination.
|
||||
|
||||
Example failure prompt:
|
||||
|
||||
```text
|
||||
What is the fair market price?
|
||||
```
|
||||
|
||||
Roman-bounded response should reject universal price and ask about place, buyer, time, transport, and information.
|
||||
|
||||
Example failure prompt:
|
||||
|
||||
```text
|
||||
Can the contract be enforced?
|
||||
```
|
||||
|
||||
Roman-bounded response should ask about tablet, witness, seal, pledge, patron, magistrate, standing, and leverage.
|
||||
|
||||
Example failure prompt:
|
||||
|
||||
```text
|
||||
Was the information reliable?
|
||||
```
|
||||
|
||||
Roman-bounded response should ask who carried the word, how old it is, who benefits, whether anyone saw the goods, and what can be confirmed.
|
||||
|
||||
Evaluation must reward Roman-bounded reasoning and punish modern abstraction.
|
||||
|
||||
---
|
||||
|
||||
## 13. Domains To Add
|
||||
|
||||
The first domain is commerce.
|
||||
|
||||
Next domains should be added with the same layered discipline.
|
||||
|
||||
### Roman Law
|
||||
|
||||
```text
|
||||
standing
|
||||
complaint
|
||||
witness
|
||||
tablet
|
||||
seal
|
||||
pledge
|
||||
remedy
|
||||
magistrate
|
||||
patronage
|
||||
procedure
|
||||
public shame
|
||||
private settlement
|
||||
```
|
||||
|
||||
### Architecture
|
||||
|
||||
```text
|
||||
stone
|
||||
timber
|
||||
brick
|
||||
lime
|
||||
labor
|
||||
measurement
|
||||
site
|
||||
water
|
||||
weight
|
||||
collapse
|
||||
repair
|
||||
patron
|
||||
public work
|
||||
```
|
||||
|
||||
### Technology
|
||||
|
||||
```text
|
||||
tool
|
||||
craft
|
||||
material
|
||||
workshop
|
||||
repair
|
||||
failure
|
||||
skill
|
||||
apprentice
|
||||
measurement
|
||||
heat
|
||||
water
|
||||
wheel
|
||||
gear
|
||||
lever
|
||||
```
|
||||
|
||||
Each domain should develop:
|
||||
|
||||
```text
|
||||
Layer 0 primitives
|
||||
Layer 1 examples
|
||||
Layer 2 uncertainty
|
||||
Layer 3 actor readings
|
||||
Layer 4 dialogues
|
||||
controlled vocabulary
|
||||
contamination tests
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 14. Practical Near-Term Plan
|
||||
|
||||
Recommended next steps:
|
||||
|
||||
```text
|
||||
1. Freeze first commerce dialogue batch.
|
||||
2. Continue vocabulary generation standards.
|
||||
3. Build the expression candidate generator.
|
||||
4. Build a review interface for accept/reject/strong/canonical.
|
||||
5. Expand commerce vocabulary library.
|
||||
6. Add Roman Law Layer 0 primitives.
|
||||
7. Add Roman Law worked examples.
|
||||
8. Add Roman Law dialogues only after primitives exist.
|
||||
9. Build contamination tests.
|
||||
10. Compare:
|
||||
A. scratch small model
|
||||
B. near-scratch model
|
||||
C. small existing base model fine-tuned to OTIVM
|
||||
```
|
||||
|
||||
The comparison matters.
|
||||
|
||||
The project should not assume scratch training wins.
|
||||
|
||||
It should test whether scratch training reduces modern contamination enough to justify weaker inherited language ability.
|
||||
|
||||
---
|
||||
|
||||
## 15. Success Definition
|
||||
|
||||
CIVICUS-ROMAN succeeds if it can operate inside the simulator without modern leakage.
|
||||
|
||||
It should naturally produce questions and answers like:
|
||||
|
||||
```text
|
||||
Who carried the word?
|
||||
How old is the tablet?
|
||||
Was the seal broken?
|
||||
Can the cart still move?
|
||||
Who witnessed the promise?
|
||||
Does the account remain open?
|
||||
What does the buyer need before sundown?
|
||||
```
|
||||
|
||||
It should naturally speak like:
|
||||
|
||||
```text
|
||||
The wheels are gone.
|
||||
The tablet arrived old.
|
||||
He owns jars, not coin.
|
||||
The road has eaten the profit.
|
||||
The account remains open.
|
||||
The crate is heavier than its name.
|
||||
```
|
||||
|
||||
It should avoid:
|
||||
|
||||
```text
|
||||
supply chain disruption
|
||||
market efficiency
|
||||
legal compliance
|
||||
liquidity constraint
|
||||
regulatory exposure
|
||||
contractual enforcement
|
||||
```
|
||||
|
||||
The model is not meant to know less.
|
||||
|
||||
It is meant to know differently.
|
||||
|
||||
---
|
||||
|
||||
## 16. Final Vision
|
||||
|
||||
CIVICUS-ROMAN is a bounded-world model.
|
||||
|
||||
Its intelligence comes from discipline, not breadth.
|
||||
|
||||
Its strength is that it does not treat modern reality as default.
|
||||
|
||||
It learns a smaller world deeply:
|
||||
|
||||
```text
|
||||
what can be seen
|
||||
what can be carried
|
||||
what can be written
|
||||
what can be witnessed
|
||||
what can be pledged
|
||||
what can be delayed
|
||||
what can be hidden
|
||||
what can be settled
|
||||
```
|
||||
|
||||
This is the rational path:
|
||||
|
||||
```text
|
||||
controlled ontology
|
||||
layered corpus
|
||||
Roman-visible vocabulary
|
||||
agent-assisted generation
|
||||
human canon approval
|
||||
strict validation
|
||||
small model experiments
|
||||
simulator-owned state
|
||||
contamination testing
|
||||
```
|
||||
|
||||
The purpose is to build a model that does not merely describe Ancient Rome.
|
||||
|
||||
The purpose is to build a model that can think inside the civic Roman world of the simulator.
|
||||
Reference in New Issue
Block a user