initial upload
This commit is contained in:
702
docs/training/chunking/CIVICUS-ROMAN-MODEL-VISION-0001.md
Normal file
702
docs/training/chunking/CIVICUS-ROMAN-MODEL-VISION-0001.md
Normal file
@@ -0,0 +1,702 @@
|
|||||||
|
# CIVICUS-ROMAN-MODEL-VISION-0001
|
||||||
|
## Rational Vision For A Bounded Roman Simulator Model
|
||||||
|
### Status: Draft Vision
|
||||||
|
### Layer: Training Infrastructure
|
||||||
|
### Purpose: Define the practical rationale, scope, and training plan for the CIVICUS-ROMAN model
|
||||||
|
### Repository Path: docs/training/chunking/CIVICUS-ROMAN-MODEL-VISION-0001.md
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 0. Purpose
|
||||||
|
|
||||||
|
This document defines the rational vision for the CIVICUS-ROMAN model.
|
||||||
|
|
||||||
|
The model is not intended to be a general chatbot.
|
||||||
|
|
||||||
|
The model is not intended to know all of history.
|
||||||
|
|
||||||
|
The model is not intended to imitate modern English reasoning with Roman facts attached.
|
||||||
|
|
||||||
|
The model is intended to operate inside a bounded Roman simulator world.
|
||||||
|
|
||||||
|
Its task is to reason, ask, answer, and speak from within that world.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Core Claim
|
||||||
|
|
||||||
|
A narrow Roman simulator model may be viable because the intended world is deliberately reduced.
|
||||||
|
|
||||||
|
The model does not need the full ontology of modern life.
|
||||||
|
|
||||||
|
It needs a bounded set of:
|
||||||
|
|
||||||
|
```text
|
||||||
|
objects
|
||||||
|
actions
|
||||||
|
pressures
|
||||||
|
actors
|
||||||
|
places
|
||||||
|
procedures
|
||||||
|
records
|
||||||
|
obligations
|
||||||
|
materials
|
||||||
|
routes
|
||||||
|
risks
|
||||||
|
social meanings
|
||||||
|
```
|
||||||
|
|
||||||
|
The target is not general intelligence.
|
||||||
|
|
||||||
|
The target is Roman-bounded simulator intelligence.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. The Problem With Existing Models
|
||||||
|
|
||||||
|
Existing general models are trained on modern reality.
|
||||||
|
|
||||||
|
Even when given Roman context, they tend to leak modern assumptions:
|
||||||
|
|
||||||
|
```text
|
||||||
|
universal market price
|
||||||
|
modern legal enforcement
|
||||||
|
modern contract logic
|
||||||
|
state-backed regulatory assumptions
|
||||||
|
instant information
|
||||||
|
abstract finance vocabulary
|
||||||
|
modern supply-chain concepts
|
||||||
|
consumer-market behavior
|
||||||
|
modern moral and institutional framing
|
||||||
|
```
|
||||||
|
|
||||||
|
Retrieval alone does not solve this.
|
||||||
|
|
||||||
|
RAG can supply correct facts, but the base model still interprets those facts through a modern ontology.
|
||||||
|
|
||||||
|
The goal of CIVICUS-ROMAN is to reduce or remove that ontology problem.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. What The Model Must Learn
|
||||||
|
|
||||||
|
The model must learn to reason from Roman-visible primitives.
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Who saw it?
|
||||||
|
Who heard it?
|
||||||
|
Who wrote it?
|
||||||
|
How old is the message?
|
||||||
|
Is the seal broken?
|
||||||
|
Who witnessed the bargain?
|
||||||
|
Where are the carts?
|
||||||
|
Can the goods move?
|
||||||
|
Who benefits if the rumor is believed?
|
||||||
|
What can safely be entered in the account?
|
||||||
|
Is the obligation settled, pledged, delayed, or disputed?
|
||||||
|
```
|
||||||
|
|
||||||
|
It must not default to:
|
||||||
|
|
||||||
|
```text
|
||||||
|
What is the market price?
|
||||||
|
Is the contract enforceable?
|
||||||
|
What is the regulatory risk?
|
||||||
|
What is the optimal modern transaction?
|
||||||
|
```
|
||||||
|
|
||||||
|
The model should ask and answer in terms of objects, actions, pressures, and visible social facts.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Reduced World Grammar
|
||||||
|
|
||||||
|
The CIVICUS-ROMAN model should be trained around a controlled world grammar.
|
||||||
|
|
||||||
|
### Objects
|
||||||
|
|
||||||
|
```text
|
||||||
|
coin
|
||||||
|
purse
|
||||||
|
chest
|
||||||
|
tablet
|
||||||
|
seal
|
||||||
|
witness
|
||||||
|
cart
|
||||||
|
wheel
|
||||||
|
mule
|
||||||
|
road
|
||||||
|
warehouse
|
||||||
|
wall
|
||||||
|
roof
|
||||||
|
jar
|
||||||
|
amphora
|
||||||
|
crate
|
||||||
|
rope
|
||||||
|
weight
|
||||||
|
measure
|
||||||
|
gate
|
||||||
|
market
|
||||||
|
portico
|
||||||
|
yard
|
||||||
|
dust
|
||||||
|
rain
|
||||||
|
lamp
|
||||||
|
grain
|
||||||
|
oil
|
||||||
|
bronze
|
||||||
|
timber
|
||||||
|
glass
|
||||||
|
stone
|
||||||
|
```
|
||||||
|
|
||||||
|
### Actions
|
||||||
|
|
||||||
|
```text
|
||||||
|
buy
|
||||||
|
sell
|
||||||
|
carry
|
||||||
|
store
|
||||||
|
seal
|
||||||
|
open
|
||||||
|
count
|
||||||
|
weigh
|
||||||
|
measure
|
||||||
|
pledge
|
||||||
|
write
|
||||||
|
witness
|
||||||
|
hire
|
||||||
|
repair
|
||||||
|
delay
|
||||||
|
ask
|
||||||
|
refuse
|
||||||
|
accuse
|
||||||
|
confirm
|
||||||
|
return
|
||||||
|
split
|
||||||
|
hold
|
||||||
|
move
|
||||||
|
settle
|
||||||
|
hide
|
||||||
|
leak
|
||||||
|
wait
|
||||||
|
rot
|
||||||
|
spoil
|
||||||
|
break
|
||||||
|
arrive
|
||||||
|
depart
|
||||||
|
```
|
||||||
|
|
||||||
|
### Pressures
|
||||||
|
|
||||||
|
```text
|
||||||
|
hunger
|
||||||
|
rain
|
||||||
|
delay
|
||||||
|
spoilage
|
||||||
|
debt
|
||||||
|
rivalry
|
||||||
|
shame
|
||||||
|
praise
|
||||||
|
shortage
|
||||||
|
crowd
|
||||||
|
rumor
|
||||||
|
cart scarcity
|
||||||
|
storage scarcity
|
||||||
|
buyer urgency
|
||||||
|
creditor pressure
|
||||||
|
official attention
|
||||||
|
bad road
|
||||||
|
old news
|
||||||
|
broken seal
|
||||||
|
empty purse
|
||||||
|
full warehouse
|
||||||
|
```
|
||||||
|
|
||||||
|
The model should learn to combine these before reaching for abstract explanation.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Speech Principle
|
||||||
|
|
||||||
|
The model should prefer Roman-visible commercial speech.
|
||||||
|
|
||||||
|
Preferred:
|
||||||
|
|
||||||
|
```text
|
||||||
|
The wheels are gone.
|
||||||
|
The tablet arrived old.
|
||||||
|
He owns jars, not coin.
|
||||||
|
The road has eaten the profit.
|
||||||
|
The crate is heavier than its name.
|
||||||
|
The purse is fat and the street has eyes.
|
||||||
|
```
|
||||||
|
|
||||||
|
Avoided:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Transport capacity is constrained.
|
||||||
|
The information is stale.
|
||||||
|
His assets are illiquid.
|
||||||
|
Transportation cost eliminated the margin.
|
||||||
|
The cargo is misclassified.
|
||||||
|
Liquidity creates security risk.
|
||||||
|
```
|
||||||
|
|
||||||
|
The purpose is not ornament.
|
||||||
|
|
||||||
|
The purpose is ontology.
|
||||||
|
|
||||||
|
A model learns the kind of world it inhabits through the language it is trained to use.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Corpus Architecture
|
||||||
|
|
||||||
|
The corpus is layered.
|
||||||
|
|
||||||
|
Each layer teaches a different kind of reasoning.
|
||||||
|
|
||||||
|
```text
|
||||||
|
Layer 0 — Primitive Facts
|
||||||
|
basic world rules
|
||||||
|
|
||||||
|
Layer 1 — Worked Examples
|
||||||
|
arithmetic, cost, movement, profit, loss, settlement
|
||||||
|
|
||||||
|
Layer 2 — Uncertainty
|
||||||
|
reports, rumors, old messages, hidden truth, confidence, confirmation
|
||||||
|
|
||||||
|
Layer 3 — Actor Perspective
|
||||||
|
same event read differently by different Roman-world actors
|
||||||
|
|
||||||
|
Layer 4 — Dialogues
|
||||||
|
in-world scenes that teach through speech, action, and consequence
|
||||||
|
```
|
||||||
|
|
||||||
|
This layering is essential.
|
||||||
|
|
||||||
|
The model should not merely memorize dialogue.
|
||||||
|
|
||||||
|
It should learn the underlying reasoning forms that make the dialogue valid.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Vocabulary Generation Pipeline
|
||||||
|
|
||||||
|
A major part of the model vocabulary can be built through a generate-review-promote workflow.
|
||||||
|
|
||||||
|
The generator combines:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Object + Action + Pressure
|
||||||
|
```
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
```text
|
||||||
|
cart + hired elsewhere + buyer waiting
|
||||||
|
= The wheels are gone, and the buyer will not wait for our excuses.
|
||||||
|
```
|
||||||
|
|
||||||
|
Most generated phrases will be weak.
|
||||||
|
|
||||||
|
That is acceptable.
|
||||||
|
|
||||||
|
Humans are faster at recognizing strong expressions than inventing them cold.
|
||||||
|
|
||||||
|
The workflow is:
|
||||||
|
|
||||||
|
```text
|
||||||
|
generate many candidates
|
||||||
|
human flags useful expressions
|
||||||
|
accepted expressions enter vocabulary
|
||||||
|
strong expressions influence dialogue
|
||||||
|
canonical expressions become simulator templates
|
||||||
|
```
|
||||||
|
|
||||||
|
Only reviewed material enters training.
|
||||||
|
|
||||||
|
Raw churn is not training data.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Human And Agent Roles
|
||||||
|
|
||||||
|
Agents will perform much of the production work.
|
||||||
|
|
||||||
|
Agents can generate:
|
||||||
|
|
||||||
|
```text
|
||||||
|
candidate expressions
|
||||||
|
dialogue variants
|
||||||
|
actor readings
|
||||||
|
primitive examples
|
||||||
|
uncertainty cases
|
||||||
|
law scenarios
|
||||||
|
architecture scenarios
|
||||||
|
technology scenarios
|
||||||
|
negative examples
|
||||||
|
contamination tests
|
||||||
|
```
|
||||||
|
|
||||||
|
Agents can also assist with:
|
||||||
|
|
||||||
|
```text
|
||||||
|
format validation
|
||||||
|
tag audit
|
||||||
|
style checks
|
||||||
|
duplicate detection
|
||||||
|
forbidden vocabulary detection
|
||||||
|
chunk extraction
|
||||||
|
statistics
|
||||||
|
regression tests
|
||||||
|
```
|
||||||
|
|
||||||
|
Humans remain responsible for:
|
||||||
|
|
||||||
|
```text
|
||||||
|
canon
|
||||||
|
ontology
|
||||||
|
final approval
|
||||||
|
style judgment
|
||||||
|
failure judgment
|
||||||
|
domain boundaries
|
||||||
|
promotion to training data
|
||||||
|
```
|
||||||
|
|
||||||
|
The human role shifts from authoring every line to governing the corpus.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. Training Strategy
|
||||||
|
|
||||||
|
The first serious training target should not be a general-purpose language model.
|
||||||
|
|
||||||
|
The first target should be a compact bounded simulator model.
|
||||||
|
|
||||||
|
A rational training progression:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Stage 1:
|
||||||
|
Roman-visible vocabulary expressions
|
||||||
|
|
||||||
|
Stage 2:
|
||||||
|
primitive facts and terse Q/A
|
||||||
|
|
||||||
|
Stage 3:
|
||||||
|
worked examples with arithmetic and consequence
|
||||||
|
|
||||||
|
Stage 4:
|
||||||
|
uncertainty examples and knowledge-boundary tests
|
||||||
|
|
||||||
|
Stage 5:
|
||||||
|
actor-perspective readings
|
||||||
|
|
||||||
|
Stage 6:
|
||||||
|
in-world dialogues
|
||||||
|
|
||||||
|
Stage 7:
|
||||||
|
simulator-state-to-response pairs
|
||||||
|
```
|
||||||
|
|
||||||
|
The model should learn from simple controlled forms before complex dialogue.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 10. Scratch Training Reconsidered
|
||||||
|
|
||||||
|
Training a general model from nothing is expensive because the model must learn broad language, broad world knowledge, and general reasoning.
|
||||||
|
|
||||||
|
CIVICUS-ROMAN is different.
|
||||||
|
|
||||||
|
It does not need to answer every question.
|
||||||
|
|
||||||
|
It does not need modern breadth.
|
||||||
|
|
||||||
|
It does not need open-ended knowledge.
|
||||||
|
|
||||||
|
It needs competence inside a small Roman simulator world.
|
||||||
|
|
||||||
|
Therefore scratch or near-scratch training may be viable if the model is deliberately narrow.
|
||||||
|
|
||||||
|
The fair comparison is not:
|
||||||
|
|
||||||
|
```text
|
||||||
|
small project vs general LLM
|
||||||
|
```
|
||||||
|
|
||||||
|
The fair comparison is:
|
||||||
|
|
||||||
|
```text
|
||||||
|
bounded simulator grammar + controlled corpus + agent-assisted data generation
|
||||||
|
```
|
||||||
|
|
||||||
|
against:
|
||||||
|
|
||||||
|
```text
|
||||||
|
modern-prior leakage from general models
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 11. Simulator Ownership Of Reality
|
||||||
|
|
||||||
|
The model should not own the simulator state.
|
||||||
|
|
||||||
|
The simulator owns:
|
||||||
|
|
||||||
|
```text
|
||||||
|
actors
|
||||||
|
locations
|
||||||
|
time
|
||||||
|
inventory
|
||||||
|
money
|
||||||
|
routes
|
||||||
|
documents
|
||||||
|
seals
|
||||||
|
witnesses
|
||||||
|
obligations
|
||||||
|
weather
|
||||||
|
prices
|
||||||
|
rumors
|
||||||
|
official attention
|
||||||
|
```
|
||||||
|
|
||||||
|
The model interprets, asks, answers, and speaks within that state.
|
||||||
|
|
||||||
|
The model should not invent facts that the simulator has not provided.
|
||||||
|
|
||||||
|
The model should prefer questions when state is insufficient.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
```text
|
||||||
|
What can be known?
|
||||||
|
Who saw it?
|
||||||
|
Who wrote it?
|
||||||
|
Can the cart still move?
|
||||||
|
Was the seal broken?
|
||||||
|
Is there a witness?
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 12. Evaluation
|
||||||
|
|
||||||
|
The model must be tested against modern contamination.
|
||||||
|
|
||||||
|
Example failure prompt:
|
||||||
|
|
||||||
|
```text
|
||||||
|
What is the fair market price?
|
||||||
|
```
|
||||||
|
|
||||||
|
Roman-bounded response should reject universal price and ask about place, buyer, time, transport, and information.
|
||||||
|
|
||||||
|
Example failure prompt:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Can the contract be enforced?
|
||||||
|
```
|
||||||
|
|
||||||
|
Roman-bounded response should ask about tablet, witness, seal, pledge, patron, magistrate, standing, and leverage.
|
||||||
|
|
||||||
|
Example failure prompt:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Was the information reliable?
|
||||||
|
```
|
||||||
|
|
||||||
|
Roman-bounded response should ask who carried the word, how old it is, who benefits, whether anyone saw the goods, and what can be confirmed.
|
||||||
|
|
||||||
|
Evaluation must reward Roman-bounded reasoning and punish modern abstraction.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 13. Domains To Add
|
||||||
|
|
||||||
|
The first domain is commerce.
|
||||||
|
|
||||||
|
Next domains should be added with the same layered discipline.
|
||||||
|
|
||||||
|
### Roman Law
|
||||||
|
|
||||||
|
```text
|
||||||
|
standing
|
||||||
|
complaint
|
||||||
|
witness
|
||||||
|
tablet
|
||||||
|
seal
|
||||||
|
pledge
|
||||||
|
remedy
|
||||||
|
magistrate
|
||||||
|
patronage
|
||||||
|
procedure
|
||||||
|
public shame
|
||||||
|
private settlement
|
||||||
|
```
|
||||||
|
|
||||||
|
### Architecture
|
||||||
|
|
||||||
|
```text
|
||||||
|
stone
|
||||||
|
timber
|
||||||
|
brick
|
||||||
|
lime
|
||||||
|
labor
|
||||||
|
measurement
|
||||||
|
site
|
||||||
|
water
|
||||||
|
weight
|
||||||
|
collapse
|
||||||
|
repair
|
||||||
|
patron
|
||||||
|
public work
|
||||||
|
```
|
||||||
|
|
||||||
|
### Technology
|
||||||
|
|
||||||
|
```text
|
||||||
|
tool
|
||||||
|
craft
|
||||||
|
material
|
||||||
|
workshop
|
||||||
|
repair
|
||||||
|
failure
|
||||||
|
skill
|
||||||
|
apprentice
|
||||||
|
measurement
|
||||||
|
heat
|
||||||
|
water
|
||||||
|
wheel
|
||||||
|
gear
|
||||||
|
lever
|
||||||
|
```
|
||||||
|
|
||||||
|
Each domain should develop:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Layer 0 primitives
|
||||||
|
Layer 1 examples
|
||||||
|
Layer 2 uncertainty
|
||||||
|
Layer 3 actor readings
|
||||||
|
Layer 4 dialogues
|
||||||
|
controlled vocabulary
|
||||||
|
contamination tests
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 14. Practical Near-Term Plan
|
||||||
|
|
||||||
|
Recommended next steps:
|
||||||
|
|
||||||
|
```text
|
||||||
|
1. Freeze first commerce dialogue batch.
|
||||||
|
2. Continue vocabulary generation standards.
|
||||||
|
3. Build the expression candidate generator.
|
||||||
|
4. Build a review interface for accept/reject/strong/canonical.
|
||||||
|
5. Expand commerce vocabulary library.
|
||||||
|
6. Add Roman Law Layer 0 primitives.
|
||||||
|
7. Add Roman Law worked examples.
|
||||||
|
8. Add Roman Law dialogues only after primitives exist.
|
||||||
|
9. Build contamination tests.
|
||||||
|
10. Compare:
|
||||||
|
A. scratch small model
|
||||||
|
B. near-scratch model
|
||||||
|
C. small existing base model fine-tuned to OTIVM
|
||||||
|
```
|
||||||
|
|
||||||
|
The comparison matters.
|
||||||
|
|
||||||
|
The project should not assume scratch training wins.
|
||||||
|
|
||||||
|
It should test whether scratch training reduces modern contamination enough to justify weaker inherited language ability.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 15. Success Definition
|
||||||
|
|
||||||
|
CIVICUS-ROMAN succeeds if it can operate inside the simulator without modern leakage.
|
||||||
|
|
||||||
|
It should naturally produce questions and answers like:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Who carried the word?
|
||||||
|
How old is the tablet?
|
||||||
|
Was the seal broken?
|
||||||
|
Can the cart still move?
|
||||||
|
Who witnessed the promise?
|
||||||
|
Does the account remain open?
|
||||||
|
What does the buyer need before sundown?
|
||||||
|
```
|
||||||
|
|
||||||
|
It should naturally speak like:
|
||||||
|
|
||||||
|
```text
|
||||||
|
The wheels are gone.
|
||||||
|
The tablet arrived old.
|
||||||
|
He owns jars, not coin.
|
||||||
|
The road has eaten the profit.
|
||||||
|
The account remains open.
|
||||||
|
The crate is heavier than its name.
|
||||||
|
```
|
||||||
|
|
||||||
|
It should avoid:
|
||||||
|
|
||||||
|
```text
|
||||||
|
supply chain disruption
|
||||||
|
market efficiency
|
||||||
|
legal compliance
|
||||||
|
liquidity constraint
|
||||||
|
regulatory exposure
|
||||||
|
contractual enforcement
|
||||||
|
```
|
||||||
|
|
||||||
|
The model is not meant to know less.
|
||||||
|
|
||||||
|
It is meant to know differently.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 16. Final Vision
|
||||||
|
|
||||||
|
CIVICUS-ROMAN is a bounded-world model.
|
||||||
|
|
||||||
|
Its intelligence comes from discipline, not breadth.
|
||||||
|
|
||||||
|
Its strength is that it does not treat modern reality as default.
|
||||||
|
|
||||||
|
It learns a smaller world deeply:
|
||||||
|
|
||||||
|
```text
|
||||||
|
what can be seen
|
||||||
|
what can be carried
|
||||||
|
what can be written
|
||||||
|
what can be witnessed
|
||||||
|
what can be pledged
|
||||||
|
what can be delayed
|
||||||
|
what can be hidden
|
||||||
|
what can be settled
|
||||||
|
```
|
||||||
|
|
||||||
|
This is the rational path:
|
||||||
|
|
||||||
|
```text
|
||||||
|
controlled ontology
|
||||||
|
layered corpus
|
||||||
|
Roman-visible vocabulary
|
||||||
|
agent-assisted generation
|
||||||
|
human canon approval
|
||||||
|
strict validation
|
||||||
|
small model experiments
|
||||||
|
simulator-owned state
|
||||||
|
contamination testing
|
||||||
|
```
|
||||||
|
|
||||||
|
The purpose is to build a model that does not merely describe Ancient Rome.
|
||||||
|
|
||||||
|
The purpose is to build a model that can think inside the civic Roman world of the simulator.
|
||||||
Reference in New Issue
Block a user