From 9889ecb57495b3b185337e0bc6d84cf3495ab7ba Mon Sep 17 00:00:00 2001 From: TheRON Date: Thu, 30 Apr 2026 15:37:16 -0400 Subject: [PATCH] initial upload --- .../CIVICUS-ROMAN-MODEL-VISION-0001.md | 702 ++++++++++++++++++ 1 file changed, 702 insertions(+) create mode 100644 docs/training/chunking/CIVICUS-ROMAN-MODEL-VISION-0001.md diff --git a/docs/training/chunking/CIVICUS-ROMAN-MODEL-VISION-0001.md b/docs/training/chunking/CIVICUS-ROMAN-MODEL-VISION-0001.md new file mode 100644 index 0000000..360f1e8 --- /dev/null +++ b/docs/training/chunking/CIVICUS-ROMAN-MODEL-VISION-0001.md @@ -0,0 +1,702 @@ +# CIVICUS-ROMAN-MODEL-VISION-0001 +## Rational Vision For A Bounded Roman Simulator Model +### Status: Draft Vision +### Layer: Training Infrastructure +### Purpose: Define the practical rationale, scope, and training plan for the CIVICUS-ROMAN model +### Repository Path: docs/training/chunking/CIVICUS-ROMAN-MODEL-VISION-0001.md + +--- + +## 0. Purpose + +This document defines the rational vision for the CIVICUS-ROMAN model. + +The model is not intended to be a general chatbot. + +The model is not intended to know all of history. + +The model is not intended to imitate modern English reasoning with Roman facts attached. + +The model is intended to operate inside a bounded Roman simulator world. + +Its task is to reason, ask, answer, and speak from within that world. + +--- + +## 1. Core Claim + +A narrow Roman simulator model may be viable because the intended world is deliberately reduced. + +The model does not need the full ontology of modern life. + +It needs a bounded set of: + +```text +objects +actions +pressures +actors +places +procedures +records +obligations +materials +routes +risks +social meanings +``` + +The target is not general intelligence. + +The target is Roman-bounded simulator intelligence. + +--- + +## 2. The Problem With Existing Models + +Existing general models are trained on modern reality. + +Even when given Roman context, they tend to leak modern assumptions: + +```text +universal market price +modern legal enforcement +modern contract logic +state-backed regulatory assumptions +instant information +abstract finance vocabulary +modern supply-chain concepts +consumer-market behavior +modern moral and institutional framing +``` + +Retrieval alone does not solve this. + +RAG can supply correct facts, but the base model still interprets those facts through a modern ontology. + +The goal of CIVICUS-ROMAN is to reduce or remove that ontology problem. + +--- + +## 3. What The Model Must Learn + +The model must learn to reason from Roman-visible primitives. + +Examples: + +```text +Who saw it? +Who heard it? +Who wrote it? +How old is the message? +Is the seal broken? +Who witnessed the bargain? +Where are the carts? +Can the goods move? +Who benefits if the rumor is believed? +What can safely be entered in the account? +Is the obligation settled, pledged, delayed, or disputed? +``` + +It must not default to: + +```text +What is the market price? +Is the contract enforceable? +What is the regulatory risk? +What is the optimal modern transaction? +``` + +The model should ask and answer in terms of objects, actions, pressures, and visible social facts. + +--- + +## 4. Reduced World Grammar + +The CIVICUS-ROMAN model should be trained around a controlled world grammar. + +### Objects + +```text +coin +purse +chest +tablet +seal +witness +cart +wheel +mule +road +warehouse +wall +roof +jar +amphora +crate +rope +weight +measure +gate +market +portico +yard +dust +rain +lamp +grain +oil +bronze +timber +glass +stone +``` + +### Actions + +```text +buy +sell +carry +store +seal +open +count +weigh +measure +pledge +write +witness +hire +repair +delay +ask +refuse +accuse +confirm +return +split +hold +move +settle +hide +leak +wait +rot +spoil +break +arrive +depart +``` + +### Pressures + +```text +hunger +rain +delay +spoilage +debt +rivalry +shame +praise +shortage +crowd +rumor +cart scarcity +storage scarcity +buyer urgency +creditor pressure +official attention +bad road +old news +broken seal +empty purse +full warehouse +``` + +The model should learn to combine these before reaching for abstract explanation. + +--- + +## 5. Speech Principle + +The model should prefer Roman-visible commercial speech. + +Preferred: + +```text +The wheels are gone. +The tablet arrived old. +He owns jars, not coin. +The road has eaten the profit. +The crate is heavier than its name. +The purse is fat and the street has eyes. +``` + +Avoided: + +```text +Transport capacity is constrained. +The information is stale. +His assets are illiquid. +Transportation cost eliminated the margin. +The cargo is misclassified. +Liquidity creates security risk. +``` + +The purpose is not ornament. + +The purpose is ontology. + +A model learns the kind of world it inhabits through the language it is trained to use. + +--- + +## 6. Corpus Architecture + +The corpus is layered. + +Each layer teaches a different kind of reasoning. + +```text +Layer 0 — Primitive Facts + basic world rules + +Layer 1 — Worked Examples + arithmetic, cost, movement, profit, loss, settlement + +Layer 2 — Uncertainty + reports, rumors, old messages, hidden truth, confidence, confirmation + +Layer 3 — Actor Perspective + same event read differently by different Roman-world actors + +Layer 4 — Dialogues + in-world scenes that teach through speech, action, and consequence +``` + +This layering is essential. + +The model should not merely memorize dialogue. + +It should learn the underlying reasoning forms that make the dialogue valid. + +--- + +## 7. Vocabulary Generation Pipeline + +A major part of the model vocabulary can be built through a generate-review-promote workflow. + +The generator combines: + +```text +Object + Action + Pressure +``` + +Example: + +```text +cart + hired elsewhere + buyer waiting += The wheels are gone, and the buyer will not wait for our excuses. +``` + +Most generated phrases will be weak. + +That is acceptable. + +Humans are faster at recognizing strong expressions than inventing them cold. + +The workflow is: + +```text +generate many candidates +human flags useful expressions +accepted expressions enter vocabulary +strong expressions influence dialogue +canonical expressions become simulator templates +``` + +Only reviewed material enters training. + +Raw churn is not training data. + +--- + +## 8. Human And Agent Roles + +Agents will perform much of the production work. + +Agents can generate: + +```text +candidate expressions +dialogue variants +actor readings +primitive examples +uncertainty cases +law scenarios +architecture scenarios +technology scenarios +negative examples +contamination tests +``` + +Agents can also assist with: + +```text +format validation +tag audit +style checks +duplicate detection +forbidden vocabulary detection +chunk extraction +statistics +regression tests +``` + +Humans remain responsible for: + +```text +canon +ontology +final approval +style judgment +failure judgment +domain boundaries +promotion to training data +``` + +The human role shifts from authoring every line to governing the corpus. + +--- + +## 9. Training Strategy + +The first serious training target should not be a general-purpose language model. + +The first target should be a compact bounded simulator model. + +A rational training progression: + +```text +Stage 1: + Roman-visible vocabulary expressions + +Stage 2: + primitive facts and terse Q/A + +Stage 3: + worked examples with arithmetic and consequence + +Stage 4: + uncertainty examples and knowledge-boundary tests + +Stage 5: + actor-perspective readings + +Stage 6: + in-world dialogues + +Stage 7: + simulator-state-to-response pairs +``` + +The model should learn from simple controlled forms before complex dialogue. + +--- + +## 10. Scratch Training Reconsidered + +Training a general model from nothing is expensive because the model must learn broad language, broad world knowledge, and general reasoning. + +CIVICUS-ROMAN is different. + +It does not need to answer every question. + +It does not need modern breadth. + +It does not need open-ended knowledge. + +It needs competence inside a small Roman simulator world. + +Therefore scratch or near-scratch training may be viable if the model is deliberately narrow. + +The fair comparison is not: + +```text +small project vs general LLM +``` + +The fair comparison is: + +```text +bounded simulator grammar + controlled corpus + agent-assisted data generation +``` + +against: + +```text +modern-prior leakage from general models +``` + +--- + +## 11. Simulator Ownership Of Reality + +The model should not own the simulator state. + +The simulator owns: + +```text +actors +locations +time +inventory +money +routes +documents +seals +witnesses +obligations +weather +prices +rumors +official attention +``` + +The model interprets, asks, answers, and speaks within that state. + +The model should not invent facts that the simulator has not provided. + +The model should prefer questions when state is insufficient. + +Example: + +```text +What can be known? +Who saw it? +Who wrote it? +Can the cart still move? +Was the seal broken? +Is there a witness? +``` + +--- + +## 12. Evaluation + +The model must be tested against modern contamination. + +Example failure prompt: + +```text +What is the fair market price? +``` + +Roman-bounded response should reject universal price and ask about place, buyer, time, transport, and information. + +Example failure prompt: + +```text +Can the contract be enforced? +``` + +Roman-bounded response should ask about tablet, witness, seal, pledge, patron, magistrate, standing, and leverage. + +Example failure prompt: + +```text +Was the information reliable? +``` + +Roman-bounded response should ask who carried the word, how old it is, who benefits, whether anyone saw the goods, and what can be confirmed. + +Evaluation must reward Roman-bounded reasoning and punish modern abstraction. + +--- + +## 13. Domains To Add + +The first domain is commerce. + +Next domains should be added with the same layered discipline. + +### Roman Law + +```text +standing +complaint +witness +tablet +seal +pledge +remedy +magistrate +patronage +procedure +public shame +private settlement +``` + +### Architecture + +```text +stone +timber +brick +lime +labor +measurement +site +water +weight +collapse +repair +patron +public work +``` + +### Technology + +```text +tool +craft +material +workshop +repair +failure +skill +apprentice +measurement +heat +water +wheel +gear +lever +``` + +Each domain should develop: + +```text +Layer 0 primitives +Layer 1 examples +Layer 2 uncertainty +Layer 3 actor readings +Layer 4 dialogues +controlled vocabulary +contamination tests +``` + +--- + +## 14. Practical Near-Term Plan + +Recommended next steps: + +```text +1. Freeze first commerce dialogue batch. +2. Continue vocabulary generation standards. +3. Build the expression candidate generator. +4. Build a review interface for accept/reject/strong/canonical. +5. Expand commerce vocabulary library. +6. Add Roman Law Layer 0 primitives. +7. Add Roman Law worked examples. +8. Add Roman Law dialogues only after primitives exist. +9. Build contamination tests. +10. Compare: + A. scratch small model + B. near-scratch model + C. small existing base model fine-tuned to OTIVM +``` + +The comparison matters. + +The project should not assume scratch training wins. + +It should test whether scratch training reduces modern contamination enough to justify weaker inherited language ability. + +--- + +## 15. Success Definition + +CIVICUS-ROMAN succeeds if it can operate inside the simulator without modern leakage. + +It should naturally produce questions and answers like: + +```text +Who carried the word? +How old is the tablet? +Was the seal broken? +Can the cart still move? +Who witnessed the promise? +Does the account remain open? +What does the buyer need before sundown? +``` + +It should naturally speak like: + +```text +The wheels are gone. +The tablet arrived old. +He owns jars, not coin. +The road has eaten the profit. +The account remains open. +The crate is heavier than its name. +``` + +It should avoid: + +```text +supply chain disruption +market efficiency +legal compliance +liquidity constraint +regulatory exposure +contractual enforcement +``` + +The model is not meant to know less. + +It is meant to know differently. + +--- + +## 16. Final Vision + +CIVICUS-ROMAN is a bounded-world model. + +Its intelligence comes from discipline, not breadth. + +Its strength is that it does not treat modern reality as default. + +It learns a smaller world deeply: + +```text +what can be seen +what can be carried +what can be written +what can be witnessed +what can be pledged +what can be delayed +what can be hidden +what can be settled +``` + +This is the rational path: + +```text +controlled ontology +layered corpus +Roman-visible vocabulary +agent-assisted generation +human canon approval +strict validation +small model experiments +simulator-owned state +contamination testing +``` + +The purpose is to build a model that does not merely describe Ancient Rome. + +The purpose is to build a model that can think inside the civic Roman world of the simulator.