# CIVICUS-ROMAN-MODEL-VISION-0001 ## Rational Vision For A Bounded Roman Simulator Model ### Status: Draft Vision ### Layer: Training Infrastructure ### Purpose: Define the practical rationale, scope, and training plan for the CIVICUS-ROMAN model ### Repository Path: docs/training/chunking/CIVICUS-ROMAN-MODEL-VISION-0001.md --- ## 0. Purpose This document defines the rational vision for the CIVICUS-ROMAN model. The model is not intended to be a general chatbot. The model is not intended to know all of history. The model is not intended to imitate modern English reasoning with Roman facts attached. The model is intended to operate inside a bounded Roman simulator world. Its task is to reason, ask, answer, and speak from within that world. --- ## 1. Core Claim A narrow Roman simulator model may be viable because the intended world is deliberately reduced. The model does not need the full ontology of modern life. It needs a bounded set of: ```text objects actions pressures actors places procedures records obligations materials routes risks social meanings ``` The target is not general intelligence. The target is Roman-bounded simulator intelligence. --- ## 2. The Problem With Existing Models Existing general models are trained on modern reality. Even when given Roman context, they tend to leak modern assumptions: ```text universal market price modern legal enforcement modern contract logic state-backed regulatory assumptions instant information abstract finance vocabulary modern supply-chain concepts consumer-market behavior modern moral and institutional framing ``` Retrieval alone does not solve this. RAG can supply correct facts, but the base model still interprets those facts through a modern ontology. The goal of CIVICUS-ROMAN is to reduce or remove that ontology problem. --- ## 3. What The Model Must Learn The model must learn to reason from Roman-visible primitives. Examples: ```text Who saw it? Who heard it? Who wrote it? How old is the message? Is the seal broken? Who witnessed the bargain? Where are the carts? Can the goods move? Who benefits if the rumor is believed? What can safely be entered in the account? Is the obligation settled, pledged, delayed, or disputed? ``` It must not default to: ```text What is the market price? Is the contract enforceable? What is the regulatory risk? What is the optimal modern transaction? ``` The model should ask and answer in terms of objects, actions, pressures, and visible social facts. --- ## 4. Reduced World Grammar The CIVICUS-ROMAN model should be trained around a controlled world grammar. ### Objects ```text coin purse chest tablet seal witness cart wheel mule road warehouse wall roof jar amphora crate rope weight measure gate market portico yard dust rain lamp grain oil bronze timber glass stone ``` ### Actions ```text buy sell carry store seal open count weigh measure pledge write witness hire repair delay ask refuse accuse confirm return split hold move settle hide leak wait rot spoil break arrive depart ``` ### Pressures ```text hunger rain delay spoilage debt rivalry shame praise shortage crowd rumor cart scarcity storage scarcity buyer urgency creditor pressure official attention bad road old news broken seal empty purse full warehouse ``` The model should learn to combine these before reaching for abstract explanation. --- ## 5. Speech Principle The model should prefer Roman-visible commercial speech. Preferred: ```text The wheels are gone. The tablet arrived old. He owns jars, not coin. The road has eaten the profit. The crate is heavier than its name. The purse is fat and the street has eyes. ``` Avoided: ```text Transport capacity is constrained. The information is stale. His assets are illiquid. Transportation cost eliminated the margin. The cargo is misclassified. Liquidity creates security risk. ``` The purpose is not ornament. The purpose is ontology. A model learns the kind of world it inhabits through the language it is trained to use. --- ## 6. Corpus Architecture The corpus is layered. Each layer teaches a different kind of reasoning. ```text Layer 0 — Primitive Facts basic world rules Layer 1 — Worked Examples arithmetic, cost, movement, profit, loss, settlement Layer 2 — Uncertainty reports, rumors, old messages, hidden truth, confidence, confirmation Layer 3 — Actor Perspective same event read differently by different Roman-world actors Layer 4 — Dialogues in-world scenes that teach through speech, action, and consequence ``` This layering is essential. The model should not merely memorize dialogue. It should learn the underlying reasoning forms that make the dialogue valid. --- ## 7. Vocabulary Generation Pipeline A major part of the model vocabulary can be built through a generate-review-promote workflow. The generator combines: ```text Object + Action + Pressure ``` Example: ```text cart + hired elsewhere + buyer waiting = The wheels are gone, and the buyer will not wait for our excuses. ``` Most generated phrases will be weak. That is acceptable. Humans are faster at recognizing strong expressions than inventing them cold. The workflow is: ```text generate many candidates human flags useful expressions accepted expressions enter vocabulary strong expressions influence dialogue canonical expressions become simulator templates ``` Only reviewed material enters training. Raw churn is not training data. --- ## 8. Human And Agent Roles Agents will perform much of the production work. Agents can generate: ```text candidate expressions dialogue variants actor readings primitive examples uncertainty cases law scenarios architecture scenarios technology scenarios negative examples contamination tests ``` Agents can also assist with: ```text format validation tag audit style checks duplicate detection forbidden vocabulary detection chunk extraction statistics regression tests ``` Humans remain responsible for: ```text canon ontology final approval style judgment failure judgment domain boundaries promotion to training data ``` The human role shifts from authoring every line to governing the corpus. --- ## 9. Training Strategy The first serious training target should not be a general-purpose language model. The first target should be a compact bounded simulator model. A rational training progression: ```text Stage 1: Roman-visible vocabulary expressions Stage 2: primitive facts and terse Q/A Stage 3: worked examples with arithmetic and consequence Stage 4: uncertainty examples and knowledge-boundary tests Stage 5: actor-perspective readings Stage 6: in-world dialogues Stage 7: simulator-state-to-response pairs ``` The model should learn from simple controlled forms before complex dialogue. --- ## 10. Scratch Training Reconsidered Training a general model from nothing is expensive because the model must learn broad language, broad world knowledge, and general reasoning. CIVICUS-ROMAN is different. It does not need to answer every question. It does not need modern breadth. It does not need open-ended knowledge. It needs competence inside a small Roman simulator world. Therefore scratch or near-scratch training may be viable if the model is deliberately narrow. The fair comparison is not: ```text small project vs general LLM ``` The fair comparison is: ```text bounded simulator grammar + controlled corpus + agent-assisted data generation ``` against: ```text modern-prior leakage from general models ``` --- ## 11. Simulator Ownership Of Reality The model should not own the simulator state. The simulator owns: ```text actors locations time inventory money routes documents seals witnesses obligations weather prices rumors official attention ``` The model interprets, asks, answers, and speaks within that state. The model should not invent facts that the simulator has not provided. The model should prefer questions when state is insufficient. Example: ```text What can be known? Who saw it? Who wrote it? Can the cart still move? Was the seal broken? Is there a witness? ``` --- ## 12. Evaluation The model must be tested against modern contamination. Example failure prompt: ```text What is the fair market price? ``` Roman-bounded response should reject universal price and ask about place, buyer, time, transport, and information. Example failure prompt: ```text Can the contract be enforced? ``` Roman-bounded response should ask about tablet, witness, seal, pledge, patron, magistrate, standing, and leverage. Example failure prompt: ```text Was the information reliable? ``` Roman-bounded response should ask who carried the word, how old it is, who benefits, whether anyone saw the goods, and what can be confirmed. Evaluation must reward Roman-bounded reasoning and punish modern abstraction. --- ## 13. Domains To Add The first domain is commerce. Next domains should be added with the same layered discipline. ### Roman Law ```text standing complaint witness tablet seal pledge remedy magistrate patronage procedure public shame private settlement ``` ### Architecture ```text stone timber brick lime labor measurement site water weight collapse repair patron public work ``` ### Technology ```text tool craft material workshop repair failure skill apprentice measurement heat water wheel gear lever ``` Each domain should develop: ```text Layer 0 primitives Layer 1 examples Layer 2 uncertainty Layer 3 actor readings Layer 4 dialogues controlled vocabulary contamination tests ``` --- ## 14. Practical Near-Term Plan Recommended next steps: ```text 1. Freeze first commerce dialogue batch. 2. Continue vocabulary generation standards. 3. Build the expression candidate generator. 4. Build a review interface for accept/reject/strong/canonical. 5. Expand commerce vocabulary library. 6. Add Roman Law Layer 0 primitives. 7. Add Roman Law worked examples. 8. Add Roman Law dialogues only after primitives exist. 9. Build contamination tests. 10. Compare: A. scratch small model B. near-scratch model C. small existing base model fine-tuned to OTIVM ``` The comparison matters. The project should not assume scratch training wins. It should test whether scratch training reduces modern contamination enough to justify weaker inherited language ability. --- ## 15. Success Definition CIVICUS-ROMAN succeeds if it can operate inside the simulator without modern leakage. It should naturally produce questions and answers like: ```text Who carried the word? How old is the tablet? Was the seal broken? Can the cart still move? Who witnessed the promise? Does the account remain open? What does the buyer need before sundown? ``` It should naturally speak like: ```text The wheels are gone. The tablet arrived old. He owns jars, not coin. The road has eaten the profit. The account remains open. The crate is heavier than its name. ``` It should avoid: ```text supply chain disruption market efficiency legal compliance liquidity constraint regulatory exposure contractual enforcement ``` The model is not meant to know less. It is meant to know differently. --- ## 16. Final Vision CIVICUS-ROMAN is a bounded-world model. Its intelligence comes from discipline, not breadth. Its strength is that it does not treat modern reality as default. It learns a smaller world deeply: ```text what can be seen what can be carried what can be written what can be witnessed what can be pledged what can be delayed what can be hidden what can be settled ``` This is the rational path: ```text controlled ontology layered corpus Roman-visible vocabulary agent-assisted generation human canon approval strict validation small model experiments simulator-owned state contamination testing ``` The purpose is to build a model that does not merely describe Ancient Rome. The purpose is to build a model that can think inside the civic Roman world of the simulator.