# DIALOGUE-STANDARD-0001 ## OTIVM Layer 4 Dialogue Style Standard ### Status: Draft Standard ### Layer: Training Infrastructure ### Purpose: Define how OTIVM dialogue files should be written, marked, and validated ### Repository Path: docs/training/chunking/DIALOGUE-STANDARD-0001.md --- ## 0. Purpose This standard defines how Layer 4 dialogue files should be authored for the OTIVM training corpus. Layer 4 dialogue is not metadata. Layer 4 dialogue is in-world scene material. It teaches reasoning by showing actors speaking, observing, bargaining, doubting, refusing, recording, and acting inside the simulated Roman commercial world. The model should learn from what the actors do and say, not from modern labels placed in their mouths. --- ## 1. Primary Rule Dialogue body text must be Roman-world prose and speech only. Chunk markers may contain modern metadata. Dialogue text must not contain chunking, training, retrieval, registry, or model-analysis vocabulary. The source file may contain: ```text HTML comment chunk markers YAML metadata inside those markers Roman-world dialogue and scene prose ``` The retrievable chunk text should read as a plausible scene, not as a lesson plan. --- ## 2. Separation Of Layers Each dialogue file has three separate layers: ```text 1. Document header Human-readable file identity and purpose. 2. Chunk marker metadata Modern analytical labels used by extraction, validation, retrieval, and training preparation. 3. Dialogue body In-world Roman prose and speech only. ``` Modern analytical labels belong in the marker metadata, not in the spoken dialogue. Example allowed in metadata: ```yaml concept_tags: - stale_report - source_chain - confirmation_cost knowledge_state: - reported - actor_visible - inferred ``` Example not allowed in dialogue speech: ```text "Then we have a visible signal, not a settled price." ``` Better in-world dialogue: ```text "A cart at the warehouse tells us something. It does not tell us what the oil will fetch." ``` --- ## 3. Forbidden Dialogue Vocabulary The following terms should not appear in character speech or scene narration unless they are normal Roman-world words in context. Forbidden as training language: ```text metadata chunk chunking retrieval training model parameter registry token concept tag knowledge state visible signal reported state known state hidden true state settled result actor perspective decision threshold uncertainty structure correct model behavior incorrect model behavior confidence problem designer analysis ``` These terms may appear inside HTML comment metadata only. --- ## 4. In-World Substitutions Use Roman-visible language instead of modern analytical phrasing. | Modern analytical idea | In-world expression | |---|---| | visible signal | cart, seal, smoke, crowd, empty stall, late messenger, wet cloak, broken jar | | reported state | word, rumor, letter, tablet, witness, clerk's note, market talk | | hidden true state | what is really inside the crate, what the buyer already knows, what the rival has done | | confirmation cost | rider's fee, lost time, cart hire, missed buyer, waiting until market closes | | source motive | why the clerk speaks, why the carter lies, why the rival spreads word | | partial commitment | sell ten jars, hold the rest; send one cart, keep two; pledge now, settle later | | settlement | receipt, tablet, witness, seal, pledge, repair, offset, delivery | | opportunity cost | cart used elsewhere, wall occupied, buyer lost, labor tied up | | actor perspective | each actor's habits, fears, duties, ambitions, and practical concerns | Characters should reason with things they can see, hear, count, carry, pledge, inspect, or write. --- ## 5. Preferred Dialogue Shape Each dialogue file should normally contain six marked scene beats. Preferred pattern: ```text 1. Scene opening and visible trouble 2. First interpretation or opportunity 3. Challenge, caution, or competing reading 4. Practical cost, arithmetic, obligation, or risk 5. Decision point with buyer, rival, official, worker, or witness 6. Closing result or changed account ``` This is a preference, not a hard rule. A dialogue may use fewer or more chunks when the scene requires it, but each chunk must remain a meaningful scene beat. --- ## 6. Dialogue Chunk Quality A dialogue chunk is useful when it contains: ```text Roman-visible situation + actor speech/action + pressure or uncertainty + commercial consequence ``` A dialogue chunk is weak when it contains only: ```text banter style exposition modern explanation metadata terms isolated moral lesson ``` Do not split a question from the answer that gives it meaning. Do not split a false claim from the correction that makes it useful. Do not split a joke or quip from the economic point it reveals. --- ## 7. Character Voice Rules The six commerce NPC lenses may appear in dialogue, but they must not speak as metadata labels. Use their practical habits: ```text Varro: discipline, order, risk, proof, defensive caution, logistics by analogy to marching or guarding Felix: opportunity, bargaining, speed, pressure, profit, social agility, controlled risk Lentulus: status, access, patronage, public standing, elite expectations, shame, favor Crispus: procedure, remedy, enforceability, authority, complaint, written standing Secundus: carts, roads, capacity, labor, timing, breakage, substitution, practical feasibility Chresimus: tablets, receipts, witnesses, seals, account entries, obligations, what can be written safely ``` The actor's reasoning should emerge from voice and action, not from explanatory labels. --- ## 8. Metadata Requirements Each dialogue chunk marker should include: ```yaml id: source_file: repository_path: domain: commerce layer: Layer_4--Dialogues document_id: document_title: "" section_heading: "<nearest section heading>" chunk_role: dialogue_beat concept_tags: - <tag> knowledge_state: - <state> speakers: - <actor> scene_location: <place> scene_signal: <visible event, rumor, cargo, document, price, or social change> demonstrated_concepts: - <concept> ``` Metadata is for the pipeline. It is not part of the Roman scene. --- ## 9. Knowledge Boundary Rule Dialogue must preserve what actors know. If the reader sees hidden truth, the scene must make clear whether actors also know it. Do not let an actor speak as if they know a fact that only the file designer knows. Use distinctions visible in Roman terms: ```text "I saw it." "I heard it." "The tablet says it." "The carter claims it." "The seal is unbroken." "The buyer has not yet agreed." "The witness can say this much." "The rest is guesswork." ``` --- ## 10. Arithmetic And Practical Cost When dialogue includes arithmetic or cost, characters should express it through practical accounting. Allowed: ```text "Two jars lost. Hire paid. Half a day gone." "If we pay double for carts, the venture thins." "Ten jars now, the rest tomorrow." "Repair stands against part of the debt." ``` Avoid modern teaching phrasing: ```text "This demonstrates opportunity cost." "The correct calculation is..." "The model should infer..." ``` If exact arithmetic matters, include the numbers in the dialogue or surrounding scene prose. Do not leave calculation only in metadata. --- ## 11. Review Checklist Before accepting a dialogue file: 1. Does every spoken line sound like a person in the world, not a trainer? 2. Are modern analytical terms confined to chunk metadata? 3. Does each chunk contain a complete scene beat? 4. Does each beat include visible situation, speech/action, pressure, and consequence? 5. Are knowledge boundaries preserved? 6. Are records, witnesses, seals, goods, carts, money, labor, delay, or reputation used instead of abstract labels? 7. Does the file teach through action rather than explanation? 8. Does the extractor validate all chunks without errors? --- ## 12. Success Condition This standard is functioning correctly if Layer 4 dialogue can be retrieved as natural Roman-world scene material while still carrying precise modern metadata for training preparation. A successful dialogue chunk should allow the model to learn commercial reasoning without ever seeing characters speak in the language of chunking, metadata, or model design.