321 lines
8.2 KiB
Markdown
321 lines
8.2 KiB
Markdown
# DIALOGUE-STANDARD-0001
|
|
## OTIVM Layer 4 Dialogue Style Standard
|
|
### Status: Draft Standard
|
|
### Layer: Training Infrastructure
|
|
### Purpose: Define how OTIVM dialogue files should be written, marked, and validated
|
|
### Repository Path: docs/training/chunking/DIALOGUE-STANDARD-0001.md
|
|
|
|
---
|
|
|
|
## 0. Purpose
|
|
|
|
This standard defines how Layer 4 dialogue files should be authored for the OTIVM training corpus.
|
|
|
|
Layer 4 dialogue is not metadata.
|
|
|
|
Layer 4 dialogue is in-world scene material. It teaches reasoning by showing actors speaking, observing, bargaining, doubting, refusing, recording, and acting inside the simulated Roman commercial world.
|
|
|
|
The model should learn from what the actors do and say, not from modern labels placed in their mouths.
|
|
|
|
---
|
|
|
|
## 1. Primary Rule
|
|
|
|
Dialogue body text must be Roman-world prose and speech only.
|
|
|
|
Chunk markers may contain modern metadata.
|
|
|
|
Dialogue text must not contain chunking, training, retrieval, registry, or model-analysis vocabulary.
|
|
|
|
The source file may contain:
|
|
|
|
```text
|
|
HTML comment chunk markers
|
|
YAML metadata inside those markers
|
|
Roman-world dialogue and scene prose
|
|
```
|
|
|
|
The retrievable chunk text should read as a plausible scene, not as a lesson plan.
|
|
|
|
---
|
|
|
|
## 2. Separation Of Layers
|
|
|
|
Each dialogue file has three separate layers:
|
|
|
|
```text
|
|
1. Document header
|
|
Human-readable file identity and purpose.
|
|
|
|
2. Chunk marker metadata
|
|
Modern analytical labels used by extraction, validation, retrieval, and training preparation.
|
|
|
|
3. Dialogue body
|
|
In-world Roman prose and speech only.
|
|
```
|
|
|
|
Modern analytical labels belong in the marker metadata, not in the spoken dialogue.
|
|
|
|
Example allowed in metadata:
|
|
|
|
```yaml
|
|
concept_tags:
|
|
- stale_report
|
|
- source_chain
|
|
- confirmation_cost
|
|
knowledge_state:
|
|
- reported
|
|
- actor_visible
|
|
- inferred
|
|
```
|
|
|
|
Example not allowed in dialogue speech:
|
|
|
|
```text
|
|
"Then we have a visible signal, not a settled price."
|
|
```
|
|
|
|
Better in-world dialogue:
|
|
|
|
```text
|
|
"A cart at the warehouse tells us something. It does not tell us what the oil will fetch."
|
|
```
|
|
|
|
---
|
|
|
|
## 3. Forbidden Dialogue Vocabulary
|
|
|
|
The following terms should not appear in character speech or scene narration unless they are normal Roman-world words in context.
|
|
|
|
Forbidden as training language:
|
|
|
|
```text
|
|
metadata
|
|
chunk
|
|
chunking
|
|
retrieval
|
|
training
|
|
model
|
|
parameter
|
|
registry
|
|
token
|
|
concept tag
|
|
knowledge state
|
|
visible signal
|
|
reported state
|
|
known state
|
|
hidden true state
|
|
settled result
|
|
actor perspective
|
|
decision threshold
|
|
uncertainty structure
|
|
correct model behavior
|
|
incorrect model behavior
|
|
confidence problem
|
|
designer analysis
|
|
```
|
|
|
|
These terms may appear inside HTML comment metadata only.
|
|
|
|
---
|
|
|
|
## 4. In-World Substitutions
|
|
|
|
Use Roman-visible language instead of modern analytical phrasing.
|
|
|
|
| Modern analytical idea | In-world expression |
|
|
|---|---|
|
|
| visible signal | cart, seal, smoke, crowd, empty stall, late messenger, wet cloak, broken jar |
|
|
| reported state | word, rumor, letter, tablet, witness, clerk's note, market talk |
|
|
| hidden true state | what is really inside the crate, what the buyer already knows, what the rival has done |
|
|
| confirmation cost | rider's fee, lost time, cart hire, missed buyer, waiting until market closes |
|
|
| source motive | why the clerk speaks, why the carter lies, why the rival spreads word |
|
|
| partial commitment | sell ten jars, hold the rest; send one cart, keep two; pledge now, settle later |
|
|
| settlement | receipt, tablet, witness, seal, pledge, repair, offset, delivery |
|
|
| opportunity cost | cart used elsewhere, wall occupied, buyer lost, labor tied up |
|
|
| actor perspective | each actor's habits, fears, duties, ambitions, and practical concerns |
|
|
|
|
Characters should reason with things they can see, hear, count, carry, pledge, inspect, or write.
|
|
|
|
---
|
|
|
|
## 5. Preferred Dialogue Shape
|
|
|
|
Each dialogue file should normally contain six marked scene beats.
|
|
|
|
Preferred pattern:
|
|
|
|
```text
|
|
1. Scene opening and visible trouble
|
|
2. First interpretation or opportunity
|
|
3. Challenge, caution, or competing reading
|
|
4. Practical cost, arithmetic, obligation, or risk
|
|
5. Decision point with buyer, rival, official, worker, or witness
|
|
6. Closing result or changed account
|
|
```
|
|
|
|
This is a preference, not a hard rule.
|
|
|
|
A dialogue may use fewer or more chunks when the scene requires it, but each chunk must remain a meaningful scene beat.
|
|
|
|
---
|
|
|
|
## 6. Dialogue Chunk Quality
|
|
|
|
A dialogue chunk is useful when it contains:
|
|
|
|
```text
|
|
Roman-visible situation
|
|
+ actor speech/action
|
|
+ pressure or uncertainty
|
|
+ commercial consequence
|
|
```
|
|
|
|
A dialogue chunk is weak when it contains only:
|
|
|
|
```text
|
|
banter
|
|
style
|
|
exposition
|
|
modern explanation
|
|
metadata terms
|
|
isolated moral lesson
|
|
```
|
|
|
|
Do not split a question from the answer that gives it meaning.
|
|
|
|
Do not split a false claim from the correction that makes it useful.
|
|
|
|
Do not split a joke or quip from the economic point it reveals.
|
|
|
|
---
|
|
|
|
## 7. Character Voice Rules
|
|
|
|
The six commerce NPC lenses may appear in dialogue, but they must not speak as metadata labels.
|
|
|
|
Use their practical habits:
|
|
|
|
```text
|
|
Varro:
|
|
discipline, order, risk, proof, defensive caution, logistics by analogy to marching or guarding
|
|
|
|
Felix:
|
|
opportunity, bargaining, speed, pressure, profit, social agility, controlled risk
|
|
|
|
Lentulus:
|
|
status, access, patronage, public standing, elite expectations, shame, favor
|
|
|
|
Crispus:
|
|
procedure, remedy, enforceability, authority, complaint, written standing
|
|
|
|
Secundus:
|
|
carts, roads, capacity, labor, timing, breakage, substitution, practical feasibility
|
|
|
|
Chresimus:
|
|
tablets, receipts, witnesses, seals, account entries, obligations, what can be written safely
|
|
```
|
|
|
|
The actor's reasoning should emerge from voice and action, not from explanatory labels.
|
|
|
|
---
|
|
|
|
## 8. Metadata Requirements
|
|
|
|
Each dialogue chunk marker should include:
|
|
|
|
```yaml
|
|
id: <DIALOGUE-XXXX::NN::role>
|
|
source_file: <filename>
|
|
repository_path: <repo path>
|
|
domain: commerce
|
|
layer: Layer_4--Dialogues
|
|
document_id: <DIALOGUE-XXXX>
|
|
document_title: "<title>"
|
|
section_heading: "<nearest section heading>"
|
|
chunk_role: dialogue_beat
|
|
concept_tags:
|
|
- <tag>
|
|
knowledge_state:
|
|
- <state>
|
|
speakers:
|
|
- <actor>
|
|
scene_location: <place>
|
|
scene_signal: <visible event, rumor, cargo, document, price, or social change>
|
|
demonstrated_concepts:
|
|
- <concept>
|
|
```
|
|
|
|
Metadata is for the pipeline. It is not part of the Roman scene.
|
|
|
|
---
|
|
|
|
## 9. Knowledge Boundary Rule
|
|
|
|
Dialogue must preserve what actors know.
|
|
|
|
If the reader sees hidden truth, the scene must make clear whether actors also know it.
|
|
|
|
Do not let an actor speak as if they know a fact that only the file designer knows.
|
|
|
|
Use distinctions visible in Roman terms:
|
|
|
|
```text
|
|
"I saw it."
|
|
"I heard it."
|
|
"The tablet says it."
|
|
"The carter claims it."
|
|
"The seal is unbroken."
|
|
"The buyer has not yet agreed."
|
|
"The witness can say this much."
|
|
"The rest is guesswork."
|
|
```
|
|
|
|
---
|
|
|
|
## 10. Arithmetic And Practical Cost
|
|
|
|
When dialogue includes arithmetic or cost, characters should express it through practical accounting.
|
|
|
|
Allowed:
|
|
|
|
```text
|
|
"Two jars lost. Hire paid. Half a day gone."
|
|
"If we pay double for carts, the venture thins."
|
|
"Ten jars now, the rest tomorrow."
|
|
"Repair stands against part of the debt."
|
|
```
|
|
|
|
Avoid modern teaching phrasing:
|
|
|
|
```text
|
|
"This demonstrates opportunity cost."
|
|
"The correct calculation is..."
|
|
"The model should infer..."
|
|
```
|
|
|
|
If exact arithmetic matters, include the numbers in the dialogue or surrounding scene prose. Do not leave calculation only in metadata.
|
|
|
|
---
|
|
|
|
## 11. Review Checklist
|
|
|
|
Before accepting a dialogue file:
|
|
|
|
1. Does every spoken line sound like a person in the world, not a trainer?
|
|
2. Are modern analytical terms confined to chunk metadata?
|
|
3. Does each chunk contain a complete scene beat?
|
|
4. Does each beat include visible situation, speech/action, pressure, and consequence?
|
|
5. Are knowledge boundaries preserved?
|
|
6. Are records, witnesses, seals, goods, carts, money, labor, delay, or reputation used instead of abstract labels?
|
|
7. Does the file teach through action rather than explanation?
|
|
8. Does the extractor validate all chunks without errors?
|
|
|
|
---
|
|
|
|
## 12. Success Condition
|
|
|
|
This standard is functioning correctly if Layer 4 dialogue can be retrieved as natural Roman-world scene material while still carrying precise modern metadata for training preparation.
|
|
|
|
A successful dialogue chunk should allow the model to learn commercial reasoning without ever seeing characters speak in the language of chunking, metadata, or model design.
|