initial upload
This commit is contained in:
320
docs/training/chunking/DIALOGUE-STANDARD-0001.md
Normal file
320
docs/training/chunking/DIALOGUE-STANDARD-0001.md
Normal file
@@ -0,0 +1,320 @@
|
||||
# DIALOGUE-STANDARD-0001
|
||||
## OTIVM Layer 4 Dialogue Style Standard
|
||||
### Status: Draft Standard
|
||||
### Layer: Training Infrastructure
|
||||
### Purpose: Define how OTIVM dialogue files should be written, marked, and validated
|
||||
### Repository Path: docs/training/chunking/DIALOGUE-STANDARD-0001.md
|
||||
|
||||
---
|
||||
|
||||
## 0. Purpose
|
||||
|
||||
This standard defines how Layer 4 dialogue files should be authored for the OTIVM training corpus.
|
||||
|
||||
Layer 4 dialogue is not metadata.
|
||||
|
||||
Layer 4 dialogue is in-world scene material. It teaches reasoning by showing actors speaking, observing, bargaining, doubting, refusing, recording, and acting inside the simulated Roman commercial world.
|
||||
|
||||
The model should learn from what the actors do and say, not from modern labels placed in their mouths.
|
||||
|
||||
---
|
||||
|
||||
## 1. Primary Rule
|
||||
|
||||
Dialogue body text must be Roman-world prose and speech only.
|
||||
|
||||
Chunk markers may contain modern metadata.
|
||||
|
||||
Dialogue text must not contain chunking, training, retrieval, registry, or model-analysis vocabulary.
|
||||
|
||||
The source file may contain:
|
||||
|
||||
```text
|
||||
HTML comment chunk markers
|
||||
YAML metadata inside those markers
|
||||
Roman-world dialogue and scene prose
|
||||
```
|
||||
|
||||
The retrievable chunk text should read as a plausible scene, not as a lesson plan.
|
||||
|
||||
---
|
||||
|
||||
## 2. Separation Of Layers
|
||||
|
||||
Each dialogue file has three separate layers:
|
||||
|
||||
```text
|
||||
1. Document header
|
||||
Human-readable file identity and purpose.
|
||||
|
||||
2. Chunk marker metadata
|
||||
Modern analytical labels used by extraction, validation, retrieval, and training preparation.
|
||||
|
||||
3. Dialogue body
|
||||
In-world Roman prose and speech only.
|
||||
```
|
||||
|
||||
Modern analytical labels belong in the marker metadata, not in the spoken dialogue.
|
||||
|
||||
Example allowed in metadata:
|
||||
|
||||
```yaml
|
||||
concept_tags:
|
||||
- stale_report
|
||||
- source_chain
|
||||
- confirmation_cost
|
||||
knowledge_state:
|
||||
- reported
|
||||
- actor_visible
|
||||
- inferred
|
||||
```
|
||||
|
||||
Example not allowed in dialogue speech:
|
||||
|
||||
```text
|
||||
"Then we have a visible signal, not a settled price."
|
||||
```
|
||||
|
||||
Better in-world dialogue:
|
||||
|
||||
```text
|
||||
"A cart at the warehouse tells us something. It does not tell us what the oil will fetch."
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Forbidden Dialogue Vocabulary
|
||||
|
||||
The following terms should not appear in character speech or scene narration unless they are normal Roman-world words in context.
|
||||
|
||||
Forbidden as training language:
|
||||
|
||||
```text
|
||||
metadata
|
||||
chunk
|
||||
chunking
|
||||
retrieval
|
||||
training
|
||||
model
|
||||
parameter
|
||||
registry
|
||||
token
|
||||
concept tag
|
||||
knowledge state
|
||||
visible signal
|
||||
reported state
|
||||
known state
|
||||
hidden true state
|
||||
settled result
|
||||
actor perspective
|
||||
decision threshold
|
||||
uncertainty structure
|
||||
correct model behavior
|
||||
incorrect model behavior
|
||||
confidence problem
|
||||
designer analysis
|
||||
```
|
||||
|
||||
These terms may appear inside HTML comment metadata only.
|
||||
|
||||
---
|
||||
|
||||
## 4. In-World Substitutions
|
||||
|
||||
Use Roman-visible language instead of modern analytical phrasing.
|
||||
|
||||
| Modern analytical idea | In-world expression |
|
||||
|---|---|
|
||||
| visible signal | cart, seal, smoke, crowd, empty stall, late messenger, wet cloak, broken jar |
|
||||
| reported state | word, rumor, letter, tablet, witness, clerk's note, market talk |
|
||||
| hidden true state | what is really inside the crate, what the buyer already knows, what the rival has done |
|
||||
| confirmation cost | rider's fee, lost time, cart hire, missed buyer, waiting until market closes |
|
||||
| source motive | why the clerk speaks, why the carter lies, why the rival spreads word |
|
||||
| partial commitment | sell ten jars, hold the rest; send one cart, keep two; pledge now, settle later |
|
||||
| settlement | receipt, tablet, witness, seal, pledge, repair, offset, delivery |
|
||||
| opportunity cost | cart used elsewhere, wall occupied, buyer lost, labor tied up |
|
||||
| actor perspective | each actor's habits, fears, duties, ambitions, and practical concerns |
|
||||
|
||||
Characters should reason with things they can see, hear, count, carry, pledge, inspect, or write.
|
||||
|
||||
---
|
||||
|
||||
## 5. Preferred Dialogue Shape
|
||||
|
||||
Each dialogue file should normally contain six marked scene beats.
|
||||
|
||||
Preferred pattern:
|
||||
|
||||
```text
|
||||
1. Scene opening and visible trouble
|
||||
2. First interpretation or opportunity
|
||||
3. Challenge, caution, or competing reading
|
||||
4. Practical cost, arithmetic, obligation, or risk
|
||||
5. Decision point with buyer, rival, official, worker, or witness
|
||||
6. Closing result or changed account
|
||||
```
|
||||
|
||||
This is a preference, not a hard rule.
|
||||
|
||||
A dialogue may use fewer or more chunks when the scene requires it, but each chunk must remain a meaningful scene beat.
|
||||
|
||||
---
|
||||
|
||||
## 6. Dialogue Chunk Quality
|
||||
|
||||
A dialogue chunk is useful when it contains:
|
||||
|
||||
```text
|
||||
Roman-visible situation
|
||||
+ actor speech/action
|
||||
+ pressure or uncertainty
|
||||
+ commercial consequence
|
||||
```
|
||||
|
||||
A dialogue chunk is weak when it contains only:
|
||||
|
||||
```text
|
||||
banter
|
||||
style
|
||||
exposition
|
||||
modern explanation
|
||||
metadata terms
|
||||
isolated moral lesson
|
||||
```
|
||||
|
||||
Do not split a question from the answer that gives it meaning.
|
||||
|
||||
Do not split a false claim from the correction that makes it useful.
|
||||
|
||||
Do not split a joke or quip from the economic point it reveals.
|
||||
|
||||
---
|
||||
|
||||
## 7. Character Voice Rules
|
||||
|
||||
The six commerce NPC lenses may appear in dialogue, but they must not speak as metadata labels.
|
||||
|
||||
Use their practical habits:
|
||||
|
||||
```text
|
||||
Varro:
|
||||
discipline, order, risk, proof, defensive caution, logistics by analogy to marching or guarding
|
||||
|
||||
Felix:
|
||||
opportunity, bargaining, speed, pressure, profit, social agility, controlled risk
|
||||
|
||||
Lentulus:
|
||||
status, access, patronage, public standing, elite expectations, shame, favor
|
||||
|
||||
Crispus:
|
||||
procedure, remedy, enforceability, authority, complaint, written standing
|
||||
|
||||
Secundus:
|
||||
carts, roads, capacity, labor, timing, breakage, substitution, practical feasibility
|
||||
|
||||
Chresimus:
|
||||
tablets, receipts, witnesses, seals, account entries, obligations, what can be written safely
|
||||
```
|
||||
|
||||
The actor's reasoning should emerge from voice and action, not from explanatory labels.
|
||||
|
||||
---
|
||||
|
||||
## 8. Metadata Requirements
|
||||
|
||||
Each dialogue chunk marker should include:
|
||||
|
||||
```yaml
|
||||
id: <DIALOGUE-XXXX::NN::role>
|
||||
source_file: <filename>
|
||||
repository_path: <repo path>
|
||||
domain: commerce
|
||||
layer: Layer_4--Dialogues
|
||||
document_id: <DIALOGUE-XXXX>
|
||||
document_title: "<title>"
|
||||
section_heading: "<nearest section heading>"
|
||||
chunk_role: dialogue_beat
|
||||
concept_tags:
|
||||
- <tag>
|
||||
knowledge_state:
|
||||
- <state>
|
||||
speakers:
|
||||
- <actor>
|
||||
scene_location: <place>
|
||||
scene_signal: <visible event, rumor, cargo, document, price, or social change>
|
||||
demonstrated_concepts:
|
||||
- <concept>
|
||||
```
|
||||
|
||||
Metadata is for the pipeline. It is not part of the Roman scene.
|
||||
|
||||
---
|
||||
|
||||
## 9. Knowledge Boundary Rule
|
||||
|
||||
Dialogue must preserve what actors know.
|
||||
|
||||
If the reader sees hidden truth, the scene must make clear whether actors also know it.
|
||||
|
||||
Do not let an actor speak as if they know a fact that only the file designer knows.
|
||||
|
||||
Use distinctions visible in Roman terms:
|
||||
|
||||
```text
|
||||
"I saw it."
|
||||
"I heard it."
|
||||
"The tablet says it."
|
||||
"The carter claims it."
|
||||
"The seal is unbroken."
|
||||
"The buyer has not yet agreed."
|
||||
"The witness can say this much."
|
||||
"The rest is guesswork."
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10. Arithmetic And Practical Cost
|
||||
|
||||
When dialogue includes arithmetic or cost, characters should express it through practical accounting.
|
||||
|
||||
Allowed:
|
||||
|
||||
```text
|
||||
"Two jars lost. Hire paid. Half a day gone."
|
||||
"If we pay double for carts, the venture thins."
|
||||
"Ten jars now, the rest tomorrow."
|
||||
"Repair stands against part of the debt."
|
||||
```
|
||||
|
||||
Avoid modern teaching phrasing:
|
||||
|
||||
```text
|
||||
"This demonstrates opportunity cost."
|
||||
"The correct calculation is..."
|
||||
"The model should infer..."
|
||||
```
|
||||
|
||||
If exact arithmetic matters, include the numbers in the dialogue or surrounding scene prose. Do not leave calculation only in metadata.
|
||||
|
||||
---
|
||||
|
||||
## 11. Review Checklist
|
||||
|
||||
Before accepting a dialogue file:
|
||||
|
||||
1. Does every spoken line sound like a person in the world, not a trainer?
|
||||
2. Are modern analytical terms confined to chunk metadata?
|
||||
3. Does each chunk contain a complete scene beat?
|
||||
4. Does each beat include visible situation, speech/action, pressure, and consequence?
|
||||
5. Are knowledge boundaries preserved?
|
||||
6. Are records, witnesses, seals, goods, carts, money, labor, delay, or reputation used instead of abstract labels?
|
||||
7. Does the file teach through action rather than explanation?
|
||||
8. Does the extractor validate all chunks without errors?
|
||||
|
||||
---
|
||||
|
||||
## 12. Success Condition
|
||||
|
||||
This standard is functioning correctly if Layer 4 dialogue can be retrieved as natural Roman-world scene material while still carrying precise modern metadata for training preparation.
|
||||
|
||||
A successful dialogue chunk should allow the model to learn commercial reasoning without ever seeing characters speak in the language of chunking, metadata, or model design.
|
||||
Reference in New Issue
Block a user