initial upload

2026-04-30 15:53:05 -04:00
parent 9889ecb574
commit 5163c72e01
1 changed files with 640 additions and 0 deletions
--- a/docs/training/chunking/GENERATOR-MODEL-SELECTION-0001.md
+++ b/docs/training/chunking/GENERATOR-MODEL-SELECTION-0001.md
@@ -0,0 +1,640 @@
+# GENERATOR-MODEL-SELECTION-0001
+## Local Model Selection And Deployment For The OTIVM Vocabulary Generator
+### Status: Draft Standard
+### Layer: Training Infrastructure
+### Purpose: Select and deploy a small local model for generating Roman-visible vocabulary candidates
+### Repository Path: docs/training/chunking/GENERATOR-MODEL-SELECTION-0001.md
+
+---
+
+## 0. Purpose
+
+This document defines a practical model-selection and deployment plan for the OTIVM Roman-visible expression generator.
+
+The generator is not the CIVICUS-ROMAN model.
+
+The generator is a tool used to produce candidate phrases.
+
+Most generated phrases may be weak.
+
+Only reviewed and accepted expressions become training material.
+
+The generator is quarry equipment.
+
+The reviewed vocabulary is the stone.
+
+---
+
+## 1. Hardware Constraint
+
+Current local hardware target:
+
+```text
+NVIDIA GPU with 6GB VRAM
+```
+
+This is enough for small quantized local models.
+
+It is not the right target for full model training.
+
+It is sufficient for:
+
+```text
+candidate expression generation
+small-batch phrase variation
+actor-voice experiments
+object/action/pressure recombination
+quick local iteration
+offline review workflows
+```
+
+It should not be used yet for:
+
+```text
+full CIVICUS-ROMAN training
+large-context corpus analysis
+unsupervised corpus promotion
+automatic canonical selection
+```
+
+---
+
+## 2. Primary Recommendation
+
+Start with:
+
+```text
+Model: Qwen2.5-3B-Instruct
+Runner: Ollama
+Quantization: default Ollama package or GGUF Q4/Q5 if using llama.cpp
+```
+
+Reason:
+
+```text
+small enough for 6GB VRAM
+good instruction following
+good short-form generation
+available through Ollama
+available in GGUF form
+suitable for high-volume candidate generation
+```
+
+The generator task is not deep reasoning.
+
+It is constrained phrase production.
+
+A 3B instruct model is enough to begin.
+
+---
+
+## 3. Backup Models
+
+### Phi-3.5-mini-instruct
+
+Use if Qwen2.5-3B gives too much decorative prose or weak instruction following.
+
+Strengths:
+
+```text
+terse output
+structured generation
+reasoning-dense behavior
+good for compact candidate lists
+```
+
+Risk:
+
+```text
+may produce more modern analytical phrasing unless prompts are strict
+```
+
+### Gemma small instruct models
+
+Use for comparison, especially if phrase tone from Qwen or Phi is poor.
+
+Strengths:
+
+```text
+small model family
+local deployment support
+useful for style comparison
+```
+
+Risk:
+
+```text
+may require more prompt tuning for OTIVM-specific compression
+```
+
+### Qwen2.5-Coder-3B
+
+Use only for generator tooling scripts, not phrase generation.
+
+Strengths:
+
+```text
+code generation
+JSONL tools
+review UI helpers
+validator scripts
+```
+
+Risk:
+
+```text
+not the right primary voice generator
+```
+
+---
+
+## 4. Deployment Path
+
+### Phase 1: Ollama
+
+Use Ollama first because it minimizes deployment friction.
+
+Install and run:
+
+```bash
+ollama pull qwen2.5:3b
+ollama run qwen2.5:3b
+```
+
+Test with direct prompt batches.
+
+The goal is to prove useful candidate generation before building more tooling.
+
+### Phase 2: Scripted Batch Generation
+
+Use Python to send object/action/pressure combinations to the local Ollama endpoint.
+
+Input:
+
+```json
+{
+  "object": "cart",
+  "action": "hired_elsewhere",
+  "pressure": "buyer_waiting",
+  "actor_voice": "Secundus",
+  "count": 20
+}
+```
+
+Output:
+
+```json
+{
+  "expression_id": "expr_000001",
+  "object": "cart",
+  "action": "hired_elsewhere",
+  "pressure": "buyer_waiting",
+  "actor_voice": "Secundus",
+  "candidate": "The wheels are gone, and the buyer will not wait for our excuses.",
+  "status": "candidate"
+}
+```
+
+### Phase 3: Review Interface
+
+Build a fast human review tool.
+
+Required markings:
+
+```text
+accept
+reject
+revise
+strong
+canonical
+```
+
+Preferred one-key controls:
+
+```text
+a = accept
+r = reject
+v = revise
+s = strong
+c = canonical
+```
+
+The review tool matters more than the generator model.
+
+---
+
+## 5. Generator Prompt Pattern
+
+Use a strict prompt.
+
+Example:
+
+```text
+You generate Roman-visible commercial expressions for OTIVM.
+
+Rules:
+- Do not explain.
+- Do not use modern business language.
+- Do not use words like logistics, liquidity, market efficiency, regulatory, contract compliance, metadata, model, training, or optimization.
+- Use concrete objects, actions, and pressures.
+- Prefer terse lines.
+- Produce candidate lines only.
+
+Object: cart
+Action: hired elsewhere
+Pressure: buyer waiting
+Actor voice: Secundus
+
+Generate 20 candidates.
+```
+
+Expected useful outputs:
+
+```text
+The wheels are gone.
+The buyer will not wait for empty ruts.
+Ten jars can still go by mule.
+Naso bought the road before the oil moved.
+```
+
+Bad outputs:
+
+```text
+Transport capacity is constrained.
+The supply chain is disrupted.
+We need to optimize the delivery channel.
+This represents a logistical bottleneck.
+```
+
+---
+
+## 6. Output Rule
+
+The generator output must never enter training directly.
+
+All generated output begins as:
+
+```text
+status: candidate
+```
+
+Only reviewed material can become:
+
+```text
+accepted
+strong
+canonical
+```
+
+Training may use:
+
+```text
+accepted expressions
+strong expressions
+canonical expressions
+human-revised expressions
+dialogue lines based on reviewed expressions
+```
+
+Training must not use:
+
+```text
+raw generated candidates
+rejected candidates
+unreviewed batches
+candidate churn
+```
+
+---
+
+## 7. Why Modern-Contaminated Generator Models Are Acceptable
+
+The generator model may contain modern assumptions.
+
+That is acceptable because it is not the final model.
+
+The generator is not trusted.
+
+The human review gate is trusted.
+
+This distinction is central:
+
+```text
+generator output = candidate quarry stone
+reviewed output = vocabulary material
+canonical output = simulator-ready phrase
+```
+
+The generator may suggest bad phrases.
+
+The review process prevents them from becoming corpus material.
+
+---
+
+## 8. Local Model Evaluation
+
+Evaluate local generator models by candidate yield, not by benchmark scores.
+
+Useful metric:
+
+```text
+accepted candidates per 100 generated lines
+```
+
+Example:
+
+```text
+Qwen2.5-3B:
+  1000 generated
+  130 accepted
+  22 strong
+  5 canonical
+
+Phi-3.5-mini:
+  1000 generated
+  90 accepted
+  18 strong
+  7 canonical
+
+Gemma small:
+  1000 generated
+  110 accepted
+  15 strong
+  4 canonical
+```
+
+The best generator is the one that gives the most reviewable Roman-visible candidates per hour.
+
+Not the one with the highest general model score.
+
+---
+
+## 9. Batch Generation Strategy
+
+Generate many small batches instead of one huge batch.
+
+Recommended:
+
+```text
+20 candidates per prompt
+50 prompts per run
+1000 candidates per review session
+```
+
+Vary one dimension at a time.
+
+Example batch family:
+
+```text
+object: cart
+action: hired_elsewhere
+pressure: buyer_waiting
+actor_voice: Secundus
+
+object: cart
+action: hired_elsewhere
+pressure: buyer_waiting
+actor_voice: Felix
+
+object: cart
+action: hired_elsewhere
+pressure: buyer_waiting
+actor_voice: Chresimus
+```
+
+This reveals actor voice differences without changing the underlying simulator condition.
+
+---
+
+## 10. Temperature And Sampling
+
+Start conservative.
+
+Suggested settings:
+
+```text
+temperature: 0.8
+top_p: 0.9
+repeat_penalty: 1.1
+num_predict: modest
+context: modest
+```
+
+If output is too dull:
+
+```text
+raise temperature slightly
+increase candidate count
+add actor-specific examples
+```
+
+If output is too theatrical:
+
+```text
+lower temperature
+add terse rule
+add rejection examples
+```
+
+If output is too modern:
+
+```text
+strengthen forbidden terms
+add Roman-visible examples
+reduce abstract wording in prompt
+```
+
+---
+
+## 11. Data Files
+
+Recommended folder layout:
+
+```text
+data/vocabulary/
+  generator_inputs/
+    objects.yaml
+    actions.yaml
+    pressures.yaml
+    actor_voices.yaml
+
+  candidates/
+    candidates_YYYYMMDD.jsonl
+
+  reviewed/
+    roman_visible_expressions.jsonl
+    canonical_templates.jsonl
+
+  reports/
+    generator_yield_report.txt
+    review_summary.txt
+```
+
+---
+
+## 12. Minimum Candidate Schema
+
+```json
+{
+  "expression_id": "expr_000001",
+  "created_at": "YYYY-MM-DD",
+  "generator_model": "qwen2.5:3b",
+  "domain": "commerce",
+  "object": "cart",
+  "action": "hired_elsewhere",
+  "pressure": "buyer_waiting",
+  "actor_voice": "Secundus",
+  "candidate": "The wheels are gone, and the buyer will not wait for our excuses.",
+  "modern_meaning": "Cart capacity has been lost while the buyer is waiting.",
+  "concept_tags": [
+    "transport_capacity",
+    "delay_cost",
+    "buyer_need"
+  ],
+  "status": "candidate",
+  "strength": null,
+  "review_note": null
+}
+```
+
+---
+
+## 13. Promotion Schema
+
+When promoted:
+
+```json
+{
+  "expression_id": "expr_000001",
+  "status": "strong",
+  "reviewed_by": "human",
+  "review_note": "Good Secundus line; concrete and reusable.",
+  "promoted_to": [
+    "roman_visible_expressions"
+  ]
+}
+```
+
+Canonical lines should be rare:
+
+```json
+{
+  "expression_id": "expr_000019",
+  "status": "canonical",
+  "candidate": "The wheels are gone.",
+  "canonical_condition": "transport_capacity_lost"
+}
+```
+
+---
+
+## 14. When To Move Beyond Ollama
+
+Move from Ollama to llama.cpp or vLLM only if needed.
+
+Reasons to move:
+
+```text
+need exact GGUF quant choice
+need better batching control
+need lower latency
+need reproducible runtime parameters
+need integration with a custom review server
+```
+
+Until then, Ollama is sufficient.
+
+The priority is vocabulary yield, not infrastructure elegance.
+
+---
+
+## 15. Near-Term Test Plan
+
+Run a small bakeoff.
+
+Models:
+
+```text
+qwen2.5:3b
+phi3.5-mini-instruct quantized
+gemma small instruct model
+```
+
+Prompts:
+
+```text
+10 object/action/pressure combinations
+6 actor voices
+20 candidates each
+```
+
+Total:
+
+```text
+10 * 6 * 20 = 1200 candidates per model
+```
+
+Human review outcome:
+
+```text
+accepted count
+strong count
+canonical count
+modern contamination count
+too theatrical count
+duplicate count
+```
+
+Pick the generator model by accepted/strong yield per review hour.
+
+---
+
+## 16. Recommendation
+
+Begin with:
+
+```text
+Ollama + qwen2.5:3b
+```
+
+Use it to generate candidate vocabulary only.
+
+Do not use it as authority.
+
+Do not train on its raw output.
+
+Do not let it decide canonical vocabulary.
+
+The first success condition is simple:
+
+```text
+Can the local generator produce enough reviewable Roman-visible candidates to make human review faster than hand-authoring?
+```
+
+If yes, the deployment is successful.
+
+If no, test Phi-3.5-mini and Gemma small models with the same input batches.
+
+---
+
+## 17. Success Condition
+
+This model-selection process is working if it produces:
+
+```text
+high candidate volume
+low deployment friction
+fast human review
+rising accepted-expression count
+a small canonical phrase library
+better dialogue voice
+less modern vocabulary
+```
+
+The correct measure is not model intelligence.
+
+The correct measure is vocabulary throughput.
+
+The generator does not need to be Roman.
+
+The reviewed output does.