diff --git a/docs/training/chunking/GENERATOR-MODEL-SELECTION-0001.md b/docs/training/chunking/GENERATOR-MODEL-SELECTION-0001.md new file mode 100644 index 0000000..5a763d3 --- /dev/null +++ b/docs/training/chunking/GENERATOR-MODEL-SELECTION-0001.md @@ -0,0 +1,640 @@ +# GENERATOR-MODEL-SELECTION-0001 +## Local Model Selection And Deployment For The OTIVM Vocabulary Generator +### Status: Draft Standard +### Layer: Training Infrastructure +### Purpose: Select and deploy a small local model for generating Roman-visible vocabulary candidates +### Repository Path: docs/training/chunking/GENERATOR-MODEL-SELECTION-0001.md + +--- + +## 0. Purpose + +This document defines a practical model-selection and deployment plan for the OTIVM Roman-visible expression generator. + +The generator is not the CIVICUS-ROMAN model. + +The generator is a tool used to produce candidate phrases. + +Most generated phrases may be weak. + +Only reviewed and accepted expressions become training material. + +The generator is quarry equipment. + +The reviewed vocabulary is the stone. + +--- + +## 1. Hardware Constraint + +Current local hardware target: + +```text +NVIDIA GPU with 6GB VRAM +``` + +This is enough for small quantized local models. + +It is not the right target for full model training. + +It is sufficient for: + +```text +candidate expression generation +small-batch phrase variation +actor-voice experiments +object/action/pressure recombination +quick local iteration +offline review workflows +``` + +It should not be used yet for: + +```text +full CIVICUS-ROMAN training +large-context corpus analysis +unsupervised corpus promotion +automatic canonical selection +``` + +--- + +## 2. Primary Recommendation + +Start with: + +```text +Model: Qwen2.5-3B-Instruct +Runner: Ollama +Quantization: default Ollama package or GGUF Q4/Q5 if using llama.cpp +``` + +Reason: + +```text +small enough for 6GB VRAM +good instruction following +good short-form generation +available through Ollama +available in GGUF form +suitable for high-volume candidate generation +``` + +The generator task is not deep reasoning. + +It is constrained phrase production. + +A 3B instruct model is enough to begin. + +--- + +## 3. Backup Models + +### Phi-3.5-mini-instruct + +Use if Qwen2.5-3B gives too much decorative prose or weak instruction following. + +Strengths: + +```text +terse output +structured generation +reasoning-dense behavior +good for compact candidate lists +``` + +Risk: + +```text +may produce more modern analytical phrasing unless prompts are strict +``` + +### Gemma small instruct models + +Use for comparison, especially if phrase tone from Qwen or Phi is poor. + +Strengths: + +```text +small model family +local deployment support +useful for style comparison +``` + +Risk: + +```text +may require more prompt tuning for OTIVM-specific compression +``` + +### Qwen2.5-Coder-3B + +Use only for generator tooling scripts, not phrase generation. + +Strengths: + +```text +code generation +JSONL tools +review UI helpers +validator scripts +``` + +Risk: + +```text +not the right primary voice generator +``` + +--- + +## 4. Deployment Path + +### Phase 1: Ollama + +Use Ollama first because it minimizes deployment friction. + +Install and run: + +```bash +ollama pull qwen2.5:3b +ollama run qwen2.5:3b +``` + +Test with direct prompt batches. + +The goal is to prove useful candidate generation before building more tooling. + +### Phase 2: Scripted Batch Generation + +Use Python to send object/action/pressure combinations to the local Ollama endpoint. + +Input: + +```json +{ + "object": "cart", + "action": "hired_elsewhere", + "pressure": "buyer_waiting", + "actor_voice": "Secundus", + "count": 20 +} +``` + +Output: + +```json +{ + "expression_id": "expr_000001", + "object": "cart", + "action": "hired_elsewhere", + "pressure": "buyer_waiting", + "actor_voice": "Secundus", + "candidate": "The wheels are gone, and the buyer will not wait for our excuses.", + "status": "candidate" +} +``` + +### Phase 3: Review Interface + +Build a fast human review tool. + +Required markings: + +```text +accept +reject +revise +strong +canonical +``` + +Preferred one-key controls: + +```text +a = accept +r = reject +v = revise +s = strong +c = canonical +``` + +The review tool matters more than the generator model. + +--- + +## 5. Generator Prompt Pattern + +Use a strict prompt. + +Example: + +```text +You generate Roman-visible commercial expressions for OTIVM. + +Rules: +- Do not explain. +- Do not use modern business language. +- Do not use words like logistics, liquidity, market efficiency, regulatory, contract compliance, metadata, model, training, or optimization. +- Use concrete objects, actions, and pressures. +- Prefer terse lines. +- Produce candidate lines only. + +Object: cart +Action: hired elsewhere +Pressure: buyer waiting +Actor voice: Secundus + +Generate 20 candidates. +``` + +Expected useful outputs: + +```text +The wheels are gone. +The buyer will not wait for empty ruts. +Ten jars can still go by mule. +Naso bought the road before the oil moved. +``` + +Bad outputs: + +```text +Transport capacity is constrained. +The supply chain is disrupted. +We need to optimize the delivery channel. +This represents a logistical bottleneck. +``` + +--- + +## 6. Output Rule + +The generator output must never enter training directly. + +All generated output begins as: + +```text +status: candidate +``` + +Only reviewed material can become: + +```text +accepted +strong +canonical +``` + +Training may use: + +```text +accepted expressions +strong expressions +canonical expressions +human-revised expressions +dialogue lines based on reviewed expressions +``` + +Training must not use: + +```text +raw generated candidates +rejected candidates +unreviewed batches +candidate churn +``` + +--- + +## 7. Why Modern-Contaminated Generator Models Are Acceptable + +The generator model may contain modern assumptions. + +That is acceptable because it is not the final model. + +The generator is not trusted. + +The human review gate is trusted. + +This distinction is central: + +```text +generator output = candidate quarry stone +reviewed output = vocabulary material +canonical output = simulator-ready phrase +``` + +The generator may suggest bad phrases. + +The review process prevents them from becoming corpus material. + +--- + +## 8. Local Model Evaluation + +Evaluate local generator models by candidate yield, not by benchmark scores. + +Useful metric: + +```text +accepted candidates per 100 generated lines +``` + +Example: + +```text +Qwen2.5-3B: + 1000 generated + 130 accepted + 22 strong + 5 canonical + +Phi-3.5-mini: + 1000 generated + 90 accepted + 18 strong + 7 canonical + +Gemma small: + 1000 generated + 110 accepted + 15 strong + 4 canonical +``` + +The best generator is the one that gives the most reviewable Roman-visible candidates per hour. + +Not the one with the highest general model score. + +--- + +## 9. Batch Generation Strategy + +Generate many small batches instead of one huge batch. + +Recommended: + +```text +20 candidates per prompt +50 prompts per run +1000 candidates per review session +``` + +Vary one dimension at a time. + +Example batch family: + +```text +object: cart +action: hired_elsewhere +pressure: buyer_waiting +actor_voice: Secundus + +object: cart +action: hired_elsewhere +pressure: buyer_waiting +actor_voice: Felix + +object: cart +action: hired_elsewhere +pressure: buyer_waiting +actor_voice: Chresimus +``` + +This reveals actor voice differences without changing the underlying simulator condition. + +--- + +## 10. Temperature And Sampling + +Start conservative. + +Suggested settings: + +```text +temperature: 0.8 +top_p: 0.9 +repeat_penalty: 1.1 +num_predict: modest +context: modest +``` + +If output is too dull: + +```text +raise temperature slightly +increase candidate count +add actor-specific examples +``` + +If output is too theatrical: + +```text +lower temperature +add terse rule +add rejection examples +``` + +If output is too modern: + +```text +strengthen forbidden terms +add Roman-visible examples +reduce abstract wording in prompt +``` + +--- + +## 11. Data Files + +Recommended folder layout: + +```text +data/vocabulary/ + generator_inputs/ + objects.yaml + actions.yaml + pressures.yaml + actor_voices.yaml + + candidates/ + candidates_YYYYMMDD.jsonl + + reviewed/ + roman_visible_expressions.jsonl + canonical_templates.jsonl + + reports/ + generator_yield_report.txt + review_summary.txt +``` + +--- + +## 12. Minimum Candidate Schema + +```json +{ + "expression_id": "expr_000001", + "created_at": "YYYY-MM-DD", + "generator_model": "qwen2.5:3b", + "domain": "commerce", + "object": "cart", + "action": "hired_elsewhere", + "pressure": "buyer_waiting", + "actor_voice": "Secundus", + "candidate": "The wheels are gone, and the buyer will not wait for our excuses.", + "modern_meaning": "Cart capacity has been lost while the buyer is waiting.", + "concept_tags": [ + "transport_capacity", + "delay_cost", + "buyer_need" + ], + "status": "candidate", + "strength": null, + "review_note": null +} +``` + +--- + +## 13. Promotion Schema + +When promoted: + +```json +{ + "expression_id": "expr_000001", + "status": "strong", + "reviewed_by": "human", + "review_note": "Good Secundus line; concrete and reusable.", + "promoted_to": [ + "roman_visible_expressions" + ] +} +``` + +Canonical lines should be rare: + +```json +{ + "expression_id": "expr_000019", + "status": "canonical", + "candidate": "The wheels are gone.", + "canonical_condition": "transport_capacity_lost" +} +``` + +--- + +## 14. When To Move Beyond Ollama + +Move from Ollama to llama.cpp or vLLM only if needed. + +Reasons to move: + +```text +need exact GGUF quant choice +need better batching control +need lower latency +need reproducible runtime parameters +need integration with a custom review server +``` + +Until then, Ollama is sufficient. + +The priority is vocabulary yield, not infrastructure elegance. + +--- + +## 15. Near-Term Test Plan + +Run a small bakeoff. + +Models: + +```text +qwen2.5:3b +phi3.5-mini-instruct quantized +gemma small instruct model +``` + +Prompts: + +```text +10 object/action/pressure combinations +6 actor voices +20 candidates each +``` + +Total: + +```text +10 * 6 * 20 = 1200 candidates per model +``` + +Human review outcome: + +```text +accepted count +strong count +canonical count +modern contamination count +too theatrical count +duplicate count +``` + +Pick the generator model by accepted/strong yield per review hour. + +--- + +## 16. Recommendation + +Begin with: + +```text +Ollama + qwen2.5:3b +``` + +Use it to generate candidate vocabulary only. + +Do not use it as authority. + +Do not train on its raw output. + +Do not let it decide canonical vocabulary. + +The first success condition is simple: + +```text +Can the local generator produce enough reviewable Roman-visible candidates to make human review faster than hand-authoring? +``` + +If yes, the deployment is successful. + +If no, test Phi-3.5-mini and Gemma small models with the same input batches. + +--- + +## 17. Success Condition + +This model-selection process is working if it produces: + +```text +high candidate volume +low deployment friction +fast human review +rising accepted-expression count +a small canonical phrase library +better dialogue voice +less modern vocabulary +``` + +The correct measure is not model intelligence. + +The correct measure is vocabulary throughput. + +The generator does not need to be Roman. + +The reviewed output does.