initial upload

2026-04-30 15:53:05 -04:00
parent 9889ecb574
commit 5163c72e01
1 changed files with 640 additions and 0 deletions
--- a/docs/training/chunking/GENERATOR-MODEL-SELECTION-0001.md
+++ b/docs/training/chunking/GENERATOR-MODEL-SELECTION-0001.md
@@ -0,0 +1,640 @@
 # GENERATOR-MODEL-SELECTION-0001
 ## Local Model Selection And Deployment For The OTIVM Vocabulary Generator
 ### Status: Draft Standard
 ### Layer: Training Infrastructure
 ### Purpose: Select and deploy a small local model for generating Roman-visible vocabulary candidates
 ### Repository Path: docs/training/chunking/GENERATOR-MODEL-SELECTION-0001.md
 ---
 ## 0. Purpose
 This document defines a practical model-selection and deployment plan for the OTIVM Roman-visible expression generator.
 The generator is not the CIVICUS-ROMAN model.
 The generator is a tool used to produce candidate phrases.
 Most generated phrases may be weak.
 Only reviewed and accepted expressions become training material.
 The generator is quarry equipment.
 The reviewed vocabulary is the stone.
 ---
 ## 1. Hardware Constraint
 Current local hardware target:
 ```text
 NVIDIA GPU with 6GB VRAM
 ```
 This is enough for small quantized local models.
 It is not the right target for full model training.
 It is sufficient for:
 ```text
 candidate expression generation
 small-batch phrase variation
 actor-voice experiments
 object/action/pressure recombination
 quick local iteration
 offline review workflows
 ```
 It should not be used yet for:
 ```text
 full CIVICUS-ROMAN training
 large-context corpus analysis
 unsupervised corpus promotion
 automatic canonical selection
 ```
 ---
 ## 2. Primary Recommendation
 Start with:
 ```text
 Model: Qwen2.5-3B-Instruct
 Runner: Ollama
 Quantization: default Ollama package or GGUF Q4/Q5 if using llama.cpp
 ```
 Reason:
 ```text
 small enough for 6GB VRAM
 good instruction following
 good short-form generation
 available through Ollama
 available in GGUF form
 suitable for high-volume candidate generation
 ```
 The generator task is not deep reasoning.
 It is constrained phrase production.
 A 3B instruct model is enough to begin.
 ---
 ## 3. Backup Models
 ### Phi-3.5-mini-instruct
 Use if Qwen2.5-3B gives too much decorative prose or weak instruction following.
 Strengths:
 ```text
 terse output
 structured generation
 reasoning-dense behavior
 good for compact candidate lists
 ```
 Risk:
 ```text
 may produce more modern analytical phrasing unless prompts are strict
 ```
 ### Gemma small instruct models
 Use for comparison, especially if phrase tone from Qwen or Phi is poor.
 Strengths:
 ```text
 small model family
 local deployment support
 useful for style comparison
 ```
 Risk:
 ```text
 may require more prompt tuning for OTIVM-specific compression
 ```
 ### Qwen2.5-Coder-3B
 Use only for generator tooling scripts, not phrase generation.
 Strengths:
 ```text
 code generation
 JSONL tools
 review UI helpers
 validator scripts
 ```
 Risk:
 ```text
 not the right primary voice generator
 ```
 ---
 ## 4. Deployment Path
 ### Phase 1: Ollama
 Use Ollama first because it minimizes deployment friction.
 Install and run:
 ```bash
 ollama pull qwen2.5:3b
 ollama run qwen2.5:3b
 ```
 Test with direct prompt batches.
 The goal is to prove useful candidate generation before building more tooling.
 ### Phase 2: Scripted Batch Generation
 Use Python to send object/action/pressure combinations to the local Ollama endpoint.
 Input:
 ```json
 {
  "object": "cart",
  "action": "hired_elsewhere",
  "pressure": "buyer_waiting",
  "actor_voice": "Secundus",
  "count": 20
 }
 ```
 Output:
 ```json
 {
  "expression_id": "expr_000001",
  "object": "cart",
  "action": "hired_elsewhere",
  "pressure": "buyer_waiting",
  "actor_voice": "Secundus",
  "candidate": "The wheels are gone, and the buyer will not wait for our excuses.",
  "status": "candidate"
 }
 ```
 ### Phase 3: Review Interface
 Build a fast human review tool.
 Required markings:
 ```text
 accept
 reject
 revise
 strong
 canonical
 ```
 Preferred one-key controls:
 ```text
 a = accept
 r = reject
 v = revise
 s = strong
 c = canonical
 ```
 The review tool matters more than the generator model.
 ---
 ## 5. Generator Prompt Pattern
 Use a strict prompt.
 Example:
 ```text
 You generate Roman-visible commercial expressions for OTIVM.
 Rules:
 - Do not explain.
 - Do not use modern business language.
 - Do not use words like logistics, liquidity, market efficiency, regulatory, contract compliance, metadata, model, training, or optimization.
 - Use concrete objects, actions, and pressures.
 - Prefer terse lines.
 - Produce candidate lines only.
 Object: cart
 Action: hired elsewhere
 Pressure: buyer waiting
 Actor voice: Secundus
 Generate 20 candidates.
 ```
 Expected useful outputs:
 ```text
 The wheels are gone.
 The buyer will not wait for empty ruts.
 Ten jars can still go by mule.
 Naso bought the road before the oil moved.
 ```
 Bad outputs:
 ```text
 Transport capacity is constrained.
 The supply chain is disrupted.
 We need to optimize the delivery channel.
 This represents a logistical bottleneck.
 ```
 ---
 ## 6. Output Rule
 The generator output must never enter training directly.
 All generated output begins as:
 ```text
 status: candidate
 ```
 Only reviewed material can become:
 ```text
 accepted
 strong
 canonical
 ```
 Training may use:
 ```text
 accepted expressions
 strong expressions
 canonical expressions
 human-revised expressions
 dialogue lines based on reviewed expressions
 ```
 Training must not use:
 ```text
 raw generated candidates
 rejected candidates
 unreviewed batches
 candidate churn
 ```
 ---
 ## 7. Why Modern-Contaminated Generator Models Are Acceptable
 The generator model may contain modern assumptions.
 That is acceptable because it is not the final model.
 The generator is not trusted.
 The human review gate is trusted.
 This distinction is central:
 ```text
 generator output = candidate quarry stone
 reviewed output = vocabulary material
 canonical output = simulator-ready phrase
 ```
 The generator may suggest bad phrases.
 The review process prevents them from becoming corpus material.
 ---
 ## 8. Local Model Evaluation
 Evaluate local generator models by candidate yield, not by benchmark scores.
 Useful metric:
 ```text
 accepted candidates per 100 generated lines
 ```
 Example:
 ```text
 Qwen2.5-3B:
  1000 generated
  130 accepted
  22 strong
  5 canonical
 Phi-3.5-mini:
  1000 generated
  90 accepted
  18 strong
  7 canonical
 Gemma small:
  1000 generated
  110 accepted
  15 strong
  4 canonical
 ```
 The best generator is the one that gives the most reviewable Roman-visible candidates per hour.
 Not the one with the highest general model score.
 ---
 ## 9. Batch Generation Strategy
 Generate many small batches instead of one huge batch.
 Recommended:
 ```text
 20 candidates per prompt
 50 prompts per run
 1000 candidates per review session
 ```
 Vary one dimension at a time.
 Example batch family:
 ```text
 object: cart
 action: hired_elsewhere
 pressure: buyer_waiting
 actor_voice: Secundus
 object: cart
 action: hired_elsewhere
 pressure: buyer_waiting
 actor_voice: Felix
 object: cart
 action: hired_elsewhere
 pressure: buyer_waiting
 actor_voice: Chresimus
 ```
 This reveals actor voice differences without changing the underlying simulator condition.
 ---
 ## 10. Temperature And Sampling
 Start conservative.
 Suggested settings:
 ```text
 temperature: 0.8
 top_p: 0.9
 repeat_penalty: 1.1
 num_predict: modest
 context: modest
 ```
 If output is too dull:
 ```text
 raise temperature slightly
 increase candidate count
 add actor-specific examples
 ```
 If output is too theatrical:
 ```text
 lower temperature
 add terse rule
 add rejection examples
 ```
 If output is too modern:
 ```text
 strengthen forbidden terms
 add Roman-visible examples
 reduce abstract wording in prompt
 ```
 ---
 ## 11. Data Files
 Recommended folder layout:
 ```text
 data/vocabulary/
  generator_inputs/
    objects.yaml
    actions.yaml
    pressures.yaml
    actor_voices.yaml
  candidates/
    candidates_YYYYMMDD.jsonl
  reviewed/
    roman_visible_expressions.jsonl
    canonical_templates.jsonl
  reports/
    generator_yield_report.txt
    review_summary.txt
 ```
 ---
 ## 12. Minimum Candidate Schema
 ```json
 {
  "expression_id": "expr_000001",
  "created_at": "YYYY-MM-DD",
  "generator_model": "qwen2.5:3b",
  "domain": "commerce",
  "object": "cart",
  "action": "hired_elsewhere",
  "pressure": "buyer_waiting",
  "actor_voice": "Secundus",
  "candidate": "The wheels are gone, and the buyer will not wait for our excuses.",
  "modern_meaning": "Cart capacity has been lost while the buyer is waiting.",
  "concept_tags": [
    "transport_capacity",
    "delay_cost",
    "buyer_need"
  ],
  "status": "candidate",
  "strength": null,
  "review_note": null
 }
 ```
 ---
 ## 13. Promotion Schema
 When promoted:
 ```json
 {
  "expression_id": "expr_000001",
  "status": "strong",
  "reviewed_by": "human",
  "review_note": "Good Secundus line; concrete and reusable.",
  "promoted_to": [
    "roman_visible_expressions"
  ]
 }
 ```
 Canonical lines should be rare:
 ```json
 {
  "expression_id": "expr_000019",
  "status": "canonical",
  "candidate": "The wheels are gone.",
  "canonical_condition": "transport_capacity_lost"
 }
 ```
 ---
 ## 14. When To Move Beyond Ollama
 Move from Ollama to llama.cpp or vLLM only if needed.
 Reasons to move:
 ```text
 need exact GGUF quant choice
 need better batching control
 need lower latency
 need reproducible runtime parameters
 need integration with a custom review server
 ```
 Until then, Ollama is sufficient.
 The priority is vocabulary yield, not infrastructure elegance.
 ---
 ## 15. Near-Term Test Plan
 Run a small bakeoff.
 Models:
 ```text
 qwen2.5:3b
 phi3.5-mini-instruct quantized
 gemma small instruct model
 ```
 Prompts:
 ```text
 10 object/action/pressure combinations
 6 actor voices
 20 candidates each
 ```
 Total:
 ```text
 10 * 6 * 20 = 1200 candidates per model
 ```
 Human review outcome:
 ```text
 accepted count
 strong count
 canonical count
 modern contamination count
 too theatrical count
 duplicate count
 ```
 Pick the generator model by accepted/strong yield per review hour.
 ---
 ## 16. Recommendation
 Begin with:
 ```text
 Ollama + qwen2.5:3b
 ```
 Use it to generate candidate vocabulary only.
 Do not use it as authority.
 Do not train on its raw output.
 Do not let it decide canonical vocabulary.
 The first success condition is simple:
 ```text
 Can the local generator produce enough reviewable Roman-visible candidates to make human review faster than hand-authoring?
 ```
 If yes, the deployment is successful.
 If no, test Phi-3.5-mini and Gemma small models with the same input batches.
 ---
 ## 17. Success Condition
 This model-selection process is working if it produces:
 ```text
 high candidate volume
 low deployment friction
 fast human review
 rising accepted-expression count
 a small canonical phrase library
 better dialogue voice
 less modern vocabulary
 ```
 The correct measure is not model intelligence.
 The correct measure is vocabulary throughput.
 The generator does not need to be Roman.
 The reviewed output does.