initial upload
This commit is contained in:
640
docs/training/chunking/GENERATOR-MODEL-SELECTION-0001.md
Normal file
640
docs/training/chunking/GENERATOR-MODEL-SELECTION-0001.md
Normal file
@@ -0,0 +1,640 @@
|
||||
# GENERATOR-MODEL-SELECTION-0001
|
||||
## Local Model Selection And Deployment For The OTIVM Vocabulary Generator
|
||||
### Status: Draft Standard
|
||||
### Layer: Training Infrastructure
|
||||
### Purpose: Select and deploy a small local model for generating Roman-visible vocabulary candidates
|
||||
### Repository Path: docs/training/chunking/GENERATOR-MODEL-SELECTION-0001.md
|
||||
|
||||
---
|
||||
|
||||
## 0. Purpose
|
||||
|
||||
This document defines a practical model-selection and deployment plan for the OTIVM Roman-visible expression generator.
|
||||
|
||||
The generator is not the CIVICUS-ROMAN model.
|
||||
|
||||
The generator is a tool used to produce candidate phrases.
|
||||
|
||||
Most generated phrases may be weak.
|
||||
|
||||
Only reviewed and accepted expressions become training material.
|
||||
|
||||
The generator is quarry equipment.
|
||||
|
||||
The reviewed vocabulary is the stone.
|
||||
|
||||
---
|
||||
|
||||
## 1. Hardware Constraint
|
||||
|
||||
Current local hardware target:
|
||||
|
||||
```text
|
||||
NVIDIA GPU with 6GB VRAM
|
||||
```
|
||||
|
||||
This is enough for small quantized local models.
|
||||
|
||||
It is not the right target for full model training.
|
||||
|
||||
It is sufficient for:
|
||||
|
||||
```text
|
||||
candidate expression generation
|
||||
small-batch phrase variation
|
||||
actor-voice experiments
|
||||
object/action/pressure recombination
|
||||
quick local iteration
|
||||
offline review workflows
|
||||
```
|
||||
|
||||
It should not be used yet for:
|
||||
|
||||
```text
|
||||
full CIVICUS-ROMAN training
|
||||
large-context corpus analysis
|
||||
unsupervised corpus promotion
|
||||
automatic canonical selection
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Primary Recommendation
|
||||
|
||||
Start with:
|
||||
|
||||
```text
|
||||
Model: Qwen2.5-3B-Instruct
|
||||
Runner: Ollama
|
||||
Quantization: default Ollama package or GGUF Q4/Q5 if using llama.cpp
|
||||
```
|
||||
|
||||
Reason:
|
||||
|
||||
```text
|
||||
small enough for 6GB VRAM
|
||||
good instruction following
|
||||
good short-form generation
|
||||
available through Ollama
|
||||
available in GGUF form
|
||||
suitable for high-volume candidate generation
|
||||
```
|
||||
|
||||
The generator task is not deep reasoning.
|
||||
|
||||
It is constrained phrase production.
|
||||
|
||||
A 3B instruct model is enough to begin.
|
||||
|
||||
---
|
||||
|
||||
## 3. Backup Models
|
||||
|
||||
### Phi-3.5-mini-instruct
|
||||
|
||||
Use if Qwen2.5-3B gives too much decorative prose or weak instruction following.
|
||||
|
||||
Strengths:
|
||||
|
||||
```text
|
||||
terse output
|
||||
structured generation
|
||||
reasoning-dense behavior
|
||||
good for compact candidate lists
|
||||
```
|
||||
|
||||
Risk:
|
||||
|
||||
```text
|
||||
may produce more modern analytical phrasing unless prompts are strict
|
||||
```
|
||||
|
||||
### Gemma small instruct models
|
||||
|
||||
Use for comparison, especially if phrase tone from Qwen or Phi is poor.
|
||||
|
||||
Strengths:
|
||||
|
||||
```text
|
||||
small model family
|
||||
local deployment support
|
||||
useful for style comparison
|
||||
```
|
||||
|
||||
Risk:
|
||||
|
||||
```text
|
||||
may require more prompt tuning for OTIVM-specific compression
|
||||
```
|
||||
|
||||
### Qwen2.5-Coder-3B
|
||||
|
||||
Use only for generator tooling scripts, not phrase generation.
|
||||
|
||||
Strengths:
|
||||
|
||||
```text
|
||||
code generation
|
||||
JSONL tools
|
||||
review UI helpers
|
||||
validator scripts
|
||||
```
|
||||
|
||||
Risk:
|
||||
|
||||
```text
|
||||
not the right primary voice generator
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Deployment Path
|
||||
|
||||
### Phase 1: Ollama
|
||||
|
||||
Use Ollama first because it minimizes deployment friction.
|
||||
|
||||
Install and run:
|
||||
|
||||
```bash
|
||||
ollama pull qwen2.5:3b
|
||||
ollama run qwen2.5:3b
|
||||
```
|
||||
|
||||
Test with direct prompt batches.
|
||||
|
||||
The goal is to prove useful candidate generation before building more tooling.
|
||||
|
||||
### Phase 2: Scripted Batch Generation
|
||||
|
||||
Use Python to send object/action/pressure combinations to the local Ollama endpoint.
|
||||
|
||||
Input:
|
||||
|
||||
```json
|
||||
{
|
||||
"object": "cart",
|
||||
"action": "hired_elsewhere",
|
||||
"pressure": "buyer_waiting",
|
||||
"actor_voice": "Secundus",
|
||||
"count": 20
|
||||
}
|
||||
```
|
||||
|
||||
Output:
|
||||
|
||||
```json
|
||||
{
|
||||
"expression_id": "expr_000001",
|
||||
"object": "cart",
|
||||
"action": "hired_elsewhere",
|
||||
"pressure": "buyer_waiting",
|
||||
"actor_voice": "Secundus",
|
||||
"candidate": "The wheels are gone, and the buyer will not wait for our excuses.",
|
||||
"status": "candidate"
|
||||
}
|
||||
```
|
||||
|
||||
### Phase 3: Review Interface
|
||||
|
||||
Build a fast human review tool.
|
||||
|
||||
Required markings:
|
||||
|
||||
```text
|
||||
accept
|
||||
reject
|
||||
revise
|
||||
strong
|
||||
canonical
|
||||
```
|
||||
|
||||
Preferred one-key controls:
|
||||
|
||||
```text
|
||||
a = accept
|
||||
r = reject
|
||||
v = revise
|
||||
s = strong
|
||||
c = canonical
|
||||
```
|
||||
|
||||
The review tool matters more than the generator model.
|
||||
|
||||
---
|
||||
|
||||
## 5. Generator Prompt Pattern
|
||||
|
||||
Use a strict prompt.
|
||||
|
||||
Example:
|
||||
|
||||
```text
|
||||
You generate Roman-visible commercial expressions for OTIVM.
|
||||
|
||||
Rules:
|
||||
- Do not explain.
|
||||
- Do not use modern business language.
|
||||
- Do not use words like logistics, liquidity, market efficiency, regulatory, contract compliance, metadata, model, training, or optimization.
|
||||
- Use concrete objects, actions, and pressures.
|
||||
- Prefer terse lines.
|
||||
- Produce candidate lines only.
|
||||
|
||||
Object: cart
|
||||
Action: hired elsewhere
|
||||
Pressure: buyer waiting
|
||||
Actor voice: Secundus
|
||||
|
||||
Generate 20 candidates.
|
||||
```
|
||||
|
||||
Expected useful outputs:
|
||||
|
||||
```text
|
||||
The wheels are gone.
|
||||
The buyer will not wait for empty ruts.
|
||||
Ten jars can still go by mule.
|
||||
Naso bought the road before the oil moved.
|
||||
```
|
||||
|
||||
Bad outputs:
|
||||
|
||||
```text
|
||||
Transport capacity is constrained.
|
||||
The supply chain is disrupted.
|
||||
We need to optimize the delivery channel.
|
||||
This represents a logistical bottleneck.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Output Rule
|
||||
|
||||
The generator output must never enter training directly.
|
||||
|
||||
All generated output begins as:
|
||||
|
||||
```text
|
||||
status: candidate
|
||||
```
|
||||
|
||||
Only reviewed material can become:
|
||||
|
||||
```text
|
||||
accepted
|
||||
strong
|
||||
canonical
|
||||
```
|
||||
|
||||
Training may use:
|
||||
|
||||
```text
|
||||
accepted expressions
|
||||
strong expressions
|
||||
canonical expressions
|
||||
human-revised expressions
|
||||
dialogue lines based on reviewed expressions
|
||||
```
|
||||
|
||||
Training must not use:
|
||||
|
||||
```text
|
||||
raw generated candidates
|
||||
rejected candidates
|
||||
unreviewed batches
|
||||
candidate churn
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Why Modern-Contaminated Generator Models Are Acceptable
|
||||
|
||||
The generator model may contain modern assumptions.
|
||||
|
||||
That is acceptable because it is not the final model.
|
||||
|
||||
The generator is not trusted.
|
||||
|
||||
The human review gate is trusted.
|
||||
|
||||
This distinction is central:
|
||||
|
||||
```text
|
||||
generator output = candidate quarry stone
|
||||
reviewed output = vocabulary material
|
||||
canonical output = simulator-ready phrase
|
||||
```
|
||||
|
||||
The generator may suggest bad phrases.
|
||||
|
||||
The review process prevents them from becoming corpus material.
|
||||
|
||||
---
|
||||
|
||||
## 8. Local Model Evaluation
|
||||
|
||||
Evaluate local generator models by candidate yield, not by benchmark scores.
|
||||
|
||||
Useful metric:
|
||||
|
||||
```text
|
||||
accepted candidates per 100 generated lines
|
||||
```
|
||||
|
||||
Example:
|
||||
|
||||
```text
|
||||
Qwen2.5-3B:
|
||||
1000 generated
|
||||
130 accepted
|
||||
22 strong
|
||||
5 canonical
|
||||
|
||||
Phi-3.5-mini:
|
||||
1000 generated
|
||||
90 accepted
|
||||
18 strong
|
||||
7 canonical
|
||||
|
||||
Gemma small:
|
||||
1000 generated
|
||||
110 accepted
|
||||
15 strong
|
||||
4 canonical
|
||||
```
|
||||
|
||||
The best generator is the one that gives the most reviewable Roman-visible candidates per hour.
|
||||
|
||||
Not the one with the highest general model score.
|
||||
|
||||
---
|
||||
|
||||
## 9. Batch Generation Strategy
|
||||
|
||||
Generate many small batches instead of one huge batch.
|
||||
|
||||
Recommended:
|
||||
|
||||
```text
|
||||
20 candidates per prompt
|
||||
50 prompts per run
|
||||
1000 candidates per review session
|
||||
```
|
||||
|
||||
Vary one dimension at a time.
|
||||
|
||||
Example batch family:
|
||||
|
||||
```text
|
||||
object: cart
|
||||
action: hired_elsewhere
|
||||
pressure: buyer_waiting
|
||||
actor_voice: Secundus
|
||||
|
||||
object: cart
|
||||
action: hired_elsewhere
|
||||
pressure: buyer_waiting
|
||||
actor_voice: Felix
|
||||
|
||||
object: cart
|
||||
action: hired_elsewhere
|
||||
pressure: buyer_waiting
|
||||
actor_voice: Chresimus
|
||||
```
|
||||
|
||||
This reveals actor voice differences without changing the underlying simulator condition.
|
||||
|
||||
---
|
||||
|
||||
## 10. Temperature And Sampling
|
||||
|
||||
Start conservative.
|
||||
|
||||
Suggested settings:
|
||||
|
||||
```text
|
||||
temperature: 0.8
|
||||
top_p: 0.9
|
||||
repeat_penalty: 1.1
|
||||
num_predict: modest
|
||||
context: modest
|
||||
```
|
||||
|
||||
If output is too dull:
|
||||
|
||||
```text
|
||||
raise temperature slightly
|
||||
increase candidate count
|
||||
add actor-specific examples
|
||||
```
|
||||
|
||||
If output is too theatrical:
|
||||
|
||||
```text
|
||||
lower temperature
|
||||
add terse rule
|
||||
add rejection examples
|
||||
```
|
||||
|
||||
If output is too modern:
|
||||
|
||||
```text
|
||||
strengthen forbidden terms
|
||||
add Roman-visible examples
|
||||
reduce abstract wording in prompt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11. Data Files
|
||||
|
||||
Recommended folder layout:
|
||||
|
||||
```text
|
||||
data/vocabulary/
|
||||
generator_inputs/
|
||||
objects.yaml
|
||||
actions.yaml
|
||||
pressures.yaml
|
||||
actor_voices.yaml
|
||||
|
||||
candidates/
|
||||
candidates_YYYYMMDD.jsonl
|
||||
|
||||
reviewed/
|
||||
roman_visible_expressions.jsonl
|
||||
canonical_templates.jsonl
|
||||
|
||||
reports/
|
||||
generator_yield_report.txt
|
||||
review_summary.txt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 12. Minimum Candidate Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"expression_id": "expr_000001",
|
||||
"created_at": "YYYY-MM-DD",
|
||||
"generator_model": "qwen2.5:3b",
|
||||
"domain": "commerce",
|
||||
"object": "cart",
|
||||
"action": "hired_elsewhere",
|
||||
"pressure": "buyer_waiting",
|
||||
"actor_voice": "Secundus",
|
||||
"candidate": "The wheels are gone, and the buyer will not wait for our excuses.",
|
||||
"modern_meaning": "Cart capacity has been lost while the buyer is waiting.",
|
||||
"concept_tags": [
|
||||
"transport_capacity",
|
||||
"delay_cost",
|
||||
"buyer_need"
|
||||
],
|
||||
"status": "candidate",
|
||||
"strength": null,
|
||||
"review_note": null
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 13. Promotion Schema
|
||||
|
||||
When promoted:
|
||||
|
||||
```json
|
||||
{
|
||||
"expression_id": "expr_000001",
|
||||
"status": "strong",
|
||||
"reviewed_by": "human",
|
||||
"review_note": "Good Secundus line; concrete and reusable.",
|
||||
"promoted_to": [
|
||||
"roman_visible_expressions"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Canonical lines should be rare:
|
||||
|
||||
```json
|
||||
{
|
||||
"expression_id": "expr_000019",
|
||||
"status": "canonical",
|
||||
"candidate": "The wheels are gone.",
|
||||
"canonical_condition": "transport_capacity_lost"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 14. When To Move Beyond Ollama
|
||||
|
||||
Move from Ollama to llama.cpp or vLLM only if needed.
|
||||
|
||||
Reasons to move:
|
||||
|
||||
```text
|
||||
need exact GGUF quant choice
|
||||
need better batching control
|
||||
need lower latency
|
||||
need reproducible runtime parameters
|
||||
need integration with a custom review server
|
||||
```
|
||||
|
||||
Until then, Ollama is sufficient.
|
||||
|
||||
The priority is vocabulary yield, not infrastructure elegance.
|
||||
|
||||
---
|
||||
|
||||
## 15. Near-Term Test Plan
|
||||
|
||||
Run a small bakeoff.
|
||||
|
||||
Models:
|
||||
|
||||
```text
|
||||
qwen2.5:3b
|
||||
phi3.5-mini-instruct quantized
|
||||
gemma small instruct model
|
||||
```
|
||||
|
||||
Prompts:
|
||||
|
||||
```text
|
||||
10 object/action/pressure combinations
|
||||
6 actor voices
|
||||
20 candidates each
|
||||
```
|
||||
|
||||
Total:
|
||||
|
||||
```text
|
||||
10 * 6 * 20 = 1200 candidates per model
|
||||
```
|
||||
|
||||
Human review outcome:
|
||||
|
||||
```text
|
||||
accepted count
|
||||
strong count
|
||||
canonical count
|
||||
modern contamination count
|
||||
too theatrical count
|
||||
duplicate count
|
||||
```
|
||||
|
||||
Pick the generator model by accepted/strong yield per review hour.
|
||||
|
||||
---
|
||||
|
||||
## 16. Recommendation
|
||||
|
||||
Begin with:
|
||||
|
||||
```text
|
||||
Ollama + qwen2.5:3b
|
||||
```
|
||||
|
||||
Use it to generate candidate vocabulary only.
|
||||
|
||||
Do not use it as authority.
|
||||
|
||||
Do not train on its raw output.
|
||||
|
||||
Do not let it decide canonical vocabulary.
|
||||
|
||||
The first success condition is simple:
|
||||
|
||||
```text
|
||||
Can the local generator produce enough reviewable Roman-visible candidates to make human review faster than hand-authoring?
|
||||
```
|
||||
|
||||
If yes, the deployment is successful.
|
||||
|
||||
If no, test Phi-3.5-mini and Gemma small models with the same input batches.
|
||||
|
||||
---
|
||||
|
||||
## 17. Success Condition
|
||||
|
||||
This model-selection process is working if it produces:
|
||||
|
||||
```text
|
||||
high candidate volume
|
||||
low deployment friction
|
||||
fast human review
|
||||
rising accepted-expression count
|
||||
a small canonical phrase library
|
||||
better dialogue voice
|
||||
less modern vocabulary
|
||||
```
|
||||
|
||||
The correct measure is not model intelligence.
|
||||
|
||||
The correct measure is vocabulary throughput.
|
||||
|
||||
The generator does not need to be Roman.
|
||||
|
||||
The reviewed output does.
|
||||
Reference in New Issue
Block a user