initial upload
This commit is contained in:
640
docs/training/chunking/GENERATOR-MODEL-SELECTION-0001.md
Normal file
640
docs/training/chunking/GENERATOR-MODEL-SELECTION-0001.md
Normal file
@@ -0,0 +1,640 @@
|
|||||||
|
# GENERATOR-MODEL-SELECTION-0001
|
||||||
|
## Local Model Selection And Deployment For The OTIVM Vocabulary Generator
|
||||||
|
### Status: Draft Standard
|
||||||
|
### Layer: Training Infrastructure
|
||||||
|
### Purpose: Select and deploy a small local model for generating Roman-visible vocabulary candidates
|
||||||
|
### Repository Path: docs/training/chunking/GENERATOR-MODEL-SELECTION-0001.md
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 0. Purpose
|
||||||
|
|
||||||
|
This document defines a practical model-selection and deployment plan for the OTIVM Roman-visible expression generator.
|
||||||
|
|
||||||
|
The generator is not the CIVICUS-ROMAN model.
|
||||||
|
|
||||||
|
The generator is a tool used to produce candidate phrases.
|
||||||
|
|
||||||
|
Most generated phrases may be weak.
|
||||||
|
|
||||||
|
Only reviewed and accepted expressions become training material.
|
||||||
|
|
||||||
|
The generator is quarry equipment.
|
||||||
|
|
||||||
|
The reviewed vocabulary is the stone.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Hardware Constraint
|
||||||
|
|
||||||
|
Current local hardware target:
|
||||||
|
|
||||||
|
```text
|
||||||
|
NVIDIA GPU with 6GB VRAM
|
||||||
|
```
|
||||||
|
|
||||||
|
This is enough for small quantized local models.
|
||||||
|
|
||||||
|
It is not the right target for full model training.
|
||||||
|
|
||||||
|
It is sufficient for:
|
||||||
|
|
||||||
|
```text
|
||||||
|
candidate expression generation
|
||||||
|
small-batch phrase variation
|
||||||
|
actor-voice experiments
|
||||||
|
object/action/pressure recombination
|
||||||
|
quick local iteration
|
||||||
|
offline review workflows
|
||||||
|
```
|
||||||
|
|
||||||
|
It should not be used yet for:
|
||||||
|
|
||||||
|
```text
|
||||||
|
full CIVICUS-ROMAN training
|
||||||
|
large-context corpus analysis
|
||||||
|
unsupervised corpus promotion
|
||||||
|
automatic canonical selection
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Primary Recommendation
|
||||||
|
|
||||||
|
Start with:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Model: Qwen2.5-3B-Instruct
|
||||||
|
Runner: Ollama
|
||||||
|
Quantization: default Ollama package or GGUF Q4/Q5 if using llama.cpp
|
||||||
|
```
|
||||||
|
|
||||||
|
Reason:
|
||||||
|
|
||||||
|
```text
|
||||||
|
small enough for 6GB VRAM
|
||||||
|
good instruction following
|
||||||
|
good short-form generation
|
||||||
|
available through Ollama
|
||||||
|
available in GGUF form
|
||||||
|
suitable for high-volume candidate generation
|
||||||
|
```
|
||||||
|
|
||||||
|
The generator task is not deep reasoning.
|
||||||
|
|
||||||
|
It is constrained phrase production.
|
||||||
|
|
||||||
|
A 3B instruct model is enough to begin.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Backup Models
|
||||||
|
|
||||||
|
### Phi-3.5-mini-instruct
|
||||||
|
|
||||||
|
Use if Qwen2.5-3B gives too much decorative prose or weak instruction following.
|
||||||
|
|
||||||
|
Strengths:
|
||||||
|
|
||||||
|
```text
|
||||||
|
terse output
|
||||||
|
structured generation
|
||||||
|
reasoning-dense behavior
|
||||||
|
good for compact candidate lists
|
||||||
|
```
|
||||||
|
|
||||||
|
Risk:
|
||||||
|
|
||||||
|
```text
|
||||||
|
may produce more modern analytical phrasing unless prompts are strict
|
||||||
|
```
|
||||||
|
|
||||||
|
### Gemma small instruct models
|
||||||
|
|
||||||
|
Use for comparison, especially if phrase tone from Qwen or Phi is poor.
|
||||||
|
|
||||||
|
Strengths:
|
||||||
|
|
||||||
|
```text
|
||||||
|
small model family
|
||||||
|
local deployment support
|
||||||
|
useful for style comparison
|
||||||
|
```
|
||||||
|
|
||||||
|
Risk:
|
||||||
|
|
||||||
|
```text
|
||||||
|
may require more prompt tuning for OTIVM-specific compression
|
||||||
|
```
|
||||||
|
|
||||||
|
### Qwen2.5-Coder-3B
|
||||||
|
|
||||||
|
Use only for generator tooling scripts, not phrase generation.
|
||||||
|
|
||||||
|
Strengths:
|
||||||
|
|
||||||
|
```text
|
||||||
|
code generation
|
||||||
|
JSONL tools
|
||||||
|
review UI helpers
|
||||||
|
validator scripts
|
||||||
|
```
|
||||||
|
|
||||||
|
Risk:
|
||||||
|
|
||||||
|
```text
|
||||||
|
not the right primary voice generator
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Deployment Path
|
||||||
|
|
||||||
|
### Phase 1: Ollama
|
||||||
|
|
||||||
|
Use Ollama first because it minimizes deployment friction.
|
||||||
|
|
||||||
|
Install and run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ollama pull qwen2.5:3b
|
||||||
|
ollama run qwen2.5:3b
|
||||||
|
```
|
||||||
|
|
||||||
|
Test with direct prompt batches.
|
||||||
|
|
||||||
|
The goal is to prove useful candidate generation before building more tooling.
|
||||||
|
|
||||||
|
### Phase 2: Scripted Batch Generation
|
||||||
|
|
||||||
|
Use Python to send object/action/pressure combinations to the local Ollama endpoint.
|
||||||
|
|
||||||
|
Input:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"object": "cart",
|
||||||
|
"action": "hired_elsewhere",
|
||||||
|
"pressure": "buyer_waiting",
|
||||||
|
"actor_voice": "Secundus",
|
||||||
|
"count": 20
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Output:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"expression_id": "expr_000001",
|
||||||
|
"object": "cart",
|
||||||
|
"action": "hired_elsewhere",
|
||||||
|
"pressure": "buyer_waiting",
|
||||||
|
"actor_voice": "Secundus",
|
||||||
|
"candidate": "The wheels are gone, and the buyer will not wait for our excuses.",
|
||||||
|
"status": "candidate"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 3: Review Interface
|
||||||
|
|
||||||
|
Build a fast human review tool.
|
||||||
|
|
||||||
|
Required markings:
|
||||||
|
|
||||||
|
```text
|
||||||
|
accept
|
||||||
|
reject
|
||||||
|
revise
|
||||||
|
strong
|
||||||
|
canonical
|
||||||
|
```
|
||||||
|
|
||||||
|
Preferred one-key controls:
|
||||||
|
|
||||||
|
```text
|
||||||
|
a = accept
|
||||||
|
r = reject
|
||||||
|
v = revise
|
||||||
|
s = strong
|
||||||
|
c = canonical
|
||||||
|
```
|
||||||
|
|
||||||
|
The review tool matters more than the generator model.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Generator Prompt Pattern
|
||||||
|
|
||||||
|
Use a strict prompt.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
```text
|
||||||
|
You generate Roman-visible commercial expressions for OTIVM.
|
||||||
|
|
||||||
|
Rules:
|
||||||
|
- Do not explain.
|
||||||
|
- Do not use modern business language.
|
||||||
|
- Do not use words like logistics, liquidity, market efficiency, regulatory, contract compliance, metadata, model, training, or optimization.
|
||||||
|
- Use concrete objects, actions, and pressures.
|
||||||
|
- Prefer terse lines.
|
||||||
|
- Produce candidate lines only.
|
||||||
|
|
||||||
|
Object: cart
|
||||||
|
Action: hired elsewhere
|
||||||
|
Pressure: buyer waiting
|
||||||
|
Actor voice: Secundus
|
||||||
|
|
||||||
|
Generate 20 candidates.
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected useful outputs:
|
||||||
|
|
||||||
|
```text
|
||||||
|
The wheels are gone.
|
||||||
|
The buyer will not wait for empty ruts.
|
||||||
|
Ten jars can still go by mule.
|
||||||
|
Naso bought the road before the oil moved.
|
||||||
|
```
|
||||||
|
|
||||||
|
Bad outputs:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Transport capacity is constrained.
|
||||||
|
The supply chain is disrupted.
|
||||||
|
We need to optimize the delivery channel.
|
||||||
|
This represents a logistical bottleneck.
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Output Rule
|
||||||
|
|
||||||
|
The generator output must never enter training directly.
|
||||||
|
|
||||||
|
All generated output begins as:
|
||||||
|
|
||||||
|
```text
|
||||||
|
status: candidate
|
||||||
|
```
|
||||||
|
|
||||||
|
Only reviewed material can become:
|
||||||
|
|
||||||
|
```text
|
||||||
|
accepted
|
||||||
|
strong
|
||||||
|
canonical
|
||||||
|
```
|
||||||
|
|
||||||
|
Training may use:
|
||||||
|
|
||||||
|
```text
|
||||||
|
accepted expressions
|
||||||
|
strong expressions
|
||||||
|
canonical expressions
|
||||||
|
human-revised expressions
|
||||||
|
dialogue lines based on reviewed expressions
|
||||||
|
```
|
||||||
|
|
||||||
|
Training must not use:
|
||||||
|
|
||||||
|
```text
|
||||||
|
raw generated candidates
|
||||||
|
rejected candidates
|
||||||
|
unreviewed batches
|
||||||
|
candidate churn
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Why Modern-Contaminated Generator Models Are Acceptable
|
||||||
|
|
||||||
|
The generator model may contain modern assumptions.
|
||||||
|
|
||||||
|
That is acceptable because it is not the final model.
|
||||||
|
|
||||||
|
The generator is not trusted.
|
||||||
|
|
||||||
|
The human review gate is trusted.
|
||||||
|
|
||||||
|
This distinction is central:
|
||||||
|
|
||||||
|
```text
|
||||||
|
generator output = candidate quarry stone
|
||||||
|
reviewed output = vocabulary material
|
||||||
|
canonical output = simulator-ready phrase
|
||||||
|
```
|
||||||
|
|
||||||
|
The generator may suggest bad phrases.
|
||||||
|
|
||||||
|
The review process prevents them from becoming corpus material.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Local Model Evaluation
|
||||||
|
|
||||||
|
Evaluate local generator models by candidate yield, not by benchmark scores.
|
||||||
|
|
||||||
|
Useful metric:
|
||||||
|
|
||||||
|
```text
|
||||||
|
accepted candidates per 100 generated lines
|
||||||
|
```
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Qwen2.5-3B:
|
||||||
|
1000 generated
|
||||||
|
130 accepted
|
||||||
|
22 strong
|
||||||
|
5 canonical
|
||||||
|
|
||||||
|
Phi-3.5-mini:
|
||||||
|
1000 generated
|
||||||
|
90 accepted
|
||||||
|
18 strong
|
||||||
|
7 canonical
|
||||||
|
|
||||||
|
Gemma small:
|
||||||
|
1000 generated
|
||||||
|
110 accepted
|
||||||
|
15 strong
|
||||||
|
4 canonical
|
||||||
|
```
|
||||||
|
|
||||||
|
The best generator is the one that gives the most reviewable Roman-visible candidates per hour.
|
||||||
|
|
||||||
|
Not the one with the highest general model score.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. Batch Generation Strategy
|
||||||
|
|
||||||
|
Generate many small batches instead of one huge batch.
|
||||||
|
|
||||||
|
Recommended:
|
||||||
|
|
||||||
|
```text
|
||||||
|
20 candidates per prompt
|
||||||
|
50 prompts per run
|
||||||
|
1000 candidates per review session
|
||||||
|
```
|
||||||
|
|
||||||
|
Vary one dimension at a time.
|
||||||
|
|
||||||
|
Example batch family:
|
||||||
|
|
||||||
|
```text
|
||||||
|
object: cart
|
||||||
|
action: hired_elsewhere
|
||||||
|
pressure: buyer_waiting
|
||||||
|
actor_voice: Secundus
|
||||||
|
|
||||||
|
object: cart
|
||||||
|
action: hired_elsewhere
|
||||||
|
pressure: buyer_waiting
|
||||||
|
actor_voice: Felix
|
||||||
|
|
||||||
|
object: cart
|
||||||
|
action: hired_elsewhere
|
||||||
|
pressure: buyer_waiting
|
||||||
|
actor_voice: Chresimus
|
||||||
|
```
|
||||||
|
|
||||||
|
This reveals actor voice differences without changing the underlying simulator condition.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 10. Temperature And Sampling
|
||||||
|
|
||||||
|
Start conservative.
|
||||||
|
|
||||||
|
Suggested settings:
|
||||||
|
|
||||||
|
```text
|
||||||
|
temperature: 0.8
|
||||||
|
top_p: 0.9
|
||||||
|
repeat_penalty: 1.1
|
||||||
|
num_predict: modest
|
||||||
|
context: modest
|
||||||
|
```
|
||||||
|
|
||||||
|
If output is too dull:
|
||||||
|
|
||||||
|
```text
|
||||||
|
raise temperature slightly
|
||||||
|
increase candidate count
|
||||||
|
add actor-specific examples
|
||||||
|
```
|
||||||
|
|
||||||
|
If output is too theatrical:
|
||||||
|
|
||||||
|
```text
|
||||||
|
lower temperature
|
||||||
|
add terse rule
|
||||||
|
add rejection examples
|
||||||
|
```
|
||||||
|
|
||||||
|
If output is too modern:
|
||||||
|
|
||||||
|
```text
|
||||||
|
strengthen forbidden terms
|
||||||
|
add Roman-visible examples
|
||||||
|
reduce abstract wording in prompt
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 11. Data Files
|
||||||
|
|
||||||
|
Recommended folder layout:
|
||||||
|
|
||||||
|
```text
|
||||||
|
data/vocabulary/
|
||||||
|
generator_inputs/
|
||||||
|
objects.yaml
|
||||||
|
actions.yaml
|
||||||
|
pressures.yaml
|
||||||
|
actor_voices.yaml
|
||||||
|
|
||||||
|
candidates/
|
||||||
|
candidates_YYYYMMDD.jsonl
|
||||||
|
|
||||||
|
reviewed/
|
||||||
|
roman_visible_expressions.jsonl
|
||||||
|
canonical_templates.jsonl
|
||||||
|
|
||||||
|
reports/
|
||||||
|
generator_yield_report.txt
|
||||||
|
review_summary.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 12. Minimum Candidate Schema
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"expression_id": "expr_000001",
|
||||||
|
"created_at": "YYYY-MM-DD",
|
||||||
|
"generator_model": "qwen2.5:3b",
|
||||||
|
"domain": "commerce",
|
||||||
|
"object": "cart",
|
||||||
|
"action": "hired_elsewhere",
|
||||||
|
"pressure": "buyer_waiting",
|
||||||
|
"actor_voice": "Secundus",
|
||||||
|
"candidate": "The wheels are gone, and the buyer will not wait for our excuses.",
|
||||||
|
"modern_meaning": "Cart capacity has been lost while the buyer is waiting.",
|
||||||
|
"concept_tags": [
|
||||||
|
"transport_capacity",
|
||||||
|
"delay_cost",
|
||||||
|
"buyer_need"
|
||||||
|
],
|
||||||
|
"status": "candidate",
|
||||||
|
"strength": null,
|
||||||
|
"review_note": null
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 13. Promotion Schema
|
||||||
|
|
||||||
|
When promoted:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"expression_id": "expr_000001",
|
||||||
|
"status": "strong",
|
||||||
|
"reviewed_by": "human",
|
||||||
|
"review_note": "Good Secundus line; concrete and reusable.",
|
||||||
|
"promoted_to": [
|
||||||
|
"roman_visible_expressions"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Canonical lines should be rare:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"expression_id": "expr_000019",
|
||||||
|
"status": "canonical",
|
||||||
|
"candidate": "The wheels are gone.",
|
||||||
|
"canonical_condition": "transport_capacity_lost"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 14. When To Move Beyond Ollama
|
||||||
|
|
||||||
|
Move from Ollama to llama.cpp or vLLM only if needed.
|
||||||
|
|
||||||
|
Reasons to move:
|
||||||
|
|
||||||
|
```text
|
||||||
|
need exact GGUF quant choice
|
||||||
|
need better batching control
|
||||||
|
need lower latency
|
||||||
|
need reproducible runtime parameters
|
||||||
|
need integration with a custom review server
|
||||||
|
```
|
||||||
|
|
||||||
|
Until then, Ollama is sufficient.
|
||||||
|
|
||||||
|
The priority is vocabulary yield, not infrastructure elegance.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 15. Near-Term Test Plan
|
||||||
|
|
||||||
|
Run a small bakeoff.
|
||||||
|
|
||||||
|
Models:
|
||||||
|
|
||||||
|
```text
|
||||||
|
qwen2.5:3b
|
||||||
|
phi3.5-mini-instruct quantized
|
||||||
|
gemma small instruct model
|
||||||
|
```
|
||||||
|
|
||||||
|
Prompts:
|
||||||
|
|
||||||
|
```text
|
||||||
|
10 object/action/pressure combinations
|
||||||
|
6 actor voices
|
||||||
|
20 candidates each
|
||||||
|
```
|
||||||
|
|
||||||
|
Total:
|
||||||
|
|
||||||
|
```text
|
||||||
|
10 * 6 * 20 = 1200 candidates per model
|
||||||
|
```
|
||||||
|
|
||||||
|
Human review outcome:
|
||||||
|
|
||||||
|
```text
|
||||||
|
accepted count
|
||||||
|
strong count
|
||||||
|
canonical count
|
||||||
|
modern contamination count
|
||||||
|
too theatrical count
|
||||||
|
duplicate count
|
||||||
|
```
|
||||||
|
|
||||||
|
Pick the generator model by accepted/strong yield per review hour.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 16. Recommendation
|
||||||
|
|
||||||
|
Begin with:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Ollama + qwen2.5:3b
|
||||||
|
```
|
||||||
|
|
||||||
|
Use it to generate candidate vocabulary only.
|
||||||
|
|
||||||
|
Do not use it as authority.
|
||||||
|
|
||||||
|
Do not train on its raw output.
|
||||||
|
|
||||||
|
Do not let it decide canonical vocabulary.
|
||||||
|
|
||||||
|
The first success condition is simple:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Can the local generator produce enough reviewable Roman-visible candidates to make human review faster than hand-authoring?
|
||||||
|
```
|
||||||
|
|
||||||
|
If yes, the deployment is successful.
|
||||||
|
|
||||||
|
If no, test Phi-3.5-mini and Gemma small models with the same input batches.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 17. Success Condition
|
||||||
|
|
||||||
|
This model-selection process is working if it produces:
|
||||||
|
|
||||||
|
```text
|
||||||
|
high candidate volume
|
||||||
|
low deployment friction
|
||||||
|
fast human review
|
||||||
|
rising accepted-expression count
|
||||||
|
a small canonical phrase library
|
||||||
|
better dialogue voice
|
||||||
|
less modern vocabulary
|
||||||
|
```
|
||||||
|
|
||||||
|
The correct measure is not model intelligence.
|
||||||
|
|
||||||
|
The correct measure is vocabulary throughput.
|
||||||
|
|
||||||
|
The generator does not need to be Roman.
|
||||||
|
|
||||||
|
The reviewed output does.
|
||||||
Reference in New Issue
Block a user