ai-trainer
AI model training and validation for Kodachi OS command intelligence
Version: 9.0.1 | Size: 2.8MB | Author: Warith Al Maawali warith@digi77.com
License: LicenseRef-Kodachi-SAN-1.0 | Website: https://www.digi77.com
File Information
| Property | Value |
|---|---|
| Binary Name | ai-trainer |
| Version | 9.0.1 |
| Build Date | 2026-02-26T08:01:51.744791725Z |
| Rust Version | 1.82.0 |
| File Size | 2.8MB |
| JSON Data | View Raw JSON |
SHA256 Checksum
Features
| Feature | Description |
|---|---|
| Feature | TF-IDF based command embeddings |
| Feature | Incremental model updates |
| Feature | Model validation and accuracy testing |
Security Features
| Feature | Description |
|---|---|
| Inputvalidation | All inputs are validated and sanitized |
| Ratelimiting | Built-in rate limiting for network operations |
| Authentication | Secure authentication with certificate pinning |
| Encryption | TLS 1.3 for all network communications |
System Requirements
| Requirement | Value |
|---|---|
| OS | Linux (Debian-based) |
| Privileges | root/sudo for system operations |
| Dependencies | OpenSSL, libcurl |
Global Options
| Flag | Description |
|---|---|
-h, --help |
Print help information |
-v, --version |
Print version information |
-n, --info |
Display detailed information |
-e, --examples |
Show usage examples |
--json |
Output in JSON format |
--json-pretty |
Pretty-print JSON output with indentation |
--json-human |
Enhanced JSON output with improved formatting (like jq) |
--verbose |
Enable verbose output |
--quiet |
Suppress non-essential output |
--no-color |
Disable colored output |
--config <FILE> |
Use custom configuration file |
--timeout <SECS> |
Set timeout (default: 30) |
--retry <COUNT> |
Retry attempts (default: 3) |
Commands
Model Management
export
Export model embeddings and metadata to JSON file
Usage:
Examples:
snapshot
Save current model as versioned snapshot
Usage:
Examples:
list-snapshots
List all saved model snapshots
Usage:
Examples:
status
Display current model status and statistics
Usage:
Examples:
download-model
Download ONNX model, tokenizer, or GGUF model for AI engine tiers
Usage:
ai-trainer download-model [--llm [default|small|large]] [--show-models] [--all] [--output-dir <DIR>] [--force]
Examples:
Model Training
train
Train AI model from command metadata (full retraining)
Usage:
Examples:
incremental
Update model incrementally with new command data
Usage:
Examples:
Validation & Testing
validate
Validate model accuracy against test dataset
Usage:
Examples:
Operational Scenarios
Scenario-oriented workflows generated from the binary's built-in -e --json examples.
Scenario 1: Model Training
Full model training operations
Step 1: Train model with command data
Expected Output: Training statistics and embeddings metricsNote
Creates new model from scratch
Step 2: Train with custom database
Expected Output: Training results with custom DB locationNote
Allows custom database path specification
Step 3: Train and output results as JSON
Expected Output: JSON-formatted training metricsNote
Structured output for automation
Scenario 2: Incremental Training
Update existing models with new data
Step 1: Incrementally train with new data
Expected Output: New embeddings added to existing modelNote
Requires existing trained model
Step 2: Incremental training with custom DB and JSON output
Expected Output: JSON-formatted incremental training resultsNote
Combines custom DB path with structured output
Scenario 3: Validation
Model accuracy testing and validation
Step 1: Validate model with test data
Expected Output: Validation results with accuracy metricsNote
Tests model against known test cases
Step 2: Validate with custom accuracy threshold
Expected Output: Pass/fail validation with 90% thresholdNote
Default threshold is 0.85
Step 3: Validate with custom DB and JSON output
Expected Output: JSON-formatted validation metricsNote
Structured validation results
Step 4: Validate with all parameters combined
Expected Output: JSON validation with custom test data, 90% threshold, and custom DBNote
Full parameter example for CI/CD pipelines
Scenario 4: Model Export
Export trained models and statistics
Step 1: Export trained model
Expected Output: Complete model export with embeddingsNote
Default format includes all embeddings
Step 2: Export in compact format
Expected Output: Compact model export without full embeddingsNote
Reduces export file size
Step 3: Export statistics as JSON
Expected Output: Model statistics without embeddingsNote
Lightweight statistics export
Step 4: Full export with JSON envelope output
Expected Output: Complete model export with JSON status envelopeNote
Combines full embeddings export with structured output
Scenario 5: Snapshots
Model versioning and snapshot management
Step 1: Create model snapshot with version
Expected Output: Versioned snapshot created successfullyNote
Preserves model state at specific version
Step 2: List all model snapshots
Expected Output: List of saved model versionsNote
Shows snapshot metadata and versions
Step 3: List snapshots as JSON
Expected Output: JSON-formatted snapshot listingNote
Structured snapshot information
Step 4: Create snapshot with JSON output
Expected Output: JSON with snapshot name, version, and embedding countNote
Structured output for automation
Scenario 6: Model Download
Download ONNX and GGUF model files for AI engine tiers
Step 1: Download ONNX embeddings model to default models/ directory
Expected Output: Model files downloaded successfullyNote
Downloads all-MiniLM-L6-v2 ONNX model and tokenizer
Step 2: Download default GGUF model (Qwen2.5-3B-Instruct Q4_K_M, ~1.8GB)
Expected Output: GGUF model downloaded to models/ directoryNote
Best balance of quality, speed, and size for CPU inference
Step 3: Download small GGUF model (Qwen2.5-1.5B, ~0.9GB)
Expected Output: Small GGUF model downloadedNote
For systems with <4GB available RAM
Step 4: Download large GGUF model (Phi-3.5-mini, ~2.3GB)
Expected Output: Large GGUF model downloadedNote
Better reasoning, 128K trained context
Step 5: Download both ONNX embeddings and default GGUF model
Expected Output: All model files downloadedNote
Complete setup for all AI tiers
Step 6: List downloaded and available models
Expected Output: Model inventory with sizes and statusNote
Shows what's installed and what can be downloaded
Step 7: Model inventory as JSON
Expected Output: JSON with downloaded and available model detailsStep 8: Force re-download of ONNX model
Expected Output: Model files re-downloadedNote
Overwrites existing files
Scenario 7: Status
Model status and health checks
Step 1: Show training status
Expected Output: Current model status and statisticsNote
Displays model readiness and metrics
Step 2: Show training status as JSON
Expected Output: JSON-formatted status informationNote
Structured status output for automation
Scenario 8: AI Tier Integration
Training operations related to the 6-tier AI engine (TF-IDF, ONNX, Mistral.rs, GenAI/Ollama, Legacy LLM, Claude)
Step 1: Validate model against all tier responses
Expected Output: Validation results covering all active AI tiersNote
Tests model accuracy across available tiers
Step 2: Train model with feedback from all tiers
Expected Output: Training metrics including multi-tier feedback dataNote
Includes feedback from mistral.rs and GenAI tier executions
Scenario 9: ONNX Intent Classifier
Evaluate the ONNX intent classifier used for fast-path routing (12 categories, <5ms inference)
Step 1: Evaluate ONNX intent classifier accuracy
Expected Output: JSON with per-intent precision, recall, and F1-scoreNote
Target: 95%+ accuracy on held-out test set
Step 2: Check if intent classifier model is downloaded
Expected Output: JSON showing classifier model statusNote
Model: kodachi-intent-classifier.onnx (~65MB)
Command Examples (Raw)
Model Training
Full model training operations
Train model with command data
Expected Output: Training statistics and embeddings metricsNote
Creates new model from scratch
Train with custom database
Expected Output: Training results with custom DB locationNote
Allows custom database path specification
Train and output results as JSON
Expected Output: JSON-formatted training metricsNote
Structured output for automation
Incremental Training
Update existing models with new data
Incrementally train with new data
Expected Output: New embeddings added to existing modelNote
Requires existing trained model
Incremental training with custom DB and JSON output
Expected Output: JSON-formatted incremental training resultsNote
Combines custom DB path with structured output
Validation
Model accuracy testing and validation
Validate model with test data
Expected Output: Validation results with accuracy metricsNote
Tests model against known test cases
Validate with custom accuracy threshold
Expected Output: Pass/fail validation with 90% thresholdNote
Default threshold is 0.85
Validate with custom DB and JSON output
Expected Output: JSON-formatted validation metricsNote
Structured validation results
Validate with all parameters combined
Expected Output: JSON validation with custom test data, 90% threshold, and custom DBNote
Full parameter example for CI/CD pipelines
Model Export
Export trained models and statistics
Export trained model
Expected Output: Complete model export with embeddingsNote
Default format includes all embeddings
Export in compact format
Expected Output: Compact model export without full embeddingsNote
Reduces export file size
Export statistics as JSON
Expected Output: Model statistics without embeddingsNote
Lightweight statistics export
Full export with JSON envelope output
Expected Output: Complete model export with JSON status envelopeNote
Combines full embeddings export with structured output
Snapshots
Model versioning and snapshot management
Create model snapshot with version
Expected Output: Versioned snapshot created successfullyNote
Preserves model state at specific version
List all model snapshots
Expected Output: List of saved model versionsNote
Shows snapshot metadata and versions
List snapshots as JSON
Expected Output: JSON-formatted snapshot listingNote
Structured snapshot information
Create snapshot with JSON output
Expected Output: JSON with snapshot name, version, and embedding countNote
Structured output for automation
Model Download
Download ONNX and GGUF model files for AI engine tiers
Download ONNX embeddings model to default models/ directory
Expected Output: Model files downloaded successfullyNote
Downloads all-MiniLM-L6-v2 ONNX model and tokenizer
Download default GGUF model (Qwen2.5-3B-Instruct Q4_K_M, ~1.8GB)
Expected Output: GGUF model downloaded to models/ directoryNote
Best balance of quality, speed, and size for CPU inference
Download small GGUF model (Qwen2.5-1.5B, ~0.9GB)
Expected Output: Small GGUF model downloadedNote
For systems with <4GB available RAM
Download large GGUF model (Phi-3.5-mini, ~2.3GB)
Expected Output: Large GGUF model downloadedNote
Better reasoning, 128K trained context
Download both ONNX embeddings and default GGUF model
Expected Output: All model files downloadedNote
Complete setup for all AI tiers
List downloaded and available models
Expected Output: Model inventory with sizes and statusNote
Shows what's installed and what can be downloaded
Model inventory as JSON
Expected Output: JSON with downloaded and available model detailsForce re-download of ONNX model
Expected Output: Model files re-downloadedNote
Overwrites existing files
Status
Model status and health checks
Show training status
Expected Output: Current model status and statisticsNote
Displays model readiness and metrics
Show training status as JSON
Expected Output: JSON-formatted status informationNote
Structured status output for automation
AI Tier Integration
Training operations related to the 6-tier AI engine (TF-IDF, ONNX, Mistral.rs, GenAI/Ollama, Legacy LLM, Claude)
Validate model against all tier responses
Expected Output: Validation results covering all active AI tiersNote
Tests model accuracy across available tiers
Train model with feedback from all tiers
Expected Output: Training metrics including multi-tier feedback dataNote
Includes feedback from mistral.rs and GenAI tier executions
ONNX Intent Classifier
Evaluate the ONNX intent classifier used for fast-path routing (12 categories, <5ms inference)
Evaluate ONNX intent classifier accuracy
Expected Output: JSON with per-intent precision, recall, and F1-scoreNote
Target: 95%+ accuracy on held-out test set
Check if intent classifier model is downloaded
Expected Output: JSON showing classifier model statusNote
Model: kodachi-intent-classifier.onnx (~65MB)
Environment Variables
| Variable | Description | Default | Values |
|---|---|---|---|
RUST_LOG |
Set logging level | info | error |
NO_COLOR |
Disable all colored output when set | unset | 1 |
Exit Codes
| Code | Description |
|---|---|
| 3 | Permission denied |
| 0 | Success |
| 5 | File not found |
| 4 | Network error |
| 1 | General error |
| 2 | Invalid arguments |