TemplateExecutor
Model execution engine
The TemplateExecutor is the engine that runs individual models. It reads model_metadata.json and orchestrates preprocessing, inference, and postprocessing.
Execution Flow
model_metadata.json
Every model bundle contains a model_metadata.json that configures execution:
{
"model_id": "whisper-tiny",
"version": "1.0",
"description": "Speech recognition model",
"execution_template": {
"type": "CandleModel",
"model_type": "WhisperTiny"
},
"preprocessing": [
{ "type": "AudioDecode", "sample_rate": 16000, "channels": 1 }
],
"postprocessing": [],
"files": ["model.safetensors", "tokenizer.json"],
"metadata": {
"task": "speech-recognition",
"language": "en"
}
}Execution Template Types
| Type | Runtime | Use Case |
|---|---|---|
SimpleMode | ONNX Runtime | Most ONNX models |
CandleModel | Candle (Rust) | Whisper, future LLMs |
Pipeline | Multi-stage | Encoder-decoder models |
Preprocessing Steps
Preprocessing transforms raw input into model-ready format.
AudioDecode
Decodes WAV audio and resamples:
{
"type": "AudioDecode",
"sample_rate": 16000,
"channels": 1
}Input: WAV bytes → Output: Float32 samples
MelSpectrogram
Converts audio to mel spectrogram (for Whisper):
{
"type": "MelSpectrogram",
"preset": "whisper"
}Input: Float32 samples → Output: Mel features [1, 80, 3000]
Phonemize
Converts text to phoneme tokens (for TTS):
{
"type": "Phonemize",
"tokens_file": "tokens.txt",
"dict_file": "cmudict.dict",
"backend": "CmuDictionary"
}Input: Text string → Output: Token IDs (i64)
Backends:
CmuDictionary- Pure Rust, built-in CMU dictionaryEspeakNG- External espeak-ng (better quality)
Tokenize
Tokenizes text for NLP models:
{
"type": "Tokenize",
"vocab_file": "vocab.json"
}Input: Text string → Output: Token IDs
Postprocessing Steps
Postprocessing transforms model output into usable format.
CTCDecode
Decodes CTC logits to text (for ASR models like Wav2Vec2):
{
"type": "CTCDecode",
"vocab_file": "vocab.json",
"blank_index": 0
}Input: Logits tensor → Output: Text string
WhisperDecode
Decodes Whisper token IDs to text:
{
"type": "WhisperDecode",
"tokenizer_file": "tokenizer.json"
}Input: Token IDs → Output: Text string
TTSAudioEncode
Encodes waveform to WAV bytes:
{
"type": "TTSAudioEncode",
"sample_rate": 24000,
"apply_postprocessing": true
}Input: Float32 waveform → Output: WAV bytes
Runtime Adapters
ONNX Runtime
Primary runtime for ONNX models. Supports execution providers:
- CPU - Universal fallback
- CoreML - Apple devices (NPU acceleration)
- CUDA - NVIDIA GPUs
Selected via SimpleMode:
{
"execution_template": {
"type": "SimpleMode",
"model_file": "model.onnx"
}
}Candle
Pure Rust inference for specific models:
{
"execution_template": {
"type": "CandleModel",
"model_type": "WhisperTiny"
}
}Currently supports:
WhisperTiny- Speech recognition
Device selection:
- CPU - Default
- Metal - macOS/iOS
- CUDA - NVIDIA GPUs
Related
- Orchestrator - Calls TemplateExecutor for each stage
- Bundles - Package models with metadata