Xybrid

TemplateExecutor

Model execution engine

The TemplateExecutor is the engine that runs individual models. It reads model_metadata.json and orchestrates preprocessing, inference, and postprocessing.

Execution Flow

model_metadata.json

Every model bundle contains a model_metadata.json that configures execution:

{
  "model_id": "whisper-tiny",
  "version": "1.0",
  "description": "Speech recognition model",

  "execution_template": {
    "type": "CandleModel",
    "model_type": "WhisperTiny"
  },

  "preprocessing": [
    { "type": "AudioDecode", "sample_rate": 16000, "channels": 1 }
  ],

  "postprocessing": [],

  "files": ["model.safetensors", "tokenizer.json"],

  "metadata": {
    "task": "speech-recognition",
    "language": "en"
  }
}

Execution Template Types

TypeRuntimeUse Case
SimpleModeONNX RuntimeMost ONNX models
CandleModelCandle (Rust)Whisper, future LLMs
PipelineMulti-stageEncoder-decoder models

Preprocessing Steps

Preprocessing transforms raw input into model-ready format.

AudioDecode

Decodes WAV audio and resamples:

{
  "type": "AudioDecode",
  "sample_rate": 16000,
  "channels": 1
}

Input: WAV bytes → Output: Float32 samples

MelSpectrogram

Converts audio to mel spectrogram (for Whisper):

{
  "type": "MelSpectrogram",
  "preset": "whisper"
}

Input: Float32 samples → Output: Mel features [1, 80, 3000]

Phonemize

Converts text to phoneme tokens (for TTS):

{
  "type": "Phonemize",
  "tokens_file": "tokens.txt",
  "dict_file": "cmudict.dict",
  "backend": "CmuDictionary"
}

Input: Text string → Output: Token IDs (i64)

Backends:

  • CmuDictionary - Pure Rust, built-in CMU dictionary
  • EspeakNG - External espeak-ng (better quality)

Tokenize

Tokenizes text for NLP models:

{
  "type": "Tokenize",
  "vocab_file": "vocab.json"
}

Input: Text string → Output: Token IDs

Postprocessing Steps

Postprocessing transforms model output into usable format.

CTCDecode

Decodes CTC logits to text (for ASR models like Wav2Vec2):

{
  "type": "CTCDecode",
  "vocab_file": "vocab.json",
  "blank_index": 0
}

Input: Logits tensor → Output: Text string

WhisperDecode

Decodes Whisper token IDs to text:

{
  "type": "WhisperDecode",
  "tokenizer_file": "tokenizer.json"
}

Input: Token IDs → Output: Text string

TTSAudioEncode

Encodes waveform to WAV bytes:

{
  "type": "TTSAudioEncode",
  "sample_rate": 24000,
  "apply_postprocessing": true
}

Input: Float32 waveform → Output: WAV bytes

Runtime Adapters

ONNX Runtime

Primary runtime for ONNX models. Supports execution providers:

  • CPU - Universal fallback
  • CoreML - Apple devices (NPU acceleration)
  • CUDA - NVIDIA GPUs

Selected via SimpleMode:

{
  "execution_template": {
    "type": "SimpleMode",
    "model_file": "model.onnx"
  }
}

Candle

Pure Rust inference for specific models:

{
  "execution_template": {
    "type": "CandleModel",
    "model_type": "WhisperTiny"
  }
}

Currently supports:

  • WhisperTiny - Speech recognition

Device selection:

  • CPU - Default
  • Metal - macOS/iOS
  • CUDA - NVIDIA GPUs

On this page