Internals
How Xybrid works under the hood
This section explains Xybrid's internal architecture for researchers, contributors, and developers who want to understand how the system works.
Overview
Xybrid is a hybrid cloud-edge ML inference orchestrator. It routes ML workloads between on-device execution and cloud APIs based on model availability, device capabilities, and user policies.
Core Architecture
The system is built around these key components:
| Component | Purpose |
|---|---|
| Envelope | Universal data container for audio, text, embeddings |
| Orchestrator | Routes requests to appropriate execution targets |
| Pipeline | Chains multiple stages (ASR → LLM → TTS) |
| Executor | Runs models via ONNX or Candle runtime |
| StreamSession | Real-time streaming inference |
Data Flow
- Input arrives as an
Envelope(audio bytes, text, or embeddings) - Orchestrator determines execution target (device, cloud, or fallback)
- Executor runs the model and produces output
- Output is wrapped in a new
Envelopefor the next stage
See Data Flow & Execution for details.
Infrastructure
| Component | Purpose |
|---|---|
| Registry | Bundle distribution server |
| Bundles | Packaged models (.xyb format) |
| Pipeline DSL | YAML pipeline definitions |
| Streaming | Real-time inference |