What is Xybrid
Hybrid AI orchestration that runs inference anywhere
Xybrid is a hybrid AI orchestration runtime that lets developers run inference anywhere—device, edge, or cloud—without rewriting code.
The Problem
AI inference is fragmenting across compute surfaces:
- Mobile NPUs now rival cloud GPUs in capability
- Privacy regulations demand localized processing
- Cloud costs escalate with scale
- Latency requirements push computation to the edge
Yet tooling remains siloed. Developers must manually handle device detection, model routing, format conversion, and fallback logic.
The Solution
Xybrid provides a unified control plane:
name: voice-assistant
stages:
- whisper-tiny@1.0 # ASR: runs on device (privacy)
- target: integration
provider: openai
model: gpt-4o-mini # LLM: routes to cloud (capability)
- kokoro-82m@0.1 # TTS: runs on device (latency)Three models, two execution locations, one pipeline. Xybrid handles the routing.
Core Principles
| Principle | Description |
|---|---|
| Privacy-first | Policies govern what data may leave the device |
| Adaptive compute | Routing adapts to latency, RTT, and power budgets |
| Real-time ready | Streaming pipelines emit incremental results |
| Extensible | .xyb bundles enable adding models without rewrites |