What is Xybrid

Xybrid is a hybrid AI orchestration runtime that lets developers run inference anywhere—device, edge, or cloud—without rewriting code.

The Problem

AI inference is fragmenting across compute surfaces:

Mobile NPUs now rival cloud GPUs in capability
Privacy regulations demand localized processing
Cloud costs escalate with scale
Latency requirements push computation to the edge

Yet tooling remains siloed. Developers must manually handle device detection, model routing, format conversion, and fallback logic.

The Solution

Xybrid provides a unified control plane:

name: voice-assistant
stages:
  - whisper-tiny@1.0      # ASR: runs on device (privacy)
  - target: integration
    provider: openai
    model: gpt-4o-mini    # LLM: routes to cloud (capability)
  - kokoro-82m@0.1        # TTS: runs on device (latency)

Three models, two execution locations, one pipeline. Xybrid handles the routing.

Core Principles

Principle	Description
Privacy-first	Policies govern what data may leave the device
Adaptive compute	Routing adapts to latency, RTT, and power budgets
Real-time ready	Streaming pipelines emit incremental results
Extensible	`.xyb` bundles enable adding models without rewrites

Next Steps

Quick Start

Get up and running in minutes

Integrations

SDKs and APIs for your platform

Internals

How Xybrid works under the hood