Observability
Telemetry, tracing, and device intelligence
Monitor and debug Xybrid pipelines with telemetry, device metrics, and tracing.
Overview
Xybrid collects telemetry for two purposes:
- Runtime Optimization - Device capabilities inform routing decisions
- Usage Analytics - Metrics for debugging and visualization
Device Intelligence
Hardware Capabilities
The system detects hardware capabilities to optimize routing:
pub struct HardwareCapabilities {
// Accelerator availability
pub has_gpu: bool,
pub has_metal: bool, // iOS/macOS
pub has_nnapi: bool, // Android
pub has_coreml: bool, // iOS/macOS Neural Engine
// Resource metrics
pub memory_available_mb: u64,
pub memory_total_mb: u64,
pub battery_level: u8, // 0-100
pub thermal_state: ThermalState,
// Platform info
pub platform: Platform,
pub gpu_type: Option<GpuType>,
pub npu_type: Option<NpuType>,
}Thermal States
| State | Temperature | Action |
|---|---|---|
Normal | < 60°C | Full performance |
Warm | 60-70°C | May throttle |
Hot | 70-80°C | Reduce workload |
Critical | > 80°C | Pause heavy operations |
Decision Methods
// Should workload be reduced?
capabilities.should_throttle()
// → true if battery < 20% OR thermal Hot/Critical
// Which accelerator to prefer?
capabilities.should_prefer_metal() // macOS/iOS
capabilities.should_prefer_nnapi() // Android
capabilities.should_prefer_gpu() // General GPU computeFlutter Usage
final caps = await xybrid.getDeviceCapabilities();
print('GPU: ${caps.hasGpu}, Metal: ${caps.hasMetal}');
print('Battery: ${caps.batteryLevel}%, Thermal: ${caps.thermalState}');
if (caps.shouldThrottle) {
// Use cloud inference or lower-quality models
}Telemetry Events
Event Types
Events flow from the orchestrator through the event bus:
| Event | Description |
|---|---|
PipelineStart | Pipeline execution begins |
PipelineComplete | Pipeline finished |
StageStart | Stage begins processing |
StageComplete | Stage finished |
StageError | Stage failed |
RoutingDecided | Target selected |
ExecutionStarted | Inference begins |
ExecutionCompleted | Inference finished |
PolicyEvaluated | Policy check result |
Event Structure
pub struct TelemetryEvent {
pub event_type: String,
pub stage_name: Option<String>,
pub target: Option<String>, // "local", "cloud", "fallback"
pub latency_ms: Option<u32>,
pub error: Option<String>,
pub data: Option<String>, // Additional JSON
pub timestamp_ms: u64,
}Subscribing to Events
final stream = subscribeTelemetryEvents();
stream.listen((event) {
print('[${event.eventType}] ${event.stageName ?? ""} '
'${event.target ?? ""} ${event.latencyMs ?? ""}ms');
});Session Metrics
Session-based aggregation for analytics:
pub struct SessionMetrics {
pub session_id: String,
pub device_id: String,
pub started_at: u64,
pub ended_at: Option<u64>,
// Aggregates
pub total_inferences: u64,
pub total_latency_ms: u64,
pub models_used: Vec<String>,
pub error_count: u64,
// Device snapshot
pub hardware_capabilities: HardwareCapabilities,
}Per-Model Metrics
pub struct ApiCallMetric {
pub model_id: String,
pub version: String,
pub call_count: u64,
pub total_latency_ms: u64,
pub avg_latency_ms: u64,
pub error_count: u64,
pub last_called: u64,
}CLI Tracing
View Traces
# Latest session
xybrid trace --latest
# Specific session
xybrid trace --session abc123
# Export to file
xybrid trace --latest --export trace.jsonTrace Storage
Traces are stored at ~/.xybrid/traces/:
~/.xybrid/traces/
├── abc123.log
├── def456.log
└── ...Export Format
Session data exports as JSON:
{
"version": "2.0",
"session": {
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"device_id": "device-abc123",
"platform": "macos",
"started_at": "2024-12-12T12:00:00Z"
},
"hardware": {
"has_gpu": true,
"gpu_type": "metal",
"has_npu": true,
"memory_total_mb": 16384,
"battery_level": 85,
"thermal_state": "normal"
},
"metrics": {
"total_inferences": 42,
"total_latency_ms": 12500,
"avg_latency_ms": 297,
"error_count": 0,
"by_model": [
{
"model_id": "wav2vec2-base-960h",
"call_count": 42,
"avg_latency_ms": 297
}
]
}
}Flutter SDK APIs
Device Capabilities
// Get current device capabilities
HardwareCapabilities getDeviceCapabilities();
// Check if a model can run locally
bool canRunModelLocally({required String modelId, String? version});Session Metrics
// Get current session metrics
SessionMetrics getSessionMetrics();
// Export telemetry as JSON
String exportTelemetryJson();
// Reset session
void resetSession();Telemetry Stream
// Subscribe to real-time events
Stream<TelemetryEvent> subscribeTelemetryEvents();Configuration
CLI Configuration
In ~/.config/xybrid/config.yml:
telemetry:
enabled: true
endpoint: "http://localhost:4318"
sampling_rate: 0.05
privacy_mode: "summary_only"Rust Configuration
let mut telemetry = Telemetry::new();
telemetry.set_enabled(false); // Disable
// Or create disabled
let telemetry = Telemetry::with_enabled(false);Best Practices
Initialize Early
void main() async {
WidgetsFlutterBinding.ensureInitialized();
await RustLib.init();
initTelemetryStream();
runApp(MyApp());
}Subscribe Once
late StreamSubscription<TelemetryEvent> _subscription;
@override
void initState() {
super.initState();
_subscription = subscribeTelemetryEvents().listen(_onEvent);
}
@override
void dispose() {
_subscription.cancel();
super.dispose();
}Export Before Session End
final json = xybrid.exportTelemetryJson();
await uploadToBackend(json);Check Capabilities Before Inference
final caps = xybrid.getDeviceCapabilities();
if (caps.shouldThrottle) {
return useCloudFallback();
}Error Categories
Errors are categorized for debugging:
| Category | Description |
|---|---|
ModelLoading | Bundle/model file issues |
Preprocessing | Input format/conversion |
Inference | Runtime execution |
Postprocessing | Output format |
Network | Registry/cloud connectivity |
Hardware | GPU/NPU initialization |
Memory | OOM conditions |
Platform Support
| Platform | GPU | NPU | Battery | Thermal |
|---|---|---|---|---|
| macOS | Metal | CoreML | pmset | - |
| iOS | Metal | CoreML | UIDevice | ProcessInfo |
| Android | Vulkan | NNAPI | BatteryManager | JNI |
| Linux | Vulkan | - | /sys/class | /sys/class |