Loading...
Loading...
Monitor the health of every infrastructure component. Track response times, manage incidents, and view uptime history for your DRD deployment.
PostgreSQL connection health, query latency, and connection pool status.
REST and tRPC endpoint response times, error rates, and throughput.
Job queue depth, processing rate, and worker health.
Cache hit rates, eviction rates, and memory utilization.
File storage availability, upload/download latency, and capacity.
Third-party service connectivity and webhook delivery health.
| Status | Description | Action |
|---|---|---|
| operational | Component is functioning normally within SLA parameters | None |
| degraded | Component experiencing higher latency or partial errors but still functional | Investigate |
| partial_outage | Component partially unavailable; some requests failing | Alert team |
| major_outage | Component fully unavailable; all requests failing | Incident created |
| maintenance | Component undergoing scheduled maintenance | Check ETA |
DRD automatically creates incidents when component status changes. Each incident has a lifecycle from detection through resolution with full timeline.
Monitoring detected an anomaly. Incident created and on-call team notified within 60 seconds.
Engineering team is actively investigating. Status page updated. Affected customers notified.
Root cause identified. Fix in progress. Estimated time to resolution published.
Fix deployed and being monitored. Component back to operational but under close watch.
Incident fully resolved. Post-mortem scheduled. Customer notification sent with summary.
DRD uses anomaly detection across 50+ metrics to automatically detect incidents. Mean time to detect (MTTD) is under 30 seconds for major outages and under 5 minutes for degraded performance.
import { DRD } from '@drd/sdk';
const drd = new DRD({ apiKey: process.env.DRD_API_KEY });
// Check current health
const health = await drd.systemHealth.getCurrent();
health.forEach(check => {
console.log(check.component, check.status, check.responseTimeMs);
});
// Create an incident
await drd.systemHealth.createIncident({
title: 'Elevated API latency',
severity: 'medium',
description: 'p99 latency > 500ms on /api/guard endpoint',
affectedComponents: ['api', 'cache'],
});curl https://api.drd.io/v1/health
# Response (no auth required)
{
"ok": true,
"data": {
"status": "operational",
"version": "2026.02.14",
"components": {
"api_gateway": { "status": "operational", "latencyMs": 12 },
"trust_engine": { "status": "operational", "latencyMs": 8 },
"policy_engine": { "status": "operational", "latencyMs": 5 },
"event_store": { "status": "operational", "latencyMs": 3 },
"content_protection": { "status": "operational", "latencyMs": 15 },
"webhook_delivery": { "status": "operational", "latencyMs": 22 }
},
"uptime": {
"last24h": 100.0,
"last7d": 99.99,
"last30d": 99.98
},
"checkedAt": "2026-02-14T12:00:00Z"
}
}curl "https://api.drd.io/v1/health/metrics?window=24h&resolution=hourly" \
-H "Authorization: Bearer drd_ws_sk_live_Abc123..."
# Response
{
"ok": true,
"data": {
"window": "24h",
"resolution": "hourly",
"metrics": {
"requestsTotal": 8542100,
"errorRate": 0.002,
"p50LatencyMs": 14,
"p95LatencyMs": 45,
"p99LatencyMs": 120,
"points": [
{ "time": "2026-02-13T12:00:00Z", "requests": 356000, "p50": 13, "p99": 110 },
{ "time": "2026-02-13T13:00:00Z", "requests": 362000, "p50": 14, "p99": 115 }
]
}
}
}curl "https://api.drd.io/v1/health/incidents?status=resolved&limit=5" \
-H "Authorization: Bearer drd_ws_sk_live_Abc123..."
# Response
{
"ok": true,
"data": [
{
"id": "inc_01JM7XBN4RTYP",
"title": "Elevated API Gateway Latency",
"status": "resolved",
"severity": "minor",
"affectedComponents": ["api_gateway"],
"detectedAt": "2026-02-10T14:22:00Z",
"resolvedAt": "2026-02-10T14:45:00Z",
"durationMinutes": 23,
"timeline": [
{ "status": "detected", "at": "2026-02-10T14:22:00Z", "message": "p99 latency exceeded 500ms" },
{ "status": "investigating", "at": "2026-02-10T14:25:00Z", "message": "Team investigating" },
{ "status": "resolved", "at": "2026-02-10T14:45:00Z", "message": "Cache layer restored" }
]
}
]
}import { DRDClient } from "@drd-io/sdk";
const drd = new DRDClient({
apiKey: process.env.DRD_API_KEY!,
workspace: process.env.DRD_WORKSPACE!,
});
// Quick health check
const health = await drd.health.check();
console.log(`Status: ${health.status}`);
console.log(`Uptime (30d): ${health.uptime.last30d}%`);
// Check individual components
for (const [name, component] of Object.entries(health.components)) {
console.log(`${name}: ${component.status} (${component.latencyMs}ms)`);
}
// Get performance metrics
const metrics = await drd.health.metrics({ window: "24h", resolution: "hourly" });
console.log(`Total requests (24h): ${metrics.requestsTotal}`);
console.log(`Error rate: ${(metrics.errorRate * 100).toFixed(2)}%`);
console.log(`p99 latency: ${metrics.p99LatencyMs}ms`);
// Subscribe to incident updates (webhook)
await drd.health.subscribe({
events: ["incident.created", "incident.updated", "incident.resolved"],
url: "https://your-app.com/webhooks/drd-health",
});