Loading...
Loading...
Create monitoring rules, configure multi-channel alerts, and activate kill switches to maintain full oversight of your AI agents in production.
Rules define the conditions that trigger alerts or automated actions. Each rule consists of a condition, severity level, and one or more response actions.
Trigger when a metric exceeds or drops below a defined value.
Example: DRD Score drops below 50
Detect unusual patterns using baseline deviation analysis.
Example: API call volume 3x above normal
Match specific sequences of events or behaviors.
Example: Repeated failed auth attempts
Monitor adherence to regulatory or policy requirements.
Example: GDPR data retention violation
{
"name": "Low Score Alert",
"type": "threshold",
"condition": {
"metric": "drd_score",
"operator": "less_than",
"value": 50
},
"severity": "critical",
"actions": [
{ "type": "alert", "channels": ["webhook", "slack"] },
{ "type": "kill_switch", "delay": 300 }
],
"agentIds": ["agent_abc123", "agent_def456"],
"enabled": true
}Alerts are sent through one or more channels when a monitoring rule triggers. Configure severity-based routing to ensure the right people are notified.
Webhook
HTTP POST to your endpoint with signed payload
Notifications to team members or distribution lists
Slack
Direct messages or channel notifications
PagerDuty
Incident creation for critical alerts
Dashboard
In-app notifications with full context
import { DRD } from '@drd/sdk';
const drd = new DRD({ apiKey: process.env.DRD_API_KEY });
// Configure alert routing
await drd.monitoring.configureAlerts({
routes: [
{
severity: 'critical',
channels: ['pagerduty', 'slack', 'webhook'],
escalation: { after: '5m', to: 'email' },
},
{
severity: 'warning',
channels: ['slack', 'dashboard'],
},
{
severity: 'info',
channels: ['dashboard'],
},
],
});The kill switch immediately suspends an agent's ability to perform actions. It can be activated manually from the dashboard, via API, or automatically through monitoring rules.
Immediate Suspension
All pending actions are cancelled and new actions are blocked.
// Activate kill switch immediately
await drd.monitoring.killSwitch('agent_abc123', {
reason: 'Anomalous behavior detected',
duration: '1h', // Auto-reactivate after 1 hour (optional)
notifyOwner: true,
});
// Deactivate kill switch
await drd.monitoring.reactivate('agent_abc123', {
reason: 'Investigation complete, behavior normal',
approvedBy: 'operator_xyz',
});// Request
{
"action": "activate",
"reason": "Policy violation threshold exceeded",
"duration": 3600,
"notifyOwner": true
}
// Response
{
"agentId": "agent_abc123",
"status": "suspended",
"activatedAt": "2026-02-12T10:30:00Z",
"expiresAt": "2026-02-12T11:30:00Z",
"activatedBy": "rule:low_score_alert"
}When an alert fires, DRD sends an HMAC-signed webhook to your configured endpoints.
{
"event": "monitoring.alert.fired",
"timestamp": "2026-02-12T10:30:00Z",
"data": {
"ruleId": "rule_abc123",
"ruleName": "Low Score Alert",
"severity": "critical",
"agentId": "agent_abc123",
"condition": {
"metric": "drd_score",
"operator": "less_than",
"threshold": 50,
"actualValue": 42
},
"actionsExecuted": [
{ "type": "alert", "channels": ["webhook", "slack"] },
{ "type": "kill_switch", "scheduledIn": 300 }
]
}
}The monitoring dashboard provides a real-time view of all your agents, active rules, and alert history.
View all registered agents with real-time score updates, activity indicators, and kill switch status.
Create, edit, enable, and disable monitoring rules. View rule trigger history and adjust thresholds.
Browse past alerts with full context, including the triggering event, actions taken, and resolution status.
DRD exports traces, metrics, and logs using the OpenTelemetry Protocol (OTLP). Connect to any OTLP-compatible backend -- Datadog, Grafana, Honeycomb, New Relic, or Jaeger.
{
"type": "opentelemetry",
"config": {
"endpoint": "https://otel-collector.acme.com:4317",
"protocol": "grpc",
"headers": {
"x-api-key": "your-otel-api-key"
},
"exporters": {
"traces": true,
"metrics": true,
"logs": true
},
"samplingRate": 0.1
}
}Every registered agent sends periodic heartbeats to DRD. Missed heartbeats trigger alerts and affect the agent's trust score. The heartbeat interval is configurable per agent.
import { DRD } from '@drd/sdk';
const drd = new DRD({ apiKey: 'drd_live_sk_...' });
// Heartbeat is sent automatically every 60 seconds
// You can also send manual heartbeats:
await drd.heartbeat({
agentId: '01956abc-...',
status: 'healthy',
metadata: {
cpuUsage: 0.42,
memoryMb: 512,
activeConnections: 23,
lastActionAt: new Date().toISOString(),
},
});| Missed Heartbeats | Action | Trust Impact |
|---|---|---|
| 1 - 2 | Warning logged | -1 per miss |
| 3 - 5 | Alert sent to owner | -3 per miss |
| 6 - 10 | Agent flagged as unreliable | -5 per miss |
| 10+ | Agent auto-suspended | Score frozen at current |
DRD continuously monitors agent behavior patterns and flags anomalies. The anomaly detection engine uses statistical baselines and machine learning to identify unusual activity.
Velocity Anomaly
highAgent makes 10x more API calls than its 7-day baseline.
Scope Escalation
criticalAgent requests access to scopes it has never used before.
Geographic Anomaly
mediumAgent connects from a new region not seen in training period.
Temporal Anomaly
lowAgent is active outside its normal operating hours.
Content Pattern Shift
mediumAgent's content scanning targets deviate significantly from historical pattern.
The activity feed streams all platform events in real-time using Server-Sent Events (SSE). Filter by event type, agent, or severity to focus on what matters.
// Connect to the real-time activity feed
const eventSource = new EventSource(
'https://api.drd.io/api/v1/events/stream?types=policy.*,enforcement.*',
{
headers: { 'Authorization': 'Bearer drd_live_sk_...' }
}
);
eventSource.addEventListener('policy.violated', (event) => {
const data = JSON.parse(event.data);
console.log('Policy violation:', data);
// {
// id: "019event-...",
// type: "policy.violated",
// agentId: "01956abc-...",
// policyId: "019policy-...",
// severity: "high",
// timestamp: "2026-02-13T12:01:00Z"
// }
});
eventSource.addEventListener('enforcement.issued', (event) => {
const data = JSON.parse(event.data);
console.log('Enforcement:', data);
});Key platform metrics are exposed via the metrics API and exported to your OpenTelemetry backend. All metrics support time-range queries and aggregation.
| Metric | Type | Description |
|---|---|---|
| drd.api.latency | Histogram | API response latency by endpoint |
| drd.events.throughput | Counter | Events ingested per minute |
| drd.trust.score.distribution | Histogram | Trust score distribution across agents |
| drd.policy.evaluations | Counter | Policy evaluations per minute |
| drd.content.scans | Counter | Content scans processed |
| drd.enforcement.active | Gauge | Currently active enforcement actions |
| drd.agent.heartbeat.miss | Counter | Missed heartbeats per agent |
| drd.webhook.deliveries | Counter | Webhook delivery attempts and failures |
DRD implements circuit breaker patterns for all downstream dependencies. When a service degrades, the circuit opens to prevent cascade failures.
Closed
All requests pass through normally. Failure counter tracks errors.
Open
Requests are rejected immediately. Service is considered unavailable.
Half-Open
Limited probe requests test service health before closing circuit.
{
"services": {
"contentPipeline": {
"failureThreshold": 5,
"resetTimeoutMs": 30000,
"halfOpenRequests": 3,
"monitoredExceptions": ["TimeoutError", "ServiceUnavailable"]
},
"trustNetwork": {
"failureThreshold": 3,
"resetTimeoutMs": 15000,
"halfOpenRequests": 2
}
}
}DRD embeds Runtime Application Self-Protection into the platform. RASP monitors application behavior from inside the runtime and blocks attacks in real-time, without relying on external firewalls.