Agent Oversight

Monitoring

Create monitoring rules, configure multi-channel alerts, and activate kill switches to maintain full oversight of your AI agents in production.

Monitoring Rules

Rules define the conditions that trigger alerts or automated actions. Each rule consists of a condition, severity level, and one or more response actions.

Threshold

Trigger when a metric exceeds or drops below a defined value.

Example: DRD Score drops below 50

Anomaly

Detect unusual patterns using baseline deviation analysis.

Example: API call volume 3x above normal

Pattern

Match specific sequences of events or behaviors.

Example: Repeated failed auth attempts

Compliance

Monitor adherence to regulatory or policy requirements.

Example: GDPR data retention violation

Creating a Rule via API

POST /api/monitoring/rules

{
  "name": "Low Score Alert",
  "type": "threshold",
  "condition": {
    "metric": "drd_score",
    "operator": "less_than",
    "value": 50
  },
  "severity": "critical",
  "actions": [
    { "type": "alert", "channels": ["webhook", "slack"] },
    { "type": "kill_switch", "delay": 300 }
  ],
  "agentIds": ["agent_abc123", "agent_def456"],
  "enabled": true
}

Alert Configuration

Alerts are sent through one or more channels when a monitoring rule triggers. Configure severity-based routing to ensure the right people are notified.

Available Channels

Webhook

HTTP POST to your endpoint with signed payload

Notifications to team members or distribution lists

Slack

Direct messages or channel notifications

PagerDuty

Incident creation for critical alerts

Dashboard

In-app notifications with full context

alert-config.ts

import { DRD } from '@drd/sdk';

const drd = new DRD({ apiKey: process.env.DRD_API_KEY });

// Configure alert routing
await drd.monitoring.configureAlerts({
  routes: [
    {
      severity: 'critical',
      channels: ['pagerduty', 'slack', 'webhook'],
      escalation: { after: '5m', to: 'email' },
    },
    {
      severity: 'warning',
      channels: ['slack', 'dashboard'],
    },
    {
      severity: 'info',
      channels: ['dashboard'],
    },
  ],
});

Kill Switch

The kill switch immediately suspends an agent's ability to perform actions. It can be activated manually from the dashboard, via API, or automatically through monitoring rules.

Immediate Suspension

All pending actions are cancelled and new actions are blocked.

kill-switch.ts

// Activate kill switch immediately
await drd.monitoring.killSwitch('agent_abc123', {
  reason: 'Anomalous behavior detected',
  duration: '1h',          // Auto-reactivate after 1 hour (optional)
  notifyOwner: true,
});

// Deactivate kill switch
await drd.monitoring.reactivate('agent_abc123', {
  reason: 'Investigation complete, behavior normal',
  approvedBy: 'operator_xyz',
});

Kill Switch API

POST /api/agents/{id}/kill-switch

// Request
{
  "action": "activate",
  "reason": "Policy violation threshold exceeded",
  "duration": 3600,
  "notifyOwner": true
}

// Response
{
  "agentId": "agent_abc123",
  "status": "suspended",
  "activatedAt": "2026-02-12T10:30:00Z",
  "expiresAt": "2026-02-12T11:30:00Z",
  "activatedBy": "rule:low_score_alert"
}

Webhook Alert Payloads

When an alert fires, DRD sends an HMAC-signed webhook to your configured endpoints.

alert-webhook-payload.json

{
  "event": "monitoring.alert.fired",
  "timestamp": "2026-02-12T10:30:00Z",
  "data": {
    "ruleId": "rule_abc123",
    "ruleName": "Low Score Alert",
    "severity": "critical",
    "agentId": "agent_abc123",
    "condition": {
      "metric": "drd_score",
      "operator": "less_than",
      "threshold": 50,
      "actualValue": 42
    },
    "actionsExecuted": [
      { "type": "alert", "channels": ["webhook", "slack"] },
      { "type": "kill_switch", "scheduledIn": 300 }
    ]
  }
}

Dashboard Usage

The monitoring dashboard provides a real-time view of all your agents, active rules, and alert history.

Live Agent Status

View all registered agents with real-time score updates, activity indicators, and kill switch status.

Rule Management

Create, edit, enable, and disable monitoring rules. View rule trigger history and adjust thresholds.

Alert History

Browse past alerts with full context, including the triggering event, actions taken, and resolution status.

OpenTelemetry Integration

DRD exports traces, metrics, and logs using the OpenTelemetry Protocol (OTLP). Connect to any OTLP-compatible backend -- Datadog, Grafana, Honeycomb, New Relic, or Jaeger.

POST /api/integrations

{
  "type": "opentelemetry",
  "config": {
    "endpoint": "https://otel-collector.acme.com:4317",
    "protocol": "grpc",
    "headers": {
      "x-api-key": "your-otel-api-key"
    },
    "exporters": {
      "traces": true,
      "metrics": true,
      "logs": true
    },
    "samplingRate": 0.1
  }
}

Exported Trace Spans

drd.guard.evaluate

drd.policy.match

drd.content.scan

drd.content.fingerprint

drd.trust.score.calculate

drd.event.ingest

drd.webhook.deliver

drd.enforcement.issue

Agent Heartbeat Monitoring

Every registered agent sends periodic heartbeats to DRD. Missed heartbeats trigger alerts and affect the agent's trust score. The heartbeat interval is configurable per agent.

heartbeat.ts

import { DRD } from '@drd/sdk';

const drd = new DRD({ apiKey: 'drd_live_sk_...' });

// Heartbeat is sent automatically every 60 seconds
// You can also send manual heartbeats:
await drd.heartbeat({
  agentId: '01956abc-...',
  status: 'healthy',
  metadata: {
    cpuUsage: 0.42,
    memoryMb: 512,
    activeConnections: 23,
    lastActionAt: new Date().toISOString(),
  },
});

Missed Heartbeats	Action	Trust Impact
1 - 2	Warning logged	-1 per miss
3 - 5	Alert sent to owner	-3 per miss
6 - 10	Agent flagged as unreliable	-5 per miss
10+	Agent auto-suspended	Score frozen at current

Anomaly Detection

DRD continuously monitors agent behavior patterns and flags anomalies. The anomaly detection engine uses statistical baselines and machine learning to identify unusual activity.

Velocity Anomaly

high

Agent makes 10x more API calls than its 7-day baseline.

Scope Escalation

critical

Agent requests access to scopes it has never used before.

Geographic Anomaly

medium

Agent connects from a new region not seen in training period.

Temporal Anomaly

low

Agent is active outside its normal operating hours.

Content Pattern Shift

medium

Agent's content scanning targets deviate significantly from historical pattern.

Real-Time Activity Feed

The activity feed streams all platform events in real-time using Server-Sent Events (SSE). Filter by event type, agent, or severity to focus on what matters.

activity-feed.ts

// Connect to the real-time activity feed
const eventSource = new EventSource(
  'https://api.drd.io/api/v1/events/stream?types=policy.*,enforcement.*',
  {
    headers: { 'Authorization': 'Bearer drd_live_sk_...' }
  }
);

eventSource.addEventListener('policy.violated', (event) => {
  const data = JSON.parse(event.data);
  console.log('Policy violation:', data);
  // {
  //   id: "019event-...",
  //   type: "policy.violated",
  //   agentId: "01956abc-...",
  //   policyId: "019policy-...",
  //   severity: "high",
  //   timestamp: "2026-02-13T12:01:00Z"
  // }
});

eventSource.addEventListener('enforcement.issued', (event) => {
  const data = JSON.parse(event.data);
  console.log('Enforcement:', data);
});

System Metrics

Key platform metrics are exposed via the metrics API and exported to your OpenTelemetry backend. All metrics support time-range queries and aggregation.

Metric	Type	Description
drd.api.latency	Histogram	API response latency by endpoint
drd.events.throughput	Counter	Events ingested per minute
drd.trust.score.distribution	Histogram	Trust score distribution across agents
drd.policy.evaluations	Counter	Policy evaluations per minute
drd.content.scans	Counter	Content scans processed
drd.enforcement.active	Gauge	Currently active enforcement actions
drd.agent.heartbeat.miss	Counter	Missed heartbeats per agent
drd.webhook.deliveries	Counter	Webhook delivery attempts and failures

Circuit Breaker Patterns

DRD implements circuit breaker patterns for all downstream dependencies. When a service degrades, the circuit opens to prevent cascade failures.

Closed

All requests pass through normally. Failure counter tracks errors.

Open

Requests are rejected immediately. Service is considered unavailable.

Half-Open

Limited probe requests test service health before closing circuit.

circuit-breaker-config.json

{
  "services": {
    "contentPipeline": {
      "failureThreshold": 5,
      "resetTimeoutMs": 30000,
      "halfOpenRequests": 3,
      "monitoredExceptions": ["TimeoutError", "ServiceUnavailable"]
    },
    "trustNetwork": {
      "failureThreshold": 3,
      "resetTimeoutMs": 15000,
      "halfOpenRequests": 2
    }
  }
}

Runtime Application Self-Protection (RASP)

DRD embeds Runtime Application Self-Protection into the platform. RASP monitors application behavior from inside the runtime and blocks attacks in real-time, without relying on external firewalls.

SQL injection detection and blocking at the ORM layer
Request deserialization protection (prototype pollution, mass assignment)
Path traversal prevention with canonicalization
SSRF protection -- deny-list for internal network ranges
Rate limiting anomaly detection at the application layer
Cryptographic operation monitoring (key usage, entropy checks)

Next Steps

DRD Score

Understand score-based monitoring

Learn more →

Webhooks

Configure alert delivery endpoints

Learn more →

AI Oversight

Automated AI-powered monitoring

Learn more →

Trust Alerts

Alert routing and escalation policies

Learn more →

Agent Oversight

Monitoring

Create monitoring rules, configure multi-channel alerts, and activate kill switches to maintain full oversight of your AI agents in production.

Monitoring Rules

Rules define the conditions that trigger alerts or automated actions. Each rule consists of a condition, severity level, and one or more response actions.

Threshold

Trigger when a metric exceeds or drops below a defined value.

Example: DRD Score drops below 50

Anomaly

Detect unusual patterns using baseline deviation analysis.

Example: API call volume 3x above normal

Pattern

Match specific sequences of events or behaviors.

Example: Repeated failed auth attempts

Compliance

Monitor adherence to regulatory or policy requirements.

Example: GDPR data retention violation

Creating a Rule via API

POST /api/monitoring/rules

{
  "name": "Low Score Alert",
  "type": "threshold",
  "condition": {
    "metric": "drd_score",
    "operator": "less_than",
    "value": 50
  },
  "severity": "critical",
  "actions": [
    { "type": "alert", "channels": ["webhook", "slack"] },
    { "type": "kill_switch", "delay": 300 }
  ],
  "agentIds": ["agent_abc123", "agent_def456"],
  "enabled": true
}

Alert Configuration

Alerts are sent through one or more channels when a monitoring rule triggers. Configure severity-based routing to ensure the right people are notified.

Available Channels

Webhook

HTTP POST to your endpoint with signed payload

Notifications to team members or distribution lists

Slack

Direct messages or channel notifications

PagerDuty

Incident creation for critical alerts

Dashboard

In-app notifications with full context

alert-config.ts

import { DRD } from '@drd/sdk';

const drd = new DRD({ apiKey: process.env.DRD_API_KEY });

// Configure alert routing
await drd.monitoring.configureAlerts({
  routes: [
    {
      severity: 'critical',
      channels: ['pagerduty', 'slack', 'webhook'],
      escalation: { after: '5m', to: 'email' },
    },
    {
      severity: 'warning',
      channels: ['slack', 'dashboard'],
    },
    {
      severity: 'info',
      channels: ['dashboard'],
    },
  ],
});

Kill Switch

The kill switch immediately suspends an agent's ability to perform actions. It can be activated manually from the dashboard, via API, or automatically through monitoring rules.

Immediate Suspension

All pending actions are cancelled and new actions are blocked.

kill-switch.ts

// Activate kill switch immediately
await drd.monitoring.killSwitch('agent_abc123', {
  reason: 'Anomalous behavior detected',
  duration: '1h',          // Auto-reactivate after 1 hour (optional)
  notifyOwner: true,
});

// Deactivate kill switch
await drd.monitoring.reactivate('agent_abc123', {
  reason: 'Investigation complete, behavior normal',
  approvedBy: 'operator_xyz',
});

Kill Switch API

POST /api/agents/{id}/kill-switch

// Request
{
  "action": "activate",
  "reason": "Policy violation threshold exceeded",
  "duration": 3600,
  "notifyOwner": true
}

// Response
{
  "agentId": "agent_abc123",
  "status": "suspended",
  "activatedAt": "2026-02-12T10:30:00Z",
  "expiresAt": "2026-02-12T11:30:00Z",
  "activatedBy": "rule:low_score_alert"
}

Webhook Alert Payloads

When an alert fires, DRD sends an HMAC-signed webhook to your configured endpoints.

alert-webhook-payload.json

{
  "event": "monitoring.alert.fired",
  "timestamp": "2026-02-12T10:30:00Z",
  "data": {
    "ruleId": "rule_abc123",
    "ruleName": "Low Score Alert",
    "severity": "critical",
    "agentId": "agent_abc123",
    "condition": {
      "metric": "drd_score",
      "operator": "less_than",
      "threshold": 50,
      "actualValue": 42
    },
    "actionsExecuted": [
      { "type": "alert", "channels": ["webhook", "slack"] },
      { "type": "kill_switch", "scheduledIn": 300 }
    ]
  }
}

Dashboard Usage

The monitoring dashboard provides a real-time view of all your agents, active rules, and alert history.

Live Agent Status

View all registered agents with real-time score updates, activity indicators, and kill switch status.

Rule Management

Create, edit, enable, and disable monitoring rules. View rule trigger history and adjust thresholds.

Alert History

Browse past alerts with full context, including the triggering event, actions taken, and resolution status.

OpenTelemetry Integration

DRD exports traces, metrics, and logs using the OpenTelemetry Protocol (OTLP). Connect to any OTLP-compatible backend -- Datadog, Grafana, Honeycomb, New Relic, or Jaeger.

POST /api/integrations

{
  "type": "opentelemetry",
  "config": {
    "endpoint": "https://otel-collector.acme.com:4317",
    "protocol": "grpc",
    "headers": {
      "x-api-key": "your-otel-api-key"
    },
    "exporters": {
      "traces": true,
      "metrics": true,
      "logs": true
    },
    "samplingRate": 0.1
  }
}

Exported Trace Spans

drd.guard.evaluate

drd.policy.match

drd.content.scan

drd.content.fingerprint

drd.trust.score.calculate

drd.event.ingest

drd.webhook.deliver

drd.enforcement.issue

Agent Heartbeat Monitoring

Every registered agent sends periodic heartbeats to DRD. Missed heartbeats trigger alerts and affect the agent's trust score. The heartbeat interval is configurable per agent.

heartbeat.ts

import { DRD } from '@drd/sdk';

const drd = new DRD({ apiKey: 'drd_live_sk_...' });

// Heartbeat is sent automatically every 60 seconds
// You can also send manual heartbeats:
await drd.heartbeat({
  agentId: '01956abc-...',
  status: 'healthy',
  metadata: {
    cpuUsage: 0.42,
    memoryMb: 512,
    activeConnections: 23,
    lastActionAt: new Date().toISOString(),
  },
});

Missed Heartbeats	Action	Trust Impact
1 - 2	Warning logged	-1 per miss
3 - 5	Alert sent to owner	-3 per miss
6 - 10	Agent flagged as unreliable	-5 per miss
10+	Agent auto-suspended	Score frozen at current

Anomaly Detection

DRD continuously monitors agent behavior patterns and flags anomalies. The anomaly detection engine uses statistical baselines and machine learning to identify unusual activity.

Velocity Anomaly

high

Agent makes 10x more API calls than its 7-day baseline.

Scope Escalation

critical

Agent requests access to scopes it has never used before.

Geographic Anomaly

medium

Agent connects from a new region not seen in training period.

Temporal Anomaly

low

Agent is active outside its normal operating hours.

Content Pattern Shift

medium

Agent's content scanning targets deviate significantly from historical pattern.

Real-Time Activity Feed

The activity feed streams all platform events in real-time using Server-Sent Events (SSE). Filter by event type, agent, or severity to focus on what matters.

activity-feed.ts

// Connect to the real-time activity feed
const eventSource = new EventSource(
  'https://api.drd.io/api/v1/events/stream?types=policy.*,enforcement.*',
  {
    headers: { 'Authorization': 'Bearer drd_live_sk_...' }
  }
);

eventSource.addEventListener('policy.violated', (event) => {
  const data = JSON.parse(event.data);
  console.log('Policy violation:', data);
  // {
  //   id: "019event-...",
  //   type: "policy.violated",
  //   agentId: "01956abc-...",
  //   policyId: "019policy-...",
  //   severity: "high",
  //   timestamp: "2026-02-13T12:01:00Z"
  // }
});

eventSource.addEventListener('enforcement.issued', (event) => {
  const data = JSON.parse(event.data);
  console.log('Enforcement:', data);
});

System Metrics

Key platform metrics are exposed via the metrics API and exported to your OpenTelemetry backend. All metrics support time-range queries and aggregation.

Metric	Type	Description
drd.api.latency	Histogram	API response latency by endpoint
drd.events.throughput	Counter	Events ingested per minute
drd.trust.score.distribution	Histogram	Trust score distribution across agents
drd.policy.evaluations	Counter	Policy evaluations per minute
drd.content.scans	Counter	Content scans processed
drd.enforcement.active	Gauge	Currently active enforcement actions
drd.agent.heartbeat.miss	Counter	Missed heartbeats per agent
drd.webhook.deliveries	Counter	Webhook delivery attempts and failures

Circuit Breaker Patterns

DRD implements circuit breaker patterns for all downstream dependencies. When a service degrades, the circuit opens to prevent cascade failures.

Closed

All requests pass through normally. Failure counter tracks errors.

Open

Requests are rejected immediately. Service is considered unavailable.

Half-Open

Limited probe requests test service health before closing circuit.

circuit-breaker-config.json

{
  "services": {
    "contentPipeline": {
      "failureThreshold": 5,
      "resetTimeoutMs": 30000,
      "halfOpenRequests": 3,
      "monitoredExceptions": ["TimeoutError", "ServiceUnavailable"]
    },
    "trustNetwork": {
      "failureThreshold": 3,
      "resetTimeoutMs": 15000,
      "halfOpenRequests": 2
    }
  }
}

Runtime Application Self-Protection (RASP)

DRD embeds Runtime Application Self-Protection into the platform. RASP monitors application behavior from inside the runtime and blocks attacks in real-time, without relying on external firewalls.

SQL injection detection and blocking at the ORM layer
Request deserialization protection (prototype pollution, mass assignment)
Path traversal prevention with canonicalization
SSRF protection -- deny-list for internal network ranges
Rate limiting anomaly detection at the application layer
Cryptographic operation monitoring (key usage, entropy checks)

Next Steps

DRD Score

Understand score-based monitoring

Learn more →

Webhooks

Configure alert delivery endpoints

Learn more →

AI Oversight

Automated AI-powered monitoring

Learn more →

Trust Alerts

Alert routing and escalation policies

Learn more →