Skip to content

Diagnostics & Metrics

The Relay provides a built-in HTTP server (configured via SERVER_PORT) for infrastructure orchestration and monitoring.


Purpose: Confirms the Relay process is running.

Returns 200 OK if the Relay process is running and the internal engine loops are active. This is the primary endpoint for orchestrator liveness probes.


Purpose: Confirms the Relay is ready to process events.

Returns 200 OK if the Relay can successfully communicate with both the Storage and the Publisher. If either dependency is unreachable, it returns 503 Service Unavailable.

This allows the Relay to stay alive and retry connections during an outage without being restarted by the orchestrator. This is the primary endpoint for orchestrator readiness probes


Purpose: Manual inspection and JSON-based health monitoring.

Returns a JSON snapshot of the current outbox state, including:

  • Lag: The age of the oldest pending event.
  • Volume: The total count of pending events.

The Relay is natively instrumented with OpenTelemetry (OTel) for vendor-neutral metrics and tracing.

To maintain a zero-dependency footprint, the Relay does not serve a local metrics endpoint. It follows the push-based OTLP (OpenTelemetry Line Protocol) standard:

  • Configuration: Set OTEL_METRICS_EXPORTER=otlp via environment variables.
  • Connectivity: Configure OTEL_EXPORTER_OTLP_ENDPOINT to point to your observability backend or an OpenTelemetry Collector.

The Relay exposes standard OpenTelemetry metrics to monitor the health, throughput, and lag of your outbox pipeline.

Histogram {event}

Tracks the number of events fetched from the database in a single claim operation. This helps monitor if the relay is keeping up with the ingestion rate.


Counter {event}

Tracks the total number of events processed.

  • Attributes: status (success/failed), type (event type).

Histogram seconds

Measures the total time elapsed from the moment an event was created in the database until it was successfully acknowledged by the message broker.

  • Attributes: type (event type).

Histogram seconds

Measures the duration of individual database operations.

  • Attributes: op (e.g., claim, mark_delivered, mark_failed).

Histogram seconds

Measures how long it takes to publish a message to the broker.

  • Attributes: status (success/failed), type (event type).

Gauge {event}

Represents the current number of events sitting in the outbox table with a PENDING status.


Gauge seconds

Tracks the age of the oldest pending event in the queue, providing a direct measurement of “lag.”