Configuration Reference
All configuration is handled via environment variables. You can set these in your docker-compose.yml, Kubernetes Deployment, or local shell.
Core Infrastructure Configuration
Section titled “Core Infrastructure Configuration”Basic connectivity and engine settings. These are the basic required configurations needed to run the Relay.
STORAGE_TYPE
Section titled “STORAGE_TYPE”Default: postgres
The database engine type used for the outbox table. Currently, PostgreSQL is the primary supported engine.
PUBLISHER_TYPE
Section titled “PUBLISHER_TYPE”Default: stdout
The target message broker where events will be delivered. Supported options: nats, kafka, redis, stdout, null.
STORAGE_URL
Section titled “STORAGE_URL”Default: —
The full connection string for your source database.
Example: postgres://user:pass@host:5432/db
PUBLISHER_URL
Section titled “PUBLISHER_URL”Default: —
The connection address for your message broker.
Example: localhost:9092 (Kafka) or nats://localhost:4222 (NATS).
STORAGE_TABLE_NAME
Section titled “STORAGE_TABLE_NAME”Default: openoutbox_events
The name of the database table where outbox events are stored. This allows the Relay to operate within shared databases while avoiding naming conflicts.
Pattern: ^[a-z0-9_]+$ (Lowercase letters, numbers, and underscores only).
RELAY_ID
Section titled “RELAY_ID”Default: os.Hostname() + "-" + uuid[:4]
A unique identifier for this specific Relay instance. This ID is used for database lease management, metric partitioning, and distributed locking.
Behavior: If left empty, the engine automatically generates an ID using the system hostname and a short unique suffix to prevent collisions in high-density environments.
Orchestration: When running in Kubernetes or Docker Compose, it is recommended to leave this empty to allow the engine to generate a unique identity per container, or map it to the Pod/Service name via the downward API.
ENVIRONMENT
Section titled “ENVIRONMENT”Default: production
The execution mode for the Relay. Use development for more verbose local logging and debugging.
Engine Tuning
Section titled “Engine Tuning”Controls how the Relay polls the database and manages event ownership.
POLL_INTERVAL
Section titled “POLL_INTERVAL”Default: 500ms
The idle-state delay. The Relay operates in Draining Mode (processing batches back-to-back without sleeping) until one of the following occurs:
- Empty Batch: No more PENDING events are found in the database.
- Process Error: An error occurs during the polling or delivery cycle.
In these cases, the Relay sleeps for the POLL_INTERVAL duration before retrying.
BATCH_SIZE
Section titled “BATCH_SIZE”Default: 100
The maximum number of events the Relay will claim and attempt to process in a single polling iteration.
LEASE_TIMEOUT
Section titled “LEASE_TIMEOUT”Default: 3m
The time allowed for an event to stay in the DELIVERING state before it is considered “stuck” (due to a worker crash) and becomes eligible for a retry.
REAP_BATCH_SIZE
Section titled “REAP_BATCH_SIZE”Default: 100
The number of expired leases the cleanup engine will reset back to PENDING in a single maintenance cycle.
HEALTH_CHECK_INTERVAL
Section titled “HEALTH_CHECK_INTERVAL”Default: 5s
The frequency at which the engine performs background health probes on its dependencies (Storage and Publisher). This interval determines the “heartbeat” of the relay and controls how quickly the instance detects an outage and updates its operational state.
PUBLISHER_CONNECT_RETRY_INTERVAL
Section titled “PUBLISHER_CONNECT_RETRY_INTERVAL”Default: 5s
The amount of time the engine waits between failed attempts to establish the initial connection to the message broker (NATS, Kafka, or Redis) during the startup phase.
ENABLE_STATS
Section titled “ENABLE_STATS”Default: true
Determines whether the engine performs background database scans to calculate backlog metrics, such as the count of pending events and the age of the oldest record.
SERVER_PORT
Section titled “SERVER_PORT”Default: :8080
The address and port where the Relay’s HTTP server listens. This server is only used to expose statistics (/stats) for now.
Supports standard Go address formats like :8080 or 127.0.0.1:8080.
Reliability & Backoff
Section titled “Reliability & Backoff”Settings for handling failures and retries.
RETRY_MAX_ATTEMPTS
Section titled “RETRY_MAX_ATTEMPTS”Default: 25
The maximum number of delivery attempts before an event is considered permanently failed. Once this limit is reached, the status is updated to DEAD and requires manual intervention or a status reset to be retried.
RETRY_BASE_DELAY
Section titled “RETRY_BASE_DELAY”Default: 1s
The starting wait time for the exponential backoff algorithm. This is the duration the Relay waits after the very first failure.
RETRY_MAX_DELAY
Section titled “RETRY_MAX_DELAY”Default: 24h
The ceiling for exponential backoff. Even if the calculated delay for the next retry exceeds this value, the Relay will cap the wait time at this limit to ensure the event is retried at least once per day.
RETRY_JITTER
Section titled “RETRY_JITTER”Default: 0.15
A randomness factor (0.15 = 15%) applied to each backoff calculation. This prevents “thundering herd” issues by ensuring that multiple failed events don’t all retry at the exact same millisecond.
NATS Specific Configurations
Section titled “NATS Specific Configurations”These settings only apply when PUBLISHER_TYPE="nats".
NATS_PUBLISH_TIMEOUT
Section titled “NATS_PUBLISH_TIMEOUT”Default: 5s
The maximum amount of time the Relay will wait for an acknowledgment from the NATS server (or JetStream) before timing out the request.
NATS_CONNECTION_TIMEOUT
Section titled “NATS_CONNECTION_TIMEOUT”Default: 10s
The deadline for the initial TCP handshake and protocol negotiation with the NATS server.
Redis Specific Configurations
Section titled “Redis Specific Configurations”These settings only apply when PUBLISHER_TYPE="redis".
REDIS_CONNECTION_TIMEOUT
Section titled “REDIS_CONNECTION_TIMEOUT”Default: 5s
The maximum amount of time allowed for the initial connectivity check (Ping) when the Relay service starts. If the Relay cannot reach Redis within this window, the service will fail to start.
REDIS_WRITE_TIMEOUT
Section titled “REDIS_WRITE_TIMEOUT”Default: 1s
The maximum time allowed for each write operation (XADD) to the Redis stream. This prevents a slow or lagging Redis instance from hanging the entire engine processing loop.
Kafka Specific Configurations
Section titled “Kafka Specific Configurations”These settings only apply when PUBLISHER_TYPE="kafka".
KAFKA_BATCH_SIZE
Section titled “KAFKA_BATCH_SIZE”Default: 1
KAFKA_BATCH_BYTES
Section titled “KAFKA_BATCH_BYTES”Default: 10485760 (10MB)
The maximum total size of a batch in bytes. The producer will trigger a request when
either KAFKA_BATCH_SIZE or KAFKA_BATCH_BYTES is reached.
KAFKA_BATCH_TIMEOUT
Section titled “KAFKA_BATCH_TIMEOUT”Default: 10ms
The maximum time to wait before sending a partial batch if the size requirements haven’t been met.
KAFKA_ASYNC
Section titled “KAFKA_ASYNC”Default: false
When set to false, the Relay waits for an acknowledgment from Kafka before proceeding.
Set this to false for guaranteed “At-Least-Once” delivery. Setting it to true increases
throughput but risks message loss if the Relay crashes.
KAFKA_COMPRESSION
Section titled “KAFKA_COMPRESSION”Default: snappy
The compression codec used for messages sent to Kafka. Supported options: none, gzip, snappy, lz4, zstd. snappy provides a good balance between CPU usage and compression ratio.
KAFKA_REQUIRED_ACKS
Section titled “KAFKA_REQUIRED_ACKS”Default: all
The number of acknowledgments the producer requires the leader to have received before considering a request complete. Options: all (maximum durability), one (faster), or none (fire and forget).
KAFKA_MAX_ATTEMPTS
Section titled “KAFKA_MAX_ATTEMPTS”Default: 5
The number of times the Kafka producer will attempt to resend a message if the initial write fails.
KAFKA_WRITE_TIMEOUT
Section titled “KAFKA_WRITE_TIMEOUT”Default: 10s
The maximum time to wait for a successful write acknowledgment from the Kafka broker.
KAFKA_READ_TIMEOUT
Section titled “KAFKA_READ_TIMEOUT”Default: 10s
The maximum time to wait for a response when reading from the Kafka broker (e.g., during metadata updates).
KAFKA_CONNECTION_TIMEOUT
Section titled “KAFKA_CONNECTION_TIMEOUT”Default: 10s
The maximum time allowed to establish the initial TCP connection and perform the metadata exchange with the Kafka brokers before the engine reports a connection failure.
Observability (OpenTelemetry)
Section titled “Observability (OpenTelemetry)”These settings control how the Relay exports traces and metrics to your observability stack.
OTEL_TRACES_EXPORTER
Section titled “OTEL_TRACES_EXPORTER”Default: none
Defines the destination for trace data. Supported options: otlp, console, or none.
OTEL_METRIC_EXPORT_INTERVAL
Section titled “OTEL_METRIC_EXPORT_INTERVAL”Default: 60000
The interval (in milliseconds) between metric flushes.
OTEL_METRICS_EXPORTER
Section titled “OTEL_METRICS_EXPORTER”Default: none
Defines the destination for metric data. Supported options: otlp, prometheus, console, or none.
OTEL_EXPORTER_OTLP_ENDPOINT
Section titled “OTEL_EXPORTER_OTLP_ENDPOINT”Default: http://localhost:4317
The address of your OpenTelemetry Collector or backend. Use the 4317 port for gRPC and 4318 for HTTP.
OTEL_EXPORTER_OTLP_PROTOCOL
Section titled “OTEL_EXPORTER_OTLP_PROTOCOL”Default: grpc
The transport protocol used for the OTLP exporter. Supported options: grpc or http/protobuf.
OTEL_BSP_EXPORT_TIMEOUT
Section titled “OTEL_BSP_EXPORT_TIMEOUT”Default: 30000 (30s)
The maximum time the OpenTelemetry SDK will wait for a batch of spans to be exported before timing out the request.
OTEL_BSP_SCHEDULE_DELAY
Section titled “OTEL_BSP_SCHEDULE_DELAY”Default: 5000 (5s)
The interval between two consecutive batch exports. Reducing this value decreases memory usage by clearing spans more frequently, but increases CPU and network overhead.