Skip to content

Performance Benchmark

This document details the performance evaluation of the Open Outbox Relay, comparing an unoptimized baseline against a tuned production configuration.


To ensure reproducibility, the benchmark was conducted on a dedicated local environment with the following hardware and workload parameters.

CategoryParameterSpecification
HardwareModelMacBook Pro M1 (32GB RAM)
StorageIntegrated Apple Silicon NVMe SSD
WorkloadEvent Payload200 Bytes
Backlog Volume100M+ Events
EnginePostgreSQL 16.4

To establish a clear performance delta, we conducted the initial Draining phase on a default, unoptimized PostgreSQL setup. For the final Bench phase, we applied high-throughput tuning to measure the peak performance potential of the relay.

The following parameters were applied to postgresql.conf to achieve the 20K+ EPS milestone:

ParameterValueRationale
shared_buffers4GBAllocated 12.5% of total RAM for data caching.
work_mem64MBIncreased memory for internal sort operations.
maintenance_work_mem512MBAccelerated index maintenance and vacuuming.
synchronous_commitoffDecoupled transaction success from disk flush (Async Logging).
checkpoint_timeout15minReduced the frequency of heavy disk flushes.
max_wal_size4GBExpanded the WAL ceiling to prevent frequent checkpoints.
random_page_cost1.1Optimized for NVMe seek speeds.

To establish a “worst-case” performance floor, the relay was tasked with clearing a 100M+ event backlog using out-of-the-box PostgreSQL settings.

  1. Backlog Setup: The outbox table was pre-populated, creating a high-pressure scenario for the index and storage engine.
  2. Worker Initialization: 12 workers were deployed to begin the concurrent polling and Kafka delivery.
  3. I/O Saturation: The system quickly hit a throughput ceiling. Adding more workers resulted in increased disk wait rather than increased EPS.
MetricResult
Stable Throughput14,500 EPS
Primary BottleneckDisk I/O Wait (fsync)
CPU Utilization~20% (Significant Headroom)
Disk LatencyHigh (Synchronous WAL writes)

Baseline Draining Grafana Figure 1: Grafana metrics showing the 14,500 EPS plateau during the unoptimized draining phase.


With the tuning parameters from Section 2 applied, we moved the performance ceiling from the I/O layer to the Index Arbitration layer.

MetricValueDescription
Peak Throughput20,100+ EPSA sustained milestone achieved with 12 workers.
Performance Gain+42.5% ImprovementMeasured against the 14.5K unoptimized baseline.

Efficiency remained high until the system reached the single-table throughput limit of the PostgreSQL index.

Worker CountThroughput (EPS)Scaling Efficiency
1 Worker5K100% (Baseline)
4 Workers14K89%
8 Workers19K84%
12 Workers22.1K67% (Saturation)

In this phase, I/O Wait dropped to near 0%. However, throughput plateaued at 22.1K EPS despite the CPU sitting idle at 20%.

This proves the hardware wasn’t the problem. The bottleneck shifted entirely to Index Locking. Even with SKIP LOCKED, there is a limit to how fast 12 workers can coordinate on a single index. To go faster, you would need Table Partitioning to give the workers more than one index to talk to at the same time.

Optimized Benchmark Grafana Figure 2: Sustained 20K EPS throughput showing consistent performance with optimized PostgreSQL settings.


MetricUnoptimized (Draining)Optimized (Bench)Delta
Throughput (EPS)14,10022,100+42.5%
  1. Draining Phase: Validated that the system can handle massive backlogs reliably at a rate of 1.2 Billion events per day, even without tuning.
  2. Bench Phase: Demonstrated that with specific tuning, the Relay exceeds 1.7 Billion events per day on a single M1 machine.
  3. Production Readiness: The Relay demonstrated high efficiency in managing internal memory and worker state, proving that the software logic is not the bottleneck.