Terminology & Concepts
Application
Section titled “Application”An application (or producer) is the component responsible for performing a state-changing operation (business logic) and persisting a corresponding event into the outbox store.
To maintain the guarantees of the outbox pattern, the application MUST:
- Generate a unique
event_id. - Ensure the event is written to the outbox store within the same atomic boundary (transaction) as the business state change.
The application is considered the “source” of the event.
An event is a record representing a message intended to be published to an external system.
Events are stored in a durable outbox store.
An event consists of:
- identity
- payload
- metadata
- processing state
An event MUST be persisted before any publish attempt.
Outbox Store
Section titled “Outbox Store”The outbox store is the durable storage system that holds events prior to publication.
Examples include:
- relational databases
- document databases
- append-only stores
The outbox store is the source of truth for event state.
Publisher
Section titled “Publisher”A publisher is a component responsible for delivering events to an external system.
Examples include:
- Kafka producers
- NATS (core or JetStream) publishers
- RabbitMQ publishers
- SQS clients
- Redis publishers
- HTTP dispatchers
A publisher reports success or failure for each publish attempt.
A relay is a runtime component that processes events from the outbox store.
A relay MUST:
- read eligible events
- claim events for processing
- publish events using a publisher
- update event state based on the outcome
Multiple relays MAY operate concurrently.
A claim is a temporary ownership marker that allows a relay to process an event.
A lease (or visibility timeout) is the duration for which a claim is valid.
- If the lease expires before the relay updates the event state, the event MUST become eligible for claiming again.
- The event MAY then be claimed by the same relay or by another relay.
- This mechanism prevents events from remaining stuck if a relay crashes, stalls, or fails to complete processing.
Concurrency Assumptions
Section titled “Concurrency Assumptions”- A claim does NOT guarantee exactly-once processing.
- A claim MAY expire.
- A claim does NOT guarantee that only one relay will ever process the event.
Implementations MUST assume that the same event can be processed more than once.
Acknowledgement
Section titled “Acknowledgement”An acknowledgement is the point at which a publish attempt is considered successful.
Unless explicitly defined otherwise:
- acknowledgement occurs when the publisher reports success
Acknowledgement does NOT imply downstream consumption.
Delivery
Section titled “Delivery”Delivery refers to the act of successfully publishing an event to an external system.
This specification assumes:
- delivery MAY occur more than once
- delivery is considered successful based on publisher acknowledgement
Idempotency
Section titled “Idempotency”Idempotency is the property where an operation can be applied multiple times without changing the result beyond the initial application.
Since this specification defaults to at-least-once delivery:
- Downstream consumers SHOULD be idempotent.
- Consumers SHOULD use the
event_idto detect and discard duplicate processing.
A batch is a collection of events retrieved from the outbox store and processed together in a single iteration of the relay’s loop.
Batching is used to improve throughput by reducing the number of round-trips to the storage and the publisher.
Delivery Guarantee
Section titled “Delivery Guarantee”A delivery guarantee defines how reliably events are delivered.
Unless otherwise specified, implementations SHOULD provide:
- at-least-once delivery
This means:
- an event will be delivered one or more times
- duplicate delivery is possible
Note: Implementations MAY provide stronger delivery guarantees, such as effectively-once or exactly-once delivery, when supported by the underlying storage and broker.
Such guarantees MUST clearly document:
- required conditions (e.g., idempotency, transactional support)
- limitations and failure modes
A retry is a subsequent publish attempt for an event that has not yet been successfully delivered or whose lease has expired.
Retries occur automatically within the normal processing lifecycle and typically apply to events in a non-terminal state.
Backoff
Section titled “Backoff”Backoff is the strategy of delaying retries to avoid overwhelming the system or the external broker during failures. This is typically implemented using the available_at field.
Replay
Section titled “Replay”A replay is an explicit action that re-schedules an event for processing after it has reached a terminal or completed state.
Replay is used to reprocess events that were previously considered finished (e.g., published or dead).
Replay is distinct from retry:
- retry occurs within the same processing lifecycle
- replay starts a new processing lifecycle
Dead Event
Section titled “Dead Event”A dead event is an event that is no longer automatically retried.
Dead events:
- MAY require manual intervention
- MAY be replayed
Attempt
Section titled “Attempt”An attempt is a single publish execution for an event.
An event MAY have multiple attempts as part of retries or replay.
Implementations MAY track the number of attempts for each event.
Correlation and Tracing
Section titled “Correlation and Tracing”Correlation is the ability to link an event back to the original request or transaction that produced it.
Tracing refers to the end-to-end tracking of an event across system boundaries.
Implementations SHOULD support the propagation of trace contexts (e.g., W3C Traceparent) via headers or metadata to enable distributed observability.
Processing State
Section titled “Processing State”Processing state represents the lifecycle stage of an event within the outbox system.
The state MUST be one of:
PENDINGCLAIMEDPUBLISHEDDEAD
The exact state model is defined in the Event Model section.
Partition Key
Section titled “Partition Key”A partition key is a value used to select routing or partitioning behavior in the target transport or broker.
Events with the same partition key MAY be routed to the same broker partition or equivalent destination.
Partition keys do not, by themselves, define ordering guarantees unless explicitly stated by an implementation.
Ordering Key
Section titled “Ordering Key”An ordering key is a value used to define the scope within which event ordering is preserved.
When ordering is supported, events with the same ordering key MUST be processed in order according to the applicable ordering rules.
Ordering
Section titled “Ordering”Ordering defines the relative sequence in which events are processed or delivered.
This specification does NOT guarantee global ordering.
Ordering MAY be guaranteed within the scope of an ordering key.
When ordering is supported, events sharing the same ordering key MUST be processed in order according to the defined ordering rules.
Note: The partition key does NOT define ordering semantics and is used only for routing or transport-level partitioning.
Operator
Section titled “Operator”An operator is a human or automated system responsible for managing the outbox system.
Operators MAY:
- inspect event state
- trigger replay
- intervene in failure scenarios
Operational Inspection
Section titled “Operational Inspection”Operational inspection refers to querying and analyzing event state for monitoring and debugging.
Examples include:
- listing dead events
- filtering events by time range
- inspecting retry counts
- identifying stuck or unprocessed events
Termination Condition
Section titled “Termination Condition”A termination condition defines when an event is no longer eligible for automatic retry.
When a termination condition is met:
- the event MUST transition to
DEAD - the event MUST NOT be retried automatically
Termination conditions are implementation-defined and MAY include:
- maximum number of attempts
- time-based limits
- explicit operator intervention
- implementation-specific policies