Webhook delivery is unreliable by nature — network failures, partner system outages, and transient errors can cause requests to fail or be received more than once. Bunjang's webhook system is designed to retry failed deliveries automatically, and your endpoint must be designed to handle duplicate deliveries safely.This page describes Bunjang's retry behavior, circuit breaker mechanism, and how to implement idempotent processing on your side.
Response timeout#
Your endpoint must respond with a 2xx status code within 10 seconds of receiving a webhook request. Responses that exceed this timeout are treated as delivery failures and trigger retries.To meet this requirement reliably, do not perform heavy business logic synchronously in your webhook handler. Instead:2.
Check the eventId for duplicates.
3.
Enqueue the event for asynchronous processing or write it to durable storage.
Heavy operations — fetching the latest state from a REST API, updating downstream systems, sending notifications — should run in a separate worker after the response is sent.
Retry policy#
Bunjang automatically retries webhook deliveries on transient failures.What is retried#
| Condition | Retried? |
|---|
2xx response | No — delivery considered successful |
5xx response | Yes |
429 Too Many Requests | Yes |
| Network error (connection refused, DNS failure, etc.) | Yes |
| Read timeout (no response within 10 seconds) | Yes |
Other 4xx responses (400, 401, 403, 404, etc.) | No — delivery considered permanently failed |
Retry schedule#
When a transient failure occurs, Bunjang performs up to 3 retries using exponential backoff with jitter. The total delivery attempt window is bounded by an overall timeout of approximately 60 seconds.Attempt 1 (initial) → fail
↓ ~1 second backoff (±50% jitter)
Attempt 2 → fail
↓ ~2 seconds backoff (±50% jitter)
Attempt 3 → fail
↓ ~4 seconds backoff (±50% jitter)
Attempt 4 (final) → fail → record failure, trigger circuit breaker logic
If all attempts fail, the event is considered failed and no further automatic retries are performed for that event. The failure is then evaluated by the circuit breaker.Note: The retry schedule is provided for reference and may change without notice as Bunjang tunes the system based on operational data. It is not a service-level agreement. Design your system to handle eventual delivery rather than expecting exact retry timing.
Implications for your system#
The total retry window is short (approximately 60 seconds). This protects against brief network blips and short partner-side issues like GC pauses or rolling restarts, but it does not protect against extended downtime.If your endpoint is unavailable for more than about a minute, events delivered during that window will be lost. To recover from extended outages, see Recovering missed events.
Circuit breaker#
To protect both Bunjang's webhook infrastructure and your partner endpoint from cascading failures, Bunjang implements a circuit breaker per subscription.How it works#
The circuit breaker tracks failures per subscription and disables delivery when failures accumulate beyond a threshold. While disabled, no new webhook requests are sent. After a cooldown period, Bunjang attempts a single recovery delivery; if it succeeds, the subscription is re-enabled.Trigger conditions#
A subscription is immediately disabled when:A response with 401, 403, 404, or other non-retryable 4xx status
codes is received. These indicate misconfiguration that retries cannot
resolve.
A response with 429 Too Many Requests is received after all retries
have been exhausted. The subscription enters a short cooldown (1 minute)
to respect rate limiting.
A subscription is disabled after repeated failures when:10 or more transient failures occur within a 1-hour window. Transient
failures include 400 responses, 5xx responses, network errors,
and timeouts.
Cooldown and recovery#
| Trigger | Initial cooldown | Behavior on recovery attempt |
|---|
429 Too Many Requests | 1 minute | If successful, subscription is re-enabled. If failed, the subscription is disabled again with a fresh cooldown. |
| All other failures | 30 minutes | Same as above. |
A single successful delivery during a recovery attempt is sufficient to fully re-enable the subscription and reset the failure counter.What this means in practice#
Brief partner outages (under ~1 minute): Bunjang's retries usually handle these without involving the circuit breaker.
Sustained partner outages (over ~1 hour with repeated failures): The subscription is disabled. Events generated during the disabled period are not queued for delivery later — they are simply not sent.
Misconfiguration (wrong signing secret, expired credentials, missing endpoint): The subscription is disabled immediately on the first 4xx response.
If your subscription is disabled, you must coordinate with your Bunjang integration contact to investigate the root cause before re-enabling.
Idempotency#
Bunjang's webhook system guarantees at-least-once delivery, not exactly-once. The same event may be delivered multiple times due to:Retries after a timeout where your endpoint actually processed the request but the response was lost.
Network-layer duplication.
Internal recovery mechanisms.
Your endpoint must therefore process each event idempotently — receiving the same event twice must produce the same result as receiving it once.How to implement idempotency#
Every webhook payload contains an eventId field that uniquely identifies the event. Use this as your idempotency key.1.
When a webhook arrives, extract eventId from the payload.
2.
Atomically check whether you've already processed this eventId:If yes → return 200 OK immediately without re-processing.
If no → record the eventId as "processing" (with a TTL to handle crashes), process the event, then mark it as "processed".
Database with unique constraint: Insert (event_id, processed_at) into a dedicated table; rely on the unique constraint to reject duplicates.
Redis with SET NX: Use SET event:{eventId} processed EX 86400 NX (24-hour TTL). Returns success only on first write.
Application-level cache: Suitable for short-lived deduplication only; not recommended as the sole mechanism.
TTL guidance: Retain processed eventIds for at least 24 hours. This comfortably covers Bunjang's retry window (~60 seconds) and circuit breaker cooldowns (up to 30 minutes), with significant safety margin for partner-side incident recovery. Longer retention adds storage overhead without practical benefit.Common idempotency mistakes#
Checking and inserting in separate steps without a transaction. Two concurrent webhook deliveries can both pass the "not yet processed" check and proceed to process. Use atomic operations (unique constraint, SET NX, or a transaction with appropriate isolation).
Using business identifiers instead of eventId. The same business entity (e.g. an orderId) can appear in many events. Deduplicate on eventId, not on entities mentioned in the payload.
Returning an error when a duplicate is detected. Return 200 OK for duplicates. Returning 4xx will cause Bunjang to disable your subscription.
Forgetting to handle "processing crashed mid-flight" cases. If your handler dies after marking an event as "processing" but before completing, the retry will see "processing" status and either skip or block. Design your TTL and state machine accordingly.
Recovering missed events#
Because Bunjang's webhook retry window is bounded (approximately 60 seconds of retries, plus circuit breaker cooldowns), extended outages or subscription disablement may result in missed events.Bunjang does not provide manual webhook replay. Missed events are not queued for later delivery.To recover, use the corresponding REST API as the authoritative source of state. Each event type maps to a domain that has its own REST API for querying current state. When you suspect missed events — after an incident, after subscription re-enablement, or as a periodic reconciliation — query the relevant API for changes since your last successful sync.General recovery pattern#
1.
Track the timestamp of the last successfully processed webhook (or last successful sync) per event domain in your system.
2.
After any incident or extended downtime, call the corresponding REST API filtered by an "updated since" parameter set to that timestamp.
3.
Reconcile each returned record against your local state.
4.
Update your last-sync timestamp.
Example: recovering missed order.status.changed events#
For order status events, use the Order API with the statusUpdateStartDate filter:GET /api/v1/orders?statusUpdateStartDate=2026-05-15T19%3A12%3A00Z&statusUpdateEndDate=2026-05-15T20%3A12%3A00Z&page=0&size=100
This returns all orders whose status was updated after the given timestamp, regardless of whether a webhook was delivered for the change.As new event types are added, refer to each event's reference page for the corresponding recovery API and filter parameter.Design principle#
Treat webhook delivery as a notification mechanism that reduces polling frequency, not as a guaranteed event stream. For business-critical state, the corresponding REST API is the source of truth, and webhooks are an optimization that lets you avoid constant polling under normal operating conditions.For event types not yet covered by webhooks, periodic polling of the corresponding REST API remains the appropriate pattern.
Summary checklist#
When implementing your webhook receiver, ensure you: Modified at 2026-05-14 03:34:37