Skip to main content

Webhook retry logic

When a webhook delivery fails (network timeout, 5xx response, DNS error, connection refused), Hook0 retries with increasing delays. Each failed attempt creates a new request attempt scheduled for later, until the retry limit is reached.

Why retries matter

Most webhook delivery failures are transient. The receiving server was restarting, a load balancer was draining connections, or a brief network partition occurred. A retry a few seconds later usually succeeds.

Without retries, every transient failure becomes a lost event. With naive retries (fixed interval, no limit), you risk overwhelming a recovering server. Hook0 uses a predefined retry schedule that spaces out attempts over increasing intervals.

Retry schedule

Hook0 uses a fixed retry schedule (not exponential backoff). Each retry attempt has a predefined delay:

The delays are cumulative from the point of failure, not from the original event. For example, if the first delivery fails at T=0, the second attempt is scheduled at T+3s. If that also fails, the third attempt is at T+3s+10s, and so on.

Retry limits

Retries are bounded by two configurable limits (whichever is reached first):

ParameterDefaultDescription
MAX_RETRIES25Maximum number of delivery attempts
MAX_RETRY_WINDOW8 daysMaximum total time window for retries

At startup, the output worker evaluates the effective retry policy by computing how many retries fit within the configured window. For example, with the default settings, all 25 retries fit comfortably within 8 days.

What happens on failure

When a delivery attempt fails, Hook0 follows this decision process:

Non-retryable errors

Some errors are never retried because retrying would produce the same result:

  • Invalid header: the webhook signature could not be constructed (e.g., event type contains characters that are invalid in HTTP headers).

Subscription and application checks

Before scheduling a retry, Hook0 checks that the subscription is still enabled, has not been soft-deleted, and that the parent application still exists. If any of these fail, the retry is skipped.

Delivery status flow

Each webhook delivery attempt goes through these states:

More precisely, Hook0 tracks five statuses:

StatusMeaning
WaitingScheduled for future delivery (delay_until has not elapsed yet)
PendingReady to be picked up by a worker
In ProgressCurrently being delivered (picked by a worker)
SuccessfulDelivery succeeded (2xx HTTP response)
FailedDelivery failed

The request_attempt table stores every attempt with timestamps (created_at, picked_at, succeeded_at, failed_at, delay_until), so you can calculate:

  • Time to first delivery: picked_at - created_at
  • Delivery latency: succeeded_at - picked_at
  • Total time to success: succeeded_at - created_at (including retries)

Each retry creates a new row in the request_attempt table with an incremented retry_count and a delay_until set to the scheduled retry time.

When all retries are exhausted

When the maximum number of retries is reached (or the retry window expires), Hook0 does not create another attempt. The last attempt stays in failed status.

Failed deliveries are not lost. You can:

  1. Inspect all delivery attempts and their responses via the API or dashboard
  2. Replay the event via the API to re-trigger delivery to all matching subscriptions

Replaying an event resets its dispatched_at field. The dispatch trigger then creates new request attempts for all active subscriptions that match the event's type and labels.

Idempotency

Every event in Hook0 has a unique event_id. Consumers should use this as an idempotency key to handle duplicate deliveries.

Duplicates happen when:

  • The consumer processed the event but returned a non-2xx response (e.g., crashed after processing but before responding)
  • Network issues caused the response to be lost
  • Manual replay of an event

Example implementation

-- PostgreSQL example
CREATE TABLE processed_webhooks (
event_id UUID PRIMARY KEY,
processed_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Before processing:
INSERT INTO processed_webhooks (event_id)
VALUES ($1)
ON CONFLICT (event_id) DO NOTHING
RETURNING event_id;

-- If no row returned, event was already processed -- skip it.

Configuration

The output worker's retry and delivery behavior is configured via environment variables:

ParameterDefaultDescription
MAX_RETRIES25Maximum delivery attempts before giving up
MAX_RETRY_WINDOW8 daysMaximum time window for retries
CONNECT_TIMEOUT5 secondsTimeout for establishing a TCP connection
TIMEOUT15 secondsTotal HTTP request timeout (including connect)
CONCURRENT1Number of request attempts handled concurrently

Error types

When a delivery fails, Hook0 records one of these error codes:

Error codeMeaning
E_TIMEOUTThe HTTP request timed out
E_CONNECTIONCould not establish a connection to the target
E_HTTPThe server responded with a non-2xx status code
E_INVALID_TARGETThe target URL is invalid or resolves to a forbidden IP
E_INVALID_HEADERA required header value could not be constructed (non-retryable)
E_UNKNOWNAn unexpected error occurred

SSRF protection

Hook0 blocks webhook deliveries to private/internal IP addresses by default (loopback, RFC 1918, link-local, etc.). This prevents Server-Side Request Forgery attacks. This check can be disabled with the DISABLE_TARGET_IP_CHECK flag for development environments.

Further reading