Skip to main content

Webhook retry logic

When a webhook delivery fails (network timeout, 5xx response, DNS error, connection refused), Hook0 retries with increasing delays. Each failed attempt creates a new request attempt scheduled for later, until the retry limit is reached.

Why retries matter

Most webhook delivery failures are transient. The receiving server was restarting, a load balancer was draining connections, or a brief network partition occurred. A retry a few seconds later usually succeeds.

Without retries, every transient failure becomes a lost event. With naive retries (fixed interval, no limit), you risk overwhelming a recovering server. Hook0 uses a two-phase retry schedule that balances fast recovery with patience.

Two-phase retry schedule

Hook0 uses a configurable two-phase approach instead of a single fixed schedule:

  1. Fast retries -- frequent attempts with increasing delays, to recover from brief outages quickly
  2. Slow retries -- spaced-out attempts at fixed intervals, to handle longer outages without overwhelming the endpoint

Default retry configuration

Every application gets these defaults. Most users never need to change them:

ParameterDefaultRangeDescription
max_fast_retries300-100Number of fast-phase retry attempts
max_slow_retries300-100Number of slow-phase retry attempts
fast_retry_delay_seconds5s1-3600sInitial delay between fast retries
max_fast_retry_delay_seconds300s (5min)1-86400sMaximum delay between fast retries
slow_retry_delay_seconds3600s (1h)60-604800sFixed delay between slow retries

Per-subscription overrides

Each subscription can override any of these parameters. When a subscription does not specify a retry configuration, it inherits the application-level defaults. This means you can:

  • Set sane defaults for the whole application
  • Customize retry behavior for specific subscriptions that need it (e.g., a critical integration that needs more aggressive retries, or a low-priority endpoint that can tolerate longer delays)

What happens on failure

When a delivery attempt fails, Hook0 follows this decision process:

Non-retryable errors

Some errors are never retried because retrying would produce the same result:

  • Invalid header: the webhook signature could not be constructed (e.g., event type contains characters that are invalid in HTTP headers).

Subscription and application checks

Before scheduling a retry, Hook0 checks that the subscription is still enabled, has not been soft-deleted, and that the parent application still exists. If any of these fail, the retry is skipped.

Delivery status flow

Each webhook delivery attempt goes through these states:

More precisely, Hook0 tracks five statuses:

StatusMeaning
WaitingScheduled for future delivery (delay_until has not elapsed yet)
PendingReady to be picked up by a worker
In ProgressCurrently being delivered (picked by a worker)
SuccessfulDelivery succeeded (2xx HTTP response)
FailedDelivery failed

The request_attempt table stores every attempt with timestamps (created_at, picked_at, succeeded_at, failed_at, delay_until), so you can calculate:

  • Time to first delivery: picked_at - created_at
  • Delivery latency: succeeded_at - picked_at
  • Total time to success: succeeded_at - created_at (including retries)

Each retry creates a new row in the request_attempt table with an incremented retry_count and a delay_until set to the scheduled retry time.

When all retries are exhausted

When the maximum number of retries is reached (or the retry window expires), Hook0 does not create another attempt. The last attempt stays in failed status.

Failed deliveries are not lost. You can:

  1. Inspect all delivery attempts and their responses via the API or dashboard
  2. Replay the event via the API to re-trigger delivery to all matching subscriptions

Replaying an event resets its dispatched_at field. The dispatch trigger then creates new request attempts for all active subscriptions that match the event's type and labels.

Idempotency

Every event in Hook0 has a unique event_id. Consumers should use this as an idempotency key to handle duplicate deliveries.

Duplicates happen when:

  • The consumer processed the event but returned a non-2xx response (e.g., crashed after processing but before responding)
  • Network issues caused the response to be lost
  • Manual replay of an event

Example implementation

-- PostgreSQL example
CREATE TABLE processed_webhooks (
event_id UUID PRIMARY KEY,
processed_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Before processing:
INSERT INTO processed_webhooks (event_id)
VALUES ($1)
ON CONFLICT (event_id) DO NOTHING
RETURNING event_id;

-- If no row returned, event was already processed -- skip it.

Configuration

The output worker's retry and delivery behavior is configured via environment variables:

ParameterDefaultDescription
MAX_RETRIES25Maximum delivery attempts before giving up
MAX_RETRY_WINDOW8 daysMaximum time window for retries
CONNECT_TIMEOUT5 secondsTimeout for establishing a TCP connection
TIMEOUT15 secondsTotal HTTP request timeout (including connect)
CONCURRENT1Number of request attempts handled concurrently

Error types

When a delivery fails, Hook0 records one of these error codes:

Error codeMeaning
E_TIMEOUTThe HTTP request timed out
E_CONNECTIONCould not establish a connection to the target
E_HTTPThe server responded with a non-2xx status code
E_INVALID_TARGETThe target URL is invalid or resolves to a forbidden IP
E_INVALID_HEADERA required header value could not be constructed (non-retryable)
E_UNKNOWNAn unexpected error occurred

SSRF protection

Hook0 blocks webhook deliveries to private/internal IP addresses by default (loopback, RFC 1918, link-local, etc.). This prevents Server-Side Request Forgery attacks. This check can be disabled with the DISABLE_TARGET_IP_CHECK flag for development environments.

Further reading