How do I retry a command until it succeeds in bash?

Wrap it in an until loop: `until mycommand; do ...; done` runs the body each time the command fails and stops the moment it succeeds. The danger is an unbounded loop against a permanently-broken dependency, so always add a maximum attempt count and return failure when you hit it. The reusable retry() function on this page takes the max attempts as its first argument and the command as the rest, so you write `retry 5 curl ...` and never reimplement the loop.

What is exponential backoff and why add jitter?

Exponential backoff doubles the wait after each failure — 1s, 2s, 4s, 8s — instead of retrying instantly. Instant retries hammer a service that's already struggling and keep it down; backoff gives it room to recover. Jitter adds a small random offset to each wait. Without it, many clients that failed at the same instant retry at the same instant, producing a synchronized thundering herd that re-overwhelms the service on every round. A few hundred milliseconds to a couple of seconds of randomness spreads the retries out.

How do I wait for a database or port to be ready before continuing?

Retry a connectivity check until it passes: `retry 10 nc -z -w 2 db.internal 5432` polls the Postgres port, backing off between tries, and returns success the moment the port accepts a connection. This is the wait-for-it pattern, and it's the cure for the deploy script that runs migrations a half-second before the database container finishes booting. Use nc -z (zero-I/O port scan) or bash's own /dev/tcp redirection; both tell you only whether the port is open, which is what you want here.

Should I retry every kind of failure?

No, and this is the mistake that makes retries dangerous. Retry transient failures — timeouts, connection-refused, HTTP 429 and 5xx, DNS hiccups — the ones that might succeed if you try again. Do not retry deterministic failures: a 404, a 401, a syntax error, a missing file. They fail identically every time, so retrying them only delays the inevitable while hiding the real error behind a wall of retry noise. When you can, inspect the exit code or HTTP status and only loop on the codes that are worth looping on.

curl already has --retry — when do I need my own loop?

Use curl --retry for plain curl calls; it handles transient HTTP and connection errors with its own backoff and is simpler than rolling your own. You need the generic retry() function when the thing you're retrying isn't curl — a database connection check, an ssh command, a flaky CLI tool, a custom health probe — or when you want one consistent backoff policy across all of them, or a condition curl can't express. The function wraps any command; curl --retry only wraps curl.

Retry a Command with Exponential Backoff in Bash

I once spent forty minutes at eleven at night debugging a deploy that wasn't broken. The release script ran the database migration, the migration threw connection refused, the script exited non-zero, the deploy rolled itself back, and I got paged. I read the migration. I read the logs. I checked the database — it was up, it was healthy, it accepted my connection instantly. I re-ran the deploy and it worked. So I did what you do at eleven at night and chalked it up to gremlins, until it happened again two days later and I finally watched the timing: the script brought up a fresh database container and started the migration about six seconds before Postgres finished initializing and began accepting connections. The migration was racing the database's boot, and most of the time it won, and the times it lost I lost forty minutes.

The script wasn't wrong about anything except its assumption that a dependency is ready the instant you ask for it. In production, dependencies are eventually ready. Networks blip. A service you call returns a 503 for the two seconds it takes to finish a rolling restart. An API rate-limits you with a 429 it fully expects you to retry. Treating the first failure as fatal turns every one of these normal, transient conditions into a paged engineer.

The fix is to retry — but retrying naively is its own trap. Retry instantly and you hammer a recovering service into staying down. Retry forever and a genuinely dead dependency hangs your script indefinitely. Retry a 404 and you wait a minute to confirm what you already knew. Good retries are bounded, backed off, and selective.

A retry function you can reuse

bash

#!/bin/bash
# Script: retry.sh
# Purpose: Survive transient failures (a flaky network, a service still booting)
#          instead of dying on the first error
# Usage: retry.sh   (or source the retry() function into your own scripts)
set -euo pipefail

CHECK="✓"
CROSS="✗"

# Retry a command with exponential backoff and jitter.
# Usage: retry <max_attempts> <command> [args...]
retry() {
    local max_attempts="$1"; shift
    local attempt=1
    local delay=1            # base delay in seconds — doubles each round
    local max_delay=30       # cap so the backoff never runs away

    until "$@"; do
        if (( attempt >= max_attempts )); then
            echo "$CROSS '$*' failed after $attempt attempts" >&2
            return 1
        fi
        # 0–2s of jitter so parallel callers don't all retry on the same beat
        local pause=$(( delay + RANDOM % 3 ))
        echo "$CROSS attempt $attempt failed — retrying in ${pause}s" >&2
        sleep "$pause"
        attempt=$(( attempt + 1 ))
        delay=$(( delay * 2 ))
        if (( delay > max_delay )); then delay=$max_delay; fi
    done

    echo "$CHECK '$*' succeeded on attempt $attempt"
}

# The fix for my 11pm deploy: wait for Postgres to accept connections first.
retry 6 nc -z -w 2 db.internal 5432
echo "$CHECK database reachable — running migration"

# A release artifact that occasionally 503s while the CDN warms up.
retry 5 curl -fsS --max-time 10 -o /tmp/pkg.tar.gz https://releases.example.com/pkg.tar.gz

The whole engine is until "$@"; do ... done. until runs the command and executes the loop body only when it fails, exiting the instant it succeeds. Passing the command as "$@" (after shift-ing past the attempt count) means the function retries anything — a curl, an ssh, a port check, your own script — without the function caring what it is.

The backoff is the three lines at the bottom of the loop: sleep for the current delay plus a touch of jitter, then double the delay, capped at max_delay. That gives you 1s, 2s, 4s, 8s, 16s, 30s, 30s… The jitter — RANDOM % 3 — looks trivial, but it's what stops a fleet of machines that all failed at the same second from retrying at the same second and knocking the recovering service straight back over.

The part that's easy to get wrong

bash

# Good: a transient failure that retrying can fix
retry 6 nc -z -w 2 db.internal 5432

# Bad: retrying a deterministic failure just delays the error 30 seconds
retry 6 curl -fsS https://api.example.com/v1/thing-that-returns-404

A retry loop is only as smart as what you point it at. The database-readiness check belongs in a loop because the answer changes — it's "no" until the database boots, then "yes." A request that returns 404 returns 404 on attempt one and attempt six; the loop just postpones the failure and buries the real status under retry logging. When you can, branch on the exit code or HTTP status and retry only the transient ones.

For plain curl, its built-in --retry 5 --retry-delay 2 does most of this and is simpler — reach for the function when the thing you're retrying isn't curl, or when you want one backoff policy across a database probe, an ssh call, and a download all at once.

That eleven-o'clock deploy never paged me again once the migration waited for the port instead of assuming it. The interesting part is what didn't change: the database still took its six seconds to boot, the network still blipped occasionally. Retrying didn't make the dependencies faster — it stopped a normal, transient slowness from being treated as a fatal error.

Retries are the third leg of an unattended job that survives. Locking with flock stops a slow run from stacking; timeout stops a hung run from jamming the lock; retries stop a transient blip from killing the run outright. The Hardened Cron Wrapper Generator wires the retry loop into a full wrapper alongside the other two, and Bash Scripts That Survive Cron is the end-to-end version.

Run this script on a real Linux server

Get $200 free credit — DigitalOcean

Get $200 Free →

Affiliate link · we earn a commission

Want to test the wait-for-port pattern for real? Bring up a droplet, start a service on a delay, and watch retry poll the port until it opens. The rest of the library is at bashsnippets.xyz — the website up-check and restart a service if it stopped pair naturally with this one.

Retry a Command with Exponential Backoff in Bash

A retry function you can reuse

The part that's easy to get wrong

The Production Bash Toolkit

Related Snippets

Frequently Asked Questions