I once spent forty minutes at eleven at night debugging a deploy that wasn't broken. The release script ran the database migration, the migration threw connection refused, the script exited non-zero, the deploy rolled itself back, and I got paged. I read the migration. I read the logs. I checked the database — it was up, it was healthy, it accepted my connection instantly. I re-ran the deploy and it worked. So I did what you do at eleven at night and chalked it up to gremlins, until it happened again two days later and I finally watched the timing: the script brought up a fresh database container and started the migration about six seconds before Postgres finished initializing and began accepting connections. The migration was racing the database's boot, and most of the time it won, and the times it lost I lost forty minutes.
The script wasn't wrong about anything except its assumption that a dependency is ready the instant you ask for it. In production, dependencies are eventually ready. Networks blip. A service you call returns a 503 for the two seconds it takes to finish a rolling restart. An API rate-limits you with a 429 it fully expects you to retry. Treating the first failure as fatal turns every one of these normal, transient conditions into a paged engineer.
The fix is to retry — but retrying naively is its own trap. Retry instantly and you hammer a recovering service into staying down. Retry forever and a genuinely dead dependency hangs your script indefinitely. Retry a 404 and you wait a minute to confirm what you already knew. Good retries are bounded, backed off, and selective.
A retry function you can reuse
The whole engine is until "$@"; do ... done. until runs the command and executes the loop body only when it fails, exiting the instant it succeeds. Passing the command as "$@" (after shift-ing past the attempt count) means the function retries anything — a curl, an ssh, a port check, your own script — without the function caring what it is.
The backoff is the three lines at the bottom of the loop: sleep for the current delay plus a touch of jitter, then double the delay, capped at max_delay. That gives you 1s, 2s, 4s, 8s, 16s, 30s, 30s… The jitter — RANDOM % 3 — looks trivial, but it's what stops a fleet of machines that all failed at the same second from retrying at the same second and knocking the recovering service straight back over.
The part that's easy to get wrong
A retry loop is only as smart as what you point it at. The database-readiness check belongs in a loop because the answer changes — it's "no" until the database boots, then "yes." A request that returns 404 returns 404 on attempt one and attempt six; the loop just postpones the failure and buries the real status under retry logging. When you can, branch on the exit code or HTTP status and retry only the transient ones.
For plain curl, its built-in --retry 5 --retry-delay 2 does most of this and is simpler — reach for the function when the thing you're retrying isn't curl, or when you want one backoff policy across a database probe, an ssh call, and a download all at once.
That eleven-o'clock deploy never paged me again once the migration waited for the port instead of assuming it. The interesting part is what didn't change: the database still took its six seconds to boot, the network still blipped occasionally. Retrying didn't make the dependencies faster — it stopped a normal, transient slowness from being treated as a fatal error.
Retries are the third leg of an unattended job that survives. Locking with flock stops a slow run from stacking; timeout stops a hung run from jamming the lock; retries stop a transient blip from killing the run outright. The Hardened Cron Wrapper Generator wires the retry loop into a full wrapper alongside the other two, and Bash Scripts That Survive Cron is the end-to-end version.
Run this script on a real Linux server
Get $200 free credit — DigitalOcean
Get $200 Free →Affiliate link · we earn a commission
Want to test the wait-for-port pattern for real? Bring up a droplet, start a service on a delay, and watch retry poll the port until it opens. The rest of the library is at bashsnippets.xyz — the website up-check and restart a service if it stopped pair naturally with this one.