A deployment pipeline ran for three weeks reporting every step green, and every one of those runs shipped a build that had failed to compile. The build step ended in npm run build | tee build.log so the output could be archived. Bash returns the exit status of the last command in a pipeline, which was tee — and tee always succeeds at copying text. The compiler's non-zero exit was discarded the instant the pipe handed off to tee. The error message was right there in build.log. GitHub Actions saw exit code 0, painted the step green, and deployed the broken artifact. Nobody looked at the log because the checkmark said there was nothing to look at.
That is the defining property of bash in CI: a silent failure can present as success. On a server you watch the command fail in your terminal. In a pipeline, a swallowed exit code produces a green checkmark over broken code, and the gap between "the logs show an error" and "the pipeline reports failure" is exactly where outages are born. This guide is about closing that gap. It is written for the developer who inherits bash they did not write — inside a GitHub Actions run: block, a Docker ENTRYPOINT, a Kubernetes init container, a deploy script — and has to run it, debug it, and change it without breaking production.
Why bash in CI/CD fails differently than bash on a server
There are four failure modes that are specific to CI and rarely bite you at an interactive prompt.
1. Exit codes swallowed by a pipe. The story above is the canonical case. Any command | tee, command | grep, or command | sort returns the exit code of the right-hand command, masking a failure on the left. On a server you would notice the bad output; in CI the pipeline only checks the final exit code, and it is wrong. set -o pipefail is the fix, and it belongs in every CI step that contains a pipe whose left side can fail.
2. Shell provisioning differences. GitHub's ubuntu-latest runners give you bash 5.x. The macos-latest runners ship bash 3.2 — a 2007 release frozen for licensing reasons — where associative arrays, ${var,,} case conversion, and mapfile do not exist. A script that passes on the Ubuntu runner can fail with a syntax error on the macOS runner in the same workflow. If a step must run cross-platform, target the lowest common denominator or pin the shell.
3. Environment variable injection gaps. CI sets variables you do not control and does not set ones you assume exist. $HOME, $USER, and $PATH differ from your laptop; $CI, $GITHUB_SHA, and dozens of others appear that you never defined. A script that reads $DEPLOY_TARGET and finds it empty will, without set -u, treat it as the empty string and do something silently wrong rather than failing.
4. Interactive-shell assumptions. CI runs a non-interactive, non-login shell. It does not source ~/.bashrc or ~/.bash_profile, so aliases are absent, functions you defined for your prompt are gone, and PATH may not include directories your login shell adds. A command that works when you type it can produce command not found in CI purely because the environment that defined it was never loaded.
Here is the shape of the fix at the workflow level — set -euo pipefail at the top of a multi-line run: block:
It belongs there even though GitHub Actions reports step failures on its own, because Actions only sees the exit code of the whole block. Without pipefail, the tee on line three hides a build failure; without set -e, a failure on line two does not stop line three from running against a half-built tree. The runner's own failure detection is downstream of bash's — if bash reports success, Actions believes it.
The minimal safe bash header for every CI step
Every CI bash step should start with the same few lines. They are not boilerplate; each one closes one of the failure modes above.
set -e exits the moment any command returns non-zero, which in CI means the step exits non-zero and the workflow registers a failure and fires its notifications. set -u treats any reference to an unset variable as an error and exits — so a typo'd $DPLOY_TARGET or a missing secret fails immediately instead of expanding to an empty string and corrupting a path. set -o pipefail makes a pipeline return the first non-zero exit code among its commands rather than only the last, which is what catches the | tee class of bug.
The IFS=$'\n\t' line changes word splitting so bash splits unquoted expansions only on newlines and tabs, not spaces. In CI you frequently process lists where the elements are paths or values that may contain spaces; restricting IFS makes accidental space-splitting far less likely. You do not always need it, but it is cheap insurance and harmless when your data has no spaces. For a deeper treatment of why these flags matter and the failure each one prevents, see Bash Error Handling.
A small logging helper makes CI logs readable when several steps interleave output. Prefix every line with the step name so a failed run is greppable:
Fifteen lines, and they eliminate all four CI-specific failure modes at the top of every step.
Handling environment variables and secrets safely
In GitHub Actions, env: values and secrets: values arrive in the shell identically — both become ordinary environment variables. The shell cannot tell a secret from a public config value; the only difference is that Actions masks the secret's literal string in the log output. That is exactly why it is dangerous: the moment you transform a secret — base64-decode it, slice it, interpolate it into a longer string — the transformed value no longer matches the mask, and it prints in clear text.
The first habit is failing fast when a required variable is missing. set -u catches an unset variable, but a variable set to an empty string passes -u. Use the ${VAR:?message} expansion to require a non-empty value:
If DEPLOY_KEY is missing or blank, the script aborts on this line with your message, instead of failing on line 47 with a cryptic permission denied that takes twenty minutes to trace back to a missing secret.
For multiple required variables, a validate_env function run at the top of the script turns a class of mid-run mysteries into a single clear message at startup:
The ${!var} indirect expansion reads the value of the variable named by $var, so the function checks each requested variable by name and collects every missing one before exiting. A developer who forgot two secrets sees both in one message rather than discovering them one failed run at a time.
On the leak pattern: never echo a secret to debug it, and be wary of set -x (covered below) in any step that touches secrets — trace mode prints every expansion, and a masked secret that gets concatenated or transformed will appear in the trace in a form the masker does not recognize. If you must trace a step that handles secrets, scope the trace tightly around the non-secret logic.
Exit codes in pipelines — the silent killer
This is the failure mode worth understanding in full, because it is the one that ships broken code under a green checkmark.
Consider grep "ERROR" deploy.log | wc -l > /dev/null. The intent is "fail if there are errors in the log." It never fails. Bash reports the exit code of the last command in the pipeline — wc, which succeeds at counting whether the count is zero or a thousand. The grep exit code, the one that actually carries the signal, is thrown away.
Two tools fix this. PIPESTATUS is an array holding the exit code of every command in the most recent pipeline:
PIPESTATUS must be read immediately after the pipeline — the very next command overwrites it — and indexes left to right, so [0] is the leftmost command.
set -o pipefail is the broader fix: it makes the whole pipeline return the first non-zero exit code among its members, so npm run build | tee build.log fails when npm fails regardless of tee. Turn it on in your header and most pipeline-exit bugs disappear.
pipefail has one well-known false positive: grep returns exit code 1 when it finds no matches, which is often a perfectly fine outcome. Under pipefail plus set -e, a no-match grep in a pipeline aborts the script. Handle it explicitly:
The || true swallows grep's "no match" exit so set -e does not fire, while still letting you act on the count. Use it deliberately and only where a non-match is genuinely acceptable — blanketing every command in || true reintroduces exactly the silent-success problem you are trying to eliminate.
Docker and bash — entrypoints, init containers, and exec
Every Docker ENTRYPOINT shell script must end with exec "$@", and forgetting it breaks signal handling in a way that is invisible until a deploy hangs.
Without exec, your entrypoint script stays running as PID 1 and launches your real process as a child. When the orchestrator sends SIGTERM to stop the container — which it does on every docker stop, every rolling deploy, every pod eviction — that signal goes to the script, not to your application. The script, a plain bash process, does not forward it. Your application never learns it should shut down. After the grace period (10 seconds by default), the orchestrator gives up and sends SIGKILL, which terminates your process abruptly with no chance to flush buffers, close connections, or finish in-flight requests. Connections drop, data in flight is lost, and shutdowns that should take a second take ten.
exec "$@" replaces the shell process with your application, so your application becomes PID 1 and receives signals directly. Combined with a trap, you get graceful shutdown:
The wait_for function is the wait-for-it pattern: it probes a dependency's TCP port with a short timeout, retries with a backoff that climbs and then caps, and gives up with a clear error after a bounded number of attempts rather than blocking forever. The trap pattern here is worth understanding in isolation; the Bash trap & Signal Handler Builder generates the exact signal-handling block for whatever combination of EXIT, ERR, and TERM your entrypoint needs.
Deploying with bash — the patterns that survive production
A deploy script that copies files over the running release is a deploy script that serves half-written files to live traffic during the copy. The patterns below avoid that.
Atomic symlink swap. Deploy into a fresh timestamped directory, then flip a current symlink in one atomic operation. The webserver always points at current; there is no window where it serves a partial release.
Rollback by preserving releases. Because each deploy lands in its own timestamped directory and you keep the last several, rollback is re-pointing the symlink at the previous directory — the same script can do both.
Post-deploy health check. Curl a health endpoint in a loop after the swap; if it does not return 200 within a bounded number of attempts, fail the deploy so the pipeline goes red.
Git SHA tagging. Write the deployed commit's SHA into the release so you can always answer "what exactly is running right now."
ln -sfn is the load-bearing line: -f forces replacement, -n treats the existing symlink as a file rather than following into the directory it points at, and the operation is atomic so no request ever sees a half-swapped state. The health check converts "the deploy finished" into "the deploy works," and a failure rolls back automatically instead of leaving a broken release live. For deploys that authenticate to remote hosts, the SSH key provisioning is its own concern — see SSH Key Setup Script for non-interactive key setup that does not prompt mid-pipeline. And for deploys triggered on a schedule rather than on push, the Cron Job Builder generates the crontab entry with correct environment and logging.
Debugging a failing CI pipeline bash script
When a step fails and the logs do not say why, trace execution. bash -x runs a script with every command printed before it executes:
For a long script, tracing the whole thing is noise. Toggle tracing around just the suspect section with set -x and set +x:
Reading trace output: each traced line is prefixed with +, and nested function calls add more + characters so you can see call depth. The critical tell is variable expansion — a line like + rsync -a /build/ /srv/app/releases// with a doubled slash or an obviously empty segment shows you a variable expanded to nothing, which is usually the bug. An unset variable under set -u aborts before it ever reaches the trace; an empty variable expands to nothing and shows as a gap in the traced command. That distinction — abort versus silent gap — tells you whether the variable was never set or was set to the empty string.
Finally, capture output without losing exit codes. The naive ./deploy.sh | tee deploy.log reintroduces the pipe-exit-code bug from earlier. Either set pipefail so the pipeline reports deploy.sh's failure, or read ${PIPESTATUS[0]} immediately after the pipe to recover the real exit code. In a debugging context where you are capturing logs of a step you suspect is failing, getting the exit code right is the whole point — a log of a failure that the pipeline scored as success is the original problem all over again.
Checklist — is your CI bash script production-ready?
- Starts with
set -euo pipefail - All required env vars validated with
${VAR:?}(or avalidate_envcall) at startup - No pipelines where exit codes matter without
set -o pipefail - Cleanup on failure via a
trap EXIThandler - Docker entrypoint ends with
exec "$@" - Deploy script preserves the previous release for rollback
- Health check runs after deploy; the script fails if the check fails
- Tested on the same runner OS (Ubuntu 22.04 / 24.04) the pipeline actually uses