$cat essential-scripts.sh

25 Bash Scripts Every Linux Sysadmin Needs

Published: June 6, 202625 scriptsNo external dependenciesMIT License

Quick Answer

The 25 scripts on this page cover five failure categories that account for the majority of production Linux incidents: disk growth, backup gaps, service crashes, security drift, and SSL expiry. None require third-party installs. Every script runs with bash, coreutils, and the utilities that ship on Ubuntu, Debian, CentOS, Rocky, and Alma Linux. Add them to /opt/scripts/, make them executable with chmod +x, and schedule the monitoring group on cron. The disk warning fires at 80% so you have time to investigate before 100% shuts down writes. The SSL check fires at 30 days so you have time to renew before browsers throw certificate errors. The service watchdog runs every 5 minutes so the longest your users wait for a crashed service is 5 minutes, not until the next business morning.

A server doesn't tell you it's about to fail. The disk fills incrementally — 1% a day, every day — until the database starts throwing write errors and the application logs stop rotating and the behavior presents as five different problems instead of one root cause. The SSL certificate expires silently at 2am on a Sunday. The service crashes at 3am, doesn't restart, and the first person who finds out is a user who emails support six hours later. By the time you notice any of these, the window for a graceful fix has long since closed.

The 25 scripts on this list are the difference between managing infrastructure and reacting to it. They are the early-warning layer that runs while you sleep — checking disk thresholds, verifying SSL expiry dates, confirming services are alive, auditing open ports for unexpected listeners, and shipping timestamped backups to a remote host before anything has a chance to go wrong.

None of these require third-party monitoring agents, SaaS dashboards, or paid platforms. No agents to install, no API keys to rotate, no subscription to lapse. Everything runs with bash, coreutils, and the standard utilities that ship on every major Linux distribution. Add them to /opt/scripts/, make them executable, schedule them on cron, and stop finding out about problems from your users.

Each script entry below includes the core command or one-liner, the specific failure it prevents, and a link to the full annotated script with configurable thresholds, cron examples, and production-tested error handling.

Disk Monitoring & Management

Your disk never fills all at once. It fills incrementally — 1% at a time, across weeks or months — until the database starts throwing write errors, the web server returns 500s with no application-level explanation, and log rotation silently starts failing. By the time any of those symptoms appear, the disk is usually at 98% or 100% and you are doing emergency surgery instead of routine maintenance.

The discipline is to run a threshold check on cron and treat its output as a chore list rather than an emergency alert. At 80% full you have time to investigate what grew and make a measured decision. At 95% you are deleting files under pressure. At 100% writes stop and every service on the machine begins misbehaving in ways that obscure the actual cause. These five scripts represent the five questions in that investigation: how full is it, what is taking the space, can I delete the old logs, are there duplicate copies taking up space, and is Docker holding hidden storage I forgot about.

#01

Disk Space Warning

Problem: Production outage caused by a filesystem hitting 100% capacity — writes fail, databases corrupt, logs stop, services crash with misleading errors.

The most reliable outage prevention on a Linux server is a simple threshold check running on cron. This one-liner reads every mounted filesystem and prints any that have crossed the 80% mark. At 80% you still have hours or days to respond. Pipe the output to mail and you have an email alert. Pair it with find-large-files-linux and you have a complete disk triage workflow.

The full script adds a configurable threshold (default 80%), hostname identification for servers where you run the same script on multiple machines, and a clean crontab entry that fires daily at 7am.

df -h | awk 'NR>1 && $5+0 >= 80 {print $0}'
Full script with configurable threshold and cron setup
#02

Find Large Files

Problem: The disk is full but df only tells you which filesystem — not what is consuming the space.

When disk-space-warning fires, the next question is always: what grew? This command answers it in seconds. du -ah walks every file and directory under the target path, calculates sizes in human-readable format, and sort -rh puts the largest items at the top. The head -20 limits output to the top 20 candidates worth investigating.

On a typical server, the culprits are application logs, database dump files, old Docker layers, and downloaded archive files that were never cleaned up. The full script adds exclusion patterns for /proc and /sys, which contain virtual filesystems that can produce misleading sizes.

du -ah /var | sort -rh | head -20
Full script with exclusion patterns
#03

Delete Old Log Files

Problem: Log files grow without bound on servers where rotation is not configured correctly — /var/log filling up is the most predictable disk failure on any production server.

Log growth is almost always the first thing to prune when a disk starts filling. Application logs, nginx access logs, systemd journal dumps — on a server that has been running for a year without rotation configured correctly, /var/log can hold tens of gigabytes of files you have already read and will never need again.

This command finds every .log file under /var/log that was last modified more than 30 days ago and deletes it. The -mtime +30 flag means "more than 30 days ago" — the + prefix is easy to misread and important to get right. The full script includes a --dry-run flag that prints what would be deleted without touching anything.

find /var/log -name "*.log" -mtime +30 -delete
Full script with retention configuration and dry-run flag
#04

Find Duplicate Files

Problem: Copied database dumps, cloned git repositories, archived files moved to multiple locations — duplicate files are the second most common source of unexpected disk growth after logs.

This script uses md5sum to generate a checksum for every file under a target path, then groups files with identical checksums. The uniq -w32 -D flag matches on the first 32 characters of each line — the MD5 hash — and prints all duplicate lines. On a shared server or a machine used for development work, the space reclaimed from forgotten duplicates is often several gigabytes.

The full script adds directory-level comparison, a human-readable output format that groups each duplicate set, and a total bytes-wasted summary so you know whether the cleanup is worth the time before you commit to it.

find /path -type f | xargs md5sum | sort | uniq -w32 -D
Full script with directory comparison and size summary
#05

Docker Prune Cleanup

Problem: Docker accumulates gigabytes of garbage — stopped containers, dangling images, unused build cache — without ever asking permission.

Every Docker-based deployment leaves behind a layer of accumulated debris. Stopped containers that were never removed. Intermediate build layers from images that have since been replaced. Volumes from services that no longer exist. On an active CI server or deployment host, this can reach 20–40GB within a month of operation without any action on your part.

This command first shows you the current state with docker system df so you know what you are about to reclaim, then removes all images that have not been used in the last 30 days. The --filter "until=720h" flag is the key safety valve — it preserves images actively in use while clearing the historical accumulation. The full script adds volume pruning with a separate confirmation step and a before/after disk usage report.

docker system df && docker image prune -af --filter "until=720h"
Full script with volume pruning and before/after disk report
Backup & Recovery

The backup you don't have is the one you will need. The backup you have that was never tested is indistinguishable from no backup until the moment you try to restore it. A MySQL dump that has been silently failing for three weeks because the credentials changed is not a backup — it is a false sense of security with a timestamp on it.

These three backup scripts address the three most common gaps: local file archives with retention management, database dumps with verified output, and incremental remote backups over SSH. Run all three on cron, send output to a log file, and check that log file weekly. A backup job that runs without emitting output is either working perfectly or silently failing — the log is the only way to tell the difference.

#06

Automated File Backup

Problem: Manual backups that get skipped when someone is busy, and timestamped archives that don't clean themselves up until the backup disk fills.

This command creates a compressed .tar.gz archive of the target directory with today's date embedded in the filename. The $(date +%Y%m%d)substitution runs at execution time, so every cron invocation produces a distinct archive — 20260606_backup.tar.gz, 20260607_backup.tar.gz, and so on. Without a retention policy, these archives accumulate until the backup volume fills.

The full script adds automatic rotation — archives older than a configurable number of days are deleted at the end of each run — and a verification step using tar -tzf to confirm the archive is not corrupt before the old one is removed.

tar -czf /backups/$(date +%Y%m%d)_backup.tar.gz /var/www
Full script with rotation and archive verification
#07

MySQL Database Backup

Problem: A deploy overwrites data before anyone thought to run a manual dump, or a backup job that has been failing silently for weeks because credentials changed.

mysqldump --all-databases produces a SQL dump of every database in the MySQL instance. Redirect it to a timestamped file and you have a recoverable point-in-time backup. This is the minimum viable database backup — one command, one output file, scheduled on cron at 2am before the morning deployment window.

The full script handles credential management via .my.cnf so passwords never appear in the process list, compresses output with gzip to reduce storage, rotates old dumps automatically, and verifies the dump file is non-empty before completing — so a silent failure writes a zero-byte file that is immediately detectable rather than a missing file you only notice at restore time.

mysqldump --all-databases > /backups/$(date +%Y%m%d)_mysql.sql
Full script with credentials handling and retention
#08

rsync Remote Backup

Problem: A local-only backup that disappears with the machine — disk failure, datacenter incident, or ransomware all destroy local backups along with the data they were protecting.

rsync over SSH is the most practical incremental backup mechanism available without installing additional software. The -avz flags enable archive mode (preserves permissions, timestamps, symlinks), verbose output for the log, and compression for the transfer. The --delete flag mirrors deletions to the remote, keeping the backup in sync rather than accumulating stale files. Only changed files transfer on each run, making subsequent backups fast even for large directories.

The full script adds a --dry-run flag for testing, exclude patterns for cache directories and temporary files, and a cron-ready invocation that logs transfer statistics to a dedicated file for weekly review.

rsync -avz --delete -e ssh /local/path/ user@remote:/backup/path/
Full script with dry-run, exclude patterns, and cron setup
Service & Process Management

A crashed service that nobody knows about is a service that has been down since 3am. The process table shows nothing. Logs show the last successful request was hours ago. The first notification you receive is a user email the next morning asking why the site is down. These scripts close that gap — the service watchdog checks every 5 minutes and restarts the process before any user ever notices the downtime window.

Process management scripts also solve the development-side problem: port conflicts that prevent services from starting, runaway processes consuming all available CPU, and the recurring need to understand the current resource consumption before a deploy. The scripts in this section address both the production monitoring and the operational debugging use cases.

#09

Restart Service If Stopped

Problem: A service crash nobody finds out about until a user reports it hours later — nginx, postgres, redis, or any service that doesn't auto-recover from a segfault.

This is the minimal service watchdog. systemctl is-active --quiet returns exit code 0 if the service is running and non-zero if it is not. The ||operator runs the restart only when the check fails. Put this on a 5-minute cron and your maximum undetected downtime drops from "until someone notices" to 5 minutes.

The full script handles multiple services in a loop, sends an email alert on each restart event so you know the service crashed even if it recovered, and logs every restart with a timestamp for post-incident review. The log is how you notice that nginx is restarting three times a day because of a memory leak — which a naked watchdog loop would mask by restarting it before anyone sees a symptom.

systemctl is-active --quiet nginx || systemctl restart nginx
Full script with email alerting and multiple services
#10

Kill a Process

Problem: A runaway process consuming all CPU or memory that normal termination methods can't stop — OOM killer hasn't fired yet but the machine is becoming unresponsive.

pkill -f sends SIGTERM to every process whose full command line matches the pattern, giving the process a chance to clean up. The || chain falls through to kill -9 only if the graceful kill produced no running process to terminate. This two-stage approach avoids data corruption from hard kills when a graceful kill would work, while still handling processes that ignoreSIGTERM.

The full script adds a safety confirmation prompt before killing, a pgrep -apreview showing what would be killed before execution, and a post-kill verification check to confirm the process is no longer running.

pkill -f "process-name" || kill -9 $(pgrep -f "process-name")
Full script with signal escalation and safety confirmation

Running these scripts on a VPS? DigitalOcean Droplets start at $4/mo — $200 free credit for new accounts.

Get $200 free credit — DigitalOcean

Get $200 Free →

Affiliate link · we earn a commission

#11

Kill Process on Port

Problem: EADDRINUSE — 'address already in use' — your dev server or production service won't start because something is squatting on the port it needs.

lsof -ti returns only the PID of the process holding the specified port — no headers, no formatting, just the number. xargs -r kill passes that PID to kill, with -r preventing an error if the output is empty (the port was already free). One command, one port freed, no need to look up the PID manually first.

The full script adds SIGTERM followed by a brief wait and then SIGKILLif the process hasn't exited, supports UDP ports in addition to TCP, and works on systems where lsof is not available by falling back to ss and/proc.

lsof -ti :3000 | xargs -r kill
Full script with safe kill escalation and UDP support
#12

Monitor CPU and RAM Usage

Problem: Resource exhaustion that causes services to become progressively slower before crashing — caught in a log as a trend rather than a sudden unexplained event.

This awk expression reads /proc/meminfo directly — no external tools, works on every Linux kernel — and calculates used memory as a percentage of total. The same approach works for CPU utilization by reading /proc/statacross two samples with a sleep interval. Both metrics are available without top, htop, or any interactive tool that would block a cron job.

The full script checks both CPU and memory against configurable thresholds, logs each reading with a timestamp to a file you can tail -f during an incident, and sends an email alert when either metric crosses the threshold. A week of this log gives you a baseline — anything that deviates from baseline is worth investigating before it becomes a crash.

awk '/MemFree/{free=$2} /MemTotal/{total=$2} END{print (total-free)/total*100 "% used"}' /proc/meminfo
Full script with CPU, thresholds, and timestamped logging
#13

Quick System Info Report

Problem: Spending five minutes running individual commands to understand the state of an unfamiliar or inherited server when you first SSH in.

This compound command chains the five most useful system overview commands with&&: kernel version and hostname from uname -a, uptime and load averages, memory consumption from free -h, disk usage across all filesystems from df -h, and the top 5 memory consumers from ps aux. All of this fits in a single terminal screen.

The full script formats each section with a labeled header so the output is readable at a glance, adds IP address information from ip addr, and can be added to/etc/profile.d/ so it runs automatically on every SSH login — giving you an instant snapshot of server health without any manual commands.

uname -a && uptime && free -h && df -h && ps aux --sort=-%mem | head -5
Full formatted script with section headers and IP info
Network & Connectivity

Network problems are uniquely frustrating because they often appear as application problems. A site that is down looks like a web server issue until you confirm the web server is running — then it looks like a database issue — then you discover the issue is actually that port 443 stopped accepting connections because certbot failed and nginx reloaded with an invalid config. External checks that bypass the application layer are the fastest way to narrow the scope of a network incident.

#14

Check If Website Is Up

Problem: Your site went down and you found out from a user, not from an automated check.

curl -s -o /dev/null -w "%{http_code}" makes an HTTP request and prints only the response status code — discarding the response body, suppressing progress output, and returning in under a second on a live connection. A 200 means the site is responding. Anything else — 5xx, connection refused, or no response — means it is not.

The full script wraps this in a loop with configurable retry logic (3 attempts with a 10-second interval before declaring the site down), sends an email alert with the HTTP status code on failure, and supports HTTPS with certificate validation. Run it on cron every 5 minutes and your average time-to-detection for an outage drops from "whenever a user reports it" to 5 minutes maximum.

curl -s -o /dev/null -w "%{http_code}" https://yoursite.com
Full script with retry logic and email notification
#15

List Open Ports

Problem: An unknown service listening on a public port that you didn't open and didn't know was there — the first symptom of a misconfiguration or a compromise.

ss -tlnp lists every TCP socket in the LISTEN state with the process name and PID that owns each socket. The flags: -t for TCP only, -l for listening sockets only, -n for numeric output (faster, no DNS resolution),-p for process information. This is the first command to run on any server you inherit or haven't reviewed recently.

Unexpected entries in this output are how you find: a debug server left running from a deployment, a database port accidentally exposed to 0.0.0.0 instead of 127.0.0.1, a crypto miner that opened a listener, or a misconfigured application binding to a port that conflicts with production. The full script adds UDP sockets, cross-references output against a known-good baseline, and flags any new listeners that appeared since the last audit.

ss -tlnp
Full script with UDP, lsof integration, and baseline comparison
Security & Access

Security posture on a Linux server degrades slowly, through small decisions that individually seem harmless: a config file given world-read permissions for debugging that was never tightened back, an SSH directory that got wrong permissions during a restore, a certificate renewed manually once and then forgotten in cron. Each of these is a silent failure mode. None of them announce themselves. You find them during audits or after incidents.

The four scripts in this section are the audit. Run them on a schedule and treat the output as a compliance checklist. Any deviation from expected output is worth investigating — it is either a configuration drift you need to correct or a change you made deliberately and need to document.

#16

SSH Key Setup

Problem: SSH brute-force attacks that succeed against password-authenticated servers — the single most common vector for unauthorized server access.

This two-command sequence generates an ed25519 key pair — the current recommended algorithm, smaller and faster than RSA-4096 with equivalent security — and copies the public key to the authorized_keys file on the remote server using ssh-copy-id, which handles the correct permissions and file format automatically. After this runs, you can disable password authentication in /etc/ssh/sshd_config and eliminate the brute-force attack surface entirely.

The full script verifies the key was installed correctly by attempting an authentication test, checks that ~/.ssh has the correct permissions (700), that authorized_keys has 600, and outputs a checklist of recommendedsshd_config hardening settings to apply after key authentication is confirmed working.

ssh-keygen -t ed25519 -C "user@host" && ssh-copy-id user@remote
Full script with authorized_keys verification and permission checks
#17

File Permissions Security

Problem: World-readable config files containing database passwords, web root directories writable by the wrong user, or SSH directories with incorrect permissions that silently break key authentication.

find /var/www -type f -perm /o=w locates every file under the web root where the "other" permission class has write access. -exec chmod o-w {} \;strips that write bit from each match. This handles the most dangerous permission state in a web root: a file that any local process or exploited web application can overwrite. Run this weekly, and any deployment that accidentally sets wrong permissions gets corrected before it becomes an exposure window.

The full script also audits SSH directory permissions (700 for ~/.ssh, 600 for authorized_keys and private keys), finds configuration files readable by non-root users, and produces a permission audit report with each finding labeled by severity.

find /var/www -type f -perm /o=w -exec chmod o-w {} \;
Full script with web root and SSH permission audit
#18

Check SSL Certificate Expiry

Problem: An SSL certificate that expired at 2am with no warning — certbot renewed for months, until a silent failure meant it didn't.

This command connects to the live server, reads the certificate it presents — not the one stored on disk, but the one actually served to browsers — and prints its expiry date. The key word is "live": this catches the scenario where the certificate on disk was renewed but nginx was never reloaded to serve the new one, a failure mode that certbot hooks don't always prevent.

The full script calculates days remaining until expiry, fires an alert at 30 days and again at 7 days, supports multiple domains in a single run, and produces output formatted for cron log review: [OK] example.com expires in 84 daysor [WARN] example.com expires in 12 days. Set it on a weekly cron and you will never again discover certificate expiry from a browser error.

echo | openssl s_client -connect yourdomain.com:443 -servername yourdomain.com \
  2>/dev/null | openssl x509 -noout -enddate
Full script with multi-domain support, days-remaining, and cron setup
#19

Send Email Alert from Bash

Problem: Monitoring scripts that log silently to a file nobody reads — alerts that exist in the log but never surface to the person who needs to act on them.

Every monitoring script on this list becomes significantly more useful when it can notify you rather than waiting for you to check a log file. This one-liner sends a plain text email using the system mail infrastructure — on most servers, eithersendmail or postfix with a local relay. The $(hostname)substitution in the subject line identifies which server sent the alert when you are managing multiple machines.

The full script handles both sendmail and msmtp (for servers without a local MTA), provides a template for the alert body that includes timestamp, hostname, and the specific metric that triggered the alert, and can be sourced as a function by other scripts — so every monitoring script in your toolkit can call send_alert as a single line.

echo "Alert: disk at 95%" | mail -s "DISK WARNING: $(hostname)" admin@yourdomain.com
Full script with SMTP configuration and msmtp fallback
Bash Scripting Fundamentals

The monitoring scripts above are only as reliable as the shell code that implements them. A backup script that exits silently in the middle of a tar operation, a watchdog that catches its own errors and reports success anyway, a deployment script that continues executing after a failed rsync — these are failure modes that correct error handling prevents entirely. The patterns in this section are the foundation that makes all the scripts above safe to run unattended on a production server.

#20

Bash Error Handling

Problem: A script that fails silently in the middle of a backup or deployment, completing from the shell's perspective while having done half the work.

set -euo pipefail is the single most important line in any production bash script. -e exits immediately if any command returns a non-zero exit code.-u treats unset variables as errors rather than expanding them to empty strings. -o pipefail makes a pipeline fail if any stage fails, not just the last one. Without these three flags, a script that encounters a disk-full error partway through a backup continues running and reports success.

The trap line registers an error handler that fires on any unexpected exit, printing the line number where the failure occurred and exiting with a non-zero code that cron can detect. The full script adds a cleanup trap for temporary files, a logging function that timestamps every operation, and a pattern for graceful error messages that distinguish between expected failures and unexpected ones.

set -euo pipefail
trap 'echo "Error on line $LINENO"; exit 1' ERR
Full script with trap cleanup and structured logging
#21

Bash If/Else Examples

Problem: Conditional logic that doesn't handle edge cases — empty strings treated as false, uninitialized variables, exit codes misread as booleans — producing scripts that misbehave silently.

The [[ ]] construct is bash's extended test command — more reliable than [ ] for string testing, supports regex matching with =~, and handles empty string edge cases correctly. The :- parameter expansion provides a default value for potentially unset variables, preventing the -u flag from triggering an unbound variable error. This pattern — test a variable defensively, provide a default — appears in nearly every production bash script.

The full reference covers string testing, numeric comparison with -eq/-lt/-gtvs. arithmetic context, file existence and type checks, exit code testing, and the five most common mistakes beginners make with bash conditionals and the specific output those mistakes produce.

[[ -n "${VAR:-}" ]] && echo "set" || echo "empty"
Full examples with pitfalls and comparison patterns
#22

Create Dated Folder

Problem: Backup directories, report exports, and log archives that overwrite each other because no timestamp was embedded in the directory name.

$(date +%Y-%m-%d) expands to the current date in ISO 8601 format at the moment of execution — 2026-06-06, for example. Combined with mkdir -p(which creates intermediate directories and doesn't error if the target already exists), this produces a uniquely named directory for every day the script runs. The pattern appears in every backup and archiving script in this collection.

The full script covers alternative date formats for different use cases (%Y%m%d for sortable filenames, %Y-%m-%dT%H%M%S for second-precision timestamps when multiple runs per day are possible), and includes a function that creates both the dated directory and a latest symlink pointing to the most recent run.

mkdir -p "$(date +%Y-%m-%d)_backup"
Full script with configurable date format and latest symlink
#23

Search Files for Text (grep)

Problem: Spending 15 minutes manually opening files to find where a config value, error message, or connection string appears across a directory tree.

grep -rn searches recursively through every file under the target path and prints matching lines with filename and line number. The --include="*.conf"filter limits results to files with the specified extension — removing that flag searches all file types, which is useful when you don't know which file type contains the value you're hunting for. Add -i for case-insensitive matching and -l to list only filenames rather than matching lines.

The full reference demonstrates the four most common grep variations for sysadmin work: finding all references to an IP address across config files, searching for a specific error pattern in log files within a date range, inverting a match to find lines that don't contain a required string, and displaying N lines of context around each match with -B and -A.

grep -rn "search_term" /path/to/search --include="*.conf"
Full reference with case-insensitive, invert, and context options
Coming Soon

The following scripts are in development and will be published in the coming weeks. All completed scripts are linked above. View the full library at /snippets.

#24

Parse JSON with jq

Problem: Every API and modern config system returns JSON. Parsing it with grep and sed is fragile — a whitespace change in the response breaks the parser.

jq is the standard tool for processing JSON in bash scripts. It handles nested objects, arrays, and type conversions that would require dozens of lines of fragile sed/awk to replicate. The naive version shown here extracts a top-level field — the full script covers filtering arrays, transforming output format, handling null values safely, and piping API responses directly from curl into jq for single-command data extraction.

curl -s https://api.example.com/status | jq '.status'
Coming soon — view all scripts
#25

Bash Retry on Failure

Problem: Network calls in deployment scripts that fail once and bring down the entire deploy when a simple retry with backoff would have succeeded.

The naive retry loop — until command; do sleep 5; done — has no maximum attempt count, which means a permanently failing command runs forever. The full script adds configurable attempt limits, exponential backoff (doubling the sleep interval on each retry to avoid thundering herd against a recovering service), a maximum wait cap, and a final exit with a non-zero code when all attempts are exhausted. This pattern belongs in every deployment script that makes network calls.

until command; do sleep 5; done  # naive — the full script adds attempt limits + backoff
Coming soon — view all scripts

Running All of These as a Monitoring Suite

The real value isn't any one script — it's running the monitoring group together on a predictable cron schedule and routing all output to a shared log file. A log file that captures daily output from disk checks, SSL expiry, and website uptime gives you something more valuable than any individual alert: a baseline.

A week of this log shows you what normal looks like on your server. You see that disk usage grows by about 0.5% per day, that the SSL certificate has 84 days remaining, that nginx restarts zero times per day. When something deviates from that baseline — disk growing 5% in a day, nginx restarting twice in an hour — you see it as a deviation from the expected pattern rather than an isolated event that you are trying to diagnose without context.

Place your scripts in /opt/scripts/, make them executable with chmod +x /opt/scripts/*.sh, and add entries to /etc/cron.d/ rather than individual user crontabs. The /etc/cron.d/ format runs as a specified user (root in the example below), survives user account changes, and is visible to anyone with root access reviewing the cron configuration. User crontabs are harder to audit and easier to lose during account migrations.

# /etc/cron.d/server-monitoring
# Daily at 7am — disk, SSL, and service health
0 7 * * * root /opt/scripts/check-ssl-expiry.sh >> /var/log/monitor.log 2>&1
5 7 * * * root /opt/scripts/disk-space-warning.sh >> /var/log/monitor.log 2>&1
10 7 * * * root /opt/scripts/check-website-up.sh >> /var/log/monitor.log 2>&1

# Every 5 minutes — service watchdog
*/5 * * * * root /opt/scripts/restart-if-stopped.sh nginx >> /var/log/monitor.log 2>&1

# Nightly at 2am — backups
0 2 * * * root /opt/scripts/mysql-backup.sh >> /var/log/backup.log 2>&1
15 2 * * * root /opt/scripts/rsync-backup.sh >> /var/log/backup.log 2>&1

# Weekly on Monday at 6am — security audit
0 6 * * 1 root /opt/scripts/list-open-ports.sh >> /var/log/security-audit.log 2>&1
5 6 * * 1 root /opt/scripts/file-permissions-audit.sh >> /var/log/security-audit.log 2>&1

The 2>&1 redirect captures both stdout and stderr to the log file. Cron jobs that produce no output send no email through the system MTA — if you want email alerts only on failure, redirect stdout to the log and let stderr go to cron's default behavior (email on any output). The staggered minute offsets (0, 5, 10) prevent all scripts from running simultaneously and competing for I/O at the same second.

tail -f /var/log/monitor.log during the first week of operation lets you verify every script is producing expected output. After that, a weekly grep -i warn /var/log/monitor.logtakes 30 seconds and surfaces anything worth investigating. That is the complete maintenance overhead for a monitoring setup that would otherwise require a paid platform subscription.

Frequently Asked Questions

What bash scripts should every sysadmin have?

The essential set covers five areas: disk monitoring (disk-space-warning, find-large-files-linux), automated backups (rsync-remote-backup, mysql-database-backup), service health (restart-service-if-stopped, check-if-website-is-up), security auditing (list-open-ports-linux, file-permissions-security), and SSL monitoring (check-ssl-certificate-expiry). These 9 scripts prevent the most common Linux server failures.

Do these bash scripts work on Ubuntu, CentOS, and Debian?

Yes. All scripts use POSIX-compatible bash and standard GNU coreutils available on every major Linux distribution — Ubuntu 20.04+, Debian 11+, CentOS 7+, Rocky Linux, AlmaLinux, and Amazon Linux 2. Scripts that depend on systemctl are labeled accordingly; they work on any systemd-based distribution. The list-open-ports-linux script includes a fallback from ss to netstat for older distributions where ss is not the default.

Can I run these bash scripts on a cron schedule?

Yes. Every script on this list is cron-safe by design: no interactive prompts in default mode, clean single-line or structured multi-line output suitable for log files, and non-zero exit codes on failure that cron can use to trigger email notification through the system MTA. The cron examples in each full script entry use /etc/cron.d/ format, which runs as a specified user rather than inheriting the calling environment — avoiding the most common cron failure mode where a script works when run manually but fails on cron because PATH or environment variables differ.

Are these bash scripts free to use?

Yes. All scripts on BashSnippets.xyz are published under the MIT License. Copy, modify, and deploy them in any environment — including production — without restriction. No attribution required, no registration, no paywall. The MIT License text is included in each full script file. If you find a bug or improvement, the contribution guide is at /about.

Put these scripts to work on a real server — DigitalOcean Droplets from $4/mo with $200 free credit.

Get $200 free credit — DigitalOcean

Get $200 Free →

Affiliate link · we earn a commission