This post outlines a practical approach to verify that your backups are mounted, recent, and complete. It uses generic examples only—no real servers, paths, or credentials.
Objectives
- Verify required backup mounts are present (e.g., CIFS/NFS/local volumes).
- Check local disk usage against thresholds.
- Confirm recent presence of database and filesystem backups.
- Optionally query a remote backup host over SSH for validations.
- Return a clear pass/fail summary and a non-zero exit code on failure (for monitoring/cron).
What the Script Checks
- Mounts: Ensure all expected mount points exist and are mounted.
- Disk Space: Parse
df -h
and alert if thresholds are exceeded (e.g., 85% warn, 95% critical). - Local Backups: Scan designated directories for recent files (e.g.,
.sql.gz
or.tar.gz
within the last 24–30 hours). - Remote Checks (optional): Run limited SSH commands against a backup server to confirm recent artifacts.
- Daily Uploads (optional): Confirm at least one file exists for the current date (e.g.,
YYYY/MM/DD/…
).
Example Configuration (Placeholders)
# Example-only configuration (no real paths or hosts)
REQUIRED_MOUNTS = [
"/mnt/backup", # example mount point
]
LOCAL_BACKUPS = [
{"path": "/backups/db", "exts": [".sql.gz", ".dump.gz"], "max_age_h": 26, "label": "Local DB"},
{"path": "/backups/files", "exts": [".tar.gz", ".zip"], "max_age_h": 26, "label": "Local FS"}
]
REMOTE = {
"host": "backup.example.com",
"user": "backupuser",
"ssh_key": "/home/user/.ssh/id_rsa", # example
"paths": [
{"path": "/remote/db", "exts": [".sql.gz"], "max_age_h": 30, "label": "Remote DB"},
{"path": "/remote/files", "exts": [".tar.gz"], "max_age_h": 30, "label": "Remote FS"}
]
}
DISK_WARN = 85
DISK_CRIT = 95
Example Script Skeleton (Python)
#!/usr/bin/env python
# Example only — replace with your own logic and error handling.
import os, sys, time, subprocess
def check_mounts(mounts):
failures = []
with open("/proc/mounts") as f:
mounted_text = f.read()
for mp in mounts:
if (" " + mp + " ") not in mounted_text:
failures.append("Missing mount: %s" % mp)
return failures
def disk_report(warn=85, crit=95):
p = subprocess.Popen(["df", "-hPT"], stdout=subprocess.PIPE)
out = p.communicate()[0].decode("utf-8", "ignore").splitlines()
alerts = []
for line in out[1:]:
parts = line.split()
if len(parts) < 7:
continue
usep = parts[5].rstrip("%")
mount = parts[-1]
try:
pct = int(usep)
except:
continue
if pct >= crit:
alerts.append("CRIT: %s at %s%%" % (mount, pct))
elif pct >= warn:
alerts.append("WARN: %s at %s%%" % (mount, pct))
return alerts
def newest_file_age_hours(root, exts):
newest = None
newest_mtime = 0.0
if not os.path.isdir(root):
return None
for r, d, files in os.walk(root):
for fn in files:
if exts and not any(fn.lower().endswith(e) for e in exts):
continue
fp = os.path.join(r, fn)
try:
st = os.stat(fp)
except OSError:
continue
if st.st_mtime > newest_mtime:
newest_mtime = st.st_mtime
newest = fp
if newest is None:
return None
return (time.time() - newest_mtime) / 3600.0
def check_local(backups):
failures = []
for b in backups:
age = newest_file_age_hours(b["path"], b["exts"])
if age is None:
failures.append("%s: no files found in %s" % (b["label"], b["path"]))
elif age > b["max_age_h"]:
failures.append("%s: newest %.1fh > %dh" % (b["label"], age, b["max_age_h"]))
return failures
def main():
errors = []
warnings = []
# 1) mounts
errors += check_mounts(REQUIRED_MOUNTS)
# 2) disk space
warnings += disk_report(DISK_WARN, DISK_CRIT)
# 3) local backups
errors += check_local(LOCAL_BACKUPS)
# 4) (optional) add remote SSH checks here if needed
# Summary and exit code
print("Backup Health Check Summary")
if warnings:
print("Warnings:")
for w in warnings: print(" - " + w)
if errors:
print("Errors:")
for e in errors: print(" - " + e)
sys.exit(2)
print("All checks passed.")
sys.exit(0)
if __name__ == "__main__":
main()
Note: The example uses a full directory walk for simplicity. In production, prefer shallow scans, date-scoped paths (e.g., YYYY/MM/DD
), or server-side queries to keep checks fast on network mounts.
Cron Integration (Example)
# Run at minute 12 every hour; log to a file
12 * * * * /usr/bin/python /opt/tools/backup_health_check.py >> /var/log/backup_check.log 2>&1
Sample Output (Example)
Backup Health Check Summary
Warnings:
- /data at 86%
Errors:
- Local DB: newest 28.4h > 26h
- Missing mount: /mnt/backup
Hardening Tips
- Use a credentials file for network mounts; never hardcode secrets in scripts.
- For network filesystems, consider shallow scans with time boundaries (e.g., check only the current month/day).
- Emit machine-readable output (JSON) in addition to human logs for dashboards.
- Return non-zero exit codes on failure; integrate with your monitoring stack.
- For remote checks, limit SSH commands to small, deterministic queries.
Comments
Post a Comment