Why df
Says “Full” but du
Disagrees
We recovered ~30 GB by restarting Apache and identified another ~67.6 GB locked in deleted-open files from long-running cron jobs (Python/shell). This post documents the problem, proof, and the exact commands to fix and prevent it.
Environment snapshot: root volume 216 GB at 86% used; separate 1.8 TB disk mounted at /media/extradrive
with 23% used.
1) Symptoms
# Root nearly full
df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/disk/by-uuid/3344...f6c 216G 175G 30G 86% /
# Separate data disk looks fine
df -h /media/extradrive
/dev/sdc 1.8T 384G 1.4T 23% /media/extradrive
# But summarising top-level dirs on / doesn’t add up to 175G
du -xhd1 / | sort -h
...
70G /var
34G /home
2.9G /usr
108G / <— way below df’s 175G used
This is a classic df vs du mismatch. The missing tens of GB were not visible to du
, yet they counted in df
.
2) Why df
≠ du
: the three usual suspects
- Deleted-open files: a process still holds a file descriptor to a file that was rotated/removed. Disk space remains allocated until the process closes or exits. (Most common; that’s our case.)
- Data hidden under a mountpoint: files were written to a directory before a filesystem was mounted on top; they’re invisible while mounted but still consume root space.
- Filesystem reserved blocks: ext4 typically reserves ~5% for root (on 216 GB ≈ 10–11 GB).
3) Prove mounts are truly separate (and not masking data)
# Show the exact device, fstype, and size for the mount
findmnt -o SOURCE,FSTYPE,SIZE,USED,AVAIL,TARGET /media/extradrive
# Example output:
# /dev/sdc ext4 1.8T 383.4G 1.3T /media/extradrive
# Cross-check block device details
lsblk -o NAME,SIZE,FSTYPE,MOUNTPOINT,UUID,MODEL | sed -n '1p;/sdc/p'
# Confirm root vs extradrive are different devices
df -h / /media/extradrive
If you suspect “hidden under mountpoint” data, temporarily unmount (when safe) and check the directory itself:
umount /media/extradrive
du -sh /media/extradrive # now shows any hidden files on /
mount /media/extradrive
4) Find where space went on /
only
# The -x flag restricts traversal to the same filesystem as /
sudo du -xhd1 / | sort -h
sudo du -xhd1 /var | sort -h
sudo du -xhd1 /home | sort -h
sudo du -xhd1 /opt | sort -h
If the totals still don’t match df
, it’s time to look for deleted-open files.
5) Detect & reclaim deleted-open files
We confirmed ~67.6 GB held by deleted-open files (after freeing ~30 GB by restarting Apache):
# Sum all deleted-open file sizes system-wide
sudo lsof -nP +L1 | awk '/deleted/ {s+=$7} END{printf "Remaining deleted-open: %.2f GB\n", s/1024/1024/1024}'
# Remaining deleted-open: 67.61 GB
5.1 Identify the main culprits by PID
# Largest PIDs by total deleted-open bytes
sudo lsof -nP +L1 \
| awk '/deleted/ {s[$2]+=$7} END{for (p in s) printf "%-8s %10.2f GB\n", p, s[p]/1024/1024/1024}' \
| sort -k2,2nr | head -20
5.2 Inspect what each PID is and which files it holds
PID=21088 # example, replace with your PID
ps -o pid,ppid,user,etime,cmd -p "$PID"
sudo lsof -nP -p "$PID" | awk '/deleted/ {print " ", $4, $7/1024/1024 " MB", $9}'
In our case, long-running cron jobs (shell/Python; e.g., ad importers) held rotated log files open for hours, consuming tens of GB.
5.3 Reclaim space (preferred): gracefully stop, then kill if needed
# Replace with the PIDs you identified
for pid in 21088 21160 21158 21057; do
echo "Stopping $pid …"; kill -TERM "$pid" 2>/dev/null || true
done
sleep 5
for pid in 21088 21160 21158 21057; do
kill -0 "$pid" 2>/dev/null && kill -KILL "$pid" || true
done
# Verify reclaimed space
sudo lsof -nP +L1 | awk '/deleted/ {s+=$7} END{printf "Remaining deleted-open: %.2f GB\n", s/1024/1024/1024}'
df -h /
6) Reclaim without restart (truncate log FDs)
Only for log files. If a restart is disruptive, you can truncate the open file descriptors that point to deleted log files. Do not truncate datafiles or libraries this way.
# Truncate deleted-open file descriptors that look like logs
while read -r pid fd; do
sudo bash -c "truncate -s 0 /proc/$pid/fd/$fd" || true
done < <(
sudo lsof -nP +L1 \
| awk '/deleted/ && tolower($9) ~ /log/ { fd=$4; gsub(/[^0-9]/,"",fd); if (fd!="") printf "%s %s\n",$2,fd }' \
| sort -u
)
# Check again
sudo lsof -nP +L1 | awk '/deleted/ {s+=$7} END{printf "Remaining deleted-open: %.2f GB\n", s/1024/1024/1024}'
df -h /
7) Prevent recurrence
7.1 logrotate: either reopen logs or use copytruncate
/var/log/myjob/*.log {
daily
rotate 7
compress
missingok
notifempty
# If your process does NOT reopen on HUP, use copytruncate:
copytruncate
# Otherwise prefer postrotate + HUP/reopen:
# postrotate
# systemctl kill -s HUP myjob.service
# endscript
}
7.2 Python logging: Watch log rotation automatically
import logging
from logging.handlers import WatchedFileHandler
logger = logging.getLogger("myjob")
logger.setLevel(logging.INFO)
handler = WatchedFileHandler("/var/log/myjob/myjob.log") # detects external rotation
fmt = logging.Formatter("%(asctime)s %(levelname)s %(message)s")
handler.setFormatter(fmt)
logger.addHandler(handler)
7.3 Replace long-running cron loops with systemd services/timers
Systemd gives you clean restarts, environment, and log control, reducing the chance of orphaned FDs.
8) Other quick space savers
8.1 Journald & APT caches
# Journald
sudo journalctl --disk-usage
sudo journalctl --vacuum-time=14d # or: --vacuum-size=2G
# APT caches
sudo apt-get clean
8.2 Lower ext4 reserved blocks (advanced; frees a few GB)
DEV=$(df -P / | awk 'NR==2{print $1}')
sudo tune2fs -l "$DEV" | egrep -i 'Reserved block percentage|Reserved block count'
# Reduce from 5% to 1% ONLY if you understand the trade-offs:
sudo tune2fs -m 1 "$DEV"
9) Incident checklist & one-liners
- Confirm mounts are separate:
findmnt
,lsblk
,df -h / /mount
. - Hunt space on
/
only:du -xhd1 /{var,home,opt}
. - Quantify deleted-open:
sudo lsof -nP +L1
and sum sizes. - Reclaim: restart offending processes (preferred) or truncate log FDs.
- Prevent: logrotate strategy, Python
WatchedFileHandler
, systemd timers. - Housekeeping:
journalctl --vacuum-*
,apt-get clean
, considertune2fs -m 1
.
Appendix: Full command set
A. Mount verification
findmnt -o SOURCE,FSTYPE,SIZE,USED,AVAIL,TARGET /media/extradrive
lsblk -o NAME,SIZE,FSTYPE,MOUNTPOINT,UUID,MODEL | sed -n '1p;/sdc/p'
df -h / /media/extradrive
B. Root-only space scan
sudo du -xhd1 / | sort -h
sudo du -xhd1 /var | sort -h
sudo du -xhd1 /home | sort -h
sudo du -xhd1 /opt | sort -h
C. Deleted-open analysis
# Total deleted-open bytes
sudo lsof -nP +L1 | awk '/deleted/ {s+=$7} END{printf "Remaining deleted-open: %.2f GB\n", s/1024/1024/1024}'
# Top PIDs
sudo lsof -nP +L1 \
| awk '/deleted/ {s[$2]+=$7} END{for (p in s) printf "%-8s %10.2f GB\n", p, s[p]/1024/1024/1024}' \
| sort -k2,2nr | head -20
# Inspect one PID
PID=12345
ps -o pid,ppid,user,etime,cmd -p "$PID"
sudo lsof -nP -p "$PID" | awk '/deleted/ {print " ", $4, $7/1024/1024 " MB", $9}'
D. Reclaim strategies
# Restart/terminate offenders (safest)
for pid in <PIDS>; do kill -TERM "$pid" 2>/dev/null || true; done
sleep 5
for pid in <PIDS>; do kill -0 "$pid" 2>/dev/null && kill -KILL "$pid" || true; done
# Truncate only logs (no process restart)
while read -r pid fd; do sudo bash -c "truncate -s 0 /proc/$pid/fd/$fd" || true; done < <(
sudo lsof -nP +L1 | awk '/deleted/ && tolower($9) ~ /log/ { fd=$4; gsub(/[^0-9]/,"",fd); if (fd!="") printf "%s %s\n",$2,fd }' | sort -u
)
E. Housekeeping
sudo journalctl --disk-usage
sudo journalctl --vacuum-time=14d
sudo apt-get clean
DEV=$(df -P / | awk 'NR==2{print $1}')
sudo tune2fs -l "$DEV" | egrep -i 'Reserved block percentage|Reserved block count'
# Optional:
# sudo tune2fs -m 1 "$DEV"
Note: Replace example PIDs and paths with those observed on your system. Test truncation on log files only; restart services for non-log deleted-open files such as libraries or data files.
From 86% to 53%: Eliminating rsyslog “deleted-open” bloat and hardening log rotation
Outcome: Root filesystem usage dropped from 86% to 53% after resolving a single culprit: rsyslogd
holding huge, already-rotated log files open (/var/log/syslog.1
~52 GB, /var/log/mail.log.1
~16 GB). This post documents the fix and the safeguards to prevent recurrence.
TL;DR
- Diagnosed a df vs du mismatch to “deleted-open” files:
rsyslogd
(PID 890) held rotated logs open. - Freed ~67 GB instantly by stopping the offending process; root usage fell to 53%.
- Hardened
logrotate
: addeddateext
,size
guards, propercreate
/su
, and a reliablepostrotate
that signals rsyslog to reopen files. - Reduced future log growth: compressed oversized
.1
files, lowered UFW logging, and added optional rotation stanzas forkern.log
/ufw.log
. - Added quick verification commands for “deleted-open” detection and logrotate status.
Context
Before: /
at 86% used; du
totals didn’t match df
→ classic sign of “deleted-open” files.
Culprit: rsyslogd
(PID 890) with file descriptors to very large, already-rotated logs (syslog.1
, mail.log.1
).
After: Stopped/reloaded rsyslog, freed space, and fixed rotation & logging strategy to avoid recurrence.
Root cause: “deleted-open” files held by rsyslog
# Confirm total “deleted-open” bytes
sudo lsof -nP +L1 | awk '/deleted/ {s+=$7} END{printf "Remaining deleted-open: %.2f GB\n", s/1024/1024/1024}'
# Identify top PIDs holding deleted files
sudo lsof -nP +L1 \
| awk '/deleted/ {s[$2]+=$7} END{for (p in s) printf "%-8s %10.2f GB\n", p, s[p]/1024/1024/1024}' \
| sort -k2,2nr | head -20
# Inspect the offender (example PID)
ps -fp 890
sudo lsof -nP -p 890 | awk '/deleted/ {printf "%-6s %10.2f MB %s\n",$4,$7/1024/1024,$9}'
Large values on syslog.1
/mail.log.1
confirmed rsyslog was writing to old, rotated files. Stopping/restarting rsyslog released the space immediately.
Fix: make rsyslog always reopen logs after rotation
Replace your /etc/logrotate.d/rsyslog
with the following. It adds dateext
(no reuse of .1
), correct create
/su
, size thresholds, and a robust postrotate
that works with systemd and legacy init.
/etc/logrotate.d/rsyslog
# SYSLOG (larger threshold)
# Rotates daily, also when >200M, and signals rsyslog to reopen files.
/var/log/syslog
{
daily
rotate 14
size 200M
missingok
notifempty
compress
delaycompress
dateext
create 0640 syslog adm
su root adm
postrotate
if command -v systemctl >/dev/null 2>&1; then
systemctl kill -s HUP rsyslog.service >/dev/null 2>&1 || true
else
invoke-rc.d rsyslog rotate >/dev/null 2>&1 || service rsyslog rotate >/dev/null 2>&1 || true
fi
endscript
}
# MAIL log (moderate threshold)
/var/log/mail.log
{
daily
rotate 14
size 100M
missingok
notifempty
compress
delaycompress
dateext
create 0640 syslog adm
su root adm
postrotate
if command -v systemctl >/dev/null 2>&1; then
systemctl kill -s HUP rsyslog.service >/dev/null 2>&1 || true
else
invoke-rc.d rsyslog rotate >/dev/null 2>&1 || service rsyslog rotate >/dev/null 2>&1 || true
fi
endscript
}
# Other rsyslog-managed logs (grouped; one postrotate run)
/var/log/mail.info
/var/log/mail.warn
/var/log/mail.err
/var/log/daemon.log
/var/log/kern.log
/var/log/auth.log
/var/log/user.log
/var/log/lpr.log
/var/log/cron.log
/var/log/debug
/var/log/messages
{
daily
rotate 14
size 100M
missingok
notifempty
compress
delaycompress
dateext
create 0640 syslog adm
su root adm
sharedscripts
postrotate
if command -v systemctl >/dev/null 2>&1; then
systemctl kill -s HUP rsyslog.service >/dev/null 2>&1 || true
else
invoke-rc.d rsyslog rotate >/dev/null 2>&1 || service rsyslog rotate >/dev/null 2>&1 || true
fi
endscript
}
Why this works: dateext
avoids confusing reuse of .1
; create
/su
ensure the new file is writable by syslog
; postrotate
guarantees rsyslog reopens files and drops references to the old ones.
Optional: make logrotate activity visible
On this host, logrotate
runs via cron.daily
(no logrotate.timer
). If you want a dedicated log of each run, use a wrapper (do not schedule this alongside cron.daily
on the same host).
/usr/local/sbin/run-logrotate-verbose.sh
#!/usr/bin/env bash
set -euo pipefail
logrotate -vf /etc/logrotate.conf >> /var/log/logrotate.log 2>&1
chmod +x /usr/local/sbin/run-logrotate-verbose.sh
# root cron (choose either this OR cron.daily; not both)
17 3 * * * /usr/local/sbin/run-logrotate-verbose.sh
Verification:
logrotate -dv /etc/logrotate.conf > /var/log/logrotate.log 2>&1
sed -n '1,200p' /var/log/logrotate.log
Tame big log producers (kern/ufw)
We found very large kern.log.1
(~467 MB) and ufw.log.1
(~415 MB). These were already rotated, so they were safe to compress immediately:
nice gzip -9 /var/log/kern.log.1 /var/log/ufw.log.1
Reduce future noise from UFW:
ufw status verbose
ufw logging low # or: off | medium | high | full
Optional size guards for kernel/UFW logs (add to /etc/logrotate.d/rsyslog
):
/var/log/kern.log { daily rotate 14 size 100M compress delaycompress missingok notifempty }
/var/log/ufw.log { daily rotate 14 size 100M compress delaycompress missingok notifempty }
Find “top talkers” in syslog (root cause)
Identify which programs are filling syslog so you can tune their verbosity or add rate limits:
zcat -f /var/log/syslog* \
| awk 'match($0, / [^ ]+ ([^[:space:]]+)(\[[0-9]+\])?:/, a){c[a[1]]++} END{for(k in c) printf "%9d %s\n", c[k], k}' \
| sort -nr | head -30
Quick sanity checks
# Expect ~0 GB after fixes
sudo lsof -nP +L1 | awk '/deleted/ {s+=$7} END{printf "Remaining deleted-open: %.2f GB\n", s/1024/1024/1024}'
# Logrotate status and recent actions
sed -n '1,120p' /var/lib/logrotate/status
logrotate -dv /etc/logrotate.conf | sed -n '1,120p'
# Monitor current log sizes
sudo find /var/log -maxdepth 2 -type f -name "*.log" -printf "%s %p\n" \
| sort -nr | head -20 | numfmt --to=iec
Recommended structural improvement: systemd services/timers
Long-running shell/Python jobs under cron often cause logging quirks. Prefer systemd services + timers:
- Clean lifecycle (start/stop/reload) and
ExecReload=/bin/kill -HUP $MAINPID
for log reopen. - Journald integration (or forward to rsyslog) without “deleted-open” risks.
- Predictable restart/backoff and resource controls.
Conclusion
Root space recovered, cause removed. The revised logrotate
configuration (with dateext
, size
thresholds, and reliable postrotate
) plus UFW noise reduction and on-demand compression make log growth predictable and safe. Keep the verification one-liners handy, and consider migrating busy cron workers to systemd for robust logging semantics.
Comments
Post a Comment