Df Vs Du Mismatch & Deleted‑open Files

Why df Says “Full” but du Disagrees

We recovered ~30 GB by restarting Apache and identified another ~67.6 GB locked in deleted-open files from long-running cron jobs (Python/shell). This post documents the problem, proof, and the exact commands to fix and prevent it.

Environment snapshot: root volume 216 GB at 86% used; separate 1.8 TB disk mounted at /media/extradrive with 23% used.

1) Symptoms

# Root nearly full
df -h /
Filesystem                         Size  Used Avail Use% Mounted on
/dev/disk/by-uuid/3344...f6c       216G  175G   30G  86%  /

# Separate data disk looks fine
df -h /media/extradrive
/dev/sdc                           1.8T  384G  1.4T  23%  /media/extradrive

# But summarising top-level dirs on / doesn’t add up to 175G
du -xhd1 / | sort -h
...
70G   /var
34G   /home
2.9G  /usr
108G  /             <— way below df’s 175G used

This is a classic df vs du mismatch. The missing tens of GB were not visible to du, yet they counted in df.

2) Why dfdu: the three usual suspects

  1. Deleted-open files: a process still holds a file descriptor to a file that was rotated/removed. Disk space remains allocated until the process closes or exits. (Most common; that’s our case.)
  2. Data hidden under a mountpoint: files were written to a directory before a filesystem was mounted on top; they’re invisible while mounted but still consume root space.
  3. Filesystem reserved blocks: ext4 typically reserves ~5% for root (on 216 GB ≈ 10–11 GB).

3) Prove mounts are truly separate (and not masking data)

# Show the exact device, fstype, and size for the mount
findmnt -o SOURCE,FSTYPE,SIZE,USED,AVAIL,TARGET /media/extradrive
# Example output:
# /dev/sdc  ext4  1.8T  383.4G 1.3T /media/extradrive

# Cross-check block device details
lsblk -o NAME,SIZE,FSTYPE,MOUNTPOINT,UUID,MODEL | sed -n '1p;/sdc/p'

# Confirm root vs extradrive are different devices
df -h / /media/extradrive

If you suspect “hidden under mountpoint” data, temporarily unmount (when safe) and check the directory itself:

umount /media/extradrive
du -sh /media/extradrive   # now shows any hidden files on /
mount /media/extradrive

4) Find where space went on / only

# The -x flag restricts traversal to the same filesystem as /
sudo du -xhd1 / | sort -h
sudo du -xhd1 /var | sort -h
sudo du -xhd1 /home | sort -h
sudo du -xhd1 /opt  | sort -h

If the totals still don’t match df, it’s time to look for deleted-open files.

5) Detect & reclaim deleted-open files

We confirmed ~67.6 GB held by deleted-open files (after freeing ~30 GB by restarting Apache):

# Sum all deleted-open file sizes system-wide
sudo lsof -nP +L1 | awk '/deleted/ {s+=$7} END{printf "Remaining deleted-open: %.2f GB\n", s/1024/1024/1024}'
# Remaining deleted-open: 67.61 GB

5.1 Identify the main culprits by PID

# Largest PIDs by total deleted-open bytes
sudo lsof -nP +L1 \
| awk '/deleted/ {s[$2]+=$7} END{for (p in s) printf "%-8s %10.2f GB\n", p, s[p]/1024/1024/1024}' \
| sort -k2,2nr | head -20

5.2 Inspect what each PID is and which files it holds

PID=21088 # example, replace with your PID
ps -o pid,ppid,user,etime,cmd -p "$PID"
sudo lsof -nP -p "$PID" | awk '/deleted/ {print "  ", $4, $7/1024/1024 " MB", $9}'

In our case, long-running cron jobs (shell/Python; e.g., ad importers) held rotated log files open for hours, consuming tens of GB.

5.3 Reclaim space (preferred): gracefully stop, then kill if needed

# Replace with the PIDs you identified
for pid in 21088 21160 21158 21057; do
  echo "Stopping $pid …"; kill -TERM "$pid" 2>/dev/null || true
done
sleep 5
for pid in 21088 21160 21158 21057; do
  kill -0 "$pid" 2>/dev/null && kill -KILL "$pid" || true
done

# Verify reclaimed space
sudo lsof -nP +L1 | awk '/deleted/ {s+=$7} END{printf "Remaining deleted-open: %.2f GB\n", s/1024/1024/1024}'
df -h /

6) Reclaim without restart (truncate log FDs)

Only for log files. If a restart is disruptive, you can truncate the open file descriptors that point to deleted log files. Do not truncate datafiles or libraries this way.

# Truncate deleted-open file descriptors that look like logs
while read -r pid fd; do
  sudo bash -c "truncate -s 0 /proc/$pid/fd/$fd" || true
done < <(
  sudo lsof -nP +L1 \
  | awk '/deleted/ && tolower($9) ~ /log/ { fd=$4; gsub(/[^0-9]/,"",fd); if (fd!="") printf "%s %s\n",$2,fd }' \
  | sort -u
)

# Check again
sudo lsof -nP +L1 | awk '/deleted/ {s+=$7} END{printf "Remaining deleted-open: %.2f GB\n", s/1024/1024/1024}'
df -h /

7) Prevent recurrence

7.1 logrotate: either reopen logs or use copytruncate

/var/log/myjob/*.log {
  daily
  rotate 7
  compress
  missingok
  notifempty
  # If your process does NOT reopen on HUP, use copytruncate:
  copytruncate

  # Otherwise prefer postrotate + HUP/reopen:
  # postrotate
  #   systemctl kill -s HUP myjob.service
  # endscript
}

7.2 Python logging: Watch log rotation automatically

import logging
from logging.handlers import WatchedFileHandler

logger = logging.getLogger("myjob")
logger.setLevel(logging.INFO)
handler = WatchedFileHandler("/var/log/myjob/myjob.log")  # detects external rotation
fmt = logging.Formatter("%(asctime)s %(levelname)s %(message)s")
handler.setFormatter(fmt)
logger.addHandler(handler)

7.3 Replace long-running cron loops with systemd services/timers

Systemd gives you clean restarts, environment, and log control, reducing the chance of orphaned FDs.

8) Other quick space savers

8.1 Journald & APT caches

# Journald
sudo journalctl --disk-usage
sudo journalctl --vacuum-time=14d   # or: --vacuum-size=2G

# APT caches
sudo apt-get clean

8.2 Lower ext4 reserved blocks (advanced; frees a few GB)

DEV=$(df -P / | awk 'NR==2{print $1}')
sudo tune2fs -l "$DEV" | egrep -i 'Reserved block percentage|Reserved block count'
# Reduce from 5% to 1% ONLY if you understand the trade-offs:
sudo tune2fs -m 1 "$DEV"

9) Incident checklist & one-liners

  • Confirm mounts are separate: findmnt, lsblk, df -h / /mount.
  • Hunt space on / only: du -xhd1 /{var,home,opt}.
  • Quantify deleted-open: sudo lsof -nP +L1 and sum sizes.
  • Reclaim: restart offending processes (preferred) or truncate log FDs.
  • Prevent: logrotate strategy, Python WatchedFileHandler, systemd timers.
  • Housekeeping: journalctl --vacuum-*, apt-get clean, consider tune2fs -m 1.

Appendix: Full command set

A. Mount verification

findmnt -o SOURCE,FSTYPE,SIZE,USED,AVAIL,TARGET /media/extradrive
lsblk -o NAME,SIZE,FSTYPE,MOUNTPOINT,UUID,MODEL | sed -n '1p;/sdc/p'
df -h / /media/extradrive

B. Root-only space scan

sudo du -xhd1 / | sort -h
sudo du -xhd1 /var | sort -h
sudo du -xhd1 /home | sort -h
sudo du -xhd1 /opt  | sort -h

C. Deleted-open analysis

# Total deleted-open bytes
sudo lsof -nP +L1 | awk '/deleted/ {s+=$7} END{printf "Remaining deleted-open: %.2f GB\n", s/1024/1024/1024}'

# Top PIDs
sudo lsof -nP +L1 \
| awk '/deleted/ {s[$2]+=$7} END{for (p in s) printf "%-8s %10.2f GB\n", p, s[p]/1024/1024/1024}' \
| sort -k2,2nr | head -20

# Inspect one PID
PID=12345
ps -o pid,ppid,user,etime,cmd -p "$PID"
sudo lsof -nP -p "$PID" | awk '/deleted/ {print "  ", $4, $7/1024/1024 " MB", $9}'

D. Reclaim strategies

# Restart/terminate offenders (safest)
for pid in <PIDS>; do kill -TERM "$pid" 2>/dev/null || true; done
sleep 5
for pid in <PIDS>; do kill -0 "$pid" 2>/dev/null && kill -KILL "$pid" || true; done

# Truncate only logs (no process restart)
while read -r pid fd; do sudo bash -c "truncate -s 0 /proc/$pid/fd/$fd" || true; done < <(
  sudo lsof -nP +L1 | awk '/deleted/ && tolower($9) ~ /log/ { fd=$4; gsub(/[^0-9]/,"",fd); if (fd!="") printf "%s %s\n",$2,fd }' | sort -u
)

E. Housekeeping

sudo journalctl --disk-usage
sudo journalctl --vacuum-time=14d
sudo apt-get clean

DEV=$(df -P / | awk 'NR==2{print $1}')
sudo tune2fs -l "$DEV" | egrep -i 'Reserved block percentage|Reserved block count'
# Optional:
# sudo tune2fs -m 1 "$DEV"

Note: Replace example PIDs and paths with those observed on your system. Test truncation on log files only; restart services for non-log deleted-open files such as libraries or data files.

 

From 86% to 53%: Eliminating rsyslog “deleted-open” bloat and hardening log rotation

Outcome: Root filesystem usage dropped from 86% to 53% after resolving a single culprit: rsyslogd holding huge, already-rotated log files open (/var/log/syslog.1 ~52 GB, /var/log/mail.log.1 ~16 GB). This post documents the fix and the safeguards to prevent recurrence.

TL;DR

  • Diagnosed a df vs du mismatch to “deleted-open” files: rsyslogd (PID 890) held rotated logs open.
  • Freed ~67 GB instantly by stopping the offending process; root usage fell to 53%.
  • Hardened logrotate: added dateext, size guards, proper create/su, and a reliable postrotate that signals rsyslog to reopen files.
  • Reduced future log growth: compressed oversized .1 files, lowered UFW logging, and added optional rotation stanzas for kern.log/ufw.log.
  • Added quick verification commands for “deleted-open” detection and logrotate status.

Context

Before: / at 86% used; du totals didn’t match df → classic sign of “deleted-open” files.
Culprit: rsyslogd (PID 890) with file descriptors to very large, already-rotated logs (syslog.1, mail.log.1).
After: Stopped/reloaded rsyslog, freed space, and fixed rotation & logging strategy to avoid recurrence.

Root cause: “deleted-open” files held by rsyslog

# Confirm total “deleted-open” bytes
sudo lsof -nP +L1 | awk '/deleted/ {s+=$7} END{printf "Remaining deleted-open: %.2f GB\n", s/1024/1024/1024}'

# Identify top PIDs holding deleted files
sudo lsof -nP +L1 \
| awk '/deleted/ {s[$2]+=$7} END{for (p in s) printf "%-8s %10.2f GB\n", p, s[p]/1024/1024/1024}' \
| sort -k2,2nr | head -20

# Inspect the offender (example PID)
ps -fp 890
sudo lsof -nP -p 890 | awk '/deleted/ {printf "%-6s %10.2f MB  %s\n",$4,$7/1024/1024,$9}'

Large values on syslog.1/mail.log.1 confirmed rsyslog was writing to old, rotated files. Stopping/restarting rsyslog released the space immediately.

Fix: make rsyslog always reopen logs after rotation

Replace your /etc/logrotate.d/rsyslog with the following. It adds dateext (no reuse of .1), correct create/su, size thresholds, and a robust postrotate that works with systemd and legacy init.

/etc/logrotate.d/rsyslog

# SYSLOG (larger threshold)
# Rotates daily, also when >200M, and signals rsyslog to reopen files.
/var/log/syslog
{
    daily
    rotate 14
    size 200M
    missingok
    notifempty
    compress
    delaycompress
    dateext
    create 0640 syslog adm
    su root adm
    postrotate
        if command -v systemctl >/dev/null 2>&1; then
            systemctl kill -s HUP rsyslog.service >/dev/null 2>&1 || true
        else
            invoke-rc.d rsyslog rotate >/dev/null 2>&1 || service rsyslog rotate >/dev/null 2>&1 || true
        fi
    endscript
}

# MAIL log (moderate threshold)
/var/log/mail.log
{
    daily
    rotate 14
    size 100M
    missingok
    notifempty
    compress
    delaycompress
    dateext
    create 0640 syslog adm
    su root adm
    postrotate
        if command -v systemctl >/dev/null 2>&1; then
            systemctl kill -s HUP rsyslog.service >/dev/null 2>&1 || true
        else
            invoke-rc.d rsyslog rotate >/dev/null 2>&1 || service rsyslog rotate >/dev/null 2>&1 || true
        fi
    endscript
}

# Other rsyslog-managed logs (grouped; one postrotate run)
/var/log/mail.info
/var/log/mail.warn
/var/log/mail.err
/var/log/daemon.log
/var/log/kern.log
/var/log/auth.log
/var/log/user.log
/var/log/lpr.log
/var/log/cron.log
/var/log/debug
/var/log/messages
{
    daily
    rotate 14
    size 100M
    missingok
    notifempty
    compress
    delaycompress
    dateext
    create 0640 syslog adm
    su root adm
    sharedscripts
    postrotate
        if command -v systemctl >/dev/null 2>&1; then
            systemctl kill -s HUP rsyslog.service >/dev/null 2>&1 || true
        else
            invoke-rc.d rsyslog rotate >/dev/null 2>&1 || service rsyslog rotate >/dev/null 2>&1 || true
        fi
    endscript
}

Why this works: dateext avoids confusing reuse of .1; create/su ensure the new file is writable by syslog; postrotate guarantees rsyslog reopens files and drops references to the old ones.

Optional: make logrotate activity visible

On this host, logrotate runs via cron.daily (no logrotate.timer). If you want a dedicated log of each run, use a wrapper (do not schedule this alongside cron.daily on the same host).

/usr/local/sbin/run-logrotate-verbose.sh

#!/usr/bin/env bash
set -euo pipefail
logrotate -vf /etc/logrotate.conf >> /var/log/logrotate.log 2>&1
chmod +x /usr/local/sbin/run-logrotate-verbose.sh

# root cron (choose either this OR cron.daily; not both)
17 3 * * * /usr/local/sbin/run-logrotate-verbose.sh

Verification:

logrotate -dv /etc/logrotate.conf > /var/log/logrotate.log 2>&1
sed -n '1,200p' /var/log/logrotate.log

Tame big log producers (kern/ufw)

We found very large kern.log.1 (~467 MB) and ufw.log.1 (~415 MB). These were already rotated, so they were safe to compress immediately:

nice gzip -9 /var/log/kern.log.1 /var/log/ufw.log.1

Reduce future noise from UFW:

ufw status verbose
ufw logging low    # or: off | medium | high | full

Optional size guards for kernel/UFW logs (add to /etc/logrotate.d/rsyslog):

/var/log/kern.log { daily rotate 14 size 100M compress delaycompress missingok notifempty }
/var/log/ufw.log  { daily rotate 14 size 100M compress delaycompress missingok notifempty }

Find “top talkers” in syslog (root cause)

Identify which programs are filling syslog so you can tune their verbosity or add rate limits:

zcat -f /var/log/syslog* \
| awk 'match($0, / [^ ]+ ([^[:space:]]+)(\[[0-9]+\])?:/, a){c[a[1]]++} END{for(k in c) printf "%9d %s\n", c[k], k}' \
| sort -nr | head -30

Quick sanity checks

# Expect ~0 GB after fixes
sudo lsof -nP +L1 | awk '/deleted/ {s+=$7} END{printf "Remaining deleted-open: %.2f GB\n", s/1024/1024/1024}'

# Logrotate status and recent actions
sed -n '1,120p' /var/lib/logrotate/status
logrotate -dv /etc/logrotate.conf | sed -n '1,120p'

# Monitor current log sizes
sudo find /var/log -maxdepth 2 -type f -name "*.log" -printf "%s %p\n" \
| sort -nr | head -20 | numfmt --to=iec

Recommended structural improvement: systemd services/timers

Long-running shell/Python jobs under cron often cause logging quirks. Prefer systemd services + timers:

  • Clean lifecycle (start/stop/reload) and ExecReload=/bin/kill -HUP $MAINPID for log reopen.
  • Journald integration (or forward to rsyslog) without “deleted-open” risks.
  • Predictable restart/backoff and resource controls.

Conclusion

Root space recovered, cause removed. The revised logrotate configuration (with dateext, size thresholds, and reliable postrotate) plus UFW noise reduction and on-demand compression make log growth predictable and safe. Keep the verification one-liners handy, and consider migrating busy cron workers to systemd for robust logging semantics.

Comments