Tech/HowTo/Full Filesystem

From lathama
< Tech‎ | HowTo
Jump to navigation Jump to search

What to do when your filesystem runs out of space

Keep calm and carry on.  :P

Background

On many systems there are services, scripts and other tooling that have no method of cleaning up after themselves. Optionally the system may be logging an increase in activity and the daily cleanup is not fast enough to help out.

Common Causes
  • Log files
  • User data
  • Automated Backups
  • Core Dumps

Options

After you understand the users and services impacted along with proper communication the work to remedy the situation can begin.

Access

Connecting to a Linux or many other systems can be difficult when there is no storage space. Often the act of connecting causes the need to write to a log file. Plan ahead and test this out by configuring an out of band management tool. Test by making the filesystem read-only.

Common Options
  • Serial Port
  • Telnet
  • SSH
SSH and bypass profile and other startup options that can block.
ssh username@hostname /usr/bin/bash --noprofile --norc

What can be deleted?

It is best to have a playbook of common things that can be deleted to free up space on systems.

Common Options
  • Package Cache /var/cache/apt/archives
  • Log files over X years/months old
  • User Cache /home/username/.cache/pip and or /home/username/.cache/go
  • Old Core dumps
  • Old Kernels uname -a and ls -lha /boot/
  • Extra package documentation /usr/share/doc/ which is easy to reinstall.

Managing Large Log Files

zcat, zless, zmore, zcmp, and zgrep are your friends.
  • Compress smaller files to free up space to then compress lager files.
  • Understand that some special files present as sizes that are not used on disk. (sparse)
  • Ensure that playbooks explain sparse files. https://en.wikipedia.org/wiki/Sparse_file
  • Identify easy wins in playbooks like "popularity-contest"

Helpful Tools

Show usage of the current directory in human readable units.
du -hs .
List files showing the ten newest.
ls -lhart | tail -n 10
List files showing the ten largest.
ls -lharS | tail -n 10
Show the size on disk of the systemd journal
journalctl --disk-usage
Remove systemd journal logs older than 30 days.
journalctl --vacuum-time=30d
Remove systemd journal logs lager than a specific size.
journalctl --vacuum-size=2G
Remove systemd journal logs older than 1 day for a specific unit/service.
journalctl --vacuum-time=1d --unit=your.service
Show the size of mailboxes on the system
du -hs /var/spool/mail

Advanced methods

Check last users
last -n 10
Show process tree to look for hints
ps faxu
Show the human readable size of user home directories
du -hs /home/*
Look for forgotten source code and files.
ls /usr/src/
Investigate backups
du -hs /var/backups/
Investigate mistaken mounts

In some cases if a mount is not completed correctly any writes will happen to the underlying root.

ls /mnt/
du -hs /mnt/*


Planning

An advanced method is to plan for this to happen.

  1. Apply user quotas and log rotation that can handle massive growth.
  2. Create a dummy file in /root/ named DELETEME.BIG of an acceptable size.
  3. Monitor the system and alert on a threshold.

Auditing

Does a system or service have a defined storage requirement? If no, then start there. Don't just build excess space into everything. Speak to your teams and present the various methods of log file and general data cleanup. Require everything to have data retention documentation.