What to do when your filesystem runs out of space

Keep calm and carry on.  :P

Background

On many systems there are services, scripts and other tooling that have no method of cleaning up after themselves. Optionally the system may be logging an increase in activity and the daily cleanup is not fast enough to help out.

Common Causes

Log files
User data
Automated Backups
Core Dumps

Options

After you understand the users and services impacted along with proper communication the work to remedy the situation can begin.

Access

Connecting to a Linux or many other systems can be difficult when there is no storage space. Often the act of connecting causes the need to write to a log file. Plan ahead and test this out by configuring an out of band management tool. Test by making the filesystem read-only.

Common Options

Serial Port
Telnet
SSH

SSH and bypass profile and other startup options that can block.

ssh username@hostname /usr/bin/bash --noprofile --norc

What can be deleted?

It is best to have a playbook of common things that can be deleted to free up space on systems.

Common Options

Package Cache /var/cache/apt/archives
Log files over X years/months old
User Cache /home/username/.cache/pip and or /home/username/.cache/go
Old Core dumps
Old Kernels uname -a and ls -lha /boot/
Extra package documentation /usr/share/doc/ which is easy to reinstall.

Managing Large Log Files

zcat, zless, zmore, zcmp, and zgrep are your friends.

Compress smaller files to free up space to then compress lager files.
Understand that some special files present as sizes that are not used on disk. (sparse)
Ensure that playbooks explain sparse files. https://en.wikipedia.org/wiki/Sparse_file
Identify easy wins in playbooks like "popularity-contest"

Helpful Tools

Show usage of the current directory in human readable units.

du -hs .

List files showing the ten newest.

ls -lhart | tail -n 10

List files showing the ten largest.

ls -lharS | tail -n 10

Show the size on disk of the systemd journal

journalctl --disk-usage

Remove systemd journal logs older than 30 days.

journalctl --vacuum-time=30d

Remove systemd journal logs lager than a specific size.

journalctl --vacuum-size=2G

Remove systemd journal logs older than 1 day for a specific unit/service.

journalctl --vacuum-time=1d --unit=your.service

Show the size of mailboxes on the system

du -hs /var/spool/mail

Advanced methods

Check last users

last -n 10

Show process tree to look for hints

ps faxu

Show the human readable size of user home directories

du -hs /home/*

Look for forgotten source code and files.

ls /usr/src/

Investigate backups

du -hs /var/backups/

Investigate mistaken mounts

In some cases if a mount is not completed correctly any writes will happen to the underlying root.

ls /mnt/
du -hs /mnt/*

Planning

An advanced method is to plan for this to happen.

Apply user quotas and log rotation that can handle massive growth.
Create a dummy file in /root/ named DELETEME.BIG of an acceptable size.
Monitor the system and alert on a threshold.

Auditing

Does a system or service have a defined storage requirement? If no, then start there. Don't just build excess space into everything. Speak to your teams and present the various methods of log file and general data cleanup. Require everything to have data retention documentation.

Tech/HowTo/Full Filesystem

Contents

What to do when your filesystem runs out of space

Background

Options

Access

What can be deleted?

Managing Large Log Files

Helpful Tools

Advanced methods

Planning

Auditing

Navigation menu

Tech/HowTo/Full Filesystem

What to do when your filesystem runs out of space

Background

Options

Access

What can be deleted?

Managing Large Log Files

Helpful Tools

Advanced methods

Planning

Auditing

Navigation menu

Search