Preserving dhcpd leases across reinstalls

2019-01-14 - Progress - Tony Finch

(This is an addendum to December's upragde notes.)

I have upgraded the IP Register DHCP servers twice this year. In February they were upgraded from Ubuntu 12.04 LTS to 14.04 LTS, to cope with 12.04's end of life, and to merge their setup into the main ipreg git repository (which is why the target version was so old). So their setup was fairly tidy before the Debian 9 upgrade.

Statefulness

Unlike most of the IP Register systems, the dhcp servers are stateful. Their dhcpd.leases files must be preserved across reinstalls. The leases file is a database (in the form of a flat text file in ISC dhcp config file format) which closely matches the state of the network.

If it is lost, the server no longer knows about IP addresses in use by existing clients, so it can issue duplicate addresses to new clients, and hilarity will ensue!

So, just before rebuilding a server, I have to stop the dhcpd and take a copy of the leases file. And before the dhcpd is restarted, I have to copy the leases file back into place.

This isn't something that happens very often, so I have not automated it yet.

Bad solutions

In February, I hacked around with the Ansible playbook to ensure the dhcpd was not started before I copied the leases file into place. This is an appallingly error-prone approach.

Yesterday, I turned that basic idea into an Ansible variable that controls whether the dhcpd is enabled. This avoids mistakes when fiddling with the playbook, but it is easily forgettable.

Better solution

This morning I realised a much neater way is to disable the entire dhcpd role if the leases file doesn't exist. This prevents the role from starting the dhcpd on a newly reinstalled server before the old leases file is in place. After the server is up, the check is a no-op.

This is a lot less error-prone. The only requirement for the admin is knowledge about the importance of preserving dhcpd.leases...

Further improvements

The other pitfall in my setup is that monit will restart dhcpd if it is missing, so it isn't easy to properly stop it.

My dhcpd_enabled Ansible variable takes care of this, but I think it would be better to make a special shutdown playbook, which can also take a copy of the leases file.