KSK rollover project status

2019-02-07 - Progress- Future - Tony Finch

I have spent the last week working on DNSSEC key rollover automation in BIND. Or rather, I have been doing some cleanup and prep work. With reference to the work I listed in the previous article...

Done

  • Stop BIND from generating SHA-1 DS and CDS records by default, per RFC 8624

  • Teach dnssec-checkds about CDS and CDNSKEY

Started

  • Teach superglue to use CDS/CDNSKEY records, with similar logic to dnssec-checkds

The "similar logic" is implemented in dnssec-dsfromkey, so I don't actually have to write the code more than once. I hope this will also be useful for other people writing similar tools!

Some of my small cleanup patches have been merged into BIND. We are currently near the end of the 9.13 development cycle, so this work is going to remain out of tree for a while until after the 9.14 stable branch is created and the 9.15 development cycle starts.

Next

So now I need to get to grips with dnssec-coverage and dnssec-keymgr.

Simple safety interlocks

The purpose of the dnssec-checkds improvements is so that it can be used as a safety check.

During a KSK rollover, there are one or two points when the DS records in the parent need to be updated. The rollover must not continue until this update has been confirmed, or the delegation can be broken.

I am using CDS and CDNSKEY records as the signal from the key management and zone signing machinery for when DS records need to change. (There's a shell-style API in dnssec-dsfromkey -p, but that is implemented by just reading these sync records, not by looking into the guts of the key management data.) I am going to call them "sync records" so I don't have to keep writing "CDS/CDNSKEY"; "sync" is also the keyword used by dnssec-settime for controlling these records.

Key timing in BIND

The dnssec-keygen and dnssec-settime commands (which are used by dnssec-keymgr) schedule when changes to a key will happen.

There are parameters related to adding a key: when it is published in the zone, when it becomes actively used for signing, etc. And there are parameters related to removing a key: when it becomes inactive for signing, when it is deleted from the zone.

There are also timing parameters for publishing and deleting sync records. These sync times are the only timing parameters that say when we must update the delegation.

What can break?

The point of the safety interlock is to prevent any breaking key changes from being scheduled until after a delegation change has been confirmed. So what key timing events need to be forbidden from being scheduled after a sync timing event?

Events related to removing a key are particularly dangerous. There are some cases where it is OK to remove a key prematurely, if the DS record change is also about removing that key, and there is another working key and DS record throughout. But it seems simpler and safer to forbid all removal-related events from being scheduled after a sync event.

However, events related to adding a key can also lead to nonsense. If we blindly schedule creation of new keys in advance, without verifying that they are also being properly removed, then the zone can accumulate a ridiculous number of DNSKEY records. This has been observed in the wild surprisingly frequently.

A simple rule

There must be no KSK changes of any kind scheduled after the next sync event.

This rule applies regardless of the flavour of rollover (double DS, double KSK, algorithm rollover, etc.)

Applying this rule to BIND

Whereas for ZSKs, dnssec-coverage ensures rollovers are planned for some fixed period into the future, for KSKs, it must check correctness up to the next sync event, then ensure nothing will occur after that point.

In dnssec-keymgr, the logic should be:

  • If the current time is before the next sync event, ensure there is key coverage until that time and no further.

  • If the current time is after all KSK events, use dnssec-checkds to verify the delegation is in sync.

  • If dnssec-checkds reports an inconsistency and we are within some sync interval dictated by the rollover policy, do nothing while we wait for the delegation update automation to work.

  • If dnssec-checkds reports an inconsistency and the sync interval has passed, report an error because operator intervention is required to fix the failed automation.

  • If dnssec-checkds reports everything is in sync, schedule keys up to the next sync event. The timing needs to be relative to this point in time, since any delegation update delays can make it unsafe to schedule relative to the last sync event.

Caveat

At the moment I am still not familiar with the internals of dnssec-coverage and dnssec-keymgr so there's a risk that I might have to re-think these plans. But I expect this simple safety rule will be a solid anchor that can be applied to most DNSSEC key management scenarios. (However I have not thought hard enough about recovery from breakage or compromise.)