CVE-2024-1737 - RRset limits in zones
The ISC have made a breaking change to BIND in version 9.20.0.
We've updated the stealth secondary sample.named.conf
with the required change, otherwise
if you receive "too many records" errors you can add
max-records-per-type 0
to misbehaving zones.
This should not be a major problem with the security issue as you should not have too many clients, and there are only a few names with a large number of associated RRs.
How about using 1Password with Ansible?
2021-11-23 - Progress - Tony Finch
i have been looking at how to use the 1Password op
command-line tool
with Ansible. It works fairly nicely.
You need to install the 1Password command-line tool.
You need a recent enough Ansible with the community.general
collection installed, so that it includes the onepassword
lookup
plugin
To try out an example, create an op.yml
file containing:
--- - hosts: localhost tasks: - name: test 1password debug: msg: <{{ lookup("onepassword", "Mythic Beasts", field="username") }}>
You might need to choose an item other than Mythic Beasts if you don't have a login with them.
Initialize op
and start a login session by typing:
eval $(op signin)
Then see if Ansible works:
ansible-playbook op.yml
Amongst the Ansible verbiage, I get the output:
ok: [localhost] => { "msg": "<hostmaster@cam.ac.uk>" }
Some more detailed notes follow...
aims
I want it to be easy to keep secrets encrypted when they are not in use. Things like ssh private keys, static API credentials, etc. "Not in use" means when not installed on the system that needs them.
In particular, secrets should normally be encrypted on any systems on which we run Ansible, and decrypted only when they need to be deployed.
And it should be easy enough that everyone on the team is able to use it.
what about regpg?
I wrote regpg to tackle this problem in a way I consider to be safe. It is modestly successful: people other than me use it, in more places than just Cambridge University Information Services.
But it was not the right tool for the Network Systems team in which I
work. It isn't possible for a simple wrapper like regpg
to fix
gpg
's usability issues: in particular, it's horrible if you don't
have a unix desktop, and it's horrible over ssh.
1password
Since I wrote regpg
we have got 1Password set up for the team. I
have used 1Password for my personal webby login things for years, and
I'm happy to use it at work too.
There are a couple of ways to use 1Password for ops automation...
secrets automation and 1password connect
First I looked at the relatively new support for "secrets automation" with 1Password. It is based around a 1Password Connect server, which we would install on site. This can provide an application with short-term access to credentials on demand via a REST API. (Sounds similar to other cloudy credential servers such as Hashicorp Vault or AWS IAM.)
However, the 1Password Connect server needs credentials to get access to our vaults, and our applications that use 1Password Connect need API access tokens. And we need some way to deploy these secrets safely. So we're back to square 1.
1password command line tool
The op
command has
basically the same functionality as 1Password's GUIs. It has a similar
login model, in that you type in your passphrase to unlock the vault,
and it automatically re-locks after an inactivity timeout. (This is
also similar to the way regpg
relies on the gpg
agent to cache
credentials so that an Ansible run can deploy lots of secrets with
only one password prompt.)
So op
is clearly the way to go, though there are a few niggles:
The
op
configuration file contains details of the vaults it has been told about, including your 1Password account secret key in cleartext. So the configuration file is sensitive and should be kept safe. (It would be better ifop
stored the account secret key encrypted using the user's password.)op signin
uses an environment variable to store the session key, which is not ideal because it is easy to accidentally leak the contents of environment variables. It isn't obvious that a collection of complicated Ansible playbooks can be trusted to handle environment variables carefully.It sometimes reauires passing secrets on the command line, which exposes them to all users on the system. For instance, the documented way to find out whether a session has timed out is with a command line like:
$ op signin --session $OP_SESSION_example example
I have reported these issues to the 1Password developers.
Ansible and op
Ansible's community.general
collection includes
some handy wrappers around the op
command, in particular the
onepassword lookup plugin. (I am not so keen on the others
because the documentation suggests to me that they do potentially
unsafe things with Ansible variables.)
One of the problems I had with regpg
was bad behaviour that occurred
when an Ansible playbook was started when the gpg agent wasn't ready;
the fix was to add a task to the start of the Ansible playbook which
polls the gpg agent in a more controlled manner.
I think a similar preflight task might be helpful for op
:
check if there is an existing
op
session; if not, prompt for a passphrase to start a sessionset up a wrapper command for
op
that gets the session key from a more sensible place than the environment
To refresh a session safely, and work around the safety issue with op
signin
mentioned above, we can test the session using a benign
command such as op list vaults
or op get account
, and run op
signin
if that fails.
The wrapper script can be as simple as:
#!/bin/sh OP_SESSION_example=SQUEAMISHOSSIFRAGE /usr/local/bin/op "$@"
Assuming there is somewhere sensible and writable on $PATH
...
Managed Zone Service CNAME relaxation
2020-06-03 - News - Tony Finch
The MZS is our service for registering non-cam.ac.uk
domains.
Underscores are now allowed in the names and targets of CNAME records so that they can be used for non-hostname purposes.
Stealth secondaries and Cisco Jabber
2020-04-30 - News - Tony Finch
The news part of this item is that I've updated the stealth secondary documentation with a warning about configuring servers (or not configuring them) with secondary zones that aren't mentioned in the sample configuration files.
One exception to that is the special Cisco Jabber zones supported by the phone service. There is now a link from our stealth secondary DNS documentation to the Cisco Jabber documentation, but there are tricky requirements and caveats, so you need to take care.
The rest of this item is the story of how we discovered the need for these warnings.
Release announcement: nsdiff-1.79
2020-04-27 - News - Tony Finch
I have released a new version of nsdiff.
This release removes TYPE65534 records from the list of
DNSSEC-related types that nsdiff
ignores.
TYPE65534 is the private type that BIND uses to keep track of incremental signing. These records usually end up hanging around after signing is complete, cluttering up the zone. It would be neater if they were removed automatically.
In fact, it's safe to try to remove them using DNS UPDATE: if the records can be removed (because signing is complete), thy will be; if they can't be removed then they are quietly left in place, and the rest of the update is applied.
After this change you can clean away TYPE65534 records using nsdiff
or nsvi
. In our deployment, nspatch
runs hourly and will now
automatically clean TYPE65534 records when they are not needed.
Firefox and DNS-over-HTTPS
2020-02-27 - News - Tony Finch
The latest release of Firefox enables DoH (encrypted DNS-over-HTTPS) by default for users in the USA, with DNS provided by Cloudflare. This has triggered some discussion and questions, so here's a reminder of what we have done with DoH.
DNS server resilience and network outages
2020-02-18 - News - Tony Finch
Our recursive DNS servers are set up to be fairly resilient. Each server is physical hardware, so that they only need power and networking in order for the DNS to work, avoiding hidden circular dependencies.
We use keepalived to determine which of the physical servers is in live service. It does two things for us:
We can move live service from one server to another with minimal disruption, so we can patch and upgrade servers without downtime.
The live DNS service can recover automatically from things like server hardware failure or power failure.
This note is about coping with network outages, which are more difficult.
SHA-1 and DNSSEC validation
2020-02-14 - News - Tony Finch
This is a follow-up to my article last month about SHA-1 chosen-prefix collisions and DNSSEC.
Summary
DNSSEC validators should continue to treat SHA-1 signatures as secure until DNSSEC signers have had enough time to perform algorithm rollovers and eliminate SHA-1 from the vast majority of signed zones.
Review of 2019
2020-01-29 - Progress - Tony Finch
Some notes looking back on what happened last year...
Managed Zone Service improvements
2020-01-24 - News - Tony Finch
The MZS is our service for registering non-cam.ac.uk
domains.
The web user interface for the MZS has moved to https://mzs.dns.cam.ac.uk/; the old names redirect to the new place.
You can now manage TXT records in your zones in the MZS.
The expiry date of each zone (when its 5 year billing period is up) is now tracked in the MZS database and is visible in the web user interface.
DNSSEC algorithm rollover HOWTO
2020-01-15 - Progress - Tony Finch
Here are some notes on how to upgrade a zone's DNSSEC algorithm using BIND. These are mainly written for colleagues in the Faculty of Maths and the Computer Lab, but they may be of interest to others.
I'll use botolph.cam.ac.uk
as the example zone. I'll assume the
rollover is from algorithm 5 (RSASHA1) to algorithm 13
(ECDSA-P256-SHA-256).
SHA-1 chosen prefix collisions and DNSSEC
2020-01-09 - News - Tony Finch
Thanks to Viktor Dukhovni for helpful discussions about some of the details that went in to this post.
On the 7th January, a new more flexible and efficient collision attack against SHA-1 was announced: SHA-1 is a shambles. SHA-1 is deprecated but still used in DNSSEC, and this collision attack means that some attacks against DNSSEC are now merely logistically challenging rather than being cryptographically infeasible.
As a consequence, anyone who is using a SHA-1 DNSKEY algorithm (algorithm numbers 7 or less) should upgrade. The recommended algorithms are 13 (ECDSAP256SHA256) or 8 (RSASHA256, with 2048 bit keys).
Update: I have written a follow-up note about SHA-1 and DNSSEC validation
SHA-1 is a shambles
2020-01-07 - News - Tony Finch
Happy new (calendar) year!
Our previous news item on DNS delegation updates explained that we are changing the DNSSEC signature algorithm on all UIS zones from RSA-SHA-1 to ECDSA-P256-SHA-256. Among the reasons I gave was that SHA-1 is rather broken.
SHAmbles
Today I learned that SHA-1 is a shambles: a second SHA-1 collision has been constructed, so it is now more accurate to say that SHA-1 is extremely broken.
The new "SHAmbles" collision is vastly more affordable than the 2017 "SHAttered" collision and makes it easier to construct practical attacks.
DNSSEC implications
As well as the UIS zones (which are now mostly off RSA-SHA-1), Maths and the Computer Lab have a number of zones signed with RSA-SHA-1. These should also be upgraded to a safer algorithm. I will be contacting the relevant people directly to co-ordinate this change.
I have written some more detailed notes on the wider implications of SHA-1 chosen prefix collisions and DNSSEC.
DNS delegation updates
2019-12-18 - News - Tony Finch
Season's greetings! I bring tidings of great joy! A number of long term DNS projects have reached a point where some big items can be struck off the to-do list.
This note starts with two actions item for those for whom we provide secondary DNS. Then, a warning for those who secondary our zones, including stealth secondaries.
A WebDriver tutorial
2019-12-12 - Progress - Tony Finch
As part of my work on superglue I have resumed work on the WebDriver scripts I started in January. And, predictably because they were a barely working mess, it took me a while to remember how to get them working again.
So I thought it might be worth writing a little tutorial describing how I am using WebDriver. These notes have nothing to do with my scripts or the DNS; it's just about the logistics of scripting a web site.
Jackdaw Apache upgrade
2019-11-21 - News - Tony Finch
This afternoon our colleagues who run Jackdaw will upgrade the web server software. (The IP Register database is an application hosted on Jackdaw, alongside the user-admin database and a few others.) This will entail a brief outage for the IP Register web user interface. The DNS will not be affected.
Make before break
2019-11-18 - Progress - Tony Finch
This afternoon I did a tricky series of reconfigurations. The immediate need was to do some prep work for improving our DNS blocks; I also wanted to make some progress towards completing the renaming/renumbering project that has been on the back burner for most of this year; and I wanted to fix a bad quick-and-dirty hack I made in the past.
Along the way I think I became convinced there's an opportunity for a significant improvement.
YAML and Markdown
2019-11-13 - Progress - Tony Finch
This web site is built with a static site generator. Each page on the site has a source file written in Markdown. Various bits of metadata (sidebar links, title variations, blog tags) are set in a bit of YAML front-matter in each file.
Both YAML and Markdown are terrible in several ways.
YAML is ridiculously over-complicated and its minimal syntax can hide minor syntax errors turning them into semantic errors. (A classic example is a list of two-letter country codes, in which Norway (NO) is transmogrified into False.)
Markdown is poorly defined, and has a number of awkward edge cases where its vagueness causes gotchas. It has spawned several dialects to fill in some of its inadequacies, which causes compatibility problems.
However, they are both extremely popular and relatively pleasant to write and read.
For this web site, I have found that a couple of simple sanity checks are really helpful for avoiding cockups.
YAML documents
One of YAML's peculiarities is its idea of storing multiple documents in a stream.
A YAML document consists of a ---
followed by a YAML value. You can
have multiple documents in a file, like these two:
--- document: one --- document: two
YAML values don't have to be key/value maps: they can also be simple strings. So you can also have a two-document file like:
--- one --- two
YAML has a complicated variety of multiline string
syntaxes. For the simple case of a
preformatted string, you can use the |
sigil. This document is like
the previous one, except that the strings have newlines:
--- | one --- | two
YAML frontmatter
The source files for this web site each start with something like this (using this page as an example, and cutting off after the title):
--- tags: [ progress ] authors: [ fanf2 ] --- | YAML and Markdown =================
This is a YAML stream consisting of two documents, the front matter (a key/value map) and the Markdown page body (a preformatted string).
There's a fun gotcha. I like to use underline for headings because it helps to make them stand out in my editor. If I ever have a three-letter heading, that splits the source file into a third YAML document. Oops!
So my static site generator's first sanity check is to verify there are exactly two YAML documents in the file.
Aside: There is also a YAML document end marker, ...
, but I have
not had problems with accidentally truncated pages because of it!
Tabs and indentation
Practically everything (terminals, editors, pagers, browsers...) by default has tab stops every 8 columns. It's a colossal pain in the arse to have to reconfigure everything for different tab stops, and even more of a pain in the arse if you have to work on projects that expect different tab stop settings. (PostgreSQL is the main offender of the projects I have worked with, bah.)
I don't mind different coding styles, or different amounts of indentation, so long as the code I am working on has a consistent style. I tend to default to KNF (the Linux / BSD kernel normal form) if I'm working on my own stuff, which uses one tab = one indent.
The only firm opinion I have is that if you are not using 8 column tab stops and tabs for indents, then you should use spaces for indents.
Indents in Markdown
Markdown uses indentation for structure, either a 4-space indent or a tab indent. This is a terrible footgun if tabs are displayed in the default way and you accidentally have a mixture of spaces and tabs: an 8 column indent might be one indent level or two, depending on whether it is a tab or spaces, and the difference is mostly invisible.
So my static site generator's second sanity check is to ensure there are no tabs in the Markdown.
This is a backup check, in case my editor configuration is wrong and unintentionally leaks tabs.
BIND security release
2019-10-17 - News - Tony Finch
Last night ISC.org published security releases of BIND.
For full details, please see the announcement messages: https://lists.isc.org/pipermail/bind-announce/2019-October/thread.html
The vulnerabilities affect two features that are new in BIND 9.14, mirror zones and QNAME minimization, and we are not affected because we are not using either feature.
Upcoming change to off-site secondary DNS
2019-10-01 - News - Tony Finch
This notice is mainly for the attention of those who run DNS zones for which the UIS provides secondary servers.
Since 2015 we have used the Internet Systems Consortium secondary name service (ISC SNS) to provide off-site DNS service for University domains.
ISC announced yesterday that the SNS is closing down in January 2020, so we need alternative arrangements.
We have not yet started to make any specific plans, so this is just to let you know that there will be some changes in the next few months. We will let you know when we have more details.
Metadata for login credentials
2019-09-28 - Progress - Tony Finch
This month I have been ambushed by domain registration faff of multiple kinds, so I have picked up a few tasks that have been sitting on the back burner for several months. This includes finishing the server renaming that I started last year, solidifying support for updating DS records to support automated DNSSEC key rollovers, and generally making sure our domain registration contact information is correct and consistent.
I have a collection of domain registration management scripts called superglue, which have always been an appalling barely-working mess that I fettle enough to get some task done then put aside in a slightly different barely-working mess.
I have reduced the mess a lot by coming up with a very simple convention for storing login credentials. It is much more consistent and safe than what I had before.
The login problem
One of the things superglue
always lacked is a coherent way to
handle login credentials for registr* APIs. It predates regpg by a
few years, but regpg
only deals with how to store the secret parts
of the credentials. The part that was awkward was how to store the
non-secret parts: the username, the login URL, commentary about what
the credentials are for, and so on. The IP Register system also has
this problem, for things like secondary DNS configuration APIs and
database access credentials.
There were actually two aspects to this problem.
Ad-hoc data formats
My typical thoughtless design process for the superglue
code that
loaded credentials was like, we need a username and a password, so
we'll bung them in a file separated by a colon. Oh, this service needs
more than that, so we'll have a multi-line file with fieldname colon
value on each line. Just terrible.
I decided that the best way to correct the sins of the past would be to use an off-the-shelf format, so I can delete half a dozen ad-hoc parsers from my codebase. I chose YAML not because it is good (it's not) but because it is well-known, and I'm already using it for Ansible playbooks and page metadata for this web server's static site generator.
Secret hygiene
When designing regpg I formulated some guidelines for looking after secrets safely.
From our high-level perspective, secrets are basically blobs of random data: we can't usefully look at them or edit them by hand. So there is very little reason to expose them, provided we have tools (such as regpg) that make it easy to avoid doing so.
Although regpg isn't very dogmatic, it works best when we put each secret in its own file. This allows us to use the filename as the name of the secret, which is available without decrypting anything, and often all the metadata we need.
That weasel word "often" tries to hide the issue that when I wrote it two years ago I did not have an answer to the question, what if the filename is not all the metadata we need?
I have found that my ad-hoc credential storage formats are very bad
for secret hygiene. They encourage me to use the sinful regpg edit
command, and decrypt secrets just to look at the non-secret parts, and
generally expose secrets more than I should.
If the metadata is kept in a separate cleartext YAML file, then the comments in the YAML can explain what is going on. If we strictly follow the rule that there's exactly one secret in an encrypted file and nothing else, then there's no reason to decrypt secrets unnecessarily everything we need to know is in the cleartext YAML file.
Implementation
I have released regpg-1.10 which includes ReGPG::Login a Perl library for loading credentials stored in my new layout convention. It's about 20 simple lines of code.
Each YAML file example-login.yml
typically looks like:
# commentary explaining the purpose of this login --- url: https://example.com/login username: alice gpg_d: password: example-login.asc
The secret is in the file example-login.asc
alongside. The library
loads the YAML and inserts into the top-level object the decrypted
contents of the secrets listed in the gpg_d
sub-object.
For cases where the credentials need to be available without someone
present to decrypt them, the library looks for a decrypted secret file
example-login
(without the .asc
extension) and loads that instead.
The code loading the file can also list the fields that it needs, to provide some protection against cockups. The result looks something like,
my $login = read_login $login_file, qw(username password url); my $auth = $login->{username}.':'.$login->{password}; my $authorization = 'Basic ' . encode_base64 $auth, ''; my $r = LWP::UserAgent->new->post($login->{url}, Authorization => $authorization, Content_Type => 'form-data', Content => [ hello => 'world' ] );
Deployment
Secret storage in the IP Register system is now a lot more coherent, consistent, better documented, safer, ... so much nicer than it was. And I got to delete some bad code.
I only wish I had thought of this sooner!
Firefox and DNS-over-HTTPS
2019-09-19 - News - Tony Finch
We are configuring our DNS to tell Firefox to continue to use the University's DNS servers, and not to switch to using Cloudflare's DNS servers instead.
Most of this article is background information explaining the rationale for this change. The last section below gives an outline of the implementation details.
Migrating a website with Let's Encrypt
2019-09-03 - Progress - Tony Finch
A few months ago I wrote about Let's Encrypt on clustered Apache web servers. This note describes how to use a similar trick for migrating a web site to a new server.
The situation
You have an existing web site, say www.botolph.cam.ac.uk
, which is
set up with good TLS security.
It has permanent redirects from http://…
to https://…
and from
bare botolph.cam.ac.uk
to www.botolph.cam.ac.uk
. Permanent
redirects are cached very aggressively by browsers, which take
"permanent" literally!
The web site has strict-transport-security with a long lifetime.
You want to migrate it to a new server.
The problem
If you want to avoid an outage, the new server must have similarly good TLS security, with a working certificate, before the DNS is changed from the old server to the new server.
But you can't easily get a Let's Encrypt certificate for a server until after the DNS is pointing at it.
A solution
As in my previous note, we can use the fact that Let's Encrypt will follow redirects, so we can provision a certificate on the new server before changing the DNS.
on the old server
In the http virtual hosts for all the sites that are being migrated
(both botolph.cam.ac.uk
and www.botolph.cam.ac.uk
in our example),
we need to add redirects like
Redirect /.well-known/acme-challenge/ \ http://{{newserver}}/.well-known/acme-challenge/
where {{newserver}}
is the new server's host name (or IP address).
This redirect needs to match more specifically than the existing
http
-> https
redirect, so that Let's Encrypt is sent to the new
server, while other requests are bounced to TLS.
on the new server
Run the ACME client to get a certificate for the web sites that are
migrating. The new server needs to serve ACME challenges for the web
site names botolph.cam.ac.uk
and www.botolph.cam.ac.uk
from the
{{newserver}}
default virtual host. This is straightforward with
the ACME client I use, dehydrated.
migrate
It should now be safe to update the DNS to move the web sites from the old server to the new one. To make sure, there are various tricks you can use to test the new server before updating the DNS [1] [2].
Work planning
2019-08-29 - Future - Tony Finch
I'm back from a summer holiday and it is "Back to School" season, so now seems like a good time to take stock and write down some plans.
This is roughly in order of priority.
Brexit and .eu domain names (update)
2019-07-22 - News - Tony Finch
There has been a change to the eligibility criteria for .eu domain names related to Brexit.
Previously, eligibility was restricted to (basically) being located in the EU. The change extends eligibility for domains owned by individuals to EU citizens everywhere.
Organizations in the UK that have .eu domain names will still need to give them up at Brexit.
Thanks to Alban Milroy for bringing this to our attention.
More complicated ops
2019-07-18 - Progress - Tony Finch
This week I am back porting ops pages from v2 to v3.
I'm super keen to hear any complaints you have about the existing user interface. Please let ip-register@uis.cam.ac.uk know of anything you find confusing or awkward! Not everything will be addressed in this round of changes but we'll keep them in mind for future work.
API cookies
Jackdaw has separate pages to download an API cookie and manage API cookies. The latter is modal and switches between an overview list and a per-cookie page.
In v3 they have been commbined into a single page (screenshot below) with less modality, and I have moved the verbiage to a separate API cookie documentation page.
While I was making this work I got terribly confused that my v3 cookie page did not see the same list of cookies as Jackdaw's manage-cookies page, until I realised that I should have been looking at the dev database on Ruff. The silliest bugs take the longest to fix...
Single ops
Today I have started mocking up a v3 "single ops" page. This is a bit of a challenge, because the existing page is rather cluttered and confusing, and it's hard to improve within the constraint that I'm not changing its functionality.
I have re-ordered the page to be a closer match to the v3 box ops
page. The main difference is that the
address
field is near the top because it is frequently used as a
primary search key.
There is a downside to this placement, because it separates the address from the other address-related fields which are now at the bottom: the address's mzone, lan, subnet, and the mac and dhcp group that are properties of the address rather than properties of the box.
On the other hand, I wanted to put the address-related fields near the
register
and search
buttons to hint that they are kind of related:
you can use the address-related fields to get the database to
automatically pick an address for registration following those
constraints, or you can search for boxes within the constraints.
Did you know that (like table ops but unlike most other pages) you can use SQL-style wildcards to search on the single ops page?
Finally, a number of people said that the mzone / lan boxes are super awkward, and they explicitly asked for a drop-down list. This breaks the rule against no new functionality, but I think it will be simple enough that I can get away with it. (Privileged users still get the boxes rather than a drop-down with thousands of entries!)
Refactored error handling
2019-07-12 - Progress - Tony Finch
This week I have been refactoring the error handling of the "v3" IP Register web interface that I posted screenshots of last week. There have not been any significant visible changes, but I have reduced the code size by nearly 200 lines compared to last week, and fixed a number of bugs in the process.
On top of the previous refactorings, the new code is quite a lot smaller than the existing web interface on Jackdaw.
page | v2 lines | v3 lines | change |
---|---|---|---|
box | 283 | 151 | 53% |
vbox | 327 | 182 | 55% |
aname | 244 | 115 | 47% |
cname | 173 | 56 | 32% |
mx | 253 | 115 | 45% |
srv | 272 | 116 | 42% |
motd | 51 | 22 | 43% |
totp | 66 | 20 | 30% |
Eight ops pages ported
2019-07-04 - Progress - Tony Finch
This week I passed a halfway mark in porting web pages from the old IP Register web interface on Jackdaw to the "v3" web interface. The previous note on this topic was in May when the first ops page was ported to v3, back before the Beer Festival and the server patching work.
DNS update confirmations
2019-06-27 - News - Tony Finch
You might have noticed that the IP Register ops pages on Jackdaw now have a note in the title stating when the last DNS update completed. (Updates start at 53 minutes past each hour and usually take a couple of minutes.)
Occasionlly the update process breaks. It is written fairly conservatively so that if anything unexpected happens it stops and waits for someone to take a look. Some parts of the build process are slightly unreliable, typically parts that push data to other systems. Many of these push actions are not absolutely required to work, and it is OK to retry when the build job runs again in an hour.
Over time we have made the DNS build process less likely to fail-stop, as we have refined the distinction between actions that must work and actions that can be retried in an hour. But the build process changes, and sometimes the new parts fail-stop when they don't need to. That happened earlier this week, which prompted us to add the last update time stamp, so you a little more visibility into how the system is working (or not).
DNS-over-HTTPS and encrypted SNI
2019-06-24 - News - Tony Finch
Recent versions of Firefox make it easier to set up encrypted
DNS-over-HTTPS. If you use Firefox on a fixed
desktop, go to Preferences -> General -> scroll to Network Settings at
the bottom -> Enable DNS over HTTPS, Custom:
https://rec.dns.cam.ac.uk/
. (Our DNS servers are only available on
the CUDN so this setting isn't suitable for mobile devices.)
Very recent versions of Firefox also support encrypted server name indication. When connecting to a web server the browser needs to tell the web server which site it is looking for. HTTPS does this using Server Name Indication, which is normally not encrypted unlike the rest of the connection. ESNI fixes this privacy leak.
To enable ESNI, go to about:config
and verify that
network.security.esni.enabled
is true
.
MZS web site upgrade failed
2019-06-20 - News - Tony Finch
[ This problem has been resolved, with help from Peter Heiner. Thanks! ]
Unfortunately the Managed Zone Service operating system upgrade this evening failed when we attempted to swap the old and new servers. As a result the MZS admin web site is unavailable until tomorrow.
DNS service for MZS domains is unaffected.
We apologise for any inconvenience this may cause.
BIND 9.14.3 and CVE-2019-6471
2019-06-20 - News - Tony Finch
Last night, isc.org announced patch releases of BIND
This vulnerability affects all supported versions of BIND.
Hot on the heels of our upgrade to 9.14.2 earlier this week, I will be patching our central DNS servers to 9.14.3 today. There should be no visible interruption to service.
SACK panic
2019-06-18 - News - Tony Finch
I have applied a workaround for the Linux SACK panic bug (CVE-2019-11477) to the DNS and other servers, as a temporary measure until they can be patched properly later today.
Clustering Let's Encrypt with Apache
2019-06-17 - Progress - Tony Finch
A few months ago I wrote about bootstrapping Let's Encrypt on Debian. I am now using Let's Encrypt certificates on the live DNS web servers.
Clustering
I have a smallish number of web servers (currently 3) and a smallish number of web sites (also about 3). I would like any web server to be able to serve any site, and dynamically change which site is on which server for failover, deployment canaries, etc.
If server 1 asks Let's Encrypt for a certificate for site A, but site A is currently hosted on server 0, the validation request will not go to server 1 so it won't get the correct response. It will fail unless server 0 helps server 1 to validate certificate requests from Let's Encrypt.
Validation servers
I considered various ways that my servers could co-operate to get certificates, but they all required extra machinery for authentication and access control that I don't currently have, and which would be tricky and important to get right.
However, there is a simpler option based on HTTP redirects. Thanks to Malcolm Scott for reminding me that ACME http-01 validation requests follow redirects! The Let's Encrypt integration guide mentions this under "picking a challenge type" and "central validation servers".
Decentralized validation
Instead of redirecting to a central validation server, a small web server cluster can co-operate to validate certificates. It goes like this:
server 1 requests a cert for site A
Let's Encrypt asks site A for the validation response, but this request goes to server 0
server 0 discovers it has no response, so it speculatively replies with a 302 redirect to one of the other servers
Let's Encrypt asks the other server for the validation response; after one or two redirects it will hit server 1 which does have the response
This is kind of gross, because it turns 404 "not found" errors into 302 redirect loops. But that should not happen in practice.
Apache mod_rewrite
My configuration to do this is a few lines of mod_rewrite. Yes, this doesn't help with the "kind of gross" aspect of this setup, sorry!
The rewrite runes live in a catch-all port 80 <VirtualHost>
which
redirects everything (except for Let's Encrypt) to https. I am not
using the dehydrated-apache2
package any more; instead I have copied
its <Directory>
section that tells Apache it is OK to serve
dehydrated's challenge responses.
I use Ansible's Jinja2 template module to install the
configuration and fill in a couple of variables: as usual,
{{inventory_hostname}}
is the server the file is installed on, and
in each server's host_vars
file I set {{next_acme_host}}
to the
next server in the loop. The last server redirects to the first one,
like web0 -> web1 -> web2 -> web0. These are all server host names,
not virtual hosts or web site names.
Code
<VirtualHost *:80> ServerName {{inventory_hostname}} RewriteEngine on # https everything except acme-challenges RewriteCond %{REQUEST_URI} !^/.well-known/acme-challenge/ RewriteRule ^ https://%{SERVER_NAME}%{REQUEST_URI} [L,R=301] # serve files that exist RewriteCond /var/lib/dehydrated/acme-challenges/$1 -f RewriteRule ^/.well-known/acme-challenge/(.*) \ /var/lib/dehydrated/acme-challenges/$1 [L] # otherwise, try alternate server RewriteRule ^ http://{{next_acme_host}}%{REQUEST_URI} [R=302] </VirtualHost> <Directory /var/lib/dehydrated/acme-challenges/> Options FollowSymlinks Options -Indexes AllowOverride None Require all granted </Directory>
MZS web site upgrade, Thurs 20 June
2019-06-13 - News - Tony Finch
On Thursday 20th June between 17:00 and 19:00, the Managed Zone Service web site will be unavailable for a few minutes while its operating system is upgraded. (This will not affect the DNS for MZS domains.)
DNS server upgrades, Tues 18 June
2019-06-12 - News - Tony Finch
On Tuesday 18th June our central DNS resolvers will be upgraded from BIND 9.12.4-P1 to BIND 9.14.2.
First ops page ported
2019-05-15 - Progress - Tony Finch
Yesterday I reached a milestone: I have ported the first "ops" page from the old IP Register web user interface on Jackdaw to the new one that will live on the DNS web servers. It's a trivial admin page for setting the message of the day, but it demonstrates that the infrastructure is (mostly) done.
Security checks
I have spent the last week or so trying to get from a proof of concept to something workable. Much of this work has been on the security checks. The old UI has:
Cookie validation (for Oracle sessions)
Raven authentication
TOTP authentication for superusers
Second cookie validaion for TOTP
CSRF checks
There was an awkward split between the Jackdaw framework and the ipreg-specific parts which meant I needed to add a second cookie when I added TOTP authentication.
In the new setup I have upgraded the cookie to modern security levels, and it handles both Oracle and TOTP session state.
my @cookie_attr = ( -name => '__Host-Session', -path => '/', -secure => 1, -httponly => 1, -samesite => 'strict', );
The various "middleware" authentication components have been split out of the main HTTP request handler so that the overall flow is much easier to see.
State objects
There is some fairly tricky juggling in the old code between:
CGI request object
WebIPDB HTTP request handler object
IPDB database handle wrapper
Raw DBI handle
The CGI object is gone. The mod_perl
Apache2 APIs are sufficient
replacements, and the HTML generation functions are being
replaced by mustache templates. (Though there is some programmatic
form generation in table_ops
that might be awkward!)
I have used Moo roles to mixin the authentication middleware bits to the main request handler object, which works nicely. I might do the same for the IPDB object, though that will require some refactoring of some very old skool OO perl code.
Next
The plan is to port the rest of the ops pages as directly as possible. There is going to be a lot of refactoring, but it will all be quite superficial. The overall workflow is going to remain the same, just more purple.
Oracle connection timeouts
2019-05-07 - Progress - Tony Finch
Last week while I was beating mod_perl
code into shape, I happily
deleted a lot of database connection management code that I had
inherited from Jackdaw's web server. Today I had to put it all back
again.
Apache::DBI
There is a neat module called Apache::DBI which hooks mod_perl
and DBI together to provide a transparent connection cache: just throw
in a use
statement, throw out dozens of lines of old code, and you
are pretty much done.
Connection hangs
Today the clone of Jackdaw that I am testing against was not available (test run for some maintenance work tomorrow, I think) and I found that my dev web server was no longer responding. It started OK but would not answer any requests. I soon worked out that it was trying to establish a database connection and waiting at least 5 minutes (!) before giving up.
DBI(3pm) timeouts
There is a long discussion about timeouts in the DBI documentation which specifically mentions DBD::Oracle as a problem case, with some lengthy example code for implementing a timeout wrapper around DBI::connect.
This is a terrible documentation anti-pattern. Whenever I find myself giving lengthy examples of how to solve a problem I take it as a whacking great clue that the code should be fixed so the examples can be made a lot easier.
In this case, DBI should have connection timeouts as standard.
Sys::SigAction
If you read past the examples in DBI(3pm) there's a reference to a more convenient module which provides a timeout wrapper that can be used like this:
if (timeout_call($connect_timeout, sub { $dbh = DBI->connect(@connect_args); moan $DBI::errstr unless $dbh; })) { moan "database connection timed out"; }
Undelete
The problem is that there isn't a convenient place to put this timeout code where it should be, so that Apache::DBI can use it transparently.
So I resurrected Jackdaw's database connection cache. But not exacly - I looked through it again and I could not see any extra timeout handling code. My guess is that hung connections can't happen if the database is on the same machine as the web server.
Reskinning IP Register
2019-05-01 - Progress - Tony Finch
At the end of the item about Jackdaw and Raven I mentioned that when the web user interface moves off Jackdaw it will get a reskin.
The existing code uses Perl CGI functions for rendering the HTML, with no styling at all. I'm replacing this with mustache templates using the www.dns.cam.ac.uk Project Light framework. So far I have got the overall navigation structure working OK, and it's time to start putting forms into the pages.
I fear this reskin is going to be disappointing, because although it's
superficially quite a lot prettier, the workflow is going to be the
same - for example, the various box_ops
etc. links in the existing
user interface become Project Light local navigation tabs in the new
skin. And there are still going to be horrible Oracle errors.
BIND 9.12.4-P1 and Ed25519
2019-04-25 - News - Tony Finch
Last night, isc.org announced patch releases of BIND
I have upgraded our DNS servers to 9.12.4-P1 to address the TCP socket exhaustion vulnerability.
At the same time I have also relinked BIND with a more recent version of OpenSSL, so it is now able to validate the small number of domains that use the new Ed25519 DNSSEC algorithm.
Jackdaw and Raven
2019-04-16 - Progress - Tony Finch
I've previously written about authentication and access control in the IP Register database. The last couple of weeks I have been reimplementing some of it in a dev version of this DNS web server.
Bootstrapping Let's Encrypt on Debian
2019-03-15 - Progress - Tony Finch
I've done some initial work to get the Ansible playbooks for our DNS systems working with the development VM cluster on my workstation. At this point it is just for web-related experimentation, not actual DNS servers.
Of course, even a dev server needs a TLS certificate, especially because these experiments will be about authentication. Until now I have obtained certs from the UIS / Jisc / QuoVadis, but my dev server is using Let's Encrypt instead.
Chicken / egg
In order to get a certificate from Let's Encrypt using the http-01
challenge, I need a working web server. In order to start the web
server with its normal config, I need a certificate. This poses a bit
of a problem!
Snakeoil
My solution is to install Debian's ssl-cert
package, which creates a
self-signed certificate. When the web server does not yet have a
certificate (if the QuoVadis cert isn't installed, or dehydrated
has
not been initialized), Ansible temporarily symlinks the self-signed
cert for use by Apache, like this:
- name: check TLS certificate exists stat: path: /etc/apache2/keys/tls-web.crt register: tls_cert - when: not tls_cert.stat.exists name: fake TLS certificates file: state: link src: /etc/ssl/{{ item.src }} dest: /etc/apache2/keys/{{ item.dest }} with_items: - src: certs/ssl-cert-snakeoil.pem dest: tls-web.crt - src: certs/ssl-cert-snakeoil.pem dest: tls-chain.crt - src: private/ssl-cert-snakeoil.key dest: tls.pem
ACME dehydrated boulders
The dehydrated
and dehydrated-apache2
packages need a little
configuration. I needed to add a cron job to renew the certificate, a
hook script to reload apache when the cert is renewed, and tell it
which domains should be in the cert. (See below for details of these
bits.)
After installing the config, Ansible initializes dehydrated
if
necessary - the creates
check stops Ansible from running
dehydrated
again after it has created a cert.
- name: initialize dehydrated command: dehydrated -c args: creates: /var/lib/dehydrated/certs/{{inventory_hostname}}/cert.pem
Having obtained a cert, the temporary symlinks get overwritten with links to the Let's Encrypt cert. This is very similar to the snakeoil links, but without the existence check.
- name: certificate links file: state: link src: /var/lib/dehydrated/certs/{{inventory_hostname}}/{{item.src}} dest: /etc/apache2/keys/{{item.dest}} with_items: - src: cert.pem dest: tls-web.crt - src: chain.pem dest: tls-chain.crt - src: privkey.pem dest: tls.pem notify: - restart apache
After that, Apache is working with a proper certificate!
Boring config details
The cron script chatters into syslog, but if something goes wrong it should trigger an email (tho not a very informative one).
#!/bin/bash set -eu -o pipefail ( dehydrated --cron dehydrated --cleanup ) | logger --tag dehydrated --priority cron.info
The hook script only needs to handle one of the cases:
#!/bin/bash set -eu -o pipefail case "$1" in (deploy_cert) apache2ctl configtest && apache2ctl graceful ;; esac
The configuration needs a couple of options added:
- copy: dest: /etc/dehydrated/conf.d/dns.sh content: | EMAIL="hostmaster@cam.ac.uk" HOOK="/etc/dehydrated/hook.sh"
The final part is to tell dehydrated
the certificate's domain name:
- copy: content: "{{inventory_hostname}}\n" dest: /etc/dehydrated/domains.txt
For production, domains.txt
needs to be a bit more complicated. I
have a template like the one below. I have not yet deployed it; that
will probably wait until the cert needs updating.
{{hostname}} {% if i_am_www %} www.dns.cam.ac.uk dns.cam.ac.uk {% endif %}
BIND 9.12.3-P4 and other patch releases
2019-02-27 - News - Tony Finch
Last week, isc.org announced patch releases of BIND
I have upgraded our DNS servers to 9.12.3-P4 to address the memory leak vulnerability.
KSK rollover project status
2019-02-07 - Progress- Future - Tony Finch
I have spent the last week working on DNSSEC key rollover automation in BIND. Or rather, I have been doing some cleanup and prep work. With reference to the work I listed in the previous article...
Done
Stop BIND from generating SHA-1 DS and CDS records by default, per RFC 8624
Teach
dnssec-checkds
about CDS and CDNSKEY
Started
- Teach
superglue
to use CDS/CDNSKEY records, with similar logic todnssec-checkds
The "similar logic" is implemented in dnssec-dsfromkey
, so I don't
actually have to write the code more than once. I hope this will also
be useful for other people writing similar tools!
Some of my small cleanup patches have been merged into BIND. We are currently near the end of the 9.13 development cycle, so this work is going to remain out of tree for a while until after the 9.14 stable branch is created and the 9.15 development cycle starts.
Next
So now I need to get to grips with dnssec-coverage
and dnssec-keymgr
.
Simple safety interlocks
The purpose of the dnssec-checkds
improvements is so that it can be
used as a safety check.
During a KSK rollover, there are one or two points when the DS records in the parent need to be updated. The rollover must not continue until this update has been confirmed, or the delegation can be broken.
I am using CDS and CDNSKEY records as the signal from the key
management and zone signing machinery for when DS records need to
change. (There's a shell-style API in dnssec-dsfromkey -p
, but that
is implemented by just reading these sync records, not by looking into
the guts of the key management data.) I am going to call them "sync
records" so I don't have to keep writing "CDS/CDNSKEY"; "sync" is also
the keyword used by dnssec-settime
for controlling these records.
Key timing in BIND
The dnssec-keygen
and dnssec-settime
commands (which are used by
dnssec-keymgr
) schedule when changes to a key will happen.
There are parameters related to adding a key: when it is published in the zone, when it becomes actively used for signing, etc. And there are parameters related to removing a key: when it becomes inactive for signing, when it is deleted from the zone.
There are also timing parameters for publishing and deleting sync records. These sync times are the only timing parameters that say when we must update the delegation.
What can break?
The point of the safety interlock is to prevent any breaking key changes from being scheduled until after a delegation change has been confirmed. So what key timing events need to be forbidden from being scheduled after a sync timing event?
Events related to removing a key are particularly dangerous. There are some cases where it is OK to remove a key prematurely, if the DS record change is also about removing that key, and there is another working key and DS record throughout. But it seems simpler and safer to forbid all removal-related events from being scheduled after a sync event.
However, events related to adding a key can also lead to nonsense. If we blindly schedule creation of new keys in advance, without verifying that they are also being properly removed, then the zone can accumulate a ridiculous number of DNSKEY records. This has been observed in the wild surprisingly frequently.
A simple rule
There must be no KSK changes of any kind scheduled after the next sync event.
This rule applies regardless of the flavour of rollover (double DS, double KSK, algorithm rollover, etc.)
Applying this rule to BIND
Whereas for ZSKs, dnssec-coverage
ensures rollovers are planned for
some fixed period into the future, for KSKs, it must check correctness
up to the next sync event, then ensure nothing will occur after that point.
In dnssec-keymgr
, the logic should be:
If the current time is before the next sync event, ensure there is key coverage until that time and no further.
If the current time is after all KSK events, use
dnssec-checkds
to verify the delegation is in sync.If
dnssec-checkds
reports an inconsistency and we are within some sync interval dictated by the rollover policy, do nothing while we wait for the delegation update automation to work.If
dnssec-checkds
reports an inconsistency and the sync interval has passed, report an error because operator intervention is required to fix the failed automation.If
dnssec-checkds
reports everything is in sync, schedule keys up to the next sync event. The timing needs to be relative to this point in time, since any delegation update delays can make it unsafe to schedule relative to the last sync event.
Caveat
At the moment I am still not familiar with the internals of
dnssec-coverage
and dnssec-keymgr
so there's a risk that I might
have to re-think these plans. But I expect this simple safety rule
will be a solid anchor that can be applied to most DNSSEC key
management scenarios. (However I have not thought hard enough about
recovery from breakage or compromise.)
DNSSEC key rollover automation with BIND
2019-01-30 - Future - Tony Finch
I'm currently working on filling in the missing functionality in BIND that is needed for automatic KSK rollovers. (ZSK rollovers are already automated.) All these parts exist; but they have gaps and don't yet work together.
The basic setup that will be necessary on the child is:
Write a policy configuration for
dnssec-keymgr
.Write a cron job to run
dnssec-keymgr
at a suitable interval. If the parent does not rundnssec-cds
then this cron job should also runsuperglue
or some other program to push updates to the parent.
The KSK rollover process will be driven by dnssec-keymgr
, but it
will not talk directly to superglue
or dnssec-cds
, which make the
necessary changes. In fact it can't talk to dnssec-cds
because that
is outside the child's control.
So, as specified in RFC 7344,
the child will advertise the desired state of its delegation using CDS
and CDNSKEY records. These are read by dnssec-cds
or superglue
to
update the parent. superglue
will be loosely coupled, and able to
work with any DNSSEC key management softare that publishes CDS
records.
The state of the keys in the child is controlled by the timing
parameters in the key files, which are updated by dnssec-keymgr
as
determined by the policy configuration. At the moment it generates
keys to cover some period into the future. For KSKs, I think it will
make more sense to generate keys up to the next DS change, then stop
until dnssec-checkds
confirms the parent has implemented the change,
before continuing. This is a bit different from the ZSK coverage
model, but future coverage for KSKs can't be guaranteed because
coverage depends on future interactions with an external system which
cannot be assumed to work as planned.
Required work
Teach
dnssec-checkds
about CDS and CDNSKEYTeach
dnssec-keymgr
to set "sync" timers in key files, and to invokednssec-checkds
to avoid breaking delegations.Teach
dnssec-coverage
to agree withdnssec-keymgr
about sensible key configuration.Teach
superglue
to use CDS/CDNSKEY records, with similar logic todnssec-checkds
Stop BIND from generating SHA-1 DS and CDS records by default, per draft-ietf-dnsop-algorithm-update
Release announcement: nsdiff-1.77
2019-01-29 - News - Tony Finch
I have released a new version of nsdiff.
This release adds CDS
and CDNSKEY
records to the list of
DNSSEC-related types that are ignored, since by default nsdiff
expects them to be managed by the name server, not as part of the zone
file. There is now a -C
option to revert to the previous behaviour.
Old recdns.csx names have been abolished
2019-01-28 - News - Tony Finch
As previously announced, the old recursive DNS server names have been removed from the DNS, so the new names are now canonical.
131.111.8.42 recdns0.csx.cam.ac.uk -> rec0.dns.cam.ac.uk 131.111.12.20 recdns1.csx.cam.ac.uk -> rec1.dns.cam.ac.uk
A digression for the historically curious: the authdns
and recdns
names date from May 2006, when they were introduced to prepare for
separating authoritative and recursive DNS service.
Until 2006, 131.111.8.42 was known as chimaera.csx.cam.ac.uk
. It had
been our primary DNS server since September/October 1995. Before then,
our DNS was hosted on CUS, the Central Unix Service.
And 131.111.12.20 had been known as c01.csi.cam.ac.uk
(or comms01
)
since before my earliest records in October 1991.
Superglue with WebDriver
2019-01-25 - Progress - Tony Finch
Earlier this month I wrote notes on some initial experiments in browser automation with WebDriver. The aim is to fix my superglue DNS delegation update scripts to work with currently-supported tools.
In the end I decided to rewrite the superglue-janet
script in Perl,
since most of superglue
is already Perl and I would like to avoid
rewriting all of it. This is still work in progress; superglue
is
currently an unusable mess, so I don't recommend looking at it right
now :-)
My WebDriver library
Rather than using an off-the-shelf library, I have a very thin layer (300 lines of code, 200 lines of docs) that wraps WebDriver HTTP+JSON calls in Perl subroutines. It's designed for script-style usage, so I can write things like this (quoted verbatim):
# Find the domain's details page. click '#commonActionsMenuLogin_ListDomains'; fill '#MainContent_tbDomainNames' => $domain, '#MainContent_ShowReverseDelegatedDomains' => 'selected'; click '#MainContent_btnFilter';
This has considerably less clutter than the old PhantomJS / CasperJS code!
Asyncrony
I don't really understand the concurrency model between the WebDriver server and the activity in the browser. It appears to be remarkably similar to the way CasperJS behaved, so I guess it is related to the way JavaScript's event loop works (and I don't really understand that either).
The upshot is that in most cases I can click
on a link, and the
WebDriver response comes back after the new page has loaded. I can
immediately interact with the new page, as in the code above.
However there are some exceptions.
On the JISC domain registry web site there are a few cases where selecting from a drop-down list triggers some JavaScript that causes a page reload. The WebDriver request returns immediately, so I have to manually poll for the page load to complete. (This also happened with CasperJS.) I don't know if there's a better way to deal with this than polling...
The WebDriver spec
I am not a fan of the WebDriver protocol specification. It is written as a description of how the code in the WebDriver server / browser behaves, written in spaghetti pseudocode.
It does not have any abstract syntax for JSON requests and responses - no JSON schema or anything like that. Instead, the details of parsing requests and constructing responses are interleaved with details of implementing the semantics of the request. It is a very unsafe style.
And why does the WebDriver spec include details of how to HTTP?
Next steps
This work is part of two ongoing projects:
I need to update all our domain delegations to complete the server renaming.
I need automated delegation updates to support automated DNSSEC key rollovers.
So I'm aiming to get superglue
into a usable state, and hook it up
to BIND's dnssec-keymgr.
Happenings in DNS
2019-01-18 - News - Tony Finch
A couple of items worth noting:
DNS flag day
The major DNS resolver providers have declared February 1st to be DNS Flag Day. (See also the ISC blog item on the DNS flag day.)
DNS resolvers will stop working around broken authoritative DNS servers that do not implement EDNS correctly. The effect will be that DNS resolution may fail in some cases where it used to be slow.
The flag day will take effect immediately on some large public resolvers. In Cambridge, it will take effect on our central resolvers after they are upgraded to BIND 9.14, which is the next stable branch due to be released Q1 this year.
I'm running the development branch 9.13 on my workstation, which already includes the Flag Day changes, and I haven't noticed any additional breakage - but then my personal usage is not particularly heavy nor particularly diverse.
Old DNSSEC root key revoked
Last week the old DNSSEC root key was revoked, so DNSSEC validators that implement RFC 5011 trust anchor updates should have deleted the old key (tag 19036) from their list of trusted keys.
For example, on one of my resolvers the output of rndc managed-keys
now includes the following. (The tag of the old key changed from 19036
to 19164 when the revoke flag was added.)
name: . keyid: 20326 algorithm: RSASHA256 flags: SEP next refresh: Fri, 18 Jan 2019 14:28:17 GMT trusted since: Tue, 11 Jul 2017 15:03:52 GMT keyid: 19164 algorithm: RSASHA256 flags: REVOKE SEP next refresh: Fri, 18 Jan 2019 14:28:17 GMT remove at: Sun, 10 Feb 2019 14:20:18 GMT trust revoked
This is the penultimate step of the root key rollover; the final step is to delete the revoked key from the root zone.
Old recdns.csx names to be abolished
2019-01-17 - News - Tony Finch
On Monday 28th January after the 13:53 DNS update, the old recursive DNS server names will be removed from the DNS. They have been renamed like this:
131.111.8.42 recdns0.csx.cam.ac.uk -> rec0.dns.cam.ac.uk 131.111.12.20 recdns1.csx.cam.ac.uk -> rec1.dns.cam.ac.uk
Although there should not be much that depends on the old names, we are giving you a warning in case things like monitoring systems need reconfiguration.
This is part of the ongoing DNS server reshuffle project.
Preserving dhcpd leases across reinstalls
2019-01-14 - Progress - Tony Finch
(This is an addendum to December's upragde notes.)
I have upgraded
the IP Register DHCP servers
twice this year. In February they were upgraded from Ubuntu 12.04 LTS
to 14.04 LTS, to cope with 12.04's end of life, and to merge their
setup into the main ipreg
git repository (which is why the target
version was so old). So their setup was fairly tidy before the Debian
9 upgrade.
Statefulness
Unlike most of the IP Register systems, the dhcp servers are stateful.
Their dhcpd.leases
files must be preserved across reinstalls.
The leases file is a database (in the form of a flat text file in ISC
dhcp config file format) which closely matches the state of the network.
If it is lost, the server no longer knows about IP addresses in use by existing clients, so it can issue duplicate addresses to new clients, and hilarity will ensue!
So, just before rebuilding a server, I have to stop the dhcpd and take a copy of the leases file. And before the dhcpd is restarted, I have to copy the leases file back into place.
This isn't something that happens very often, so I have not automated it yet.
Bad solutions
In February, I hacked around with the Ansible playbook to ensure the dhcpd was not started before I copied the leases file into place. This is an appallingly error-prone approach.
Yesterday, I turned that basic idea into an Ansible variable that controls whether the dhcpd is enabled. This avoids mistakes when fiddling with the playbook, but it is easily forgettable.
Better solution
This morning I realised a much neater way is to disable the entire dhcpd role if the leases file doesn't exist. This prevents the role from starting the dhcpd on a newly reinstalled server before the old leases file is in place. After the server is up, the check is a no-op.
This is a lot less error-prone. The only requirement for the admin is
knowledge about the importance of preserving dhcpd.leases
...
Further improvements
The other pitfall in my setup is that monit
will restart dhcpd
if
it is missing, so it isn't easy to properly stop it.
My dhcpd_enabled
Ansible variable takes care of this, but I think it
would be better to make a special shutdown playbook, which can also
take a copy of the leases file.
Review of 2018
2019-01-11 - Progress - Tony Finch
Some notes looking back on what happened last year...
Stats
1457 commits
4035 IP Register / MZS support messages
5734 cronspam messages
Projects
New DNS web site (Feb, Mar, Jun, Sep, Oct, Nov)
This was a rather long struggle with a lot of false starts, e.g. February / March finding that Perl Template Toolkit was not very satisfactory; realising after June that the server naming and vhost setup was unhelpful.
End result is quite pleasing
IP Register API extensions (Aug)
API access to
xlist_ops
MWS3 API generalized for other UIS services
Now in active use by MWS, Drupal Falcon, and to a lesser extent by the HPC OpenStack cluster and the new web Traffic Managers. When old Falcon is wound down we will be able to eliminate Gossamer!
Server upgrade / rename (Dec)
Lots of Ansible review / cleanup. Satisfying.
Future of IP Register
Prototype setup for PostgreSQL replication using
repmgr
(Jan)Prototype infrastructure for JSON-RPC API in Typescript (April, May)
Maintenance
DHCP servers upgraded to match rest of IP Register servers (Feb)
DNS servers upgraded to BIND 9.12, with some
serve-stale
related problems. (March)Local patches all now incorporated upstream :-)
git.uis continues, hopefully not for much longer
IETF
Took over as the main author of draft-ietf-dnsop-aname. This work is ongoing.
Received thanks in RFC 8198 (DNSSEC negative answer synthesis), RFC 8324 (DNS privacy), RFC 8482 (minimal ANY responses), RFC 8484 (DNS-over-HTTPS).
Open Source
Ongoing maintenance of
regpg
. This has stabilized and reached a comfortable feature plateau.Created
doh101
, a DNS-over-TLS and DNS-over-HTTPS proxy.Initial prototype in March at the IETF hackathon.
Revamped in August to match final IETF draft.
Deployed in production in September.
Fifteen patches committed to BIND9.
CVE-2018-5737; extensive debugging work on the
serve-stale
feature.Thanked by ISC.org in their annual review.
Significant clean-up and enhancement of my qp trie data structure, used by Knot DNS. This enabled much smaller memory usage during incremental zone updates.
https://gitlab.labs.nic.cz/knot/knot-dns/issues/591
What's next?
Update
superglue
delegation maintenance script to match the current state of the world. Hook it in todnssec-keymgr
and get automatic rollovers working.Rewrite draft-ietf-dnsop-aname again, in time for IETF104 in March.
Server renumbering, and xfer/auth server split, and anycast. When?
Port existing ipreg web interface off Jackdaw.
Port database from Oracle on Jackdaw to PostgreSQL on my servers.
Develop new API / UI.
Re-do provisioning system for streaming replication from database to DNS.
Move MZS into IP Register database.
Brexit and .eu domain names
2019-01-09 - News - Tony Finch
This message is for the attention of anyone who has used a third-party DNS provider to register a .eu domain name, or a domain name in another European country-class two-letter top-level domain.
Last year, EURID (the registry for .eu domain names) sent out a notice about the effect of Brexit on .eu domain names registered in the UK. The summary is that .eu domains may only be registered by organizations or individuals in the EU, and unless any special arrangements are made (which has not happened) this will not include the UK after Brexit, so UK .eu domain registrations will be cancelled.
https://eurid.eu/en/register-a-eu-domain/brexit-notice/
Other European country-class TLDs may have similar restrictions (for instance, Italy's .it).
Sadly we cannot expect our government to behave sensibly, so you have to make your own arrangements for continuity of your .eu domain.
The best option is for you to find one of your collaborators in another EU country who is able to take over ownership of the domain.
We have contacted the owners of .eu domains registered through our Managed Zone Service. Those who registered a .eu domain elsewhere should contact their DNS provider for detailed support.
Edited to add: Thanks to Elliot Page for pointing out that this problem may apply to other TLDs as well as .eu
Edited to add (2019-07-22): There has been an update to the .eu eligibility criteria.
Notes on web browser automation
2019-01-08 - Progress - Tony Finch
I spent a few hours on Friday looking in to web browser automation. Here are some notes on what I learned.
Context
I have some old code called superglue-janet which drives the JISC / JANET / UKERNA domain registry web site. The web site has some dynamic JavaScript behaviour, and it looks to me like the browser front-end is relatively tightly coupled to the server back-end in a way that I expected would make reverse engineering unwise. So I decided to drive the web site using browser automation tools. My code is written in JavaScript, using PhantomJS (a headless browser based on QtWebKit) and CasperJS (convenience utilities for PhantomJS).
Rewrite needed
PhantomJS is now deprecated, so the code needs a re-work. I also want to use TypeScript instead, where I would previously have used JavaScript.
Current landscape
The modern way to do things is to use a full-fat browser in headless mode and control it using the standard WebDriver protocol.
For Firefox this means using the geckodriver proxy which is a Rust program that converts the WebDriver JSON-over-HTTP protocol to Firefox's native Marionette protocol.
[Aside: Marionette is a full-duplex protocol that exchanges JSON messages prefixed by a message length. It fits into a similar design space to Microsoft's Language Server Protocol, but LSP uses somewhat more elaborate HTTP-style framing and JSON-RPC message format. It's kind of a pity that Marionette doesn't use JSON-RPC.]
The WebDriver protocol came out of the Selenium browser automation project where earlier (incompatible) versions were known as the JSON Wire Protocol.
What I tried out
I thought it would make sense to write the WebDriver client in TypeScript. The options seemed to be:
selenium-webdriver, which has Selenium's bindings for node.js. This involves a second proxy written in Java which goes between node and geckodriver. I did not like the idea of a huge wobbly pile of proxies.
webdriver.io aka wdio, a native node.js WebDriver client. I chose to try this, and got it going fairly rapidly.
What didn't work
I had enormous difficulty getting anything to work with wdio and TypeScript. It turns out that the wdio typing was only committed a couple of days before my attempt, so I had accidentally found myself on the bleeding edge. I can't tell whether my failure was due to lack of documentation or brokenness in the type declarations...
What next
I need to find a better WebDriver client library. The wdio framework is very geared towards testing rather than general automation (see the wdio "getting started" guide for example) so if I use it I'll be talking to its guts rather than the usual public interface. And it won't be very stable.
I could write it in Perl but that wouldn't really help to reduce the amount of untyped code I'm writing :-)
The missing checklist
2019-01-07 - Progress - Tony Finch
Before I rename/upgrade any more servers, this is the checklist I should have written last month...
For rename
Ensure both new and old names are in the DNS
Rename the host in
ipreg/ansible/bin/make-inventory
and run the scriptRun
ipreg/ansible/bin/ssh-knowhosts
to update~/.ssh/known_hosts
Rename
host_vars/$SERVER
and adjust the contents to match a previously renamed server (mutatis mutandis)For recursive servers, rename the host in
ipreg/ansible/roles/keepalived/files/vrrp-script
andipreg/ansible/inventory/dynamic
For both
- Ask
infra-sas@uis
to do the root privilege parts of the netboot configuration - rename and/or new OS version as required
For upgrade
For DHCP servers, save a copy of the leases file by running:
ansible-playbook dhcpd-shutdown-save-leases.yml \ --limit $SERVER
Run the
preseed.yml
playbook to update the unprivileged parts of the netboot configReboot the server, tell it to netboot and do a preseed install
Wait for that to complete
For DHCP servers, copy the saved leases file to the server.
Then run:
ANSIBLE_SSH_ARGS=-4 ANSIBLE_HOST_KEY_CHECKING=False \ ansible-playbook -e all=1 --limit $SERVER main.yml
For rename
Update the rest of the cluster's view of the name
git push ansible-playbook --limit new main.yml
Notes on recent DNS server upgrades
2019-01-02 - Progress - Tony Finch
I'm now most of the way through the server upgrade part of the rename / renumbering project. This includes moving the servers from Ubuntu 14.04 "Trusty" to Debian 9 "Stretch", and renaming them according to the new plan.
Done:
Live and test web servers, which were always Stretch, so they served as a first pass at getting the shared parts of the Ansible playbooks working
Live and test primary DNS servers
Live x 2 and test x 2 authoritative DNS servers
One recursive server
To do:
Three other recursive servers
Live x 2 and test x 1 DHCP servers
Here are a few notes on how the project has gone so far.
More DNS server rebuilds
2018-12-19 - News - Tony Finch
As announced last week the remaining authoritative DNS servers
will be upgraded this afternoon. There will be an outage of
authdns1.csx.cam.ac.uk
for several minutes, during which our other
authoritative servers will be available to provide DNS service.
The primary server ipreg.csi.cam.ac.uk
will also be rebuilt, which
will involve reconstructing all our DNS zone files from scratch. (This
is less scary than it sounds, because the software we use for the
hourly DNS updates makes it easy to verify that DNS zones are
the same.)
These upgrades will cause secondary servers to perform full zone transfers of our zones, since the incremental transfer journals will be lost.
Authoritative DNS server rebuilds
2018-12-12 - News - Tony Finch
Tomorrow (13 December) we will reinstall the authoritative DNS server
authdns0.csx.cam.ac.uk
and upgrade its operating system from Ubuntu
14.04 "Trusty" to Debian 9 "Stretch".
During the upgrade our other authoritative servers will be available
to provide DNS service. After the upgrade, secondary servers are
likely to perform full zone transfers from authdns0
since it will
have lost its incremental zone transfer journal.
Next week, we will do the same for authdns1.csx.cam.ac.uk
and for
ipreg.csi.cam.ac.uk
(the primary server).
During these upgrades the servers will have their hostnames changed to
auth0.dns.cam.ac.uk
, auth1.dns.cam.ac.uk
, and pri0.dns.cam.ac.uk
,
at least from the sysadmin point of view. There are lots of references
to the old names which will continue to work until all the NS and SOA
DNS records have been updated. This is an early step in the DNS
server renaming / renumbering project.
IPv6 prefixes and LAN names
2018-12-06 - Future - Tony Finch
I have added a note to the ipreg schema wishlist that it should be possible for COs to change LAN names associated with IPv6 prefixes.
Postcronspam
2018-11-30 - Progress - Tony Finch
This is a postmortem of an incident that caused a large amount of cronspam, but not an outage. However, the incident exposed a lot of latent problems that need addressing.
Description of the incident
I arrived at work late on Tuesday morning to find that the DHCP
servers were sending cronspam every minute from monit
. monit
thought dhcpd
was not working, although it was.
A few minutes before I arrived, a colleague had run our Ansible playbook to update the DHCP server configuration. This was the trigger for the cronspam.
Cause of the cronspam
We are using monit
as a basic daemon supervisor for our critical
services. The monit
configuration doesn't have an "include" facility
(or at least it didn't when we originally set it up) so we are using
Ansible's "assemble" feature to concatenate configuration file
fragments into a complete monit
config.
The problem was that our Ansible setup didn't have any explicit
dependencies between installing monit
config fragments and
reassembling the complete config and restarting monit
.
Running the complete playbook caused the monit
config to be
reassembled, so an incorrect but previously inactive config fragment
was activated, causing the cronspam.
Origin of the problem
How was there an inactive monit
config fragment on the DHCP servers?
The DHCP servers had an OS upgrade and reinstall in February. This was
when the spammy broken monit
config fragment was written.
What were the mistakes at that time?
The config fragment was not properly tested. A good
monit
config is normally silent, but in this case we didn't check that it sent cronspam when things are broken, whoch would have revealed that the config fragment was not actually installed properly.The Ansible playbook was not verified to be properly idempotent. It should be possible to wipe a machine and reinstall it with one run of Ansible, and a second run should be all green. We didn't check the second run properly. Check mode isn't enough to verify idempotency of "assemble".
During routine config changes in the nine months since the servers were reinstalled, the usual practice was to run the DHCP-specific subset of the Ansible playbook (because that is much faster) so the bug was not revealed.
Deeper issues
There was a lot more anxiety than there should have been when debugging this problem, because at the time the Ansible playbooks were going through a lot of churn for upgrading and reinstalling other servers, and it wasn't clear whether or not this had caused some unexpected change.
This gets close to the heart of the matter:
- It should always be safe to check out and run the Ansible playbook against the production systems, and expect that nothing will change.
There are other issues related to being a (nearly) solo developer, which makes it easier to get into bad habits. The DHCP server config has the most contributions from colleagues at the moment, so it is not really surprising that this is where we find out the consequences of the bad habits of soloists.
Resolutions
It turns out that monit
and dhcpd do not really get along. The
monit
UDP health checker doesn't work with DHCP (which was the cause
of the cronspam) and monit
's process checker gets upset by dhcpd
being restarted when it needs to be reconfigured.
The monit
DHCP UDP checker has been disabled; the process checker
needs review to see if it can be useful without sending cronspam on
every reconfig.
There should be routine testing to ensure the Ansible playbooks committed to the git server run green, at least in check mode. Unfortunately it's risky to automate this because it requires root access to all the servers; at the moment root access is restricted to admins in person.
We should be in the habit of running the complete playbook on all the servers (e.g. before pushing to the git server), to detect any differences between check mode and normal (active) mode. This is necessary for Ansible tasks that are skipped in check mode.
Future work
This incident also highlights longstanding problems with our low bus protection factor and lack of automated testing. The resolutions listed above will make some small steps to improve these weaknesses.
New servers in Maths, and other sample.named.conf changes
2018-11-26 - News - Tony Finch
The Faculty of Mathematics have a revamped DNS setup, with new
authoritative DNS servers, authdns0.maths
(131.111.20.101) and
authdns1.maths
(131.111.20.202).
I have updated sample.named.conf
and
catz.arpa.cam.ac.uk
to refer to these new servers for the 11 Maths
zones. Also, I have belatedly added the Computer Lab's new reverse DNS
range for 2a05:b400:110::/48.
The stealth secondary server documentation now includes
separate, simpler configuration files for forwarding BIND resolvers,
and for stealth secondaries using catz.arpa.cam.ac.uk
. (As far as I
can tell I am still the only one using catalog zones at the moment!
They are pretty neat, though.)
New DNS web site
2018-11-20 - News - Tony Finch
There is a new web site for DNS in Cambridge at https://www.dns.cam.ac.uk/
The new site is mostly the old (sometimes very old) documentation that was hosted under https://jackdaw.cam.ac.uk/ipreg/. It has been reorganized and reformatted to make it easier to navigate; for example some pages have been rescued from the obscurity of the news archives. There are a few new pages that fill in some of the gaps.
The old pages (apart from the IP Register database interface) will shortly be replaced by redirects to their new homes on the new site.
News feeds
Our DNS news mailing list has been renamed to uis-dns-announce;
those who were subscribed to the old cs-nameservers-announce
list
have been added to the new list. This mailing list is for items of
interest to those running DNS servers on the CUDN, but which aren't of
broad enough relevance to bother the whole of ucam-itsupport
.
There are now Atom feeds for DNS news available from https://www.dns.cam.ac.uk/news/.
This news item is also posted at https://www.dns.cam.ac.uk/news/2018-11-20-web-site.html
Infrastructure
The new site is part of the project to move the IP Register database off Jackdaw. The plan is:
New web server; evict documentation. (done)
Replcate IP Register web user interface on new server. (This work will mostly be about porting Jackdaw's bespoke "WebDBI"
mod_perl
/ Oracle application framework.)Move the IP Register database off Jackdaw onto a new PostgreSQL database, without altering the external appearance. (This will involve porting the schema and stored procedures, and writing a test suite.)
After that point we should have more tractable infrastructure, making it easier to provide better user interface and APIs.
The new site is written in Markdown. The Project Light templates
use Mustache, because it is programming-language-agnostic, so it
will work with the existing mod_perl
scripts, and with TypeScript in
the future.
DNS-OARC and RIPE
2018-10-23 - Progress - Tony Finch
Last week I visited Amsterdam for a bunch of conferences. The 13th and 14th was the joint DNS-OARC and CENTR workshop, and 15th - 19th was the RIPE77 meeting.
I have a number of long-term projects which can have much greater success within the University and impact outside the University by collaborating with people from other organizations in person. Last week was a great example of that, with significant progress on CDS (which I did not anticipate!), ANAME, and DNS privacy, which I will unpack below.
DNSSEC root key rollover this Thursday
2018-10-11 - News - Tony Finch
The rollover has occurred, and everything is OK at least as well as we can tell after one hour.
Before: http://dnsviz.net/d/root/W79zYQ/dnssec/ After: http://dnsviz.net/d/root/W790GQ/dnssec/
There's a lot of measurement happening, e.g. graphs of the view from the RIPE Atlas distributed Internet measurement system at: https://nlnetlabs.nl/
I have gently prodded our resolvers with rndc flushname .
so they start
using the 2017 key immediately, rather than waiting for the TTL to expire,
since I am travelling tomorrow. I expect there will be a fair amount of
dicsussion about the rollover at the DNS-OARC meeting this weekend...
DNS-over-TLS snapshot
2018-10-10 - Progress - Tony Finch
Some quick stats on how much the new DNS-over-TLS service is being used:
At the moment (Wednesday mid-afternoon) we have about
29,000 - 31,000 devices on the wireless network
3900 qps total on both recursive servers
about 15 concurrent DoT clients (s.d. 4)
about 7qps DoT (s.d. 5qps)
5s TCP idle timeout
6.3s mean DoT connection time (s.d. 4s - most connections are just over 5s, they occasionally last as long as 30s; mean and s.d. are not a great model for this distribution)
DoT connections very unbalanced, 10x fewer on 131.111.8.42 than on 131.111.12.20
The rule of thumb that number of users is about 10x qps suggests that we have about 70 Android Pie users, i.e. about 0.2% of our userbase.
DNSSEC root key rollover this Thursday
2018-10-08 - News - Tony Finch
This Thursday at 16:00 UTC (17:00 local time), the 2010 DNSSEC root key (tag 19036) will stop being used for signing, leaving only the 2017 root key (tag 20326). The root key TTL is 2 days so the change might not be visible until the weekend.
If you run a DNSSEC validating resolver, you should double check that it trusts the 2017 root key. ICANN have some instructions at the link below; if in doubt you can ask ip-register at uis.cam.ac.uk for advice.
ICANN's DNSSEC trust anchor telemetry data does not indicate any problems for us; however the awkward cases are likely to be older validators that predate RFC 8145.
I am away for the DNS-OARC and RIPE meetings starting on Friday, but I will be keeping an eye on email. This ought to be a non-event but there hasn't been a DNSSEC root key rollover before so there's a chance that lurking horrors will be uncovered.
BIND 9.12.2-P2 and other patch releases
2018-09-20 - News - Tony Finch
Yesterday, isc.org announced patch releases of BIND
I have upgraded our DNS servers to 9.12.2-P2 mainly to address the referral interoperability problem, though we have not received any reports that this was causing noticable difficulties.
There is a security issue related to Kerberos authenticated DNS updates; I'll be interested to hear if anyone in the University is using this feature!
Those interested in DNSSEC may have spotted the inline-signing bug that is
fixed by these patches. We do not use inline-signing but instead use
nsdiff
to apply changes to signed zones, and I believe this is also true
for the signed zones run by Maths and the Computer Lab.
DNS-over-TLS and DNS-over-HTTPS
2018-09-05 - News - Tony Finch
The University's central recursive DNS servers now support encrypted queries. This is part of widespread efforts to improve DNS privacy. You can make DNS queries using:
Traditional unencrypted DNS using UDP or TCP on port 53 ("Do53")
DNS-over-TLS on port 853 - RFC 7858
DNS-over-HTTPS on port 443 - RFC 8484
Amongst other software, Android 9 "Pie" uses DoT when possible and you can configure Firefox to use DoH.
There is more detailed information about Cambridge's DNS-over-TLS and DNS-over-HTTPS setup on a separate page.
Renaming and renumbering the DNS servers
2018-09-04 - News - Tony Finch
There is now a draft / outline plan for renaming and renumbering the central DNS servers. This is mainly driven by the need to abolish the old Computing Service domains and by the new IPv6 prefix, among other things. See the reshuffle notes for more details.
Non-interactive access to the xlist_ops page
2018-08-13 - News - Tony Finch
Automated access to the IP Register
database has so far been limited to the
list_ops
page. In order to allow automated registration of
systems with IPv6 addresses, it is now possible to use long-term
downloaded cookies for the xlist_ops
page as well.
BIND 9.12.2 and serve-stale
2018-08-03 - News - Tony Finch
Earlier this year, we had an abortive
attempt to turn on BIND 9.12's new serve-stale
feature. This
helps to make the DNS more resilient when there are local network
problems or when DNS servers out on the Internet are temproarily
unreachable. After many trials and tribulations we have at last
successfully enabled serve-stale
.
Popular websites tend to have very short DNS TTLs, which means the DNS
stops working quite soon when there are network problems. As a result,
network problems look more like DNS problems, so they get reported to
the wrong people. We hope that serve-stale
will reduce this
kind of misattribution.
New IPv6 prefix and reverse zone
2018-06-21 - News - Tony Finch
Our new IPv6 prefix is 2a05:b400::/32
As part of our planning for more eagerly rolling out IPv6, we concluded that our existing allocation from JISC (2001:630:210::/44) would not be large enough. There are a number of issues:
A typical allocation to a department might be a /56, allowing for 256 subnets within the department - the next smaller allocation of /60 is too small to allow for future growth. We only had space for 2048 x/56 allocations, or many fewer if we needed to make any /52 allocations for large institutions.
There is nowhere near enough room for ISP-style end-user allocations, such as a /64 per college bedroom or a /64 per device on eduroam.
As a result, we have asked RIPE NCC (the European regional IP address registry) to become an LIR (local internet registry) in our own right. This entitles us to get our own provider-independent ISP-scale IPv6 allocations, amongst other things.
We have now been allocated 2a05:b400::/32 and we will start planning to roll out this new address range and deprecate the old one.
We do not currently have any detailed plans for this process; we will make further announcements when we have more news to share. Any institutions that are planning to request IPv6 allocations might want to wait until the new prefix is available, or talk to networks@uis.cam.ac.uk if you have questions.
The first bit of technical setup for the new address space is to
create the reverse DNS zone, 0.0.4.b.5.0.a.2.ip6.arpa
. This is now
present and working on our DNS servers, though it does not yet contain
anything interesting! We have updated the sample stealth secondary
nameserver configuration to include this new
zone. If you are using the catalog zone
configuration your nameserver will already have
the new zone.
Edited to add: Those interested in DNSSEC might like to know that this new reverse DNS zone is signed with ECDSA P256 SHA256, whereas our other zones are signed with RSA SHA1. As part of our background project to improve DNSSEC key management, we are going to migrate our other zones to ECDSA as well, which will reduce the size of our zones and provide some improvement in cryptographic security.
DNSSEC validation and the root key rollover
2018-06-14 - News - Tony Finch
Those running DNSSEC validating resolvers should be aware that ICANN is preparing to replace the root key later this year, after last year's planned rollover was delayed.
Some of you need to take action to ensure your validating resolvers are properly configured.
There is more information at https://www.icann.org/resources/pages/ksk-rollover
ICANN have started publishing IP addresses of resolvers which are providing RFC 8145 trust anchor telemetry information that indicates they do not yet trust the new KSK. The announcement is at https://mm.icann.org/pipermail/ksk-rollover/2018-June/000418.html
IP addresses belonging to our central DNS resolvers appear on this list: 2001:630:212:8::d:2 and 2001:630:212:12::d:3
ICANN's data says that they are getting inconsistent trust anchor telemetry from our servers. Our resolvers trust both the old and new keys, so their TAT signals are OK; however our resolvers are also relaying TAT signals from other validating resolvers on the CUDN that only trust the old key.
I am going to run some packet captures on our resolvers to see if I can track down where the problem trust anchor telemetry signals are coming from, so that I can help you to fix your resolvers before the rollover.
External references to IP addresses
2018-06-13 - News - Tony Finch
After some experience with the the relaxed rules for references to
off-site servers we have changed our
process slightly. Instead of putting the IP addresses in the
ucam.biz
zone, we are going to enter them into the IP Register
database, so that these non-CUDN IP addresses appear directly in the
cam.ac.uk
zone.
There are a few reasons for this:
Both the
ucam.biz
and IP Register setups are a bit fiddly, but the database is more easily scripted;It reduces the need for us to set up separate HTTP redirections on the web traffic managers;
It reduces problems with ACME TLS certificate authorization at off-site web hosting providers;
It is closer to what we have in mind for the future.
The new setup registers off-site IP addresses in an OFF-SITE
mzone,
attached to an off-site
vbox. The addresses are associated with web
site hostnames using aname
objects. This slightly round-about
arrangement allows for IP addresses that are used by multiple web
sites.
Long-form domain aliases
2018-05-31 - News - Tony Finch
Our documentation on domain names now includes a page on
long-form aliases for top-level domains
under cam.ac.uk
.
Web servers on bare domains
2018-05-30 - News - Tony Finch
The DNS does not allow CNAME records to exist at the same name as other records. This restriction causes some friction for bare domains which you want to use for both mail and web. The IP Register database does not make it particularly easy to work around this restriction in the DNS, but we now have some tips for setting web sites on bare domain names
BIND security release
2018-05-21 - News - Tony Finch
On Friday, ISC.org released a security patch version of BIND 9.12.
The serve-stale vulnerability (CVE-2018-5737) is the one that we encountered on our live servers on the 27th March.
There are still some minor problems with serve-stale
which will be
addressed by the 9.12.2 release, so I plan to enable it after the next
release.
A note on prepared transactions
2018-04-24 - Future - Tony Finch
Some further refinements of the API behind shopping-cart style prepared transactions:
On the server side, the prepared transaction is a JSON-RPC request blob which can be updated with HTTP PUT or PATCH. Ideally the server should be able to verify that the result of the PATCH is a valid JSON-RPC blob so that it doesn't later try to perform an invalid request. I am planning to do API validity checks using JSON schema.
This design allows the prepared transaction storage to be just a simple JSON blob store, ignorant of what the blob is for except that it has to match a given schema. (I'm not super keen on nanoservices so I'll just use a table in the ipreg database to store it, but in principle there can be some nice decoupling here.)
It also suggests a more principled API design: An immediate
transaction (typically requested by an API client) might look like the
following (based on JSON-RPC version 1.0 system.multicall
syntax):
{ jsonrpc: "2.0", id: 0, method: "rpc.transaction", params: [ { jsonrpc: "2.0", id: 1, method: ... }, { jsonrpc: "2.0", id: 2, method: ... }, ... ] }
When a prepared transaction is requested (typically by the browser UI) it will look like:
{ jsonrpc: "2.0", id: 0, method: "rpc.transaction", params: { prepared: "#" } }
The "#" is a relative URI referring to the blob stored on the JSON-RPC endpoint (managed by the HTTP methods other than POST) - but it could in principle be any URI. (Tho this needs some thinking about SSRF security!) And I haven't yet decided if I should allow an arbitrary JSON pointer in the fragment identifier :-)
If we bring back rpc.multicall
(JSON-RPC changed the reserved prefix
from system.
to rpc.
) we gain support for prepared
non-transactional batches. The native batch request format becomes a
special case abbreviation of an in-line rpc.multicall
request.
DNS server QA traffic
2018-03-28 - Future - Tony Finch
Yesterday I enabled
serve-stale
on our recursive DNS servers, and after a few hours one of them
crashed messily. The automatic failover setup handled the crash
reasonably well, and I disabled serve-stale
to avoid any more
crashes.
How did this crash slip through our QA processes?
Test server
My test server is the recursive resolver for my workstations, and the primary master for my personal zones. It runs a recent development snapshot of BIND. I use it to try out new features, often months before they are included in a release, and I help to shake out the bugs.
In this case I was relatively late enabling serve-stale
so I was
only running it for five weeks before enabling it in production.
It's hard to tell whether a longer test at this stage would have exposed the bug, because there are relatively few junk queries on my test server.
Pre-heat
Usually when I roll out a new version of BIND, I will pre-heat the cache of an upgraded standby server before bringing it into production. This involves making about a million queries against the server based on a cache dump from a live server. This also serves as a basic smoke test that the upgrade is OK.
I didn't do a pre-heat before enabling serve-stale
because it was
just a config change that can be done without affecting service.
But it isn't clear that a pre-heat would have exposed this bug because the crash required a particular pattern of failing queries, and the cache dump did not contain the exact problem query (though it does contain some closely related ones).
Possible improvements?
An alternative might be to use live traffic as test data, instead of a
static dump. A bit of code could read a dnstap
feed on a live
server, and replay the queries against another server. There are two
useful modes:
test traffic: replay incoming (recursive client-facing) queries; this reproduces the current live full query load on another server for testing, in a way that is likely to have reproduced yesterday's crash.
continuous warming: replay outgoing (iterative Internet-facing) queries; these are queries used to refill the cache, so they are relatively low volume, and suitable for keeping a standby server's cache populated.
There are a few cases where researchers have expressed interest in DNS
query data, of either of the above types. In order to satisfy them we
would need to be able to split a full dnatap
feed so that recipients
only get the data they want.
This live DNS replay idea needs a similar dnstap
splitter.
More upgrades
2018-03-27 - News - Tony Finch
Edited to add:
A few hours after the item below, we disabled the new serve-stale
feature following problems on one of our recursive DNS servers. We are
working with ISC.org to get
serve-stale
working better.
Original item follows:
The DNS servers are now running BIND 9.12.1. This version fixes an interoperability regression that affected resolution of bad domains with a forbidden CNAME at the zone apex.
We have also enabled the new serve-stale
feature, so that
when a remote DNS server is not available, our resolvers will return
old answers instead of a failure. The max-stale-ttl
is set to
one hour, which should be long enough to cover short network problems,
but not too long to make malicious domains hang around long after they
are taken down.
In other news, the DNS rebuild scripts (that run at 53 minutes past each hour) have been amended to handle power outages and server maintenance more gracefully. This should avoid most of the cases where the DNS build has stopped running due to excessive caution.
IPv6 DAD-die issues
2018-03-26 - Progress - Tony Finch
Here's a somewhat obscure network debugging tale...
Transactions and JSON-RPC
2018-03-02 - Future - Tony Finch
The /update
API endpoint
that I outlined turns out to be basically
JSON-RPC 2.0, so it seems to be worth making
the new IP Register API follow that spec exactly.
However, there are a couple of difficulties wrt transactions.
The current
not-an-API list_ops
page
runs each requested action in a separate transaction. It should be
possible to make similar multi-transation batch requests with the new
API, but my previous API outline did not support this.
A JSON-RPC batch request is a JSON array of request objects, i.e. the
same syntax as I previously described for /update
transactions,
except that JSON-RPC batches are not transactional. This is good for
preserving list_ops
functionality but it loses one of the key points
of the new API.
There is a simple way to fix this problem, based on a fairly
well-known idea. XML-RPC doesn't have batch requests like JSON-RPC,
but they were retro-fitted by defining
a system.multicall
method
which takes an array of requests and returns an array of responses.
We can define transactional JSON-RPC requests in the same style, like this:
{ "jsonrpc": "2.0", "id": 0, "method": "transaction", "params": [ { "jsonrpc": "2.0", "id": 1, "method": "foo", "params": { ... } }, { "jsonrpc": "2.0", "id": 2, "method": "bar", "params": { ... } } ] }
If the transaction succeeds, the outer response contains a "result" array of successful response objects, exactly one for each member of the request params array, in any order.
If the transaction fails, the outer response contains an "error" object, which has "code" and "message" members indicating a transaction failure, and an "error" member which is an array of response objects. This will contain at least one failure response; it may contain success responses (for actions which were rolled back); some responses may be missing.
Edited to add: I've described some more refinements to this idea
Upgraded to BIND 9.12.0
2018-02-20 - News - Tony Finch
The DNS servers are now running BIND 9.12.0. This version includes official versions of all the patches we needed for production, so we can now run servers built from unpatched upstream source.
First, a really nice DNSSEC-related performance enhancement is RFC 8198 negative answer synthesis: BIND can use NSEC records to generate negative responses, rather than re-querying authoritative servers. Our current configuration includes a lot of verbiage to suppress junk queries, all of which can be removed because of this new feature.
Second, a nice robustness improvement: when upstream authoritative DNS servers become unreachable, BIND will serve stale records from its cache after their time-to-live has expired. This should improve your ability to reach off-site servers when there are partial connectivity problems, such as DDoS attacks against their DNS servers.
Third, an operational simplifier: by default BIND will limit journal files to twice the zone file size, rather than letting them grow without bound. This is a patch I submitted to ISC.org about three years ago, so it has taken a very long time to get included in a release! This feature means I no longer need to run a patched BIND on our servers.
Fourth, a DNSSEC automation tool, dnssec-cds
. (I mentioned this in a
message I sent to this list back in October.) This is I think my largest
single contribution to BIND, and (in contrast to the previous patch) it
was one of the fastest to get committed! There's still some more work
needed before we can put it into production, but we're a lot closer.
There are numerous other improvements, but those are the ones I am particularly pleased by. Now, what needs doing next ...
Deprocrastinating
2018-02-16 - Progress - Tony Finch
I'm currently getting several important/urgent jobs out of the way so that I can concentrate on the IP Register database project.
User interface sketch
2018-02-12 - Future - Tony Finch
The current IP Register user interface closely follows the database schema: you choose an object type (i.e. a table) and then you can perform whatever search/create/update/delete operations you want. This is annoying when I am looking for an object and I don't know its type, so I often end up grepping the DNS or the textual database dumps instead.
I want the new user interface to be search-oriented. The best existing example within the UIS is Lookup. The home page is mostly a search box, which takes you to a search results page, which in turn has links to per-object pages, which in turn are thoroughly hyperlinked.
ANAME vs aname
2018-02-01 - Future - Tony Finch
The IETF dnsop working group are currently discussing a draft specification for an ANAME RR type. The basic idea is that an ANAME is like a CNAME, except it only works for A and AAAA IP address queries, and it can coexist with other records such as SOA (at a zone apex) or MX.
I'm following the ANAME work with great interest because it will make certain configuration problems much simpler for us. I have made some extensive ANAME review comments.
An ANAME is rather different from what the IP Register database calls
an aname
object. An aname
is a name for a set of existing IP
addresses, which can be an arbitrary subset of the combined addresses
of multiple box
es or vbox
es, whereas an ANAME copies all the
addresses from exactly one target name.
There is more about
the general problem of aliases in the IP Register database
in one of the items I posted in December. I am still unsure how the
new aliasing model might work; perhaps it will become more clear when
I have a better idea about how the existing aname
implementation and
its limitations.
Support for Ed25519 SSHFP records
2018-01-23 - News - Tony Finch
The IP Register database now allows SSHFP records with algorithm 4 (Ed25519) See our previous announcement for details about SSHFP records.
An interesting bug in BIND
2018-01-12 - Progress - Tony Finch
(This item isn't really related to progress towards a bright shiny future, but since I'm blogging here I might as well include other work-related articles.)
This week I have been helping
Mark Andrews and Evan Hunt
to track down a bug in BIND9. The problem manifested as named
occasionally failing to re-sign a DNSSEC zone; the underlying cause
was access to uninitialized memory.
It was difficult to pin down, partly because there is naturally a lot of nondeterminism in uninitialized memory bugs, but there is also a lot of nondeterminism in the DNSSEC signing process, and it is time-dependent so it is hard to re-run a failure case, and normally the DNSSEC signing process is very slow - three weeks to process a zone, by default.
Timeline
Oct 9 - latent bug exposed
Nov 12 - first signing failure
I rebuild and restart my test DNS server quite frequently, and the bug is quite rare, which explains why it took so long to appear.
Nov 18 - Dec 6 - Mark fixes several signing-related bugs
Dec 28 - another signing failure
Jan 2 - I try adding some debugging diagnostics, without success
Jan 9 - more signing failures
Jan 10 - I make the bug easier to reproduce
Mark and Evan identify a likely cause
Jan 11 - I confirm the cause and fix
The debugging process
The incremental re-signing code in named
is tied into BIND's core
rbtdb
data structure (the red-black tree database). This is tricky
code that I don't understand, so I mostly took a black-box approach to
try to reproduce it.
I started off by trying to exercise the signing code harder. I set up a test zone with the following options:
# signatures valid for 1 day (default is 30 days) # re-sign 23 hours before expiry # (whole zone is re-signed every hour) sig-validity-interval 1 23; # restrict the size of a batch of signing to examine # at most 10 names and generate at most 2 signatures sig-signing-nodes 10; sig-signing-signatures 2;
I also populated the zone with about 500 records (not counting DNSSEC records) so that several records would get re-signed each minute.
This helped a bit, but I often had to wait a long time before it went
wrong. I wrote a script to monitor the zone using rndc zonestatus
, so
I could see if the "next resign time" matches the zone's earliest
expiring signature.
There was quite a lot of flailing around trying to exercise the code harder, by making the zone bigger and changing the configuration options, but I was not successful at making the bug appear on demand.
To make it churn faster, I used dnssec-signzone
to construct a version
of the zone in which all the signatures expire in the next few minutes:
rndc freeze test.example dig axfr test.example | grep -v RRSIG | dnssec-signzone -e now+$((86400 - 3600 - 200)) \ -i 3600 -j 200 \ -f signed -o test.example /dev/stdin rm -f test.example test.example.jnl mv signed test.example # re-load the zone rndc thaw test.example # re-start signing rndc sign test.example
I also modified BIND's re-signing co-ordination code; normally each batch will re-sign any records that are due in the next 5 seconds; I reduced that to 1 second to keep batch sizes small, on the assumption that more churn would help - which it did, a little bit.
But the bug still took a random amount of time to appear, sometimes within a few minutes, sometimes it would take ages.
Finding the bug
Mark (who knows the code very well) took a bottom-up approach; he ran
named
under valgrind
which identified an access to uninitialized
memory. (I don't know what led Mark to try valgrind
- whether he does
it routinely or whether he tried it just for this bug.)
Evan had not been able to reproduce the bug, but once the cause was identified it became clear where it came from.
The commit on the 9th October that exposed the bug was a change to BIND's memory management code, to stop it from deliberately filling newly-allocated memory with garbage.
Before this commit, the missing initialization was hidden by the memory fill, and the byte used to fill new allocations (0xbe) happened to have the right value (zero in the bottom bit) so the signer worked correctly.
Evan builds BIND in developer mode, which enables memory filling, which stopped him from being able to reproduce it.
Verifying the fix
I changed BIND to fill memory with 0xff which (if we were right) should provoke signing failures much sooner. And it did!
Then applying the one-character fix to remove the access to uninitialized memory made the signer work properly again.
Lessons learned
BIND has a lot of infrastructure that tries to make C safer to use, for instance:
Run-time assertions to ensure that internal APIs are used correctly;
Canary elements at the start of most objects to detect memory overruns;
buffer
andregion
types to prevent memory overruns;A memory management system that keeps statistics on memory usage, and helps to debug memory leaks and other mistakes.
The bug was caused by failing to use buffer
s well, and hidden by the
memory management system.
The bug occurred when initializing an rdataslab
data structure, which
is an in-memory serialization of a set of DNS records. The records are
copied into the rdataslab
in traditional C style, without using a
buffer
. (This is most blatant when the code
manually serializes a 16 bit number
instead of using isc_buffer_putuint16
.) This code is particularly
ancient which might explain the poor style; I think it needs
refactoring for safety.
It's ironic that the bug was hidden by the memory management code - it's
supposed to help expose these kinds of bug, not hide them! Nowadays, the
right approach would be to link to jemalloc
or some other advanced
allocator, rather than writing a complicated wrapper around standard
malloc
. However that wasn't an option when BIND9 development started.
Conclusion
Memory bugs are painful.
High-level API design
2018-01-09 - Future - Tony Finch
This is just to record my thoughts about the overall shape of the IP Register API; the details are still to be determined, but see my previous notes on the data model and look at the old user interface for an idea of the actions that need to be available.
The first Oracle to PostgreSQL trial
2017-12-24 - Progress - Tony Finch
I have used ora2pg
to do a quick export
of the IP Register database from Oracle to PostgreSQL. This export
included an automatic conversion of the table structure, and the
contents of the tables. It did not include the more interesting parts
of the schema such as the views, triggers, and stored procedures.
Oracle Instant Client
Before installing ora2pg
, I had to install the Oracle client
libraries. These are not available in Debian, but Debian's ora2pg
package is set up to work with the following installation process.
Get the Oracle Instant Client RPMs
from Oracle's web site. This is a free download, but you will need to create an Oracle account.
I got the
basiclite
RPM - it's about half the size of thebasic
RPM and I didn't need full i18n. I also got thesqlplus
RPM so I can talk to Jackdaw directly from my dev VMs.The
libdbd-oracle-perl
package in Debian 9 (Stretch) requires Oracle Instant Client 12.1. I matched the version installed on Jackdaw, which is 12.1.0.2.0.Convert the RPMs to debs (I did this on my workstation)
$ fakeroot alien oracle-instantclient12.1-basiclite-12.1.0.2.0-1.x86_64.rpm $ fakeroot alien oracle-instantclient12.1-sqlplus-12.1.0.2.0-1.x86_64.rpm
Those packages can be installed on the dev VM, with
libaio1
(which is required by Oracle Instant Client but does not appear in the package dependencies), andlibdbd-oracle-perl
andora2pg
.sqlplus
needs a wrapper script that sets environment variables so that it can find its libraries and configuration files. After some debugging I foud that although the documentation claims thatglogin.sql
is loaded from$ORACLE_HOME/sqlplus/admin/
in fact it is loaded from$SQLPATH
.To configure connections to Jackdaw, I copied
tnsnames.ora
andsqlnet.ora
froment
.
Running ora2pg
By default, ora2pg
exports the table definitions of the schema we
are interested in (i.e. ipreg
). For the real conversion I intend to
port the schema manually, but ora2pg
's automatic conversion is handy
for a quick trial, and it will probably be a useful guide to
translating the data type names.
The commands I ran were:
$ ora2pg --debug $ mv output.sql tables.sql $ ora2pg --debug --type copy $ mv output.sql rows.sql $ table-fixup.pl <tables.sql >fixed.sql $ psql -1 -f functions.sql $ psql -1 -f fixed.sql $ psql -1 -f rows.sql
The fixup script and SQL functions were necessary to fill in some gaps
in ora2pg
's conversion, detailed below.
Compatibility problems
Oracle treats the empty string as equivalent to NULL but PostgreSQL does not.
This affects constraints on the
lan
andmzone
tables.The Oracle
substr
function supports negative offsets which index from the right end of the string, but PostgreSQL does not.This affects subdomain constraints on the
unique_name
,maildom
, andservice
tables. These constraints should be replaced by function calls rather than copies.The
ipreg
schema usesraw
columns for IP addresses and prefixes;ora2pg
converted these tobytea
.The
v6_prefix
table has a constraint that relies on implicit conversion fromraw
to a hex string. PostgreSQL is stricter about types, so this expression needs to work onbytea
directly.There are a number of cases where
ora2pg
represented named unique constraints as unnamed constraints with named indexes. This unnecessarily exposes an implementation detail.There were a number of Oracle functions which PostgreSQL doesn't support (even with
orafce
), so I implemented them in thefunctions.sql
file.- regexp_instr()
- regexp_like()
- vzise()
Other gotchas
The
mzone_co
,areader
, andregistrar
tables reference thepers
table in thejdawadm
schema. These foreign key constraints need to be removed.There is a weird bug in
ora2pg
which mangles the regex[[:cntrl:]]
into[[cntrl:]]
This is used several times in the
ipreg
schema to ensure that various fields are plain text. The regex is correct in the schema source and in theALL_CONSTRAINTS
table on Jackdaw, which is why I think it is anora2pg
bug.There's another weird bug where a
regexp_like(string,regex,flags)
expression is converted tostring ~ regex, flags
which is nonsense.There are other calls to
regexp_like()
in the schema which do not get mangled in this way, but they have non-trivial string expressions whereas the broken one just has a column name.
Performance
The export of the data from Oracle and the import to PostgreSQL took an uncomfortably long time. The SQL dump file is only 2GB so it should be possible to speed up the import considerably.
IP Register schema wishlist
2017-12-19 - Future - Tony Finch
Here are some criticisms of the IP Register database schema and some thoughts on how we might change it.
There is a lot of infrastructure work to do before I am in a position to make changes - principally, porting from Oracle to PostgreSQL, and developing a test suite so I can make changes with confidence.
Still, it's worth writing down my thoughts so far, so colleagues can see what I have in mind, and so we have some concrete ideas to discuss.
I expect to add to this list as thoughts arise.
How to get a preseed file into a Debian install ISO
2017-12-12 - Progress - Tony Finch
Goal: install a Debian VM from scratch, without interaction, and with a minimum of external dependencies (no PXE etc.) by putting a preseed file on the install media.
Sadly the documentation for how to do this is utterly appalling, so here's a rant.
Starting point
The Debian installer documentation, appendix B.
https://www.debian.org/releases/stable/amd64/apbs02.html.en
Some relevant quotes:
Putting it in the correct location is fairly straightforward for network preseeding or if you want to read the file off a floppy or usb-stick. If you want to include the file on a CD or DVD, you will have to remaster the ISO image. How to get the preconfiguration file included in the initrd is outside the scope of this document; please consult the developers' documentation for debian-installer.
Note there is no link to the developers' documentation.
If you are using initrd preseeding, you only have to make sure a file named preseed.cfg is included in the root directory of the initrd. The installer will automatically check if this file is present and load it.
For the other preseeding methods you need to tell the installer what file to use when you boot it. This is normally done by passing the kernel a boot parameter, either manually at boot time or by editing the bootloader configuration file (e.g.
syslinux.cfg
) and adding the parameter to the end of the append line(s) for the kernel.
Note that we'll need to change the installer boot process in any case, in order to skip the interactive boot menu. But these quotes suggest that we'll have to remaster the ISO, to edit the boot parameters and maybe alter the initrd.
So we need to guess where else to find out how to do this.
Wiki spelunking
https://wiki.debian.org/DebianInstaller
This suggests we should follow https://wiki.debian.org/DebianCustomCD
or use simple-cdd
.
simple-cdd
I tried simple-cdd
but it failed messily.
It needs parameters to select the correct version (it defaults to Jessie) and a local mirror (MUCH faster).
$ time simple-cdd --dist stretch \ --debian-mirror http://ftp.uk.debian.org/debian [...] ERROR: missing required packages from profile default: less ERROR: missing required packages from profile default: simple-cdd-profiles WARNING: missing optional packages from profile default: grub-pc grub-efi popularity-contest console-tools console-setup usbutils acpi acpid eject lvm2 mdadm cryptsetup reiserfsprogs jfsutils xfsprogs debootstrap busybox syslinux-common syslinux isolinux real 1m1.528s user 0m34.748s sys 0m1.900s
Sigh, looks like we'll have to do it the hard way.
Modifying the ISO image
Eventually I realise the hard version of making a CD image without
simple-cdd
is mostly about custom package selections, which is not
something I need.
This article is a bit more helpful...
https://wiki.debian.org/DebianInstaller/Preseed
It contains a link to...
https://wiki.debian.org/DebianInstaller/Preseed/EditIso
That requires root privilege and is a fair amount of faff.
That page in turn links to...
https://wiki.debian.org/DebianInstaller/Modify
And then...
https://wiki.debian.org/DebianInstaller/Modify/CD
This has a much easier way of unpacking the ISO using bsdtar
, and
instructions on rebuilding a hybrid USB/CD ISO using xorriso
. Nice.
Most of the rest of the page is about changing package selections which we already determined we don't need.
Boot configuration
OK, so we have used bsdtar
to unpack the ISO, and we can see various
boot-related files. We need to find the right ones to eliminate the
boot menu and add the preseed arguments.
There is no syslinux.cfg
in the ISO so the D-I documentation's
example is distressingly unhelpful.
I first tried editing boot/grub/grub.cfg
but that had no effect.
There are two boot mechanisms on the ISO, one for USB and one for
CD/DVD. The latter is in isolinux/isolinux.cfg
.
Both must be edited (in similar but not identical ways) to get the effect I want regardless of the way the VM boots off the ISO.
Unpacking and rebuilding the ISO takes less than 3 seconds on my workstation, which is acceptably fast.
Authentication and access control
2017-12-06 - Future - Tony Finch
The IP Register database is an application hosted on Jackdaw, which is a platform based on Oracle and Apache mod_perl.
IP Register access control
Jackdaw and Raven handle authentication, so the IP Register database
only needs to concern itself with access control. It does this using
views defined with check option
, as is briefly described in
the database overview and visible in the
SQL view DDL.
There are three levels of access to the database:
the
registrar
table contains privileged users (i.e. the UIS network systems team) who have read/write access to everything via the views with theall_
prefix.the
areader
table contains semi-privileged users (i.e. certain other UIS staff) who have read-only access to everything via the views with thera_
prefix.the
mzone_co
table contains normal users (i.e. computer officers in other institutions) who have read-write access to their mzone(s) via the views with themy_
prefix.
Apart from a few special cases, all the underlying tables in the database are available in all three sets of views.
IP Register user identification
The first part of the view definitions
is where the IP Register database schema is tied to the authenticated
user. There are two kinds of connection: either a web connection
authenticated via Raven, or a direct sqlplus
connection
authenticated with an Oracle password.
SQL users are identified by Oracle's user
function; Raven users are
obtained from the sys_context()
function, which we will now examine
more closely.
Porting to PostgreSQL
We are fortunate that support for create view with check option
was
added to PostgreSQL by our colleague Dean Rasheed.
The sys_context()
function is a bit more interesting.
The Jackdaw API
Jackdaw's mod_perl
-based API is called WebDBI, documented at
https://jackdaw.cam.ac.uk/webdbi/
There's some discussion of authentication and database connections at https://jackdaw.cam.ac.uk/webdbi/webdbi.html#authentication and https://jackdaw.cam.ac.uk/webdbi/webdbi.html#sessions but it is incomplete or out of date; in particular it doesn't mention Raven (and I think basic auth support has been removed).
The interesting part is the description of sessions. Each web server process makes one persistent connection to Oracle which is re-used for many HTTP requests. How is one database connection securely shared between different authenticated users, without giving the web server enormously privileged access to the database?
Jackdaw authentication - perl
Instead of mod_ucam_webauth
, WebDBI has its own implementation of
the Raven protocol - see jackdaw:/usr/local/src/httpd/Database.pm
.
This mod_perl
code does not do all of the work; instead it calls
stored procedures to complete the authentication. On initial login it
calls raven_auth.create_raven_session()
and for a returning user
with a cookie it calls raven_auth.use_raven_session()
.
Jackdaw authentication - SQL
These raven_auth
stored procedures set the authenticated user that
is retrieved by the sys_context()
call in the IP Register views -
see jackdaw:/usr/local/src/httpd/raven_auth/
.
Most of the logic is written in PL/SQL, but there is also an external
procedure written in C which does the core cryptography - see
jackdaw:/usr/local/oracle/extproc/RavenExtproc.c
.
Porting to PostgreSQL - reprise
On the whole I like Jackdaw's approach to preventing the web server from having too much privilege, so I would like to keep it, though in a simplified form.
As far as I know, PostgreSQL doesn't have anything quite like
sys_context()
with its security properties, though you can get
similar functionality using PL/Perl.
However, in the future I want more heavy-weight sessions that have more server-side context, in particular the "shopping cart" pending transaction.
So I think a better way might be to have a privileged session table,
keyed by the user's cookie and containing their username and jsonb
session data, etc. This table is accessed via security definer
functions, with something like Jackdaw's create_raven_session()
,
plus functions for getting the logged-in user (to replace
sys_context()
) and for manipulating the jsonb
session data.
We can provide ambient access to the cookie using the set session
command at the start of each web request, so the auth functions can
retrieve it using the current_setting()
function.
Relaxed rules for external references from the cam.ac.uk domain
2017-10-13 - News - Tony Finch
We have relaxed the rules for external references from the cam.ac.uk
domain
so that CNAMEs are no longer required; external references can refer
to IP addresses when a hostname isn't available.
One of the reasons for the old policy was that the IP Register database only knows about IP addresses on the CUDN. However, an old caveat says, "CUDN policy is not defined by this database, rather the reverse." The old policy proved to be inconvenient both for the Hostmaster team and for our colleagues around the University who requested external references. We didn't see any benefit to compensate for this inconvenience, so we have relaxed the policy.
At the moment we aren't easily able to change the structure of the IP
Register database. In order to work around the technical limitations,
when we need to make an external reference to an IP address, the
Hosmaster team will create the address records in the domain
ucam.biz
and set up a CNAME in the database from cam.ac.uk
to
ucam.biz
. This is slightly more fiddly for the Hostmaster team but
we expect that it will make the overall process easier.
Ongoing DNSSEC work
2017-10-05 - Progress - Tony Finch
We reached a nice milestone today which I'm pretty chuffed about, so I wanted to share the good news. This is mostly of practical interest to the Computer Lab and Mathematics, since they have delegated DNSSEC signed zones, but I hope it is of interest to others as well.
I have a long-term background project to improve the way we manage our DNSSEC keys. We need to improve secure storage and backups of private keys, and updating public key digests in parent zones. As things currently stand it requires tricky and tedious manual work to replace keys, but it ought to be zero-touch automation.
We now have most of the pieces we need to support automatic key management.
regpg
For secure key storage and backup, we have a wrapper around GPG called
regpg
which makes it easier to repeatably encrypt files to a managed set
of "recipients" (in GPG terminology). In this case the recipients are the
sysadmins and they are able to decrypt the DNS keys (and other secrets)
for deployment on new servers. With regpg
the key management system will
be able to encrypt newly generated keys but not able to decrypt any other
secrets.
At the moment regpg
is in use and sort-of available (at the link below)
but this is a temporary home until I have released it properly.
Edited to link to the regpg
home page
dnssec-cds
There are a couple of aspects to DNSKEY management: scheduling the rollovers, and keeping delegations in sync.
BIND 9.11 has a tool called dnssec-keymgr
which makes rollovers a lot
easier to manage. It needs a little bit of work to give it proper support
for delegation updates, but it's definitely the way of the future. (I
don't wholeheartedly recommend it in its current state.)
For synchronizing delegations, RFC 7344 describes special CDS and CDNSKEY records which a child zone can publish to instruct its parent to update the delegation. There's some support for the child side of this protocol in BIND 9.11, but it will be much more complete in BIND 9.12.
I've written dnssec-cds
, an implementation of the parent side, which was
committed to BIND this morning. (Yay!) My plan is to use this tool for
managing our delegations to the CL and Maths. BIND isn't an easy codebase
to work with; the reason for implementing dnssec-cds
this way is (I
hope) to encourage more organizations to deploy RFC 7344 support than I
could achieve with a standalone tool.
https://gitlab.isc.org/isc-projects/bind9/commit/ba37674d038cd34d0204bba105c98059f141e31e
Until our parent zones become enlightened to the ways of RFC 7344 (e.g. RIPE, JANET, etc.) I have a half-baked framework that wraps various registry/registrar APIs so that we can manage delegations for all our domains in a consistent manner. It needs some work to bring it up to scratch, probably including a rewrite in Python to make it more appealing.
Conclusion
All these pieces need to be glued together, and I'm not sure how long that will take. Some of this glue work needs to be done anyway for non-DNSSEC reasons, so I'm feeling moderately optimistic.
DNSSEC lookaside validation decommissioned
2017-10-02 - News - Tony Finch
In the bumper July news item there is a note about DNSSEC lookaside validation (DLV) being deprecated.
During the DNS OARC27
meeting
at the end of last week, DLV was decommissioned by emptying the
dlv.isc.org
zone. The item on the agenda was titled "Deprecating
RFC5074" - there are no slides because the configuration change was
made live in front of the meeting.
If you have not done so already, you should remove any
dnssec-lookaside
(BIND) or dlv-anchor
(Unbound) from your server
configuration.
The effect is that the reverse DNS for our IPv6 range 2001:630:210::/44 and our JANET-specific IPv4 ranges 193.60.80.0/20 and 193.63.252.0/32 can no longer be validated.
Other Cambridge zones which cannot be validated are our RFC 1918
reverse DNS address space (because of the difficulty of distributing
trust anchors); private.cam.ac.uk
; and most of our Managed
Zone Service zones. This may change because we would like to improve
our DNSSEC coverage.
DNSSEC root key rollover postponed
2017-09-29 - News - Tony Finch
In the bumper July news item there is a note about the DNSSEC root key rollover, which has been in careful preparation this year.
ICANN announced last night that the DNSSEC root key rollover has been postponed, and will no longer take place on the 11th October. The delay is because telemetry data reveals that too many validators do not trust the new root key.
Split views for private.cam.ac.uk
2017-09-27 - News - Tony Finch
Since private.cam.ac.uk
was set up in 2002, our DNS servers have
returned a REFUSED error to queries for private zones from outside the
CUDN. Hiding private zones from the public Internet is necessary to
avoid a number of security problems.
In March the CA/Browser Forum decided that after the 8th September 2017, certificate authorities must check CAA DNS records before issuing certificates. CAA records specify restrictions on which certificate authorities are permitted to issue certificates for a particular domain.
However, because names under private.cam.ac.uk
cannot be resolved on
the public Internet outside the CUDN, certificate authorities became
unable to successfuly complete CAA checks for private.cam.ac.uk
. The
CAA specification RFC 6844
implies that a CA should refuse to issue certificates in this
situation.
In order to fix this we have introduced a split view for
private.cam.ac.uk
.
There are now two different versions of the private.cam.ac.uk
zone: a fully-populated internal version, same as before; and a
completely empty external version.
With the split view, our authoritative servers will give different
answers to different clients: devices on the CUDN will get full
answers from the internal version of private.cam.ac.uk
, and
devices on the public Internet will get negative empty answers
(instead of an error) from the external version.
There is no change to the "stealth secondary" arrangements for
replicating the private.cam.ac.uk
zone to other DNS servers
on the CUDN.
The authoritative server list for private.cam.ac.uk
has been
pruned to include just the UIS authdns
servers which have the
split view configuration. Our thanks to the Computer Lab and the
Engineering Department for providing authoritative service until this
change.
A Cambridge Catalog Zone
2017-09-06 - News - Tony Finch
Catalog Zones are a new feature in BIND 9.11 which allow a econdary server to automatically configure itself using a specially-formatted zone. The isc.org knowledge base has an introduction to catalog zones and Jan-Piet Mens has some notes on his catalog zone tests.
We can use this new feature to make "stealth secondary" configurations
much shorter and lower-maintenance. Accordingly, there is now a
catz.arpa.cam.ac.uk
catalog zone corresponding to our recommended
stealth secondary configuration, and our sample BIND
configuration has been updated with notes on
how to use it.
Background
This started off with some testing of the in-progress BIND 9.12 implementation of RFC 8198, which allows a validating DNSSEC resolver to use NSEC records to synthesize negative responses. (This spec is known as the Cheese Shop after an early draft which refers to a Monty Python sketch, https://tools.ietf.org/html/draft-wkumari-dnsop-cheese-shop / https://tools.ietf.org/html/rfc8198)
RFC 8198 is very effective at suppressing unnecessary queries especially to the root DNS servers and the upper levels of the reverse DNS. A large chunk of my DNS server configuration previously tried to help with that by adding a lot of locally-served empty zones (as specified by RFC 6761 etc.) With the cheese shop all that becomes redundant.
The other big chunk of my configuration is the stealth slave list. I have previously not investigated catalog-zones in detail, since they aren't quite expressive enough for use by our central DNS servers, and in any case their configuration is already automated. But it's just right for the stealth slave configuration on my test server (and ppsw, etc.)
Setting up a Cambridge catalog zone was not too difficult. Altogether it allowed me to delete over 100 zone configurations from my test server.
Deleting "localhost" entries from the cam.ac.uk DNS zone
2017-09-01 - News - Tony Finch
Between the 4th and 8th September we will delete all the localhost
entries from the cam.ac.uk
DNS zone. This change should have no
effect, except to avoid certain obscure web security risks.
RFC 1537, "Common DNS Data File
Configuration Errors", says "all domains that contain hosts should
have a localhost
A record in them." and the cam.ac.uk
zone has
followed this advice since the early 1990s (albeit not entirely
consistently).
It has belatedly come to our attention that this advice is no longer
considered safe, because localhost
can be used to subvert web
browser security policies
in some obscure situations.
Deleting our localhost DNS records should have no effect other than
fixing this security bug and cleaning up the inconsistency. End-user
systems handle queries for localhost
using their hosts
file,
without making DNS queries, and without using their domain search list
to construct queries for names like localhost.cam.ac.uk
. We verified
this by analysing query traffic on one of the central DNS resolvers,
and the number of unwanted queries was negligible, less than one every
15 minutes, out of about 1000 queries per second.
CST delegated, plus DNSSEC-related news
2017-07-18 - News - Tony Finch
From October, the Computer Laboratory will be known as the Department of Computer Science and Technology.
Our colleagues in the CL have set up the zone cst.cam.ac.uk
to go
with the new name, and it has been added to our sample nameserver
configuration file.
The first root DNSSEC key rollover is happening
The new key (tag 20326) was published on 11th July, and validating resolvers that follow RFC 5011 rollover timing will automatically start trusting it on the 10th August. There's a lot more information about the root DNSSEC key rollover on the ISC.org blog. I have added some notes on how to find out about your server's rollover state on our DNSSEC validation page.
DNSSEC lookaside validation is deprecated
The DLV turndown was announced in
2015 and the dlv.isc.org
zone is
due to be emptied in 2017. You should delete any dnssec-lookaside
option you have in your configuration to avoid complaints in named
's
logs.
Annoyingly, we were relying on DLV as a stop-gap while waiting for JISC to sign their reverse DNS zones. Some of our IPv4 address ranges and our main IPv6 allocation are assigned to us from JISC. Without DLV these zones can no longer be validated.
BIND 9.11
2017-07-11 - News - Tony Finch
The central DNS servers have been upgraded from BIND 9.10 to BIND 9.11, which has a number of new features a few of which are particularly relevant to us.
On the authoritative servers, the minimal-any
anti-DDOS
feature was developed by us and contributed to isc.org.
Happily we no longer have to maintain this as a patch.
On the recursive servers, there are a couple of notable features.
Firstly, BIND 9.11 uses EDNS cookies to identify legitimate clients so they can bypass DDoS rate limiting. Unfortunately EDNS options can encounter bugs in old badly-maintained third-party DNS servers. We are keeping an eye out for problems and if necessary we can add buggy servers to a badlist of those who can't have cookies.
Secondly, we now have support for "negative trust anchors" which provide a workaround for third party DNSSEC failures. Fortunately we have not so far had significant problems due to the lack of this feature.
BIND CVE-2017-3142 and CVE-2017-3143
2017-06-30 - News - Tony Finch
In case you have not already seen it, last night ISC.org announced a serious vulnerability in BIND: if you have a server which allows dynamic DNS UPDATE then a remote attacker may me able to alter your zones without proper authentication. For more details see:
Note that update-policy local;
uses a well-known TSIG key name, and does
not include any IP address ACL restrictions, so it is extremely vulnerable
to attack. To mitigate this you can replace update-policy local;
with
allow-update { !{ !localhost; any; }; key local-ddns; };
This denies updates that come from everywhere except localhost, and then
allows updates with the built-in local-ddns key. For a longer explanation, see
https://kb.isc.org/article/AA-00723/0/Using-Access-Control-Lists-ACLs-with-both-addresses-and-keys.html
You can still use nsupdate -l
with this configuration.
Our master DNS server has very strict packet filters which should be effective at mitigating this vulnerability until I can update the servers.
April BIND security release
2017-04-13 - News - Tony Finch
Yesterday evening there was a BIND security release fixing three vulnerabilities.
The most serious one is CVE-2017-3137 which can crash recursive servers. (It is related to the previous DNAME/CNAME RRset ordering bugs which led to security releases in January and November.)
The other vulnerabilities are in DNS64 support (which I don't think any of us use) and in the rndc control channel (which is mainly a worry if you have opened up read-only access in BIND 9.11).
More details on the bind-announce list, https://lists.isc.org/pipermail/bind-announce/2017-April/thread.html
I have patched the central DNS servers and the ppsw-specific resolvers.
An update on Cloudflare
2017-03-20 - News - Tony Finch
The UIS no longer plans to deploy Cloudflare on a large scale; we will
use Cloudflare only for www.cam.ac.uk
.
As such the automated Cloudflare provisioning system described previously has been decommissioned.
Security upgrade
2017-03-06 - News - Tony Finch
A number of security vulnerabilities in the IP Register web user interface have been fixed.
SPF records
2017-01-30 - News - Tony Finch
The Sender Policy Framework is a way to publish in the DNS which mail servers may send email "from" a particular mail domain. It uses specially formatted TXT records alongside the mail domain's MX records.
Over the last several months, we have added SPF records for mail
domains under cam.ac.uk
which have mail hosted offsite. The
most common offsite host is Microsoft Office 365 Exchange Online, but
we have a few others using survey or mailshot services.
These SPF records are managed by the IP Register / Hostmaster team, in co-operation with the Mail Support / Postmaster team. Please email us if you would like any changes.
Name servers need patching: four BIND CVEs
2017-01-12 - News - Tony Finch
ISC.org have just announced several denial-of-service vulnerabilities in BIND's handling of DNS responses. Recursive DNS servers are particularly vulnerable.
I am in the process of patching our central DNS servers; you should patch yours too.
These bugs appear to be a similar class of error to the previous BIND CVE a couple of months ago.
Streaming replication from PostgreSQL to the DNS
2016-12-23 - Future - Tony Finch
This entry is backdated - I'm writing this one year after I made this experimental prototype.
Our current DNS update mechanism runs as an hourly batch job. It would be nice to make DNS changes happen as soon as possible.
user interface matters
Instant DNS updates have tricky implications for the user interface.
At the moment it's possible to make changes to the database in between batch runs, knowing that broken intermediate states don't matter, and with plenty of time to check the changes and make sure the result will be OK.
If the DNS is updated immediately, we need a way for users to be able to prepare a set of inter-related changes, and submit them to the database as a single transaction.
(Aside: I vaguely imagine something like a shopping-cart UI that's available for collecting more complicated changes, though it should be possible to submit simple updates without a ceremonial transaction.)
This kind of UI change is necessary even is we simply run the current batch process more frequently. So we can't reasonalbly deploy this without a lot of front-end work.
back-end experiments
Ideally I would like to keep the process of exporting the database to the DNS and DHCP servers as a purely back-end matter; the front-end user interface should only be a database client.
So, assuming we have a better user interface, we would like to be able to get instant DNS updates by improvements to the back end without any help from the front end.
PostgreSQL has a very tempting replication feature called "logical decoding", which takes a replication stream and turns it into a series of database transactions. You can write a logical decoding plugin which emits these transactions in whatever format you want.
With logical decoding, we can (with a bit of programming) treat the
DNS as a PostgreSQL replication target, with a script that looks
something like pg_recvlogical | nsupdate
.
I wrote a prototype along these lines, which is published at https://git.uis.cam.ac.uk/x/uis/ipreg/pg-decode-dns-update.git
status of this prototype
The plugin itself works in a fairly satisfactory manner.
However it needs a wrapper script to massage transactions before they
are fed into nsupdate
, mainly to split up very large transactions
that cannot fit in a single UPDATE request.
The remaining difficult work is related to starting, stopping, and pausing replication without losing transactions. In particular, during initial deployment we need to be able to pause replication and verify that the replicated updates are faithfully reproducing what the batch update would have done. We can use the same pause/batch/resume mechanism to update the parts of the DNS that are not maintained in the database.
At the moment we are not doing any more work in this area until the other prerequisites are in place.
Cloudflare
2016-11-29 - News - Tony Finch
Cloudflare is a web content delivery network with an emphasis on denial-of-service protection.
The UIS are aiming to deploy Cloudflare in front of the University's most prominent / sensitive web sites; this service might be extended more widely to other web sites, though it is not currently clear if this will be feasible.
There is a separate document with more details of how the IP Register database and Cambridge DNS setup support Cloudflare.
Recursive servers need patching: BIND CVE 2016-8864
2016-11-01 - News - Tony Finch
ISC.org have just announced a denial-of-service vulnerability in BIND's handling of DNAME records in DNS responses. Recursive DNS servers are particularly vulnerable.
I am in the process of patching our central DNS servers; you should patch yours too.
(This bug was encountered by Marco Davids of SIDN Labs, and I identified it as a security vulnerability and reported it to ISC.org. You can find us in the acknowledgments section of the security advisory.)
Urgent patching required: BIND CVE 2016-2776
2016-10-05 - News - Tony Finch
Yesterday evening, ISC.org announced a denial-of-service vulnerability in BIND's buffer handling. The crash can be triggered even if the apparent source address is excluded by BIND's ACLs (allow-query).
All servers are vulnerable if they can receive request packets from any source.
If you have not yet patched, you should be aware that this bug is now being actively exploited.
Urgent patching required: BIND CVE 2016-2776
2016-09-28 - News - Tony Finch
Yesterday evening, ISC.org announced a denial-of-service vulnerability in BIND's buffer handling. The crash can be triggered even if the apparent source address is excluded by BIND's ACLs (allow-query).
All servers are vulnerable if they can receive request packets from any source.
Most machines on the CUDN are protected to a limited extent from outside attack by the port 53 packet filter. DNS servers that have an exemption are much more at risk.
http://www.ucs.cam.ac.uk/network/infoinstitutions/techref/portblock
I am in the process of patching our central DNS servers; you should patch yours too.
(This is another bug found by ISC.org's fuzz testing campaign; they have slowed down a lot since the initial rush that started about a year ago; the last one was in March.)
recovering from the DNS update service outage
2016-04-06 - News - Tony Finch
Sorry about the extended lack of DNS updates today.
Unfortunately the VM host system lost the RAID set that held the filesystem for our DNS master server (amongst others). We determined that it would be faster to rebuild some servers from scratch rather than waiting for more intensive RAID recovery efforts.
The DNS master server is set up so it can be rebuit from scratch without too much difficulty - all the data on its filesystem comes from our configuration management systems, and from the IP register and MZS databases.
The main effect of this is that the zone transfers following the rebuild will be full transfers from scratch - incremental transfers are not possible. There is likely to be some additional load which slows down zone transfers while everything catches up.
BIND CVE-2016-1286 etc.
2016-03-10 - News - Tony Finch
Last night the ISC announced another security release of BIND to fix three vulnerabilities. For details see https://lists.isc.org/pipermail/bind-announce/2016-March/thread.html
Probably the most risky is CVE-2016-1286 which is a remote denial-of-service vulnerability in all versions of BIND without a workaround. CVE-2016-1285 can be mitigated, and probably is already mitigated on servers with a suitably paranoid configuration. CVE-2016-2088 is unlikely to be a problem.
I have updated the central DNS servers to BIND 9.10.3-P4.
I have also made a change to the DNS servers' name compression behaviour.
Traditionally, BIND used to compress domain names in responses so they
match the case of the query name. Since BIND 9.10 it has tried to preserve
the case of responses from servers, which can lead to case mismatches
between queries and answers. This exposed a case-sensitivity bug in
Nagios, so after the upgrade it falsely claimed that our resolvers were
not working properly! I have added a no-case-compress
clause to the
configuration so our resolvers now behave in the traditional manner.
recursive DNS server packet filters
2016-03-02 - News - Tony Finch
Yesterday I changed the iptables packet filters on the central recursive DNS servers, 131.111.8.42 and 131.111.12.20, to harden them against denial of service attacks from outside the CUDN.
Previously we were rejecting queries from outside the CUDN using DNS-level REFUSED responses; now, TCP connections from outside the CUDN are rejected at the network layer using ICMP connection refused.
This change should not have any visible effect; I am letting you know because others who run DNS servers on the CUDN might want to make a similar change, and because there is some interesting background.
For most purposes, incoming DNS queries are blocked by the JANET border packet filters. http://www.ucs.cam.ac.uk/network/infoinstitutions/techref/portblock You only really need an exemption to this block for authoritative DNS servers. If you are running recursive-only DNS servers that are exempted from the port 53 block, you should consider changing your packet filters.
The particular reason for this change is that BIND's TCP connection listener is trivially easy to flood. The inspiration for this change is a cleverly evil exploit announced by Cloudflare earlier this week which relies on TCP connection flooding. Although their particular attack doesn't work with BIND, it would still be unpleasant if anyone tried it on us.
I have published a blog article with more background and context at http://fanf.livejournal.com/141807.html
DNS DoS mitigation by patching BIND to support draft-ietf-dnsop-refuse-any
2016-02-05 - Progress - Tony Finch
Last weekend one of our authoritative name servers
(authdns1.csx.cam.ac.uk
) suffered a series of DoS attacks which made
it rather unhappy. Over the last week I have developed a patch for
BIND to make it handle these attacks better.
The attack traffic
On authdns1
we provide off-site secondary name service to a number of
other universities and academic institutions; the attack targeted
imperial.ac.uk
.
For years we have had a number of defence mechanisms on our DNS servers. The main one is response rate limiting, which is designed to reduce the damage done by DNS reflection / amplification attacks.
However, our recent attacks were different. Like most reflection / amplification attacks, we were getting a lot of QTYPE=ANY queries, but unlike reflection / amplification attacks these were not spoofed, but rather were coming to us from a lot of recursive DNS servers. (A large part of the volume came from Google Public DNS; I suspect that is just because of their size and popularity.)
My guess is that it was a reflection / amplification attack, but we were not being used as the amplifier; instead, a lot of open resolvers were being used to amplify, and they in turn were making queries upstream to us. (Consumer routers are often open resolvers, but usually forward to their ISP's resolvers or to public resolvers such as Google's, and those query us in turn.)
What made it worse
Because from our point of view the queries were coming from real resolvers, RRL was completely ineffective. But some other configuration settings made the attacks cause more damage than they might otherwise have done.
I have configured our authoritative servers to avoid sending large UDP packets which get fragmented at the IP layer. IP fragments often get dropped and this can cause problems with DNS resolution. So I have set
max-udp-size 1420; minimal-responses yes;
The first setting limits the size of outgoing UDP responses to an MTU which is very likely to work. (The ethernet MTU minus some slop for tunnels.) The second setting reduces the amount of information that the server tries to put in the packet, so that it is less likely to be truncated because of the small UDP size limit, so that clients do not have to retry over TCP.
This works OK for normal queries; for instance a cam.ac.uk IN MX
query
gets a svelte 216 byte response from our authoritative servers but a
chubby 2047 byte response from our recursive servers which do not have
these settings.
But ANY queries blow straight past the UDP size limit: the attack
queries for imperial.ac.uk IN ANY
got obese 3930 byte responses.
The effect was that the recursive clients retried their queries over TCP, and consumed the server's entire TCP connection quota. (Sadly BIND's TCP handling is not up to the standard of good web servers, so it's quite easy to nadger it in this way.)
draft-ietf-dnsop-refuse-any
We might have coped a lot better if we could have served all the attack traffic over UDP. Fortunately there was some pertinent discussion in the IETF DNSOP working group in March last year which resulted in draft-ietf-dnsop-refuse-any, "providing minimal-sized responses to DNS queries with QTYPE=ANY".
This document was instigated by Cloudflare, who have a DNS server architecture which makes it unusually difficult to produce traditional comprehensive responses to ANY queries. Their approach is instead to send just one synthetic record in response, like
cloudflare.net. HINFO ( "Please stop asking for ANY" "See draft-jabley-dnsop-refuse-any" )
In the discussion, Evan Hunt (one of the BIND developers) suggested an alternative approach suitable for traditional name servers. They can reply to an ANY query by picking one arbitrary RRset to put in the answer, instead of all of the RRsets they have to hand.
The draft says you can use either of these approaches. They both allow an authoritative server to make the recursive server go away happy that it got an answer, and without breaking odd applications like qmail that foolishly rely on ANY queries.
I did a few small experiments at the time to demonstrate that it really would work OK in the real world (unlike some of the earlier proposals) and they are both pretty neat solutions (unlike some of the earlier proposals).
Attack mitigation
So draft-ietf-dnsop-refuse-any is an excellent way to reduce the damage caused by the attacks, since it allows us to return small UDP responses which reduce the downstream amplification and avoid pushing the intermediate recursive servers on to TCP. But BIND did not have this feature.
I did a very quick hack on Tuesday to strip down ANY responses, and I deployed it to our authoritative DNS servers on Wednesday morning for swift mitigation. But it was immediately clear that I had put my patch in completely the wrong part of BIND, so it would need substantial re-working before it could be more widely useful.
I managed to get back to the Patch on Thursday. The right place to put
the logic was in the fearsome query_find()
which is
the top-level query handling function and nearly 2400 lines long! I
finished the first draft of the revised patch that afternoon (using
none of the code I wrote on Tuesday), and I spent Friday afternoon
debugging and improving it.
The result this patch which adds a minimal-qtype-any option. I'm currently running it on my toy nameserver, and I plan to deploy it to our production servers next week to replace the rough hack.
I have submitted the patch to ISC.org; hopefully something like it will be included in a future version of BIND. And it prompted a couple of questions about draft-ietf-dnsop-refuse-any that I posted to the DNSOP working group mailing list.
January BIND security release
2016-01-20 - News - Tony Finch
Last night the ISC published yet another security release of BIND.
For details, please see the announcement messages: https://lists.isc.org/pipermail/bind-announce/2016-January/thread.html
The central DNS servers have been upgraded to BIND 9.10.3-P3.
BIND security release
2015-12-17 - News - Tony Finch
On Tuesday night the ISC published security releases of BIND which fix a couple of remote denial of service vulnerabilities. If you are running a recursive DNS server then you should update as soon as possible.
If you build your own BIND packages linked to OpenSSL 1.0.1 or 1.0.2 then you should also be aware of the OpenSSL security release that occurred earlier this month. The new versions of BIND will refuse to build with vulnerable versions of OpenSSL.
For more information see the bind-announce list, https://lists.isc.org/pipermail/bind-announce/2015-December/thread.html
The central nameservers and the resolvers on the central mail relays were updated to BIND 9.10.3-P2 earlier today.
Isaac Newton Institute delegated
2015-10-22 - News - Tony Finch
There are two new zones in the sample nameserver configuration,
newton.cam.ac.uk
145.111.131.in-addr.arpa
These have been delegated like the other domains managed by the Faculty of Mathematics.
Cutting a zone with DNSSEC
2015-10-21 - Progress - Tony Finch
This week we will be delegating newton.cam.ac.uk
(the Isaac Newton
Institute's domain) to the Faculty of Mathematics, who have been
running their own DNS since the very earliest days of Internet
connectivity in Cambridge.
Unlike most new delegations, the newton.cam.ac.uk
domain already
exists and has a lot of records, so we have to keep them working
during the process. And for added fun, cam.ac.uk
is signed with
DNSSEC, so we can't play fast and loose.
In the absence of DNSSEC, it is mostly OK to set up the new zone, get all the relevant name servers secondarying it, and then introduce the zone cut. During the rollout, some servers will be serving the domain from the old records in the parent zone, and other servers will serve the domain from the new child zone, which occludes the old records in its parent.
But this won't work with DNSSEC because validators are aware of zone cuts, and they check that delegations across cuts are consistent with the answers they have received. So with DNSSEC, the process you have to follow is fairly tightly constrained to be basically the opposite of the above.
The first step is to set up the new zone on name servers that are
completely disjoint from those of the parent zone. This ensures that a
resolver cannot prematurely get any answers from the new zone - they
have to follow a delegation from the parent to find the name servers
for the new zone. In the case of newton.cam.ac.uk
, we are lucky that
the Maths name servers satisfy this requirement.
The second step is to introduce the delegation into the parent zone. Ideally this should propagate to all the authoritative servers promptly, using NOTIFY and IXFR.
(I am a bit concerned about DNSSEC software which does validation as a separate process after normal iterative resolution, which is most of it. While the delegation is propagating it is possible to find the delegation when resolving, but get a missing delegation when validating. If the validator is persistent at re-querying for the delegation chain it should be able to recover from this; but quick propagation minimizes the problem.)
After the delegation is present on all the authoritative servers, and
old data has timed out of caches, the new child zone can (if
necessary) be added to the parent zone's name servers. In our case the
central cam.ac.uk
name servers and off-site secondaries also serve the
Maths zones, so this step normalizes the setup for newton.cam.ac.uk
.
(lack of) DNS root hints
2015-09-28 - News - Tony Finch
Another change I made to sample.named.conf
on Friday was to remove the
explicit configuration of the root name server hints. I was asked why,
so I thought I should explain to everyone.
BIND comes with a built-in copy of the hints, so there is no need to explicitly configure them. It is important to keep BIND up-to-date for security reasons, so the root hints should not be stale. And even if they are stale, the only negative effect is a warning in the logs.
So I regard explicitly configuring root hints as needless extra work.
It is worth noting that the H-root name server IP addresses are going to change on the 1st December 2015. We will not be making any special effort in response since normal BIND updates will include this change in due course.
There is a history of root name server IP address changes at http://root-servers.org/news.html
New CUDN-wide private addresses 10.128.0.0/9
2015-09-25 - News - Tony Finch
You should be aware of our previous announcements about changing the status of 10.128.0.0/9 to CUDN-wide private address space.
- http://news.uis.cam.ac.uk/articles/2015/07/24/reassignment-of-10-128-0-0-9-to-cudn-wide-private-address-space
- http://news.uis.cam.ac.uk/articles/2015/09/17/10-128-0-0-9-reassignment-and-university-wireless-address-changes
The central name servers now have DNS zones for 10.128.0.0/9. There are not yet any registrations in this address space, so the zones are currently almost empty. We have updated the name server configuration advice to cover these new zones.
https://jackdaw.cam.ac.uk/ipreg/nsconfig/
On the CUDN the RFC 1918 address block 10.0.0.0/8 is divided in two. The bottom half, 10.0.0.0/9, is for institution-private usage and is not routed on the CUDN. The top half, 10.128.0.0/9, was previously reserved; it has now been re-assigned as CUDN-wide private address space.
To provide DNS for 10.0.0.0/8 we have a mechanism for conveniently
sharing the zone 10.in-addr.arpa
between institution-private and
CUDN-wide private uses. The arrangement we are using is similar to the
way 128.232.0.0/16
is divided between the Computer Lab and the rest
of the University.
We have two new zones for this address space,
10.in-addr.arpa
in-addr.arpa.private.cam.ac.uk
The sample nameserver configuration has been updated to include them.
Institutions that are using the bottom half, 10.0.0.0/9, should
provide their own version of 10.in-addr.arpa
with DNAME redirections to in-addr.arpa.private.cam.ac.uk
for the
CUDN-wide addresses.
BIND security update
2015-09-03 - News - Tony Finch
Last night the ISC released new versions of BIND, 9.9.7-P3 and 9.10.2-P4, which address a couple of remote denial-of-service vulnerabilities, CVE-2015-5722 (DNSKEY parsing bug) and CVE-2015-5986 (OPENPGPKEY parsing bug). There is some background information on the recent spate of security releases at https://www.isc.org/blogs/summer_security_vulnerabilities/
If you are running BIND as a recursive DNS server you should update it urgently. We will be patching the central DNS servers this morning.
CVE-2015-5477: critical remote crash bug in BIND
2015-07-29 - News - Tony Finch
If you have a DNS server running BIND, you should apply the latest security patch as soon as possible.
The bind-announce mailing list has the formal vulnerability notification and release announcements:
The authors of BIND have also published a blog post emphasizing that there are no workarounds for this vulnerability: it affects both recursive and authoritative servers and I understand that query ACLs are not sufficient protection.
Our central DNS servers authdns* and recdns* have been patched.
More frequent DNS updates
2015-02-18 - News - Tony Finch
DNS updates now occur every hour at 53 minutes past the hour. (There is a mnemonic for the new timing of DNS updates: 53 is the UDP and TCP port number used by the DNS.) Previously, the interval between DNS update runs was four hours.
The update job takes a minute or two to run, after which changes are immediately visible on our public authoritative DNS servers, and on our central recursive servers 131.111.8.42 and 131.111.12.20.
We have also reduced the TTL of our DNS records from 24 hours to 1 hour. (The time-to-live is the maximum time old data will remain in DNS caches.) This shorter TTL means that users of other recursive DNS servers around the University and elsewhere will observe DNS changes within 2 hours of changes to the IP Register database.
There are two other DNS timing parameters which were reduced at the time of the new DNS server rollout.
The TTL for negative answers (in response to queries for data that is not present in the DNS) has been reduced from 4 hours to 1 hour. This can make new entries in the DNS available faster.
Finally, we have reduced the zone refresh timer from 4 hours to 30 minutes. This means that unofficial "stealth secondary" nameservers will fetch DNS updates within 90 minutes of a change being made to the IP Register database. Previously the delay could be up to 8 hours.
DNS server rollout report
2015-02-16 - Progress - Tony Finch
Last week I rolled out my new DNS servers. It was reasonably successful - a few snags but no showstoppers.
New DNS servers
2015-02-15 - News - Tony Finch
The DNS servers have been replaced with an entirely new setup.
The immediate improvements are:
Automatic failover for recursive DNS servers. There are servers in four different locations, two live, two backup.
DNSSEC signing moved off authdns0 onto a hidden master server, with support for signing Managed Zone Service domains.
There are extensive improvements to the DNS server management and administration infrastructure:
Configuration management and upgrade orchestration moved from ad-hoc to Ansible.
Revision control moved from SCCS to git, including a history of over 20,000 changes dating back to 1990.
Operating system moved from Solaris to Linux, to make better use of our local knowledge and supporting infrastructure.
DNS server upgrade
2015-02-09 - News - Tony Finch
This week we will replace the DNS servers with an entirely new setup. This change should be almost invisible: the new servers will behave the same as the old ones, and each switchover from an old to a new server will only take a few seconds so should not cause disruption.
The rollout will switch over the four service addresses on three occasions this week. We are avoiding changes during the working day, and rolling out in stages so we are able to monitor each change separately.
Tuesday 10 February, 18:00 -
- Recursive server recdns1, 131.111.12.20 (expected outage 15s)
Wednesday 11 February, 08:00 -
- Recursive server recdns0, 131.111.8.42 (expected outage 15s)
- Authoritative server authdns1, 131.111.12.37 (expected outage 40s)
Thursday 12 February, 18:00 -
- Authoritative server authdns0, 131.111.8.37 (expected outage 40s)
There will be a couple of immediate improvements to the DNS service, with more to follow:
Automatic failover for recursive DNS servers. There are servers in three different locations, two live, one backup, and when the West Cambridge Data Centre comes online there will be a second backup location.
DNSSEC signing moved off authdns0 onto a hidden master server, with support for signing Managed Zone Service domains.
There are extensive improvements to the DNS server management and administration infrastructure:
Configuration management and upgrade orchestration moved from ad-hoc to Ansible. The expected switchover timings above are based on test runs of the Ansible rollout / backout playbooks.
Revision control moved from SCCS to git, including a history of over 20,000 changes dating back to 1990.
Operating system moved from Solaris to Linux, to make better use of our local knowledge and supporting infrastructure.
Recursive DNS rollout plan - and backout plan!
2015-01-30 - Progress - Tony Finch
The last couple of weeks have been a bit slow, being busy with email and DNS support, an unwell child, and surprise 0day. But on Wednesday I managed to clear the decks so that on Thursday I could get down to some serious rollout planning.
My aim is to do a forklift upgrade of our DNS servers - a tier 1 service - with negligible downtime, and with a backout plan in case of fuckups.
BIND patches as a byproduct of setting up new DNS servers
2015-01-17 - Progress - Tony Finch
On Friday evening I reached a BIG milestone in my project to replace Cambridge University's DNS servers. I finished porting and rewriting the dynamic name server configuration and zone data update scripts, and I was - at last! - able to get the new servers up to pretty much full functionality, pulling lists of zones and their contents from the IP Register database and the managed zone service, and with DNSSEC signing on the new hidden master.
There is still some final cleanup and robustifying to do, and checks to make sure I haven't missed anything. And I have to work out the exact process I will follow to put the new system into live service with minimum risk and disruption. But the end is tantalizingly within reach!
In the last couple of weeks I have also got several small patches into BIND.
Recursive DNS server failover with keepalived --vrrp
2015-01-09 - Progress - Tony Finch
I have got keepalived working on my recursive DNS servers, handling
failover for testdns0.csi.cam.ac.uk
and testdns1.csi.cam.ac.uk
. I
am quite pleased with the way it works.
Network setup for Cambridge's new DNS servers
2015-01-07 - Progress - Tony Finch
The SCCS-to-git project that I wrote about previously was the prelude to setting up new DNS servers with an entirely overhauled infrastructure.
Uplift from SCCS to git
2014-11-27 - Progress - Tony Finch
My current project is to replace Cambridge University's DNS servers. The first stage of this project is to transfer the code from SCCS to Git so that it is easier to work with.
Ironically, to do this I have ended up spending lots of time working with SCCS and RCS, rather than Git. This was mainly developing analysis and conversion tools to get things into a fit state for Git.
If you find yourself in a similar situation, you might find these tools helpful.
The early days of the Internet in Cambridge
2014-10-30 - Progress - Tony Finch
I'm currently in the process of uplifting our DNS development / operations repository from SCCS (really!) to git. This is not entirely trivial because I want to ensure that all the archival material is retained in a sensible way.
I found an interesting document from one of the oldest parts of the archive, which provides a good snapshot of academic computer networking in the UK in 1991. It was written by Tony Stonely, aka <ajms@cam.ac.uk>. AJMS is mentioned in RFC 1117 as the contact for Cambridge's IP address allocation. He was my manager when I started work at Cambridge in 2002, though he retired later that year.
The document is an email discussing IP connectivity for Cambridge's Institute of Astronomy. There are a number of abbreviations which might not be familiar...
- Coloured Book: the JANET protocol suite
- CS: the University Computing Service
- CUDN: the Cambridge University Data Network
- GBN: the Granta Backbone Network, Cambridge's duct and fibre infrastructure
- grey: short for Grey Book, the JANET email protocol
- IoA: the Institute of Astronomy
- JANET: the UK national academic network
- JIPS: the JANET IP service, which started as a pilot service early in 1991; IP traffic rapidly overtook JANET's native X.25 traffic, and JIPS became an official service in November 1991, about when this message was written
- PSH: a member of IoA staff
- RA:
the Rutherford Appleton Laboratory, a national research institute in Oxfordshirethe Mullard Radio Astronomy Observatory, an outpost at Lords Bridge near Barton, where some of the dishes sit on the old Cambridge-Oxford railway line. (I originally misunderstood the reference.) - RGO: The Royal Greenwich Observatory, which moved from Herstmonceux to the IoA site in Cambridge in 1990
- Starlink: a UK national DECnet network linking astronomical research institutions
Edited to correct the expansion of RA and to add Starlink
Connection of IoA/RGO to IP world --------------------------------- This note is a statement of where I believe we have got to and an initial review of the options now open. What we have achieved so far ---------------------------- All the Suns are properly connected at the lower levels to the Cambridge IP network, to the national IP network (JIPS) and to the international IP network (the Internet). This includes all the basic infrastructure such as routing and name service, and allows the Suns to use all the usual native Unix communications facilities (telnet, ftp, rlogin etc) except mail, which is discussed below. Possibly the most valuable end-user function thus delivered is the ability to fetch files directly from the USA. This also provides the basic infrastructure for other machines such as the VMS hosts when they need it. VMS nodes --------- Nothing has yet been done about the VMS nodes. CAMV0 needs its address changing, and both IOA0 and CAMV0 need routing set for extra-site communication. The immediate intention is to route through cast0. This will be transparent to all parties and impose negligible load on cast0, but requires the "doit" bit to be set in cast0's kernel. We understand that PSH is going to do all this [check], but we remain available to assist as required. Further action on the VMS front is stalled pending the arrival of the new release (6.6) of the CMU TCP/IP package. This is so imminent that it seems foolish not to await it, and we believe IoA/RGO agree [check]. Access from Suns to Coloured Book world --------------------------------------- There are basically two options for connecting the Suns to the JANET Coloured Book world. We can either set up one or more of the Suns as full-blown independent JANET hosts or we can set them up to use CS gateway facilities. The former provides the full range of facilities expected of any JANET host, but is cumbersome, takes significant local resources, is complicated and long-winded to arrange, incurs a small licence fee, is platform-specific, and adds significant complexity to the system managers' maintenance and planning load. The latter in contrast is light-weight, free, easy to install, and can be provided for any reasonable Unix host, but limits functionality to outbound pad and file transfer either way initiated from the local (IoA/RGO) end. The two options are not exclusive. We suspect that the latter option ("spad/cpf") will provide adequate functionality and is preferable, but would welcome IoA/RGO opinion. Direct login to the Suns from a (possibly) remote JANET/CUDN terminal would currently require the full Coloured Book package, but the CS will shortly be providing X.29-telnet gateway facilities as part of the general infrastructure, and can in any case provide this functionality indirectly through login accounts on Central Unix facilities. For that matter, AST-STAR or WEST.AST could be used in this fashion. Mail ---- Mail is a complicated and difficult subject, and I believe that a small group of experts from IoA/RGO and the CS should meet to discuss the requirements and options. The rest of this section is merely a fleeting summary of some of the issues. Firstly, a political point must be clarified. At the time of writing it is absolutely forbidden to emit smtp (ie Unix/Internet style) mail into JIPS. This prohibition is national, and none of Cambridge's doing. We expect that the embargo will shortly be lifted somewhat, but there are certain to remain very strict rules about how smtp is to be used. Within Cambridge we are making best guesses as to the likely future rules and adopting those as current working practice. It must be understood however that the situation is highly volatile and that today's decisions may turn out to be wrong. The current rulings are (inter alia) Mail to/from outside Cambridge may only be grey (Ie. JANET style). Mail within Cambridge may be grey or smtp BUT the reply address MUST be valid in BOTH the Internet AND Janet (modulo reversal). Thus a workstation emitting smtp mail must ensure that the reply address contained is that of a current JANET mail host. Except that - Consenting machines in a closed workgroup in Cambridge are permitted to use smtp between themselves, though there is no support from the CS and the practice is discouraged. They must remember not to contravene the previous two rulings, on pain of disconnection. The good news is that a central mail hub/distributer will become available as a network service for the whole University within a few months, and will provide sufficient gateway function that ordinary smtp Unix workstations, with some careful configuration, can have full mail connectivity. In essence the workstation and the distributer will form one of those "closed workgroups", the workstation will send all its outbound mail to the distributer and receive all its inbound mail from the distributer, and the distributer will handle the forwarding to and from the rest of Cambridge, UK and the world. There is no prospect of DECnet mail being supported generally either nationally or within Cambridge, but I imagine Starlink/IoA/RGO will continue to use it for the time being, and whatever gateway function there is now will need preserving. This will have to be largely IoA/RGO's own responsibility, but the planning exercise may have to take account of any further constraints thus imposed. Input from IoA/RGO as to the requirements is needed. In the longer term there will probably be a general UK and worldwide shift to X.400 mail, but that horizon is probably too hazy to rate more than a nod at present. The central mail switch should in any case hide the initial impact from most users. The times are therefore a'changing rather rapidly, and some pragmatism is needed in deciding what to do. If mail to/from the IP machines is not an urgent requirement, and since they will be able to log in to the VMS nodes it may not be, then the best thing may well be to await the mail distributer service. If more direct mail is needed more urgently then we probably need to set up a private mail distributer service within IoA/RGO. This would entail setting up (probably) a Sun as a full JANET host and using it as the one and only (mail) route in or out of IoA/RGO. Something rather similar has been done in Molecular Biology and is thus known to work, but setting it up is no mean task. A further fall-back option might be to arrange to use Central Unix facilities as a mail gateway in similar vein. The less effort spent on interim facilities the better, however. Broken mail ----------- We discovered late in the day that smtp mail was in fact being used between IoA and RA, and the name changing broke this. We regret having thus trodden on existing facilities, and are willing to help try to recover any required functionality, but we believe that IoA/RGO/RA in fact have this in hand. We consider the activity to fall under the third rule above. If help is needed, please let us know. We should also report sideline problem we encountered and which will probably be a continuing cause of grief. CAVAD, and indeed any similar VMS system, emits mail with reply addresses of the form "CAVAD::user"@.... This is quite legal, but the quotes are syntactically significant, and must be returned in any reply. Unfortunately the great majority of Unix systems strip such quotes during emission of mail, so the reply address fails. Such stripping can occur at several levels, notably the sendmail (ie system) processing and the one of the most popular user-level mailers. The CS is fixing its own systems, but the problem is replicated in something like half a million independent Internet hosts, and little can be done about it. Other requirements ------------------ There may well be other requirements that have not been noticed or, perish the thought, we have inadvertently broken. Please let us know of these. Bandwidth improvements ---------------------- At present all IP communications between IoA/RGO and the rest of the world go down a rather slow (64Kb/sec) link. This should improve substantially when it is replaced with a GBN link, and to most of Cambridge the bandwidth will probably become 1-2Mb/sec. For comparison, the basic ethernet bandwidth is 10Mb/sec. The timescale is unclear, but sometime in 1992 is expected. The bandwidth of the national backbone facilities is of the order of 1Mb/sec, but of course this is shared with many institutions in a manner hard to predict or assess. For Computing Service, Tony Stoneley, ajms@cam.cus 29/11/91
Some small changes to sample.named.conf
2014-07-23 - News - Tony Finch
The UIS have taken over the management of the DNS entries for the MRC Biostatistics Unit subnets 193.60.[86-87].x. As a result, the zones
- 86.60.193.in-addr.arpa
- 87.60.193.in-addr.arpa
can now be slaved from the authdns*.csx servers by hosts within the CUDN, and they have been added to the sample BIND configuration at
https://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf
Those who feel they must slave enough reverse zones to cover the whole CUDN may want to include them. These zones are not yet signed, but we expect them to be within a week or two.
A number of cosmetic changes to the comments in the sample configuration have also been made, mostly bringing up to date matters like the versions of BIND still being actively supported by ISC.
Those who use an explicit root hints file may want to note that a new version
was issued in early June, adding an IPv6 address to B.ROOT-SERVERS.NET.
The copy at https://jackdaw.cam.ac.uk/ipreg/nsconfig/db.cache
was updated.
SRV records
2014-07-14 - News - Chris Thompson
There is now a service_ops
web page available which allows
authorised users to create service (SRV) records in the DNS for names
in the domains to which they have access. See the service_ops
help
page for more details.
SSHFP records
2014-07-08 - News - Chris Thompson
SSH fingerprint records can now be added to the DNS via the IP registration database. Such records are described in RFC 4225 as updated by RFC 6594.
More information can be found on the IP Register SSHFP documentation page
DHCP-related data in the IP registration database
2014-07-07 - News - Chris Thompson
Many users of the IP registration database web interface will have
noticed the appearance some time ago of mac
and dhcp_group
fields
on the single_ops
page, as well as related changes visible via the
table_ops
page.
These were intended in the first instance for maintaining a DHCP service for internal use in the UCS. It was perhaps unwise of us to make them visible outside and raise users' expectations prematurely. It remains a work in progress, and we have had to make changes of detail that affected some of those who had set these fields. The notes here describe the current state.
Although the single_ops
page doesn't make this obvious, the mac
and dhcp_group
fields are properties of the IP address rather than
the box. If a box or vbox has multiple IP addresses, each one can have
its own values for them. The fields are cleared automatically when the
IP address is rescinded.
MAC addresses can be entered in any of the usual formats but are
displayed as colon-separated. Because the intent is to support DHCP
servers, MAC addresses (if set) are required to be unique within any
particular mzone
/lan
combination. A non-null dhcp_group
value is
intended to indicate non-default DHCP options. To support automated
processing, it must correspond to a registered dhcp_group
object for
the given mzone
/lan
which can be created, modified or deleted via
table_ops
. The values should contain only alphanumeric, hyphen and
underline characters.
The degree to which any of this is of use to users outside the UIS is currently very limited. We do intend to add more usability features, though.
Representing network access controls in the database
2014-05-20 - News - Chris Thompson
The scheme described in news item 2008-12-15 has been reworked to represent a larger number of references to specific IP addresses from the various parts of the CUDN infrastructure. The intention remains the same: to prevent such IP addresses being rescinded or reused without appropriate changes being made to the CUDN configuration.
There are now four "anames" used instead of three:
janet-filter.net.private.cam.ac.uk
for exceptions at the CUDN border routers, often permitting some network traffic that would otherwise be blocked. This is essentially the same as the oldjanet-acl.net.private.cam.ac.uk
which is temporarily an alias.cudn-filter.net.private.cam.ac.uk
for exceptions at internal CUDN routers. This includes the old high-numbered port blocking, where it is still in use, but also many other sorts of exception which were previously not represented. The old namecudn-acl.net.private.cam.ac.uk
is temporarily an alias.cudn-blocklist.net.private.cam.ac.uk
for addresses for which all IP traffic is completely blocked, usually as the result of a security incident. This is essentially the same as the oldblock-list.net.private.cam.ac.uk
which is temporarily an alias.cudn-config.net.private.cam.ac.uk
for addresses that are referred to in the CUDN routing infrastructure. This is completely new.
Both IPv4 and IPv6 addresses may appear in these lists (although at
the moment only cudn-config
has any IPv6 addresses).
Requests for the creation or removal of network access control
exceptions, or explanations of existing ones, should in most cases be
sent to network-support@uis.cam.ac.uk in the first instance, who
will redirect them if necessary. However, the CERT team at
cert@cam.ac.uk are solely responsible for the cudn-blocklist
contents in particular.
Upgrade to BIND 9.9.5 may possibly cause problems
2014-02-18 - News - Chris Thompson
The most recently released BIND versions (9.9.5, 9.8.7, 9.6-ESV-R11) have implemented a more pedantic interpretation of the RFCs in the area of compressing responses. It is just possible that this will cause problems for some resolving software. ISC have written a Knowledge Base article about it, which can be found at https://kb.isc.org/article/AA-01113
In particular, the name in the answer section of a response may now have a different case from that in the question section (which will always be identical to that in the original query). Previously they would (after decompression) have been identical. Resolvers are meant to use case- insensitive comparisons themselves, but this change could expose non- conformance in this area.
However, experiments we have performed so far, and information from the DNS community at large, suggests that such non-conformance is quite rare. We are therefore planning to upgrade the CUDN central nameservers (both authoritative and recursive) to BIND 9.9.5 over the next few days. Please keep an eye out for any problems that might be caused by the change, and let us (hostmaster at ucs.cam.ac.uk) know as soon as possible, while we still have the option of backing off.
The "consolidated reverse zone" in-addr.arpa.cam.ac.uk is now signed
2014-01-26 - News - Chris Thompson
In November, I wrote to this list (cs-nameservers-announce):
As regards DNSSEC validation, cl.cam.ac.uk now has a chain of trust from the root zone. We expect that 232.128.in-addr.arpa will also have one before long.
This last happened earlier in January. That made it sensible to sign the "consolidated reverse zone" in-addr.arpa.cam.ac.uk which provides reverse lookup results for IPv4 addresses in the range 128.232.[128-255].x. This has now been done, and the results of such reverse lookup can be fully validated using chains of trust from the root zone.
There is more information at
https://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-signed.html
Moved to https://www.dns.cam.ac.uk/domains/signed.html
https://jackdaw.cam.ac.uk/ipreg/nsconfig/consolidated-reverse-zones.html
Moved to https://www.dns.cam.ac.uk/domains/reverse/
which have been brought up to date.
Computer Laboratory zones are now signed
2013-11-19 - News - Chris Thompson
First, a note that the IPv6 reverse zone delegated to the Computer Laboratory,
2.0.2.1.2.0.0.3.6.0.1.0.0.2.ip6.arpa, has been added to the list of zones
in https://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf
that can be
slaved stealthily within the CUDN. Some of the commentary in that file has
also been brought up to date.
The main news is that the zones
- cl.cam.ac.uk
- 232.128.in-addr.arpa
- 2.0.2.1.2.0.0.3.6.0.1.0.0.2.ip6.arpa
are now all signed. They are therefore much larger than before, and have larger and more frequent incremental updates. Those who are slaving them may need to be aware of that.
As regards DNSSEC validation, cl.cam.ac.uk now has a chain of trust from the root zone. We expect that 232.128.in-addr.arpa will also have one before long. The IP reverse zone has DS (delegation signer) records in 1.2.0.0.3.6.0.1.0.0.2.ip6.arpa, but that itself can be validated only via the dlv.isc.org lookaside zone, as JANET have not yet signed its parent zone 0.3.6.0.1.0.0.2.ip6.arpa (despite an 18-month-old promise on their part).
New xlist_ops web page
2013-07-01 - News - Chris Thompson
There is a new xlist_ops
page that provides more general list
operations on the IP registration database than does the list_ops
page. In particular it allows downloads of lists of boxes, vboxes,
cnames or anames, and uploads to perform bulk operations on multihomed
boxes, vboxes, cnames or (for registrars only) anames. See the
xlist_ops
help page for details.
The opportunity has been taken to make a number of other small
modifications. The order of links in the standard page header has been
altered, and multihome_ops
has been relegated to a link from the
preferred box_ops
page.
Removing some registrations in dlv.isc.org
2012-07-08 - News - Chris Thompson
The following will be relevant primarily to those who are performing DNSSEC validation.
It will soon be the second anniversary of the date on which the root zone was signed, 15 July 2010. By now, everyone seriously into the business of DNSSEC validation should be using a trust anchor for the root zone, whether or not they also use lookaside validation via a trust anchor for dlv.isc.org. The latter facility was always meant to be an aid to early deployment of DNSSEC, not a permanent solution. While it remains useful to cover the many unsigned gaps in the tree of DNS zones, it no longer seems appropriate to have dlv.isc.org entries for DNS zones that can be validated via a chain of trust from the root zone.
Therefore, on or about 15 July 2012, we shall be dropping the entries
for the two zones cam.ac.uk
and 111.131.in-addr.arpa
from the
dlv.isc.org zone, as these have now have had chains of trust from the
root zone for well over a year. We will be retaining the entries for
a number of our signed reverse zones whose parent zones are not yet
signed - for details see
https://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-signed.html
Phasing out of the old IPv6 address block
2012-02-07 - News - Chris Thompson
Nearly all IPv6 use within the CUDN has now been transferred from the old
block 2001:630:200::/48 to the new block 2001:630:210::/44. Therefore the
following changes have been made to the sample configuration for stealth
servers at https://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf
The old block has been dropped from the definition of the "camnets" ACL.
The reverse zone 0.0.2.0.0.3.6.0.1.0.0.2.ip6.arpa has been dropped from the list of those which may be slaved.
We advise you to make the corresponding changes in your nameserver configurations if they are relevant.
New BIND vulnerability CVE-2011-4313
2011-11-16 - News - Chris Thompson
ISC have issued the BIND advisory
http://www.isc.org/software/bind/advisories/cve-2011-4313
It concerns a bug, thought to be a remotely exploitable, that crashes recursive nameservers, and they have provided new BIND versions (9.4-ESV-R5-P1, 9.6-ESV-R5-P1, 9.7.4-P1, 9.8.1-P1) which are proof against crashing from this cause, although the precise sequence of events that leads to it remains obscure.
Although we are not aware of any local nameservers that have been affected by this problem, several other sites have been badly affected in the last 24 hours.
The CUDN central recursive nameservers at 131.111.8.42 & 131.111.12.20 are now running BIND 9.8.1-P1.
IPv6 addresses of the CUDN central nameservers
2011-08-23 - News - Chris Thompson
The IPv6 routing prefixes for the vlans on which the CUDN central nameservers are located are being altered. As a result, their IPv6 addresses are changing as follows:
Old IPv6 address New IPv6 address authdns0.csx 2001:630:200:8080::d:a0 2001:630:212:8::d:a0 authdns1.csx 2001:630:200:8120::d:a1 2001:630:212:12::d:a1 recdns0.csx 2001:630:200:8080::d:0 2001:630:212:8::d:0 recdns1.csx 2001:630:200:8120::d:1 2001:630:212:12::d:1
The new addresses are working now, and the old addresses will continue to work as well until Monday 5 September, when they will be removed. If you are using them (e.g. in nameserver or stub resolver configuration files) you should switch to the new addresses (or the IPv4 ones) before then.
The comments in the sample configuration file
https://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf
about using IPv6 addresses to address the nameservers have been modified appropriately.
Two ISC advisories about BIND
2011-07-05 - News - Chris Thompson
ISC have issued two BIND advisories today:
http://www.isc.org/software/bind/advisories/cve-2011-2464
This affects most still-supported versions of BIND. A suitably crafted UPDATE packet can trigger an assertion failure. Apparently not yet seen in the wild...
http://www.isc.org/software/bind/advisories/cve-2011-2465
This affects only users of Response Policy Zones in 9.8.x.
Fixed versions are 9.6-ESV-R4-P3, 9.7.3-P3 and 9.8.0-P4.
New IPv6 address block for Cambridge
2011-06-21 - News - Chris Thompson
You may be aware that we have been negotiating with JANET for a larger IPv6 address block. These negotiations have (eventually) been successful. We are being allocated 2001:630:210::/44, and the existing use of 2001:630:200::/48 will be phased out over (we hope) the next few months. Details of how the new space will be divided up will be available from Networks in due course.
As immediate consequences, the following changes have been made to
http://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf
:
The "camnets" ACL has 2001:630:210::/44 added to it.
The reverse zone "1.2.0.0.3.6.0.1.0.0.2.ip6.arpa" is listed as available for (stealth) slaving.
Of course, the reverse zone has nothing significant in it yet! But if you are slaving the existing IPv6 reverse zone, you should probably start slaving the new one as well.
There will of course be other changes during the transition that may affect local nameserver administrators. In particular the IPv6 addresses
of the CUDN central authoritative and recursive nameservers will change at some point: this list will be informed before that happens.
A few minor issues while I have your attention:
The zone amtp.cam.ac.uk (old name for damtp.cam.ac.uk) is no longer delegated, and is about to vanish entirely. If you are still slaving it even after the message here on 9 March, now is the time to stop.
There has been another small change to the official root hints file ftp://ftp.internic.net/domain/named.cache, and the copy at
http://jackdaw.cam.ac.uk/ipreg/nsconfig/db.cache
has been updated accordingly. The change is the addition of an IPv6 address for d.root-servers.net, and rather appropriately it was made on "IPv6 day".My description of the BIND vulnerability CVE-2011-1910 was defective in two directions:
It isn't necessary to have DNSSEC validation turned on to be vulnerable to it.
On the other hand, only moderately recent versions of BIND are vulnerable: old enough ones are not.
The information at
http://www.isc.org/software/bind/advisories/cve-2011-1910
about which versions are affected is accurate (bearing in mind that some OS vendors make their own changes without altering the version number). If you are compiling from source, I can advise you on the code fragment to look for.
BIND high severity advisory CVE-2011-1910
2011-05-27 - News - Chris Thompson
ISC have issued a high severity advisory
http://www.isc.org/software/bind/advisories/cve-2011-1910
and several fixed BIND versions are now available (9.4-ESV-R4-P1, 9.6-ESV-R4-P1, 9.7.3-P1, 9.8.0-P2).
This bug can only be exercised if DNSSEC validation is turned on, but that is increasingly becoming the default setup these days.
New box_ops web page
2011-05-22 - News - Chris Thompson
There is a new box_ops
page which can be used as an alternative to
the multihome_ops
page to manipulate the registrations of hosts
("boxes" in the terminology of the IP registration database) with more
than one IP address.
Its functions and display are simpler than those of multihome_ops
and more in line with those of the other web pages. Unlike
multihome_ops
it supports the addition or removal of IPv6 addresses
(if any are assigned to the user's management zones) as well as IPv4
ones. However, it is lacking some of the facilities available with
multihome_ops
such as: using wildcards with display
, selecting by
address, and displaying detailed properties of the associated IP
address objects.
We hope to add at least some of these facilities to box_ops
(and to
other pages, such as vbox_ops
) in due course, and to eliminate the
necessity to keep mutihome_ops
in its current form. The main reason
for releasing box_ops
now in this somewhat undeveloped state is its
support for IPv6 addresses.
Changes to sample.named.conf for delegated Maths domains
2011-03-09 - News - Chris Thompson
There is a new version of the sample nameserver configuration at
http://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf
The following zones, which were not previously delegated, have been added:
- maths.cam.ac.uk
- statslab.cam.ac.uk
- 20.111.131.in-addr.arpa
The following zone, which is being phased out, has been removed:
- amtp.cam.ac.uk
There are no other changes.
Consolidated reverse zone in-addr.arpa.cam.ac.uk
2011-03-03 - News - Chris Thompson
We have progressed past step (2), as in:
- If all goes well, during the first week in March we will get the delegations of the 32 zones replaced by DNAMEs in the parent zone 232.128.in-addr.arpa.
with thanks to the Computer Lab hostmaster for his co-operation. We have no reports of any problems at this stage.
The sample nameserver configuration
https://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf
has been updated to remove the 32 zones [224-255].232.128.in-addr.arpa from the list that may be slaved. (Apart from some modifications to the comments before "in-addr.arpa.cam.ac.uk", that is the only change.)
If you are slaving any or all of these 32 reverse zones, you should stop doing so now. Sometime next week we will start logging such slaving activity, and alert the administrators of any hosts involved.
The target date for step (3), the complete removal of these 32 reverse zones, remains Monday 14 March.
Consolidated reverse zone in-addr.arpa.cam.ac.uk
2011-02-25 - News - Chris Thompson
We performed step (1) -
- On Monday 21 February we will replace the the 32 zones [224-255].232.128.in-addr.arpa by versions using DNAMEs that indirect into in-addr.arpa.cam.ac.uk.
on Monday as planned, but had to back off for 12 out of the 32 zones (those covering PWF subnets) because of a problem with a local script used in the PWF re-imaging process. This has now been fixed, and all 32 zones are using indirecting DNAMEs again.
At present we do not think that this delay will significantly affect the schedule for steps (2) and (3). If you are experiencing any problems which you think might be related to these changes, please contact hostmaster at ucs.cam.ac.uk as soon as possible.
Consolidated reverse zone in-addr.arpa.cam.ac.uk
2011-02-16 - News - Chris Thompson
We are planning to extend the IP address range covered by the
consolidated reverse zone in-addr.arpa.cam.ac.uk
, described
here last November, to include 128.232.[224-255].x.
The web page
http://jackdaw.cam.ac.uk/ipreg/nsconfig/consolidated-reverse-zones.html
has been updated with the planned schedule and some new advice for users of Windows DNS Server.
To summarise:
On Monday 21 February we will replace the the 32 zones [224-255].232.128.in-addr.arpa by versions using DNAMEs that indirect into in-addr.arpa.cam.ac.uk.
If all goes well, during the first week in March we will get the delegations of the 32 zones replaced by DNAMEs in the parent zone 232.128.in-addr.arpa.
If all still goes well, we plan to remove the 32 zones [224-255].232.128.in-addr.arpa completely on Monday 14 March.
The schedule is rather tight because we want to complete this work during full term if possible. If there have to be substantial delays, some of the later steps will be postponed until after Easter.
BIND users who want to slave zones providing reverse lookup for substantially the whole CUDN should slave "in-addr.arpa.cam.ac.uk" and "232.128.in-addr.arpa" (the latter from the CL nameservers) if they are not already doing so, and they should cease slaving the 32 zones [224-255].232.128.in-addr.arpa after step (2) but before step (3). [There will be a further announcement here when step (2) has been completed.]
Windows DNS Server users should note that we no longer recommend that they should stealth slave any zones, see
http://www-tus.csx.cam.ac.uk/windows_support/Current/activedirectory/dns/dnssec.html
If you do feel you must continue such stealth slaving, the earlier link contains advice about which versions support zones
containing DNAMEs and which do not. In particular, those using Windows 2003 or 2003R2 should cease slaving any of the zones
[224-255].232.128.in-addr.arpa as soon as possible, before step (1).
Consolidated reverse zone in-addr.arpa.cam.ac.uk
2010-11-26 - News - Chris Thompson
The sample nameserver configuration at
http://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf
has been updated to include the zone in-addr.arpa.cam.ac.uk
. Until
recently this was contained within the cam.ac.uk
zone, but it is
now a separate (unsigned) delegated zone. It currently provides the
reverse lookup records for IP addresses in the range 128.232.[128-223].x
but we hope to extend that to cover the whole of 128.232.[128-255].x
eventually.
A description of the zone and our plans for it can be found at
http://jackdaw.cam.ac.uk/ipreg/nsconfig/consolidated-reverse-zones.html
Please be reassured that there will be further announcements here (and probably elsewhere) before the extension to cover 128.232.[224-255].x is implemented.
Signed root zone
2010-07-19 - News - Chris Thompson
As expected, the DNS root zone became properly signed on 15 July. See http://www.root-dnssec.org/ for details.
A trust anchor for the root zone has now been added to the configuration
of the CUDN central recursive nameservers (at 131.111.8.42 & 131.111.12.20),
in addition to the existing one for dlv.isc.org
used for "lookaside
validation". There is no immediate prospect of being able to drop the
latter, as there are still huge gaps in the signed delegation tree
(the "ac.uk" zone, for example).
For those running their own validating recursive nameservers, the pages
https://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-validation.html https://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-testing.html
have been updated with some relevant information.
New root hints file, and validating DNSSEC-signed zones
2010-06-21 - News - Chris Thompson
A new version of the root zone hints file has been published, and
http://jackdaw.cam.ac.uk/ipreg/nsconfig/db.cache
has been updated
with a copy. The substantive change is the addition of an IPv6
address for i.root-servers.net
. As usual with such changes, there
is little urgency to update your copies.
The rest of this posting is about validating DNSSEC-signed zones.
ICANN have held their first "key signing ceremony" and appear to be on target to sign the root zone on Thursday 15 July. See http://www.root-dnssec.org/ for details. We expect to be including a trust anchor for the signed root zone on the CUDN central recursive nameservers (131.111.8.42 and 131.111.12.20) shortly after it is available.
If you are operating a validating nameserver, there are issues about the supported signing algorithms. There are currently three important ones:
Mnemonic Code Supported by Can be used with which BIND versions[1] negative responses RSASHA1 5 9.4 Only zones using NSEC NSEC3RSASHA1 7 9.6 Zones using NSEC or NSEC3[2] RSASHA256 8 9.6.2 or 9.7 Zones using NSEC or NSEC3
[1] or later.
[2] but as NSEC3RSASHA1 is otherwise identical to RSASHA1, it is almost invariably used with zones using NSEC3 records.
Zones signed only with algorithms unsupported by particular software will be treated by them as unsigned.
Only RSASHA1 is officially mandatory to support according to current IETF standards, but as the intention is to sign the root zone with RSASHA256, it will become effectively mandatory as well. (Other organisations are already assuming this. For example, Nominet have signed the "uk" top-level domain using RSASHA256, although they do not intend to publish a trust anchor for it other than by having a signed delegation in the root zone.)
Therefore, if you want to be able to use a trust anchor for the
root zone you will need software that supports the RSASHA256
algorithm, e.g. BIND versions 9.6.2 / 9.7 or later. As an aid
for checking this, the test zone dnssec-test.csi.cam.ac.uk is
now signed using RSASHA256. For details on how to test, see
http://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-testing.html
There are no immediate plans to change the algorithm used to sign our production DNS zones from RSASHA1.
cs-nameservers-announce copies on ucam.comp.tcp-ip newsgroup
2010-06-09 - News - Chris Thompson
On Sep 30 2008, I wrote:
It has been the practice to post copies of messages posted to the cs-nameservers-announce mailing list to the local newsgroup ucam.comp.tcp-ip.
The local newsgroups ucam.* are expected to be phased out before long, so I propose that we discontinue this practice. If anyone feels differently, please let us know.
At that time, we received pleas to continue the copies to the ucam.comp.tcp-ip newsgroup for as long as it remained in existence (which has in fact been much longer than was then anticipated). However, its demise now really is imminent, see e.g.
http://ucsnews.csx.cam.ac.uk/articles/2010/03/30/newsgroups-and-bulletin-boards
Therefore I have removed the references to ucam.comp.tcp-ip from the mailing list description and from the sample.named.conf file, and this message will be the last one copied to the newsgroup.
Updates to sample.named.conf
2010-04-26 - News - Chris Thompson
There is a new sample configuration for nameservers on the CUDN at
http://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf
The following changes have been made:
Some references relating to DNSSEC validation have been added. For more details, though, consult as before
http://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-validation.html
A recommended setting for "max-journal-size" is included. Without this, the journal files for incrementally updated zones will grow indefinitely, and for signed zones in particular they can become extremely large.
The most significant change concerns the zone private.cam.ac.uk
.
Previously, there was no delegation for this zone in cam.ac.uk
.
However, we have found that with the most recent versions of BIND,
defining private.cam.ac.uk
as either "type stub" or "type forward"
in combination with using DNSSEC validation, led to validation failures
due to BIND's inability to prove private.cam.ac.uk
unsigned while
cam.ac.uk
is signed.
On consideration, we have decided to create a delegation for
private.cam.ac.uk
after all. (The only effect for users outside
the CUDN should be that they will consistently get a REFUSED response
for names in that zone, instead of sometimes getting NXDOMAIN instead.)
This will also allow us to increase the number of official nameservers
for private.cam.ac.uk
(within the CUDN, obviously), and perhaps to
sign it without having to advertise a trust anchor for it by special
means.
Nameservers on the CUDN should therefore either slave private.cam.ac.uk
,
or not define it at all in their configuration. (Using "type stub" or
"type forward" will continue to work for non-validating servers, but
should be phased out.)
However, our corresponding reverse zones 16.172.in-addr.arpa
through 30.172.in-addr.arpa
cannot be delegated from the parent
zone "172.in-addr.arpa". Luckily there are delegations there to
the IANA "black hole" (AS112) servers, and this suffices to make
the zones provably unsigned. Any of "type slave", "type stub"
or "type forward" can be used for these zones (with or without
validation) and one of them must be used or reverse lookups
will fail.
Rationale for the recent changes to recommended AD configurations
2010-02-03 - News - Chris Thompson
You will probably have seen Andy Judd's message to ucam-itsupport
last Friday announcing new recommendations for Active Directory
and Windows DNS Server configurations within the CUDN, described
more fully at
http://www-tus.csx.cam.ac.uk/windows_support/Current/activedirectory/dns/ad_dns_config_info.html
These were the result of discussions between our PC Support group and our Hostmaster group. This message gives part of the background to our thinking, and some points may be relevant to institutions not using Windows DNS Server at all.
It will be no surprise that the advice not to ("stealth") slave zones
from the CUDN central (authoritative) nameservers was motivated by
the deficiencies of the various versions of Windows DNS Server when
slaving signed zones (not to mention other defects in its treatment
of unknown DNS record types and SOA serial number handling). Not
slaving zones such as cam.ac.uk
does have the disadvantage that
resolving of names and addresses of hosts local to the institution
may fail if it is cut off from the rest of the CUDN, but we think
this should be tolerated because of the other advantages.
The advice to forward requests not resolved locally to the CUDN
central (recursive) nameservers may seem contrary to advice given
in https://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf
and
in previous messages to this list. In the case of Windows DNS Server
configurations the primary intent was to make sure that queries for
names in private.cam.ac.uk
and corresponding reverse lookups worked
correctly. (Configured laboriously via GUI, many zones were omitted
from Windows DNS Server setups in the past.) However, there is the
more general point that the central servers provide DNSSEC validation
for the increasing proportion of names for which it is available,
and forwarding requests to them takes advantage of that if validation
is not being performed locally. We should admit, though, that the
communication path between the institution and the CUDN central
nameservers is not yet secured cryptographically. (If there is
a fully functional validating recursive nameserver local to the
institution, that could of course be used instead of the CUDN
central nameservers.)
Another issue is the likelihood that we will be changing the set
of reverse zones available for slaving during the next year. In
particular we are likely to want to extend the scheme described
at http://people.pwf.cam.ac.uk/cet1/prune-reverse-zones
which we
are already using for reverse lookup of the 128.232.[128-223].x
range to cover 128.232.[224-255].x as well, eliminating the 32
individual zones used for the latter range at present.
Unicode in the IP registration database
2009-12-09 - News - Chris Thompson
(These changes first went into service on 2009-12-07, but had to withdrawn due to problems with uploads, in particular. They are now back in service.)
When the Jackdaw Oracle server was moved to new hardware and upgraded to Oracle 10 earlier this year, the opportunity was taken to change the encoding it uses for character data from ISO Latin 1 to UTF-8. However, little change will have been apparent at the user interfaces, because translation from and to ISO Latin 1 were made for input and output.
This has now been changed so that all interfaces use UTF-8. In
particular, the IP registration web pages now use UTF-8 encoding, and
so do files downloaded from the list_ops
page. Files uploaded should
also be in UTF-8: invalid encodings (such as might be caused by using
the ISO Latin 1 encoding instead) will be replaced by Unicode
replacement characters '�' (U+FFFD).
Only those fields that are allowed to contain arbitrary text (such as
equipment
, location
, owner
, sysadmin
, end_user
, remarks
)
are affected by this change. Values (the great majority) that are in
7-bit ASCII will not be affected because it is the common subset of
ISO Latin 1 and UTF-8.
We have identified a few values in the IP registration data which have suffered the unfortunate fate of being converted from ISO Latin 1 to UTF-8 twice. We will be contacting the relevant institutional COs about them.
Problem with SOA serial numbers and Windows DNS Server
2009-09-29 - News - Chris Thompson
In conjunction with PC Support we suggest the following guidelines for dealing with Windows DNS servers in the wake of the SOA serial number wrap-around:
All zones which are copied from any of the UCS servers (cam.ac.uk
,
private.cam.ac.uk
, and the reverse zones) need to be refreshed so they
have a serial number which starts 125... rather than 346... The serial
number can be found in the Start of Authority tab for the zones properties.
To refresh the zones try the following steps;
In a DNS MMC select the DNS server, right click and select clear cache. For any zone you copy, right click and select Transfer from Master. Check the serial number for the zone once it has loaded.
If the serial number hasn't been updated you may have tried too soon, wait a couple more minutes and try again. However if after ten minutes it hasn't updated you can also try;
If the serial number hasn't been updated delete the zone, clear the cache and re-create. Check the serial number once it has fully loaded.
Final resort: delete the zone, clear the cache, delete the files from C:\Windows\System32\DNS then re-create.
In most cases methods 1 or 2 will work.
For those with older copies of notes from the Active Directory course which are being used as reference, don't. You should check your configuration information at the following locations.
http://www-tus.csx.cam.ac.uk/windows_support/Current/activedirectory/dns/configureserver.html
http://www-tus.csx.cam.ac.uk/windows_support/Current/activedirectory/dns/dnssec.html
Incidentally, Windows 2008 DNS Server is not immune to the problem (but method 1 above should normally work for it).
Problem with SOA serial numbers and Windows DNS Server
2009-09-28 - News - Chris Thompson
Last Saturday (26 September) we started to change SOA serial numbers
for the zones managed by the Computing Service from "seconds since 1900"
to "seconds since 1970" (the latter being familiar as the POSIX time_t
value). We had made sure that this was an increase in RFC 1982
(published August 1996) terms. No version of BIND has any problem with this.
Unfortunately, we did not foresee that many versions of Windows DNS Server (apparently even those as late as Windows 2003 R2) cannot cope with this change, repeatedly attempting to transfer the zone at short intervals and discarding the result. We are seeing a great deal of churning on our authoritative nameservers as a result. (This affects servers that are fetching from 131.111.12.73 [fakedns.csx.cam.ac.uk] as well.)
It is too late for us to undo this change. If you are running Windows DNS Server and are failing to fetch cam.ac.uk and similar DNS zones, you should discard your existing copy of the zone(s). Andy Judd advises us that you "need to delete the zone in a DNS MMC and then delete the zone files from C:\Windows\System32\dns and C:\Windows\System32\dns\backup, then re-create the zone". Please ask Hostmaster and/or PC Support for assistance if necessary.
We shall be contacting the administrators of the hosts that are causing the most continuous zone-fetching activity on our servers.
Two reverse zones to be signed on Tuesday 29 September
2009-09-24 - News - Chris Thompson
We judge that the signing of the DNS zone cam.ac.uk since 3 August has been a success. We intend to start signing two reverse zones
111.131.in-addr.arpa 0.0.2.0.0.3.6.0.1.0.0.2.ip6.arpa
next Tuesday morning, 29 September 2009.
For those who stealth slave either or both of these zones, but cannot cope with signed zones, unsigned versions will remain available from fakedns.csx.cam.ac.uk [131.111.12.73]. Other relevant information may be found via the DNSSEC-related links on
https://jackdaw.cam.ac.uk/ipreg/nsconfig/
In future, we may not always announce when particular zones are expected to become signed.
Any problems should be referred to hostmaster@ucs.cam.ac.uk
cam.ac.uk to be signed on Monday 3 August
2009-07-31 - News - Chris Thompson
We intend to make cam.ac.uk a signed DNS zone next Monday morning, 3 August 2009. We believe that those most likely to be adversely affected are the Windows DNS Server clients within the CUDN that are slaving it. The following is taken from
http://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-windows.html
which we will update in the light of experience.
Only Windows 2008 R2 is practically trouble-free in this context. Earlier versions will generate very large numbers of messages in the system log about unknown record types, and may not result in a usable copy of the zone.
However, with Windows 2003 R2 or Windows 2008 you can use the registry option described at
(using the 0x2 setting) and this should allow you to slave a signed zone, although not actually to use the signatures.
For other versions, or in any case if problems arise, you can slave the zone from 131.111.12.73 [fakedns.csx.cam.ac.uk] instead of from 131.111.8.37 and/or 131.111.12.37. This server provides unsigned versions of all the zones described as available for slaving from
the latter addresses in
http://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf
for transfer to clients within within the CUDN. It should not be used for any other purpose.
Any problems should be referred to hostmaster@ucs.cam.ac.uk.
BIND security alert
2009-07-29 - News - Chris Thompson
If you are using BIND and are not already aware of it, please see the
security advisory at https://www.isc.org/node/474
This is high severity denial-of-service bug which is being exploited in the wild. Nameservers are vulnerable if
They have any zone of "type master", whose name is known to the attacker. Note that this includes zones such as "localhost" (but apparently not BIND's generated "automatic empty zones").
The attacker can get a DNS update request through to the server. For example, those with a port 53 block at the CUDN border router can be attacked (directly) only from within the CUDN. Access controls within BIND cannot protect against the vulnerability.
Those who use versions of BIND supplied with their operating system should look for advisories from their respective suppliers.
SOA serial numbers in UCS-maintained zones
2009-07-21 - News - Chris Thompson
This should only be of concern to those who look at SOA serial numbers for diagnostic information. Up to now we have used the
<4-digit-year><2-digit-month><2-digit-day><2-more-digits>
format for the zones the Computing Service maintains. We are about to switch to using "seconds since 1900-01-01" (not 1970-01-01, because we need the change to be an increase, in RFC 1982 terms). This is part of the preparations for using DNSSEC-signed zones, where some SOA serial increases are imposed by BIND as part of the re-signing operations.
All of our zones now contain an HINFO record at the apex which contains version information in the old format; e.g.
$ dig +short hinfo cam.ac.uk "SERIAL" "2009072120"
We expect these to remain a human-readable version indication, although not necessarily in exactly this format.
More about DNSSEC validation, and signing the cam.ac.uk zone
2009-07-01 - News - Chris Thompson
The web page describing how to set up DNSSEC validation on your own recursive nameservers, using the dlv.isc.org lookaside validation zone dlv.isc.org, has been updated and is now at
http://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-validation.html
We continue to make progress towards signing cam.ac.uk. The previous signed near-clone "cam.test" will be removed at the end of this week. Instead we have a new such zone "dnssec-test.csi.cam.ac.uk" which is properly delegated and registered at dlv.isc.org. Instructions on how to slave it or validate against it are at
http://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-testing.html
We have had almost no feedback so far. We would like to hear from anyone who has successfully slaved it, but even more from those who tried and failed. We believe that much old nameserver software will be unable to cope, and expect to have to provide "dumbed-down" unsigned versions of the signed zones for such clients. We need to estimate how large the demand will be for such a service.
Recursive nameservers using DNSSEC validation
2009-06-16 - News - Chris Thompson
We have been using DNSSEC validation on the main recursive nameservers at
131.111.8.42 or 2001:630:200:8080::d:0 131.111.12.20 or 2001:630:200:8120::d:1
since the morning of Tuesday 9 June, and no significant problems have arisen. We now expect this state to persist indefinitely.
Therefore, will all those who kindly assisted us by pointing their resolvers at the testing validating nameservers please switch back to using the regular ones. We shall be monitoring the use of the testing addresses and in due course contacting those who are still using them. Eventually they will be reused for other testing purposes.
More about DNSSEC validation, and signing the cam.ac.uk zone
2009-05-28 - News - Chris Thompson
Further to the request posted on 6 May to try using the testdns*.csi validating nameservers (and with thanks to the few who did so!) there have been some queries as to how you can configure DNSSEC validation in your own recursive nameservers. There are some notes on that here:
http://people.pwf.cam.ac.uk/cet1/dnssec-validation.html
As a separate but related exercise, we plan to sign our own zones,
starting with cam.ac.uk
, as soon as we can. To investigate the
problems involved, we have set up a signed almost-clone of cam.ac.uk
,
called cam.test
, and made it available in various ways within the CUDN.
Some of the things you could try doing with it are described here:
http://people.pwf.cam.ac.uk/cet1/signed-cam.html
[The fact that these web pages are in a personal space rather than in, say, http://jackdaw.cam.ac.uk/ipreg/ emphasizes their temporary and provisional nature. Please don't let that stop you reading them!]
Recursive nameservers using DNSSEC validation available for testing
2009-05-06 - News - Chris Thompson
We are hoping to turn on DNSSEC ("Secure DNS") validation in our main central recursive nameservers within the next few weeks. We have set up testing nameservers with essentially the expected configuration, and Computing Service staff have already been exercising them. You are now invited to do the same: details can be found at
http://people.pwf.cam.ac.uk/cet1/dnssec-testing.html
Use of DNAMEs for reverse lookup of CUDN addresses
2009-03-18 - News - Chris Thompson
We have started to use a scheme involving DNAMEs (domain aliases) for the reverse lookup up of some IP addresses within the CUDN. The primary motivation is to reduce the number of individual reverse zones. A description of the mechanism, written for an audience not restricted to the university, can be found in
http://people.pwf.cam.ac.uk/cet1/prune-reverse-zones
- Moved to https://www.dns.cam.ac.uk/domains/reverse/
At the moment we are using this method for these address ranges:
- 192.153.213.*
192.84.5.*
- these subnets are or will be used for CUDN infrastructure (although within the CUDN, the corresponding reverse zones are not listed in the sample.named.conf configuration)
128.232.[128-223].*
- some of this address space will be used for Eduroam
Some nameserver software (especially Windows DNS Server) may be unable to cope with zones containing DNAMEs: they will have to avoid stealth slaving (for example) 232.128.in-addr.arpa. We don't believe that any stub resolvers fail to cope with the "synthesised CNAMEs" generated from DNAMEs, although at least some versions of the glibc resolver log warning messages about the DNAME (but give the right answer anyway). If anyone experiences problems as a result of what we are doing, please let us know.
In the light of experience, we may later extend this scheme to other address ranges, e.g. 128.232.[224-255].* which is currently covered by 32 separate reverse zones. However, we will give plenty of warning before making such a change.
Balancing the use of CUDN central recursive nameservers
2009-03-18 - News - Chris Thompson
The Computing Service provides two general-purpose recursive nameservers for use within the CUDN, at IPv4 addresses 131.111.8.42 and 131.111.12.20 (or IPv6 addresses 2001:630:200:8080::d:0 and 2001:630:200:8120::d:1).
Historically, there were compelling reasons to favour 131.111.8.42 over 131.111.12.20, and therefore to list them in that order in resolver configurations. The machine servicing 131.111.12.20 was severely overloaded and often had much poorer response times.
For the last two years, this has not been the case. The two services run on machines with equal power and for nearly all locations within the CUDN there is no reason to prefer one over the other. Since last September, one of them has been in our main machine room on the New Museums Site, and one at Redstone, providing improved physical redundancy.
However. we observe that the load on 131.111.8.42 is still several times that on 131.111.12.20, presumably as a result of the historical situation. For a while now we have been randomising the order in which the two addresses appear in the "nameservers:" line generated when the "register" or "infofor*" functions are used on the ipreg/single_ops web page, but we suspect that COs rarely act on that when actually setting up resolver configurations.
We would like to encourage you to do a bit of randomising yourselves, or even to deliberately prefer 131.111.12.20 to redress the current imbalance. If you have resolvers which support it, and you are configuring only these two addresses as nameservers, then you could sensibly use "options rotate" to randomise the order they are tried within a single host. (Unfortunately, this doesn't work well if you have a preferred local resolver and want to use the two CS nameservers only as backups.)
Frequency of DNS updates
2009-03-17 - News - Chris Thompson
This message, like the earlier ones referred to, was sent to
ucam-itsupport
at lists because it is of concern to all IPreg database updaters, not just to stealth slave administrators. However, it has been plausibly suggested that they ought to have been sent to cs-nameservers-announce at lists as well, if only so that they appear in its archives. Therefore, this one is being so sent!
Subsequent to the changes of schedule to every 12 hours (September) and every 6 hours (November), we have now made a further increase in the number of (potential) updates to our DNS zones. Currently the regular update job runs at approximately
01:00, 09:00, 13:00, 17:00 and 21:00
each day (the exact times are subject to variation and should not be relied upon). We are reserving the 05:00 slot, at which actual changes would be very rare, for other maintenance activity.
The "refresh" parameter for these zones has also been reduced from 6 hours to 4 hours: this is the amount by which stealth slaves may be out of date (in the absence of network problems). The TTL values for individual records remains 24 hours: this is how long they can remain in caches across the Internet.
Representing network access controls in the database
2008-12-15 - News - Chris Thompson
(Updated and partly obsoleted on 2014-05-20)
(Updated 2009-01-13)
Various exceptions to the general network access controls are applied at CUDN routers for some individual IP addresses. Some of these are at the border routers between the CUDN and JANET, and others at the individual CUDN routers interfacing to institutional networks.
We have implemented a scheme which we hope will enable us to keep
better control over these exceptions. When an exception is created
for a registered IP address, that address is added to one of the
following anames
janet-acl.net.private.cam.ac.uk
for exceptions at the border routers, usually permitting some network traffic that would otherwise be blocked,cudn-acl.net.private.cam.ac.uk
for exceptions at the local CUDN routers, usually allowing some use of high-numbered ports for those vlans for which such a restriction is imposed.block-list.net.private.cam.ac.uk
for addresses for which all IP traffic is completely blocked, usually as the result of a security incident.
As long as the attachment to the aname
remains, it prevents the main
registration from being rescinded. The intent is that this will result
in the institutional COs requesting removal of the exception at that point.
If the IP address is not registered, then it is first registered as
reserved.net.cam.ac.uk
or reserved.net.private.cam.ac.uk
as
appropriate, and then processed as above. This prevents it being
reused while the exception still exist. (Some of these cases are due
to the fact that we did not have the scheme in the past, and there are
several now-unregistered IP addresses whose exceptions were never
removed.)
Note that this apparatus only deals with exceptions for individual IP addresses, not those for whole subnets.
Requests for the creation or removal of network access control exceptions should be sent to cert@cam.ac.uk.
cs-nameservers-announce copies on ucam.comp.tcp-ip newsgroup
2008-09-30 - News - Chris Thompson
It has been the practice to post copies of messages posted to the
cs-nameservers-announce
mailing list to the local newsgroup
ucam.comp.tcp-ip
. This is promised both in the descriptive text
for the mailing list, and in the initial comments in
The local newsgroups ucam.* are expected to be phased out before long, so I propose that we discontinue this practice. If anyone feels differently, please let us know.
The archives of the mailing list are accessible to non-members, at
and there is no intention to change that.
pmms.cam.ac.uk zone should no longer be slaved
2008-08-03 - News - Chris Thompson
The zone pmms.cam.ac.uk
has been removed from the list of zones that
may be slaved given in
http://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf
Historically, this zone was a clone of "dpmms.cam.ac.uk", but it is now
essentially empty and will soon be removed entirely. If your nameserver
currently slaves pmms.cam.ac.uk
, you should remove it from its
configuration file as soon as is convenient.
Independently, some comments have been added to the sample configuration file about IPv6 addresses that can be used as alternative to the IPv4 ones for fetching zones or forwarding requests, for those whose nameservers themselves have IPv6 connectivity.
Multiple DNS implementations vulnerable to cache poisoning
2008-07-09 - News - Chris Thompson
There has been a lot of recent publicity, some of it rather garbled, on this subject. Please refer to
for an authoritative account. The remainder of this note refers specifically to what to do if you are running a recursive nameserver using BIND. (Authoritative-only servers have [almost] no cache and are not affected.)
For full details, see http://www.isc.org/ , especially the links under "Hot Topics" - "Upgrade Now!". In summary, ISC have released the following new versions:
if you are using upgrade to or if you are prepared use a "beta" version BIND 9.5.x 9.5,0-P1 9.5.1b1 BIND 9.4.x 9.4.2-P1 9.4.3b2 BIND 9.3.x 9.3.5-P1 BIND 9.2.x (or earlier) - no fix available - time to move!
Note that the earlier round of changes in July last year (versions 9.2.8-P1, 9.3.4-P1, 9.4.1-P1, 9.5.0a6), that improve defences against cache poisoning by randomising query ids, are no longer considered adequate. The new fixes rework the randomisation of query ids and also randomise the UDP port numbers used to make queries. Note that if you specify a specific port in the "query-source" setting, e.g. to work your way through a recalcitrant firewall, you will lose much of the advantage of the new fixes.
If you are not in a position to upgrade, you can forward all requests to other recursive nameservers that you trust. The recursive nameservers provided by the Computing Service, at IP addresses 131.111.8.42 and 131.111.12.20, are now running BIND 9.4.2-P1 and can be used in this way by hosts on the CUDN.
If you need advice about this, please contact hostmaster@ucs.cam.ac.uk.
General relaxation of the rules on use of sub-domains
2008-05-13 - News - Chris Thompson
The restructuring of the database to allow free use of sub-domains, mooted in a previous article, has now been implemented.
As before, all names in the database have an associated domain whose
value must be in a predefined table and is used to control user access.
However this can now be any suffix part of the name following a dot
(or it can be the whole name). If a CO has access to the domain
dept.cam.ac.uk
, then they can register names such as
foobar.dept.cam.ac.uk
(as previously) or foo.bar.dept.cam.ac.uk
,
or even dept.cam.ac.uk
alone (although this last may be inadvisable).
Such names can be used for "boxes" as registered and rescinded via the
single_ops
page, and also (to the rather limited extent that COs
have delegated control over them) for vboxes
and anames
.
There are cases when one already registered domain name is a
suffix of another, e.g. sub.dept.cam.ac.uk
and dept.cam.ac.uk
.
Often these are in the same management zone and the longer name
is present only to satisfy the previously enforced constraints.
In these cases we shall phase out the now unnecessary domain.
However, in a few cases they are in different management zones,
with different sets of COs having access to them. It is possible
for a CO with access only to dept.cam.ac.uk
to register a name
such as foobar.sub.dept.cam.ac.uk
, but its domain part will
be taken as dept.cam.ac.uk
and not sub.dept.cam.ac.uk
. This
is likely to cause confusion, and we will be relying on the good
sense of COs to avoid such situations.
For CNAMEs, the mechanism using strip_components
described in the
previous article still exists at the
moment, but it will be soon be replaced by a cname_ops
web page in
which the domain part is deduced automatically, as for the other
database object types mentioned above, rather than having to be
specified explicitly. (Now implemented, 2008-06-05.)
We advise that COs should not use sub-domains too profligately, and plan their naming schemes carefully. Any questions about the new facilities should be emailed to us.
IPv6 addresses for the root nameservers
2008-02-05 - News - Chris Thompson
Six of the thirteen root nameservers are now advertising IPv6 addresses. There is some background information about this change at
There is also a new root hints file with these addresses added, and the copy at
http://jackdaw.cam.ac.uk/ipreg/nsconfig/db.cache
has been updated.
Of course, the IPv6 addresses are not useful if your nameserver does not (yet) have IPv6 connectivity, but they should do no harm, and on general principles it's inadvisable to let one's root hints file get too out of date.
More changes to the sample nameserver configuration
2008-01-30 - News - Chris Thompson
A number of changes have been made to the sample configuration for "stealth" nameservers at
http://jackdaw.cam.ac.uk/ipreg/nsconfig/
None of these require urgent action.
First, the set of locally defined empty reverse zones, intended to stop queries for the corresponding IP addresses being sent to the Internet's root nameservers, has been brought into line with those created automatically by BIND 9.4 and later. Some of the IP address ranges covered are larger than before, while some are smaller. If you are actually running BIND 9.4 or later, then you can omit most of these zone definitions, but note that "0.in-addr.arpa" should not yet be omitted (as of BIND 9.4.2), and nor should those for the RFC1918 institution-wide private addresses.
There are new versions of the zone files db.null, db.localhost, and db.localhost-rev. The first has been made identical to that which BIND 9.4 generates internally, except that the SOA.mname value is "localhost" rather than a copy of the zone name (this avoids a warning message from BIND when it is loaded). The other two, intended to provide forward and reverse lookup for the name "localhost", have been modified in a similar way. These files no longer have "sample" in their name, because they no longer require any local modification before being used by BIND.
Some changes to sample.named.conf have been made in support of IPv6. The CUDN IPv6 range 2001:630:200::/48 has been added to the "camnets" ACL definition: this becomes relevant if you are running a nameserver providing service over IPv6. The corresponding reverse zone "0.0.2.0.0.3.6.0.1.0.0.2.ip6.arpa" has been added to the list that can be slaved from 131.111.8.37 and 131.111.12.37: it may be desirable to do that if your nameserver is providing a lookup service to clients on IPv6-enabled networks, whether it uses IPv6 itself or not.
In addition, a number of comments have been corrected or clarified. Note in particular that BIND does not require a "controls" statement in the configuration file to make run-time control via the "rndc" command work. See the comments for more details. It should only rarely be necessary to actually restart a BIND daemon due to a change in its configuration.
Change of address of a root nameserver
2007-11-03 - News - Chris Thompson
The IP address of one of the root nameservers, l.root-servers.net
,
has changed from 198.32.64.12 to 199.7.83.42. (Such changes are
rare: the last one was in January 2004.)
If you are running a nameserver with a root hints zone file, that should be updated. There are a number of ways of generating a new version, but the official with-comments one is at
ftp://ftp.internic.net/domain/named.root
and there is a copy of that locally at
http://jackdaw.cam.ac.uk/ipreg/nsconfig/db.cache
Modern versions of BIND 9 have a compiled-in version of the root hints zone to use if none is defined in the configuration file. As a result of this change, the compiled-in version will be out of date for existing BIND versions: a corrected version has been promised for the next versions of BIND 9.3.x, 9.4.x and 9.5.x.
Using a slightly out-of-date root hints zone is unlikely to cause serious problems, but it is something that should not be allowed to persist indefinitely.
Relaxation of the name-to-domain rules for CNAMEs
2007-08-23 - News - Chris Thompson
All names in the database have an associated domain whose value must
be in a predefined table and is used to control user access. Until now,
the domain has always been formed by stripping exactly one leading component
from the name. This leads, for example, to the often unwelcome advice
to use www-foo.dept.cam.ac.uk
rather than www.foo.dept.cam.ac.uk
.
We have tentative plans to restructure the database to liberalise this constraint everywhere, but this is a major undertaking and will not happen soon. However, we have been able to provide partial relief in the special case of CNAMEs.
In the table_ops
page under object type cname
there is now
a field strip_components
. This can be set to a number which
controls how many leading components are stripped from the
name
value to convert it to a domain. (Note that it has no
affect on the treatment of target_name
.) For example, setting
it to 2 for www.foo.dept.cam.ac.uk
associates it with the domain
dept.cam.ac.uk
rather than the (probably non-existent) domain
foo.dept.cam.ac.uk
. Leaving the field null is equivalent to setting
it to 1. (0 is an allowed value, but note that creating a CNAME
dept.cam.ac.uk
is disallowed if there is a mail domain with that
name.)
Changes to the sample nameserver configuration
2007-08-20 - News - Chris Thompson
Three changes have been made to the sample configuration for "stealth" slave nameservers on the CUDN.
First, the configuration files have been moved from
ftp://ftp.cus.cam.ac.uk/pub/IP/Cambridge
to
http://jackdaw.cam.ac.uk/ipreg/nsconfig
and internal references have been adjusted to match. The old location will contain copies of the updated files only for a very limited overlap period.
Second, the sample.named.conf file now recommends use of
notify no;
in the "options" statement. BIND is by default profligate with its use of notify messages, and a purely stealth nameserver can and should dispense with them. See the comments in the file for what to do if you also master or officially slave other DNS zones.
Third, comments in the file previously suggested that one could use a "type forward" zone for private.cam.ac.uk. Although this does work for the corresponding private reverse zones, it does not for the forward zone if cam.ac.uk itself is being slaved. In that case, if you don't want to slave the whole of private.cam.ac.uk, then you should use a "type stub" zone instead. See the new comments for details.
Recent problems with CUDN central nameservers
2007-07-16 - News - Chris Thompson
In the normal state, one machine hosts 131.111.8.37 [authdns0.csx] and 131.111.8.42 [recdns0.csx] while another hosts 131.111.12.37 [authdns1.csx] and 131.111.12.20 [recdns1.csx]. (On each machine the different nameservers run in separate Solaris 10 zones.) On the evening of Friday 13 July, work being done on the second machine (in preparation for keeping the machines running during the electrical testing work on Sunday) caused it to lose power unexpectedly, and recovery took us some time, so that the authdns1 and recdns1 services were unavailable from about 17:24 to 19:20.
Unfortunately, our recovery procedure was flawed, and introduced creeping corruption into the filing system. The relevant machine became unusable at about 14:45 today (Monday 16 July). In order to get the most important services functional again,
the recursive nameserver at 131.111.12.20 [recdns1.csx] was moved to a new Solaris 10 zone on the machine already hosting authdns0 & recdns0: this was functional from about 15:45 (although there were some short interruptions later);
the non-recursive authoritative nameserver at 131.111.12.37 [authdns1.csx] had its address added to those being serviced by the authdns0 nameserver at about 20:10 this evening.
Of course, we hope to get the failed machine operational again as soon as possible, and authdns1 & recdns1 will then be moved back to it.
Please send any queries about these incidents or their consequences to hostmaster@ucs.cam.ac.uk.
Splitting of authoritative from recursive nameservers
2007-04-23 - News - Chris Thompson
A few weeks ago we told you
We currently plan to lock down the recursive nameservers at 131.111.8.42 and 131.111.12.20, so that they do not respond to queries from outside the CUDN and also do not allow zone transfers, during the first week of the Easter term (23-27 April). We will update you on this closer to the time.
We now intend to make these changes, at least insofar as zone transfers are concerned, early on Thursday 26 April.
We would like to thank all those who made changes to nameservers in their jurisdiction to fetch DNS zones from 131.111.8.37 / 131.111.12.37 instead. Logging has shown that the number of hosts still fetching from 131.111.8.42 / 131.111.12.20 is now quite small. Some final reminders will be sent to those who still have not made the change.
Splitting of authoritative from recursive nameservers
2007-03-21 - News - Chris Thompson
Some minor changes have been made to the sample configuration for "stealth" slave nameservers on the CUDN at
ftp://ftp.cus.cam.ac.uk/pub/IP/Cambridge/sample.named.conf
Firstly, one of the MRC-CBU subnets was incorrectly omitted from the "camnets" ACL, and has been added.
Secondly, questions were asked about the setting of "forwarders" in the "options" statement, and so I have added some comments about that. We used to recommend its use, but have not done so for some time now, except in situations where the nameserver doing the forwarding does not have full access to the Internet. However, if query forwarding is used, it should always be to recursive nameservers, hence to 131.111.8.42 and 131.111.12.20 rather than to the authoritative but non-recursive nameservers at 131.111.8.37 and 131.111.12.37.
We are now logging all outgoing zone transfers from 131.111.8.42 and 131.111.12.20, and will be contacting users who have not made the change to fetch from 131.111.8.37 and 131.111.12.37 instead, as time and effort permit. Help us by making the change before we get around to you!
We currently plan to lock down the recursive nameservers at 131.111.8.42 and 131.111.12.20, so that they do not respond to queries from outside the CUDN and also do not allow zone transfers, during the first week of the Easter term (23-27 April). We will update you on this closer to the time.
Splitting of authoritative from recursive nameservers
2007-03-08 - News - Chris Thompson
There is a new version of the sample configuration for "unofficial" slave nameservers on the CUDN at
ftp://ftp.cus.cam.ac.uk/pub/IP/Cambridge/sample.named.conf
This is a major revision, which includes new reverse zones, advice on access control settings, and several other changes. However the most important, and one which anyone managing such a slave nameserver should act on as soon as possible, is that the zones which were previously being fetched from
masters { 131.111.8.42; 131.111.12.20; };
should now be fetched from
masters { 131.111.8.37; 131.111.12.37; };
instead. The background to this is described below.
We are in the process of separating the authoritative nameservers for the Cambridge University DNS zones from those providing a recursive DNS lookup service for clients on the CUDN. To minimise the pain, it is the latter which have to retain the existing IP addresses. When the transformation is complete we will have
authdns0.csx.cam.ac.uk [131.111.8.37] authdns1.csx.cam.ac.uk [131.111.12.37]
providing non-recursive authoritative access to our zones (and zone transfer for appropriate zones to clients on the CUDN) while
recdns0.csx.cam.ac.uk [131.111.8.42] recdns1.csx.cam.ac.uk [131.111.12.20]
will provide a recursive lookup service to CUDN clients (but not zone transfers), and no service at all outside the CUDN.
Announcement list converted to Mailman
2006-10-04 - News - Chris Thompson
The mailing list cs-nameservers-announce@lists.cam.ac.uk
has been
converted from an old-style list to a Mailman list. (See
https://www.lists.cam.ac.uk for background information.)
The list options attempt to match the previous state of affairs. The list is moderated, and subscription requires administrator action (but you can now request it via the web pages as well as by message). On the other hand, unsubscription by end-user is enabled.
Digests are not available. Archives will be kept and can be read even by non-members.
Non-interactive use of the IP registration database
2006-05-08 - News - Chris Thompson
There are situations in which there is a requirement for non-interactive access to the IP registration database. A new method of using the web interface has been provided, in which cookies with a long life can be downloaded and used to authenticate subsequent non-interactive https access, for example by using programs such as curl.
See the download-cookie page on Jackdaw
for a more complete description of the scheme. At the moment only the
list_ops
page can be used with downloaded cookies for the ipreg
realm, and it requires a certain amount of reverse engineering to be
used with a non-interactive tool. Pages more suitable for this sort of
use may be provided later in the light of experience. The current
state is quite experimental and we would ask anyone planning to use it
in production to let us know.
Some departments and colleges are using firewall software written by Ben McKeegan at Netservers Ltd., which interacts with the IP registration database using the old method of authentication via an Oracle account password. A version of this software that uses downloaded cookies as described above is under development and we hope it will be available soon.
For several reasons we want to restrict the number of people who have SQL-level access to the underlying Oracle database, and there has been a recent purge of unused Oracle accounts. If you have good reason to need such access to the IP registration part of the database, please let us know.
More delegated control of CNAMEs
2005-12-19 - News - Chris Thompson
Up until now ordinary users of the IP registration database have only been allowed to modify certain fields (targetname, purpose, remarks) in existing CNAMEs, and in particular not to create or delete them. These restrictions have now been removed. CNAMEs for which both the name and the targetname are in domains which the user has access to can be freely created, updated and deleted, subject to the existing consistency constraints: for example, that the target_name actually refers to an existing database record.
Such operations can be done using the table_ops
page after selecting
object type cname
, in ways that will be familiar to those who have
performed modifications to existing CNAMEs in the past. We recognise
that this interface is somewhat clunky, and a tailored cname_ops
web
page may be made available in the future.
There is a maximum number of CNAMEs associated with each management zone in the database, which can be altered only by us. hese limits have been set high enough that we do not expect sensible use of CNAMEs to reach them very often. Users will be expected to review their existing CNAMEs before asking for an increase.