White dot for spacing only
The Dice Project

(Inf logo) Operational Meeting Infrastructure Unit Report -- 14th October 2015

  1. OpenVPN (was: Certificates)

    Please pick up the new OpenVPN configurations, if you haven't already. Last week there were 68 users of the new configurations, and 31 users of the old configurations (still including a couple of C(S)Os). We even have a kx509-on-Windows10-and-cygwin user (RT#74051) now.

    A reminder was sent to sys-announce on Monday 28th September, with the deadline for turning off the old configuration being 2nd November.

  2. Server rooms
  3. makeDNS and serial numbers

    As trailed, we have now migrated our "inf" zone serial numbers from YYYYdddmmm format to using the epoch number as-is. ("inf" in this context includes dcs, dai, cogsci and aiai -- basically, those zones which are generated from hostfile-format files, as opposed to zonefile-format.)

    We're continuing to fix up a very few odd machines which have somehow still managed to miss one of the the steps of this process, as flagged up by the nightly reports. Wake the machine, then as root:

       om dns stop ; rm -f /var/named/ZoneFiles/zone.* ; om dns start

    The next stage is to move on to the non-hostfile-format zones, where the same process will be adopted.

  4. ipaddrs_* and wire_* headers

    The ipaddrs_* headers are generally in good condition. Some tidying of the wire_* headers had previously been done. The following are still pending:

    (Aside: we can not do arp-protection on any of the DICE subnets because the machines on them cache their IP addresses and so don't do DHCP. There might be a case for reviewing this for client wires, particularly for lab machines. There are too many servers with multiple addresses on their subnets for this to be possible for those wires.)

  5. Little-used subnets

    Following on (sort of) from the above, we have two /24 subnets with very few hosts on them. "aliens" (152) has only one non-infrastructure machine marbas on it, which is about four years out of warranty and which has some odd cups configuration.

    The other is "y" (153) which was once in Forrest Hill and was retained "temporarily" for Doug Armstrong's HRB outpost. It has had four addresses used in the last few weeks: one appears to be an HP printer, according to nmap, one is apparently a GX270, and the other two have mysql ports open on them.

    Ideally we would tidy these up. Unfortunately it's likely to take a bit of liaison. However, given that IPv4 space is now at a premium we really can't justify dedicating two /24s to so few machines.

  6. lcfg-dns

    The suggestion has been made that we might be able to "fix" the SL7 DNS sometimes-problems by changing the way we do things so that the named daemon gets started by "something else" (most likely systemd), while the component concerns itself with configuration things. That looks to be possible, though the changes required would have been rather too intrusive to do just before going on holiday.

    It's a requirement, however, that any changes do not compromise the current ability to add lcfg-dns to an existing system without having to do a complete reinstall, as is the requirement not to break maintenance procedures (as above).

    Here's what would appear to be required:

    1. The Install() method would need to create some directories, rather than assuming that the Configure() method will do so. However, the Configure() method must still ensure that directories are created if it finds they are not already there.

      (Issue: rpm-proofing, particularly against pre- and post-install scripts.)

    2. Move the rndc key file to a non-default location, generate an rndc.conf file in a non-default location if required, and amend the component to pass in that new configuration file's location to rndc itself. named.conf will of course require similar amendment, possibly extracting the key material from the rndc.conf file. This should avoid any issues caused by the bind rpms creating one and potentially splatting ours, as has been seen quite often in the past. The only question is whether anything (systemd??) expects to be able to use rndc with the default locations.

      (Issue: The component needs to be able to command the daemon when its configuration changes. This is authenticated using a shared key. If something, such as an rpm script, changes one end under our feet then this breaks.)

    3. rndc things would certainly require to be done in the Install() method as well as in their current location.
    4. Preloading of zones would need to be done in the Install() method as well as the Configure() method.

      (Issue: if the daemon is configured to be a secondary for a zone and the content of that zone is not yet loaded when a query comes in then the daemon will return NXDOMAIN or even SERVFAIL to the caller. This is deemed to be a complete answer by the querier, which does not fail over to any alternative configured server. This failure window is not just of theoretical concern -- it's what prompted the addition of the zone-preload logic to the component. Having the daemon be started by something other than the component itself just moves this window around a bit.)

    5. We would need to build a complete named.conf in the Install() method so that it is ready for first (systemd) use, as well as in the Configure() method for configuration changes.

      (Issue: we can't just reconfigure later and rely on the standard caching behaviour before that happens. At the very least the daemon's rndc key has to be set here.)

    6. At present the component waits for named to start before itself finishing, and in the SL7 world thereby indicating that a significant target has been reached. Some other way may need to be found to mark this event.
    7. Some IsStarted()-based logic may need revised for the case where something else starts the daemon, particularly around the detection and implementation of daemon restarts.

    (Aside: we need to make changes to the gai.conf resources and perhaps also the associated component code for IPv6 (see below).)

  7. IPv6

    DNS forward and reverse zones are now being generated from dns/inf6. Forward entries are being merged with the IPv4 RR sets, and so are now globally visible. Reverse zones haven't been delegated from above yet. We may have to make changes to our gai.conf settings to make proper use of these; these were designed for an IPv4-only world.

    The configuration tools have been told how to set up RA and RA guard on IPv6-capable switches, including some awkward-corner workarounds. You may see some additional configuration being pushed to them the first time one is reconfigured. An interesting buffer overrun bug in the Tnm package was diagnosed and fixed along the way.

    The issues page is being updated as things progress, and the final report is being written as stages are completed.

 : Operational : Meetings 

Mini Informatics Logo - Link to Main Informatics Page
Please contact us with any comments or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh
Spacing Line