The Dice Project

Operational Meeting Infrastructure Unit Report -- 24th October 2012

  1. Forum server room air-conditioning
  2. DHCP snooping and ARP protection

    It turns out that the switches which implement this do actually also implement the necessary SNMP OIDs, so rather than have this all done in an ad hoc way we're adding code to the configuration tools to do the needful. It's only part-way there at the moment, so there may be the odd additional message in the nightly reports meantime.

  3. Firmware upgrades

    We have just started rebooting switches to bring in the latest firmware. We're doing a few first thing each morning.

  4. conserver support for KVM guest consoles

    The necessary changes to allow conserver to support KVM guest consoles are in this week's stable release. For general purpose use, all that's necessary is to allocate the name of the guest to the next available KVM 'slot' in the file live/console_server.h. For example:

    conserver.consolename_srkvm00s01 myguest

    where myguest is the name of a guest being hosted on an MPU-maintained KVM server based in the Forum.

  5. Nagios

    In our report to the meeting of 26th September 2012, we alluded to the possibility of problems affecting the version of Nagios which we are now running (which is the latest version of nagios available, namely 3.4.1.)

    Among other things, it turns out that there is a bug in this version of Nagios which means that declarations of downtime can be lost over a restart of the Nagios server: see http://tracker.nagios.org/view.php?id=338.

    We are reluctant to start patching and building a local version of Nagios to get over things like this - though we will do so (or, alternatively, will downgrade) if necessary.

    Meanwhile, the suggested workaround - or rather, the suggested working practice - is that machines should not be put in 'downtime' states for days/weeks/months/... on end. If a machine is truly parked, please entirely remove it from the Nagios monitoring system instead. The easy way to do so is to put

    #define DICE_NO_NAGIOS
    at the top of its profile.

    Thanks to Stephen for pointing out this problem.

