White dot for spacing only
The Dice Project

(Inf logo) Operational Meeting Infrastructure Unit Report -- 14th November 2016

  1. AT basement power

    We were too quick in assuming that the AT basement power issues were fixed. Paul Hutton writes: "Apologies for the mass mailing. During the recent work on the backup electrical power system at the Appleton Tower datacentre, a further problem was identified with the electrical switching components. ..."

    There'll be a total shutdown of the AT server room to get this fixed. It now looks as though this will be some time in the New Year.

  2. IPMI

    Ian and Chris have been looking at the Dell iDRAC 8. The results are here.

  3. IPv6

    REMINDER: if you install an SL7 machine on any of the "server" wires (S32 and S33 in the Forum, AT1 in Appleton Tower, S at JCMB) then you will also get an IPv6 address and IPv6 global routes. If you add that address to the DNS you will then get firewall holes for it. However, in contrast to the client wires, forward SLAAC-style DNS entries are not being automatically generated for machines on these wires. This has implications in both directions:

    Perhaps the most important thing to beware of is assuming that any IPv4 access controls you have in place also apply to IPv6. They might. Then again, they might be totally independent and default to "open".

  4. Core switch firmware

    The new version of the problematic 54xx firmware which appeared on HPE's site apparently should fix our issues, though the release note entry is cryptic at best. It was installed on core1 on Wednesday 7th, and all seems to have been running properly since. TCP PSH preservation has also been turned off on all of the other core switches, as well as on the Forum server room edge switches. (It will default to off in the next new firmware versions we install on them.)

    Even so, we'll hold off doing any other core switch reboots until after the holidays, though the firmware has been uploaded to them all ready to go just in case.

    From the release-note entry: "0000217339 - TCP - The HPE Provision switches prioritize received TCP packets with the PSH flag set by moving the packets to the head of the inbound port's processing queue. But due to increased levels of such packets in today's networks, the prioritized processing could potentially lead to head-of-line (HoL) blocking and subsequent dropping of inbound data packets. ..." That appears to include BPDUs, and once the spanning tree gets disrupted packets will start to be flooded, and the whole thing will just collapse from there. What they did in the K.16.02 firmware to break it isn't clear. Or perhaps we were teetering on the edge and something just tipped us over.

  5. KB power

    There was a power cut at KB on Wednesday 16th November. CHP restart apparently happened on 7th December.

  6. JCMB CSR PoP switches

    The new CSR PoP switches have been brought into service. The necessary components for our own connection are on order, and once they arrrive we'll arrange to move our own links across: 1x 10GbaseSR bridged and 2x (or 3x) 1000baseT routed.

  7. OpenVPN

    Version 2.4 has moved from beta to rc1. There are sufficiently many useful new things in this version that we're testing it on the DEV endpoint (and perhaps the DR endpoint soon). In particular, IPv6 support is now pretty much up to the same level as IPv4, and has worked as advertised in testing.

    Unfortunately, setting up the daemon to listen on more than one address doesn't work the hoped-for way. Essentially it can be made to listen either on exactly one IPv4 or one IPv6 address or on a wildcard address. That latter would allow us to listen on both IPv4 and IPv6, but requires kernel support which first appeared in 3.15, so we'll just have to wait and see whether RH have backported this when we finally come to upgrading the endpoints to SL7. (We're currently waiting for systemd support to stabilise.)

  8. loghost

    Logging to the old loghost tycho was dropped from last week's <stable> release, prompting an immediate drop in traffic. We'll leave it to run for a few weeks to catch any residual stray machines, and then turn it off.

 : Operational : Meetings 

Mini Informatics Logo - Link to Main Informatics Page
Please contact us with any comments or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh
Spacing Line