White dot for spacing only
The Dice Project


(Inf logo) Operational Meeting Infrastructure Unit Report -- 25th April 2018

  1. Stop press: another proposed cooling shutdown

    E&B write: "I wanted to make you aware that there will be one further shutdown of the mains cooling from 7am on Thursday morning. As before, this is required to allow the new cooling pipework to be connected into the existing network.

    "Arrangements are in place to run the standby chiller for the Appleton Tower datacentre to cover this period and therefore this email is for information only.

    "The mains cooling should be reinstated later on in the day on Thursday."

    Dave H is querying the details, particularly as the Forum would be affected. More as we know it...

  2. Forum power-down

    There was a Forum power-down on Saturday April 14th. We shut down some systems (particularly those with master data) in advance, but left most of the network infrastructure to look after itself. This mostly happened according to plan, though we did find afterwards that linnaeus's nut settings were wrong (now fixed).

    Two switches failed to come back after the power-down: sr08 in the main server room, and sr22 in the SMSR. The hot-spare was swapped in for the latter, on the basis that the former has a bonding partner (sr09). There then followed a lengthy discussion with HPE over whether they would actually replace two failed switches under lifetime warranty, which they eventually did "this time". It's now being followed up with our supplier.

    Based on the UPS emails, and the nut and snmptrap logs, here's a rough timeline (all in, or converted to, BST):

    We speculated last time as to whether "the other" 3kVA UPS would recalibrate as a result of the power-down to match the one we tested. It didn't. It's presumably not due to battery age, as it actually has a more recently replaced battery. The age of the units is very similar (mid-2005), so it's probably not that either. Recharge times were all rather faster than the test too. A mystery...

  3. Power-bar things

  4. AT black start tests 19 April 2018 (i.e. last Thursday)

    Test started on schedule at 08:05. "The UPS and generator behaved as expected..."

    (11:12) "The 'Black Start' power testing at the Appleton Tower datacentre is complete. No problems were noted during the testing. Thanks to everyone for their help and cooperation."

  5. Phones and Class of Service

    It appears we didn't ever set class of service (CoS) for the phones VLAN in Appleton Tower. We have CoS=6 in the Forum, which prioritises voice traffic over (most) data traffic. We propose setting CoS=6 in AT too.

    Note that this actually affects the Forum phones too, in principle, as traffic for them is bridged from EdLAN-AT through via our AT core. Whether EdLAN has any CoS setting is another question; but at least if we have it set then within our own network there shouldn't be any level-related issues caused by our own traffic.

    (CoS is carried in the VLAN tag field of the ethernet packet. It's therefore not possible for untagged ports to attempt to increase traffic priority. There might, however, be a case for setting CoS for private VLANs, as some of those are passed tagged to the end ports.)

  6. New servers

  7. iDRAC/BMC/IPMI problems

    The installation of charmoz (a Dell R330) revealed what seems to a problem with the current iDRAC firmware - or, at least, a problem which manifests itself in the interface between that firmware and the ipmitool utility.

    Namely: an attempt to use /usr/sbin/conserver-ipmisetpass to set the root password of the iDRAC to our canonical password fails - and, what's worse, it fails silently.

    Watching the exchange on the wire, what's happening is that the handshaking/version-negotiation between ipmitool and the iDRAC results in a 16-character-maximum password (i.e. an IPMI v1.5 password) being agreed on, after which ipmitool therefore sends only the first 16 characters of our 20-character password.

    Note that this problem occurs whether ipmitool is communicating with the iDRAC over the network, or locally (using the IPMI open channel.)

    The workaround - for now - is to explicitly set the iDRAC's root user's password using the BIOS - and then simply not to use our /usr/sbin/conserver-ipmisetpass utility.


 : Operational : Meetings 

Mini Informatics Logo - Link to Main Informatics Page
Please contact us with any comments or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh
Spacing Line