White dot for spacing only
The Dice Project

(Inf logo) Operational Meeting Infrastructure Unit Report -- 8th April 2015

  1. AT comms UPS #1

    The AT comms UPS #1 turned itself off at self-test time on Tuesday 17th March. This is very reminiscent of the behaviour of one of the Forum comms UPSes in December 2012, which resulted in us having to replace the UPS. However, it didn't turn itself off at its subsequent self-test, on March 31st. The next one is due on Tuesday 14th ...

  2. (Following on from the above) Problems with power supply reporting via ipmi-sensors

    As mentioned in our report for the previous meeting, our machine gatti (a Dell PE R320) started correctly reporting a failed PSU via nagios after its UPS supply had temporarily switched off - but it then incorrectly kept reporting the same fault after the UPS power supply had returned.

    We've now seen the same behaviour on another machine, namely the User Support machine hyde, a Dell PE R420.

    Our conclusion is that the current power supply check being done by the hwmon script via ipmi-sensors is, unfortunately, not reliable. We suspect that this problem might be fixed by firmware upgrades to the BMC - but we haven't tried that yet. It might be the case that the use of Dell-specific utilities (rather than standard IPMI tools) to interrogate the BMC might get better results. Or, maybe something else is wrong, and/or misconfigured somewhere!

    Summary: this isn't just an Inf Unit problem; it potentially affects all of the School's servers.

  3. OpenVPN

    The "old-style" OpenVPN endpoints were turned off on Tuesday 31st at around 07:30 (RT#71252). There had previously been two blog articles, two sys-announce messages and one round of personal reminder emails. There had still been 19 users (including a couple of COs!) from the weekend to the turn-off time. So far there don't appear to have been any RT tickets as a result.

    The /23 subnet released has been recycled for ATLABS use.

  4. Racks in the Forum server rooms
  5. Switches

    The two AT-basement server rack switches were replaced on Thursday and Friday 26th and 27th March. All seemed to go smoothly, with no reports of vital things going off the air due to bonding issues. Thanks to all involved!

  6. Prometheus lifecycle

    Automatic processing of prometheus lifecycle (this means sending of email to accounts entering grace and disabling accounts which have exceeded grace) will go live once some issues with database accounts flip-flopping have been addressed.

    Note that the Prometheus documentation has been given a revamp/update - see https://wiki.inf.ed.ac.uk/DICE/PrometheusOverview.

  7. ESISS scanning source addresses

    In February 2014, we were advised that ESISS scans would come from addresses in the two ranges & - but that advice also mentioned that the range of source addresses 'might expand in the future.'

    Last week, we noticed what appeared to be an ESISS scan coming from the address

    We are trying to clarify this, but so far have no definite information.

 : Operational : Meetings 

Mini Informatics Logo - Link to Main Informatics Page
Please contact us with any comments or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh
Spacing Line