White dot for spacing only
The Dice Project

(Inf logo) Operational Meeting: Infrastructure Unit Report
17th June 2020

  1. JCMB CSR generator test

    The generator test that was scheduled for yesterday was postponed, due to Boards of Examiners being held by one of the other Schools. The new date is expected to be 2020-06-23 (Tuesday next week) at 09:00.

    Reminder: when the power is switched over and the generator starts then, as usual

    1. the odd-numbered power bars are connected to the rack UPSes, which should smooth over the inevitable glitches; but
    2. the even-numbered bars are powered directly from the incoming supply, and so will turn off when the power glitches.

    Note: the even-numbered bars have long on-delay settings, as discussed previously and before that and before that, so won't turn their outlets back on for quite a while after the power is restored (see here for details; times are in seconds).

    Note also: the GPU-rack bars have NO UPS protection. They will all turn off when the power glitches. They also have long turn-on delays, which might or might not cover the entire test.

    Suitable precautions will need to be taken.

  2. Power glitches

    Most, though not all, of our edge switches rebooted as a result of last Friday's power faults. We've actually been seeing odd grumbles from the KB UPSes for a while, which might be part of the same problem. Some models of switch seemed to be more resilient to power wobbles than others, but there doesn't appear to be any obvious reason in most cases why the same model of switch sometimes rebooted and sometimes stayed up.

    harrison, the GPS clock, lost its time synchronisation and had to be power-cycled to recover it. We'll look at local UPS protection for it when we get a chance.

  3. remote.inf and lab.inf

    To remove hosts from remote.inf you need to do three things:

    1. rfe dns/remoteHostList
    2. As root on oramo, edit /var/lcfg/conf/DNS/zone.remote.inf and remove all of the entries pointing to hosts that you want to remove. (The script tries to preserve the mappings between users and machines to make session resumption easier. If that's not a concern, the file can just be removed, or moved aside, as it will be completely regenerated in the next step anyway. If it is removed, though, it's important that the next step is not then missed out.)
    3. As root on oramo, run /usr/lib/lcfg/dns/generateRemoteInf to generate a new zone. Users not already allocated to a machine will be spread across the remining ones in the remoteHostList map.

    lab.inf comes from a netgroup, so for that one you need to:

    1. Arrange to update the netgroup
    2. As root on oramo edit /var/lcfg/conf/DNS/zone.lab.inf
    3. As root on oramo, run /usr/lib/lcfg/dns/generateLabInf to generate a new zone.

    This is also covered in our DNS care and feeding document.

inf-unit-report.html,v 1.10 2020/06/15 14:56:54 gdmr Exp

 : Operational : Meetings 

Mini Informatics Logo - Link to Main Informatics Page
Please contact us with any comments or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh
Spacing Line