White dot for spacing only
The Dice Project

Operational Meeting

Minutes of the meeting held on Wednesday 10th February 2016 in IF-4.31

Ross Armstrong, Toby Blake, Lindsey Brown, Neil Brown, Roger Burroughes, Chris Cooke, Carol Dow, Alison Downie (convener and minutes), Ian Durkacz, Graham Dutton, Jennifer Oxley, Stephen Quinney, Iain Rae, Gordon Reid, George Ross, Alastair Scobie, Craig Strachan
  1. Minutes of the last meeting

    These were accepted as a reasonable account of the last meeting.

  2. Actions and blog articles

    Actions discussed:

    Actions added or revived:

    Actions deferred:

    Actions completed:

  3. Report from Computing Executive Group

    1. Security Week

      Reply to COs with comments/changes to notes produced by Craig/Chris along with suggestions about how to progress by next Operational Meeting. (If minor corrections, just contact Craig/Chris directly).

    2. B.03

      Just in case you weren't aware, kit in B.03 that you want to be kept must be labelled by end of Feb otherwise it will be recycled.

  4. Reports from Units

  5. Topics for discussion

    1. Power failure and SMSR

      Here's a copy of the notes from the wiki page:

      • We would normally expect around an hour of on-battery time at the current load level
        • but the server-room UPS isn't as reliable as we would like, and one of the units didn't provide any power at the last outage which doesn't really give us long enough to do anything other than start to shut things down straight away.
      • As things stand the SMSR is likely to get the last of any UPS battery power, which isn't really the priority we would like
      • There's a similar situation regarding cooling, should the aircon go down, and the SMSR machines run seriously hot
      • The power bars in the SMSR are unswitched, because they're cheaper, so we can't turn things off there remotely
      • Removing the SMSR DB from the UPS would likely involve a power-down for the main server room too
      • For some reason (†) best known to the contractors who built the place, the row of outlets along the back wall, currently powering the dexion shelves and the "physics" rack, are connected to the SMSR DB
        • and the locks-controller in the SMSR is actually powered from the main server room DB
        • (†) Presumably dates from the time in the design when there was only 'one big server room.' I.e. the existence of the SMSR was an afterthought (by us), I think. - idurkacz
      • Could we insist/suggest that some remote shutdown mechanism is provided on the machines that we have access to? Possible mechisms: ssh to "shutdown" with a password we know. Provide them with a daemon that listens on some port, which again we can connect to, authenticate and initiate a shutdown. The selling point would be, that if they don't support this, then we just kill the power. - neilb
        • Doesn't sound great to me. As a SMSR user I wouldn't want privileged actions to be available to an unspecified user group via SSH. And as a CO I wouldn't want to have to know the magic sequence which would in practice probably be slightly different on every single machine, let alone OS - gdutton
        • why not just give them a nut feed and warn them that the power is going. An ssh account has too many security/support issues.- iainr
      • If a remote shutdown is not feasible. Could we suggest they make sure that the (presumably soft) power button on the chasis, initiates a clean halt/power off. And explain that in the event of power loss, we will just start pushing all the power buttons - neilb
        • We do currently suggest it (see near the end of the computing.help SMSR page), but we don't know how many machines have it implemented. Perhaps we should do random tests??
            If we do request this, we should specify it such that it doesn't matter if it works or not: the consequence of not supporting it is a hard poweroff at PDU - gdutton
        • In any case, is it a fast enough procedure?

      It was agreed that remote ssh access for COs is not the way to go.

      Inf will look into running a nut service for SMSR.

      Will put sign up on door with our shutdown procedure.

      Will ensure procedure is documented on computing.help - an 'Emergency' page.

    2. exFAT

      There is uncertainty over the legality of our installing an exFAT FUSE module on DICE. We should contact Angus for advice on licence issues.

  6. Items for the Computing Systems blog
  7. AOCB

  8. Next meeting

    The next meeting will be on 24th February 2016 at 10:00 in IF-4.31.

 : Operational : Meetings 

Mini Informatics Logo - Link to Main Informatics Page
Please contact us with any comments or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh
Spacing Line