The minutes for 13th January were approved.
Report from Computing Executive Group
Reports from Units
Stephen asked what the expectation was for machines mentioned in the new reports. George replied that the reports were only concerned with DICE wires and that the expectation was that Units would fix their own machines.
Stephen requested that MPU should be notified in advance of any future changes to these scripts. Toby stated that he would be retroactively adding the appropriate entitlement to the accounts currently in grace, allowing the new scripts to be reinstated.
Toby offered to produce a patch which would fix the nagios certificate monitoring problem. Stephen gratefully accepted this offer.
Stephen asked how the provision of Semester 2 software was coming along. Graham replied that it was all done bar one package, Eclipse. It was pointed out that this was delaying the production of new Virtual Dice images. Graham said that the minimal and teaching (albeit without Eclipse) version could be produced now and he would be happy to demonstrate to others how to produce such images. The mega version is still outstanding though. Alastair observed that there is no reason why we couldn't produce multiple VDICE images eliminating the need to wait for outstanding software. There is also a lack of clarity on which Unit is responsible for producing these images. This will be discussed at CEG.
After a question from Neil, Alastair provided a verbal update on the state of the moves made just before Christmas.
Graham reminded the meeting of the existence of the on-site status in RT.
Neil informed the meeting that he had not yet fulfilled the request to delete all IV email addresses since he was not convinced that all the ramifications of such a move were fully understood. He also asked about the future of the IV web sites. Alastair suggested that he discuss this with Jim Ashe.
Topics for discussion
The recent issue with the OpenAFS file system and our response to it was discussed at some length. Stephen started by asking whether it would be a good idea, in any similar event in future, to nominate one person to maintain an overview of the situation and ensure that nothing is overlooked. A incident commander if you will. Alastair replied that he was unsure what benefit we would have derived from having such a person. He noted that we had set up an incident response team by lunchtime, and that discussions had been on-going for some time as to how we should plan for these sorts of major incidents. One issue, of course was to know when to declare a major incident.
Neil pointed out that though we had all formed small groups to investigate our own areas of responsibility soon after it became clear that something was amiss, no-one was checking that nothing had been overlooked. An incident commander might have been useful for this. Alastair asked if a wiki page might have been useful for sharing information between us. Neil thought it might have been helpful for recording who was looking at what but the general feeling was that a combination of chat and small Teams meetings had allowed for good communications.
Attention was then turned to how well we had communicated with our users. Stephen asked whether we should have more indicators in the status area of computing.help which would allow us to announce such things as logins being slow. It would still be hard though to come up with a range of indicators which would cover all possibilities. Neil suggested that we might have a MOTD on computing.help which could say things like "logins slow - file system issues". If was felt that a way of conveying a little more information about the current status computing systems might be helpful.
Jennifer observed that sys-announce messages such as those sent out last week often occasion alarm in the admin staff who are unsure if such announcements apply to them. Should the User Support Unit make especial efforts to contact admin staff when such events occur? It might be possible to word sys-announce messages in such a way as to indicate whether they apply to admin staff.
Alastair pointed out that it was important that something should be sent out to sys-announce as quickly as possible to acknowledge that there are issues, preferably within 15 minutes of it becoming clear that something is up. The initial message should at least acknowledge that an issue exists, with follow up messages providing more information as it emerges. We didn't do so well here as the first sys-announce message didn't go out until after lunchtime.
Neil asked if we should have made more use of Twitter, and whether more of the computing staff should be able to access the Twitter account. It was felt that while this might be a good thing, in general all comms should come from the User Support Unit. Alastair stated that he wanted to discuss comms at CEG.So to recap, the main take-aways from the discussion are, as I see it:
Neil suggested that during unit reports, questions and comments should be delayed until the end of the report, Toby agreed with this proposal. No-one else seemed to have strong feelings either way so we'll give it a try.
The next meeting will be on Wednesday 27th January 2021 online at 10am.
Please contact us with any comments or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh