Apologies for absence
Alison, Morna, Paul, Stephen, Tim and Toby had sent their apologies.
minutes of the last meeting.
These were accepted.
Report from Computing Executive Group
The Computing Strategy Group had finally managed to meet on December 18th. Unfortunately because of extensive further discussion of the paper on Research Computing there was no time to discuss many of the items on the agenda. Two issues that were raised were the proposed policy for out of hours support and the proposal for handling old accounts of students doing non-Informatics degrees. The latter was accepted with the amendment that the deletion will take place no earlier than 12 months after the last relevant Board of Examiners meeting.
Reports from units.
Toby has pushed out the latest ldap upgrade to development machines. He will later keep it in the testing release for at least two weeks. He will announce when it is going into the stable release. A consequence of this is that the replacement of basilisk by a new FC5 server will be delayed.
The FC5 routers have stayed up over the Christmas/New Year holiday and appear to be far more stable than prior to the introduction of a modified kernel that Alastair built (see the Managed Platform Unit report).
The previously reported instability of the JCMB and Forrest Hill switches turned out to be a general problem with all the switches. Something on EdLAN had been sending out Spanning Tree topology changes every 2 seconds, continuously for about a week. It had caused all the switches to flush out their caches every two seconds, causing them to be under constant high load. As a work around the Informatics switches have now been isolated from the rest of EdLAN as far as the Spanning Tree operations are concerned. An unfortunate consequence of this is that we no longer have automatic fail-over of links between Appleton Tower and each of Forrest Hill and Buccleuch Place. The redundant links still exist but have been disabled. In the event of a failure of one of these two primary links the secondary link will need to be manually brought into use, and this will need to be done from a network local to the switches concerned for the two ends independently. This should only be done by knowledgeable members of the infrastructure unit since the scope for causing huge disruption throughout Informatics is huge.
George has upgraded the firmware on most of the wireless access points and has booted this new firmware on several of them.
Chris reported that there were now only two remaining servers managed by the unit that were still running FC3; pezanas, the rpm master server, and achilles, which has many roles at present. Pezanas will be upgraded later this week.
The support for 64 bit operation was now a lot closer to completion. It is now at a level intermediate between lcfg and dice.
The LCFG website now has a permanent home on dresden. It has much of its content automatically maintained and kept up to date with the software releases. The LCFG workshop held on 20th December was a success. The Desktop Services Team has an LCFG wiki which is worth looking at.
No machine's LCFG profile should now require special resources settings in order to get it to PXE boot.
The stable DICE releases will now be produced at about 15:30 each Thursday instead of on Thursday mornings.
The bug affecting the connection of a machine via a USB port to a UPS is still unsolved (but the workaround is to use a dongle).
Alastair's kernel module fix for the H.323 router problem appears to be successful (see the Infrastructure report about FC5 router stability).
The diagnosis of the problem of locking up Dell GX260s to a faulty kernel module used for the video card support appears to have been correct. Since this kernel module has been excluded there have been no similar lock ups of GX260s. Every freezing of a machine since that time could be explained by problems with the network. Everybody should, however, continue to report any similar problems that arise.
Alastair commented on the number of machines that last week were still in need of a reboot to pick of important changes. It is planned to modify the boot component so that, if so configured, it would automatically reboot the machine if a reboot was requested by a component (for some clients, such as those in labs, this would be immediate, whereas with other clients it would be after a few days of warnings; in the case of servers it would not reboot but would repeatedly send mail requesting a reboot).
Research and Teaching.
Tim had sent in a report:
Mainly FC5 upgrades - all complete now apart from a few end-user machines which are waiting to be scheduled and the database server which had to be postponed because of time critical work which needed it in service.
Iain has been working hard on fixing problems with Condor under FC5 (we believe related to the glibc upgrade), which are hopefully now fixed.
Phoenix crashed at about 02:30 this morning. The reason for this is still under investigation. It was brought back up again at 08:27.
During the upgrade of a server at JCMB on Monday afternoon FC5 was installed on one of the disks attached to the server over fibre channel instead of installing it on the system disk. This overwrote a volume that held 5 partitions. These partitions were restored from backup by the end of that afternoon. In order to reduce the possibility of this happening in future hosts must be disconnected from any fibre channel attached disks prior to installation or upgrade. LUN (Logical Unit Number) masking will also be enabled on all fibre channel devices. This has already been done on the SATABoy at Appleton Tower and the SATABeast and ATABoy at JCMB. It will be done on the ATABeasts at JCMB and Buccleuch Place.
The mail server nutty will be upgraded next week. The services currently provided by three separate servers, webdav on baboon, jabber on boogaloo and bugzilla on kittyhawk will be moved to a single FC5 server next week.
The remaining 5 print servers would not be upgraded to FC5 until a suitable solution to the problem with the broken LPRng on FC5 had been found. The unit will now be looking at a fix for LPRng on FC5 or a switch to CUPS on FC5.
In response to a question from Ken, Craig said that the scripts for reporting on the per-user status of their disk quota would be rolled out to the testing release by the start of next week.
The unit acquired responsibility for another four servers bringing the total to 49. All but three have now been upgraded to FC5, one has been unallocated, another will be unallocated today and one, bu.inf, will remain as an FC3 login machine for as long as one is required and supported.
It was not possible to fix the broken profiles of RH9 laptops that are still in use by changing them to using the os/selfmanaged.h header in place of the redhat 9 header for fear that this would have deleted all the software off the machine had it ever been reconnected to the Informatics network and updaterpms run. Alastair said that he would check what the situation really was.
The vast majority of machines that needed to be rebooted, either to pick up the most recent glibc or to make sure that the buggy kernel module for the video card on GX260s was not loaded, have now been rebooted. This morning there were still 9 to be done (8 of which were in JCMB).
Ken reported that since the last Operational Meeting the User Support Unit had handled 276 new RT tickets (equivalent to about 16 per working day) and resolved 65% of them. There had been a total of 285 tickets (including both new and existing tickets) resolved over the same period.
Ken has been doing some of the final work on the new devproj server setup. The rack-mount server dolly.inf will be hosting the devproj site as from early next week. It will have the same style as the Informatics web server and will be using the mod_fastcgi to gain significant speed improvements.
Roger reported that he had upgraded the CSTR servers as part of the general upgrading of servers that were the responsibility of the unit. He had discussed with Tim the issue of continuing support for Franzlisp and Alastair confirmed that maintenance had been renewed this time but that this would need to be looked at again next year in good time. Roger had been doing more work on the MacOS documentation (FAQ, hints and tips etc) and this was now available. He will announce it later this week. He has also been doing some work on making the AFS client for Windows available on MDP machines.
Alastair raised a concern about the Computing Support Officer cover at JCMB and Ken agreed to look into this.
Alastair announced that we would be welcoming a new computing officer towards the end of March; Ian Durkacz would be joining us from Sheffield University.
We were all saddened and shocked to hear of the death of Shehzad Ali, a computer science graduate of this university who had worked briefly as a computing support officer in this School in the autumn of 2005.
Please contact us with any comments or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh