![]() |
Apologies for absence
Carol, Chris, Morna and Rosemary had sent their apologies.
minutes of the last meeting.
These were accepted
Report from Computing Executive Group.
Alastair reported on the schedule for the Appleton Tower refurbishment.
There will be one Appleton Tower support office and it would be located somewhere on level 3, 4 or 5. Space on levels 3 and 4 will be in high demand and so it is very likely that all computing staff will be based in the Forum.
The JCMB labs would close in mid June 2007 but we would probably still have use of the JCMB machine room until sometime in 2009.
Reports from units.
Infrastructure.
The unit members have been doing upgrades of servers to FC5 and it had thrown up a few issues. Machine monitoring of UPSs over USB is unstable and they are now using dongles so that they can use the serial interface. They are about half way through upgrading the routers.
Toby has been testing new OpenLDAP code in the development release and as it seems to be OK will be putting it in the testing release.
Under FC5 LDAP servers will, by default, only serve data to the local machine and this should be born in mind when setting up LDAP servers for the Suns which use an external LDAP server.
Some machines are showing problems with LDAP and Toby has requested that if anyone comes across a machine that is showing authentication errors then Toby should be informed of the host name and the time that the failure occurred. He should also be informed of any machines that have needed to be rebooted because of this problem.
George has upgraded the firmware on all the KB switches. Since there was a need for additional switch ports in Appleton Tower an HP 40 port Gigabit switch has been ordered for installation in one of the racks in the Appleton Tower server room.
The rationalisation of console ports in the Buccleuch Place server room has now been done and a Cyclades serial console server box freed up for later use in Appleton Tower. Alastair mentioned that he had been looking at the possibility of using IPMI (Intelligent Platform Management Interface) as a possible alternative to serial consoles.
Managed Platform.
The number of broken profiles has now been reduced to 24 (the majority of these belong to old RedHat 9 machines). Chris has tidied up the contents of the inv.manager resource on all profiles; the value should be a comma separated list of valid email addresses (but usually just one address or uun). Shortly email will be sent out listing the names of all broken profiles to the address corresponding to those broken profiles. There will also be a new web page on the LCFG server showing a dynamic list of the broken profiles.
Stephen warned us that if a machine was left disconnected from the network for more than a few weeks it might get into a state where it would need to be reinstalled. This was because certain changes to package lists and resource changes needed to be carefully phased in order to work correctly.
Alastair reported that once the 64-bit project was complete the unit would start work on support for Fedora Core 6 which came out on 24th October this year. It is believed that Fedora Core 7 will not be released until mid summer 2007 which would be too late to adopt for the academic year 2007-2008. Alastair said that they will attempt to introduce FC6 as an upgrade rather than a re-install.
Research and Teaching.
Tim reported that the unit had started to upgrade their servers to FC5. They had finally managed to take the old RedHat 9 postgresql server out of service. They had upgraded two of the BioInformatics servers and Iain and Archie were planning to do the last one on Friday. Iain has upgraded the hermes beowulf cluster nodes, has consulted the beowulf users and has now scheduled the upgrades of the others. Tim will be upgrading the School database server and has been looking at the open source release of Ingres on FC5.
New condor LCFG header files are now available for stable release machines. The headers correspond to the new condor pools, condor-kbpool.h for the KB pool and condor-centrepool.h for the central area pool. Alastair commented that the headers should only be active on machine running a stable release of DICE, Tim said that he would check if this was the case. There was some question as to whether there should be a separate CSTR condor pool.
Rosemary has been working on the Graduate School database work project. She has made changes to the applicant and student custom forms. She is also continuing to do work to support the RAE (Research Assessment Exercise), specifically exporting our research report data to the University's RAE database.
Archie has done a test install of a DIY DICE machine but had hit a few problems to do with PXE booting and running X. Alastair believed that the X problem was not a DICE issue but was to do with X being unable to auto-detect the monitor type when it was connected via a KVM switch (which was the case).
The School now has access to 35 TB of space on the SRIF (Science Research Investment Fund) Storage Area Network at the Bush. EUCS are managing the SAN now instead of EPCC. We will have the ability to manage how all the storage on a single cabinet is configured for our use. We will be locating one of our servers out at the Bush and this will be directly attached to the SAN
Services.
Most effort over the next month will be devoted to upgrading the remaining 33 servers managed by the unit to FC5. However it is extremely unlikely that the print servers will be upgraded before the Christmas break because LPRng local printing does not currently work on FC5. So far the legacy web servers, the mail virtual relay, one of the AFS database servers and some ssh gateway machines had been upgraded to FC5.
A recent disk failure on salamander which had been used as a test AFS server eventually caused a interruption to service for AFS users last Tuesday. At the time it only held read-only copies of the data describing the underlying AFS volumes; there was no user data on it. The disk failure caused no problem for users until the AFS file server on phoenix was rebooted. The AFS cache managers on the clients all switched to using salamander and locked up. Deleting references to salamander from the volume database and rebooting the AFS server on phoenix finally cleared the problem after up to about an hour's disruption.
Craig and Bill have upgraded the firmware, reseated the controller and replaced the disk showing the most errors in the Buccleuch Place ATABeast.
Phoenix and the samba server stuma were hung on Monday because of a problem with the SATABoy. There had been corruption of data following an atypical slow disk failure. Craig eventually managed to fsck all the affected file systems. This is the first time that we have seen any data corruption following a disk failure.
User Support.
Ken reported on the numbers of desktop machines that still had to be upgraded to FC5:
Site | To be upgraded to FC5 |
---|---|
AT | 4 |
BP | 0 |
FH | 2 |
KB | 23 |
All | 29 |
Of the 45 servers that were the responsibility of the unit 27 had so far been upgraded to FC5. It was most likely that 17 of the remaining servers would be upgraded this semester (leaving the fc3 login machine as a self-managed machine).
Ken reported that since the last Operational Meeting the User Support Unit had handled 420 RT tickets (equivalent to about 28 per working day) and resolved 72% of them.
Ewan Grant, the second of the two new Computing Support Officer appointments, had joined us on 13th November.
Roger has been working on the testing of the archiving scripts and doing work on AFS and MacOS documentation, as well as mainly working on the server upgrades.
AOCB
There was none
Please contact us with any
comments or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh |
![]() |