![]() |
Apologies for absence
Alastair and Rosemary had sent their apologies.
minutes of the last meeting.
These were accepted.
Report from Computing Executive Group
This agenda item was not taken.
Reports from units.
Infrastructure.
Shortly after the last meeting openLDAP was upgraded on all hosts to be the latest version (there has been another release since our upgrade however). Various changes have been made to the LDAP configuration to improve robustness: idle connections to the server are now dropped after a timeout period, the cache size has been reduced from 320MB to a more reasonable 80MB, check-pointing has been added and there are now indexed caches on the database.
Rooms 1206b, 1206c, 3316 and 1501 in JCMB (all of which are labs) have been released for use by others in the College. All of our kit in those rooms was removed prior to the handover and the switches are being reused in the Appleton Tower.
George has rearranged some of the Appleton Tower switch locations so that similar model switches are being used on the same floor. The newer switches with the greater functionality are being used on levels 3, 4 and 5 since the ports on these floors must be capable of being locked down to a specific MAC address.
Appleton Tower Level 6 is now occupied by the previous occupants of Level 3; thanks to the technicians and the Front Line Support Team for their efforts that resulted in an essentially smooth change.
The server room in Appleton Tower basement has been made significantly tidier after the removal of cardboard boxes and packing materials by Alison and Jennifer; many thanks to them both.
The technicians are shipping a new rack down to the Appleton Tower server room this afternoon. A new 10GBit link has been installed between two of the Appleton Tower switches; the installation went extremely smoothly and it worked straight away.
The power load distribution within the Appleton Tower server room was extremely sub-optimal and this resulted in one of the power blocks being overloaded with a consequent failure of supply (see Craig's Services Unit report for more details). The load distribution has subsequently been significantly improved but it is still not ideal. A new power distribution block with a network connection for remote configuration and control will be tried out shortly.
Managed Platform.
Stephen reported that they believe they have fixed a bug with the fstab component. During the installation of certain machines it had been observed that occasionally the component would hang. On investigation this appeared to be a timing issue between the component and udev. The component would check whether a device existed and if it did would start to partition the disk and build new file systems, but sometimes udev would delete the device and then recreate it while the component was trying to do its work. The component now removes all the device files to be certain that any device files are created by udev as a result of the partitioning done by the component. This new behaviour should be more robust. The new fstab component will be in this week's stable release and in the PXE release by next Monday.
Graham had discovered a problem with the FC6 PXE root. When this was created on the Sun server, by untarring a tar file containing the generated PXE root, some of the path names were truncated. This seemed to be a bug in both Solaris tar and the version of gnu tar on the Suns. By reverting to using the older cpio format for the archive of the PXE root this problem was avoided.
Chris has now implemented the tool for automatically rebooting machines. If it is installed (via the autoreboot.h header file), the autoreboot program runs once a day at some time between 01:00 and 02:00 and by default (under DICE) only pays attention to reboot requests from the updaterpms component and reboots the machine after a configurable delay (default 1 week). Warning messages are sent to the console periodically by the shutdown command in the intervening period. This is available for FC5 machines currently and FC6 shortly.
Official Fedora support for FC5 finishes in three days. In the next few months the unit will be investigating the use of Scientific Linux 5 (based on Red Hat Enterprise Linux 5) which has support for a couple of years beyond the point where it is replaced by a later version; this would be of interest for use on our servers.
Research and Teaching.
Iain reported on a problem with printing from firefox 1.5.10 that is shipped with FC6. Some web pages consistently do not print correctly. Iain will be looking at this more closely next week. We will probably either need to jump forward to a firefox 2 release or revert to the firefox 1.5.9 release.
Tim mentioned a problem with the behaviour of the piklab software that Archie was making available under DICE. The plug-in file browser insists on trying to list /afs (which it would eventually do after several days!). It was suggested that this may simply be a misconfiguration issue.
Graham has rebuilt the bash completion package so that it does not store its functions in the environment unless explicitly sourced. He has also added a few extra shell scripts for doing completion for a couple of extra commands (such as the fs command for AFS).
The unit have ported all the research packages to FC6; the vast majority have also been built for FC6 64bit. The snack package is available on FC5/FC6 but they are at a different package release (although identical on installation) since upgrading FC5 to the same release as FC6 would require a delete/install cycle as doing an upgrade tickles some obscure bug in rpmlib, this may also cause a problem on any new future releases of this package for FC5 or FC6.
Tim mentioned a bug in the new RT service. When a comment is added for example, and then committed, the window hangs (even though the operation has successfully completed). It appears that RT is redirecting one to https://rt3.inf.ed.ac.uk:80/.... which wouldn't normally work. Graham has mailed Alison about this.
It seems that neither the GPFS kernel module nor the Xilinx USB kernel module work under FC5 or FC6. The former works on Scientific Linux 4 and the latter on RHEL4. It is likely that they will be ported to Scientific Linux 5/ RHEL5 later this year. In the mean time the unit will aim to provide Xilinx on a single machine running RHEL4 that students can remotely log into in time for the start of the next semester.
Services.
Craig reported that there had been a power failure in the Appleton Tower server room at about 01:30 on 18th June which caused the phoenix and roc file servers and the stumer and eejit samba servers to be out of service until they were powered up again at about 09:00 that morning.
The machine hosting the www.dai web server finally failed at about 14:00 last Sunday after warning of a problem with its root disk for many months. The data was held on a separate server and it was only necessary to include the appropriate header file in the profile of a replacement machine (hootsmon) to get the service up and running again. The change to the DNS (giving hootsmon as a CNAME entry for www.dai) had propagated to all machines by 13:00 on Monday.
There was a mail problem that started at about 22:00 on Friday and was resolved by 17:00 on Saturday. Some member of staff had wrongly configured a mailing list so that the reply-to field was the mailing list itself. More than one wrong address in the mailing list meant that bounced messages were resent to the list generating a greater number of bounced messages and so on, the number of bounced messages going to the list growing exponentially with time. The mail server finally refused any more connections. The member of staff has been spoken to.
The AFS file servers have been upgraded to run OpenAFS 1.4.4. Client machines are still running OpenAFS 1.4.1.
User Support.
Since the last Operational Meeting the User Support Unit had handled 373 new RT tickets (equivalent to about 14 per working day) and resolved 74% of them. There had been a total of 379 tickets (including both new and existing tickets) resolved over the same period.
Ken reported some figures showing resolution rates against age of recent RT tickets:
Ticket Age D | Percentage Resolved (or rejected) |
---|---|
1 week < D < 2 weeks | 82% |
2 weeks < D < 1 month | 86% |
1 month < D < 3 months | 91% |
Lindsey has deleted several hundred old computer accounts, belonging to non-Informatics students who have not been enrolled on an Informatics course for over a year.
As mentioned earlier the move of staff in Appleton Tower from levels 3 to 6 happened on Tuesday 12th June. The Front Line Support Team were involved from about 14:30 on the day before till late Tuesday afternoon (although the majority of machines had been moved and set up by lunch time). Thanks to Dave Hamilton for configuring the switch ports (since we had expected that task to fall to us).
The population of the Forrest Hill lab with newish FC6 machines scheduled for Friday 15th June finally happened on Thursday 21st June. The work started well on the scheduled Friday; all the old machines being shutdown, dismantled and stored away. However a mix-up with the use of the van meant that the move of the 46 newer machines from JCMB to Forrest Hill had to be postponed. Most of the machines in the level 5 west lab in the Appleton Tower have also been upgraded to FC6. All remaining lab machines will be upgraded during the week commencing 28th August (after the MSc students have handed in their dissertations). Alison will shortly be distributing to CSOs lists by site of which staff/PhD machines are to be upgraded and which replaced.
The move of staff home directories from NFS to AFS is progressing; approximately 30 non-computing staff have now been moved. The total number of accounts (not including temporary ones) which have an AFS home directory is now about 130.
AOCB
There was none.
Please contact us with any
comments or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh |
![]() |