![]() |
Apologies for absence
Alison, Chris and Iain had sent their apologies.
minutes of the last meeting.
Roger pointed out a grammatical mistake in the sentence about the location of the Appleton Tower Support Office.
Report from Computing Executive Group
This item was not taken.
Reports from units.
Infrastructure.
The unit members have been been investigating stability problems with JCMB and Forrest Hill switches and FC5 routers. The problem with the FC5 routers appears to be caused by a kernel bug (mentioned later in the MDP Unit report).
Toby is now shipping the new OpenLDAP code in the testing release and if no problems are reported will be including it in the stable release of 14th December. The possible problem with LDAP that had been mentioned at the last meeting turned out to be not an LDAP problem intrinsically but a consequence of people power-cycling machines rather than rebooting them.
One of the BP switches has a fault with one of its fans and George will arrange for the technicians to replace it.
Managed Platform.
There are still 20 broken profiles (19 of these belong to old RedHat 9 machines). Ken said that the User Support Unit would change these profiles so that they used the self-managed header.
As part of the 64-bit project all the package lists and headers for the LCFG layer and inf have been changed so that noarch rpms (mainly LCFG components) carry an explicit noarch suffix. Prior to this project any package that had no architecture suffix was assumed to be the default architecture i386 and only if such an rpm didn't exist in the repository was a noarch version used from the repository. However in order to handle three possible architectures i386, x86_64 and noarch when sharing package lists and headers it will in future be the case that any package description that had no architecture suffix would be interpreted as referring to the default architecture (either i386 or x86_64 depending on the individual host) with no fall back to noarch.
There are some known bugs in the 2.6.17. kernel that we are using. However we can't use the later 2.6.18 kernel because it doesn't correctly support the SATA interface and the 2.6.19 kernel is not yet an option for FC5.
One of the bugs is triggered by use of a USB connection to a UPS. However a workaround that avoids triggering the bug is to connect the USB port on the server to a serial port on the UPS using a USB cable and a USB-serial dongle as mentioned in the last minutes.
There is also a bug in the ip tables connection tracking code which has been fixed in the 2.6.18 kernel (but which is currently unusable by us, as mentioned above). The workaround that we can use is to build a kernel without this particular kernel module; it only appears to affect FC5 routers.
Research and Teaching.
Tim reported that the unit had continued to be busy upgrading their servers to FC5. The self-service server for postgresql accounts had been shut down and the service moved to the postgresql server since Apache 1.3 is now supported on DICE FC5 machines. The final BioInformatics server was upgraded last week (previously its upgrade had been waiting on FC5 support for fibre channel). Iain has upgraded the lutzow and townhill beowulf clusters (both the heads and the other nodes). There is just the lion beowulf cluster still to upgrade. Iain has started on installing the replacement beowulf LCFG server
Iain has hit an incompatibility between the Cyclades cards and the FC5 kernel; he will, no doubt, discuss this with Stephen when he returns.
Tim will be upgrading the School database server to the open source release of Ingres running on FC5. The results of regression tests so far look very promising. The upgrade will probably be done next week. Tim hit problems building gurgle on FC5 but will use the FC3 binary instead initially since this does work on FC5.
They have started on upgrading the matlab licence servers. The quorum of servers for the research licences will be replaced by a single licence server (the teaching licence server set up is already like this). By changing the setting of the MLM_LICENSE_FILE shell variable so that it specifies the licence server address and the port to connect to on that server (instead of specifying the path of the licence file) it will be no longer necessary to ship the licence file to all client machines. Once chubby is no longer a licence server the Research and Teaching Unit will hand it over to the User Support Unit for upgrading to FC5.
Services.
There have been two recent problems affecting AFS (but neither directly to do with AFS!). The server symplegades couldn't talk to the network because of a problem with the switch to which it was attached. The actual cause is as yet unknown but the problem went away when symplegades' switch connection was moved to a different port on the switch. The second problem was a failure of the mirroring of AFS volumes (which only affected Craig and Roger). This was caused by a problem with har which was only resolved by moving har's connection to a different port and rebooting the switch.
Printing at Appleton Tower, which is currently served from the print server intertype at JCMB, was disrupted when intertype stopped being able to talk to the shared printer VLAN. To restore printing at Appleton Tower the print server at Forrest Hill was reconfigured so that it would also handle the Appleton Tower printers. When intertype was moved to a different port on the switch the network problem that it had been suffering disappeared. Of possible interest is that both symplegades and intertype were experiencing problems with VLANs that spanned all the physical sites in Informatics.
One factor that might have caused the unusual switch host interaction was the use by har of the same MAC address for its two interfaces (the default behaviour under Solaris). Craig will speak to Chris about changing this configuration on har and the other Solaris servers with two interfaces so that they all use distinct MAC addresses for each interface.
Linotype, the original Appleton Tower print server) had been reporting a failing disk. On reporting this to Dell they said that the messages were caused by running old firmware. On upgrading the firmware the errors stopped. However yesterday the disk really did fail and we have received a replacement.
User Support.
Another 10 of the 45 servers that were the responsibility of the unit had been upgraded to FC5 since the last meeting. Roger would be upgrading one remaining 8 this afternoon and one more next week. It still seemed most likely that 7 of the remaining servers would be upgraded this semester.
Ken reported that since the last Operational Meeting the User Support Unit had handled 243 new RT tickets (equivalent to about 24 per working day) and resolved 63% of them. There had been a total of 228 tickets (including both new and existing tickets) resolved over the last two weeks.
Roger had been working on the server upgrades and doing further work on the account creation scripts needed for creating AFS accounts. He also asked about some of the problems that some had reported on the use of Crossover Office. The problems were hard to replicate and with it being a commercial product there was little that we could do other than offer alternative software (eg native Windows). Without being able to properly describe or replicate the problem we weren't even in a position to report the problem to the software maintainers. There were only 9 non-support machines running Crossover Office.
Morna reported that the Unit had taken over some of the tasks from the Services Unit. Next week computing staff who had not yet received MDP training would be going on an MDP course. Morna had been spending time trying to investigate any potential instances of FC5 instability (see the agenda item). She had also been making progress on the questions to be asked in the survey project that she was running. EUCS have admitted that a problem experienced by MDP users with their roaming profiles was not because of anything that the users were doing but because of a full partition on the profile server. They have now made a new profile server available and the Unit have started to move individual MDP users across to this new server.
FC5 stability investigations
Stephen reported on some of the findings so far. It appeared that there were several distinct problems, with different symptoms.
Stephen said that a kernel bug that had been implicated in some of the cases examined was present on Dell GX260 workstations with the ATI Radeon graphics card. Certain graphics manipulations would precipitate a crash. The Managed Platform Unit will be changing the include/lcfg/hw/dell_optiplex_gx260.h header file to remove any reference to the kernel module (that is now only poorly supported) that was needed for the Radeon card.
It was essential that when details of frozen machines are reported, whether the machine was pingable or not must be included.
Stephen also mentioned some problems that had been observed with KDE. On some machines with a very high load there were a very large number of kded processes in an uninterruptable state. It appears that as each person logs in with KDE a new kded is started but when they log out the kded process becomes uninterruptable.
Undergraduate student quotas
Craig reported on some of the discussions at CEG on this issue (see 1.7) Disk quotas for commodity use in the most recent CEG minutes. He said that by the start of January no home directory partition should be over-subscribed i.e. the sum of the quotas of the user home directories on a partition would be no greater than the size of the partition. Initial examination of the statistics seemed to suggest that the quotas were not in general too low (the average use of 3rd and 4th year students was about 63% of their individual quota).
The problem was exacerbated by some badly-behaved software (such as Eclipse) that wrote zero length files without reporting an error.
A script would be written that would be run each time a shell was fired up which would report on the quota usage. Another script would mail the user if they were approaching their quota limit.
AOCB
There was none
Please contact us with any
comments or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh |
![]() |