Apologies for absence
Alison had sent her apologies.
minutes of the last meeting.
These were accepted.
Report from Computing Executive Group
This agenda item was not taken.
Reports from units.
The switchover to the new LDAP master server, franklin, will take place on Thursday May 17th. Toby has been investigating some strange LDAP replication problems; when the problem occurs the LDAP database gets corrupted and slapd reports an error. Once this has happened it is impossible to log into the machine other than as root. To fix the problem one can log on as root and run om openldap restart -- -f. Alternatively one can edit the machine profile to point the machine at an alternative LDAP server, wait for the new profile to be loaded onto the machine and then log on and fix the problem with the local LDAP before returning the profile to its previous state.
Based on the lack of water in the bucket placed to catch any leak of condensate it would appear that the problem with the air-conditioning unit in the Buccleuch Place server room has been fixed. The maintainers of the equipment had replaced a pump which, from its condition, didn't appear to have been working for a long time. It is however still the case that this air-conditioning unit is working at nearly full capacity and no additional servers should be installed in that server room. An order for a new rack for the Appleton Tower server room has been placed in the order queue.
We are committed to releasing rooms 1206b, 1206c, 3316 and 1501 in JCMB (all of which are labs) for use by others in the College by May 25th. There is an RT ticket in the technicians queue that pertains to this.
George reminded us of the decanting of offices from floor 3 to floor 6 in Appleton Tower. The new network kit for Appleton Tower has arrived. There will be a meeting to discuss Appleton Tower refurbishment tomorrow afternoon.
Roger requested aliases for the console servers, and George agreed that this would be a good idea (shortly after the meeting George created site-specific console server aliases: atconsoles, bpconsoles, fhconsoles and kbconsoles).
Chris reported that the Managed Platform Unit would shortly be installing an FC6 test server for users to test their software on; this will be announced in the near future.
The System Configuration and LCFG: Tutorial Workshop will take place on 13th June in the Appleton Tower and is almost fully booked by people from elsewhere in the University and from other universities. A rerun of the workshop for Informatics staff will take place at some time after this.
A fix to the network component has shipped and is now in the stable release. The new release can also configure channel bonding.
The Unit are now actively looking at automating the reboot of desktops when the reboot flag is set (this had been talked about in the first January Operational meeting).
There is a problem with diysubmit, the utility for submitting LCFG configuration files to a DIY LCFG server, which potentially affects DIY users wishing to use the command. The command relies on using rsync to copy files to a directory on the DIY LCFG server. The directory is managed by the file component and should be owned by the user who manages the specific DIY DICE machine. However the file component starts before the LDAP component and also before contexts become active so the directory ownerships revert to root at reboot of the DIY LCFG server. Happily reboots are very rare and at present the work around is to manually reset the ownerships to their correct values after the server has booted.
Research and Teaching.
The most recent School database server crashes have now been fixed. Ed Dee sent a list of ingres variables that could be configured and after trying modifications to them all it was found that setting opf_hash_join to off (OPF is the query optimisation facility) prevented the crashes.
The Unit have ported 30 rpms out of about 120 that are needed for teaching and research to FC6. Most of the time is spent identifying the most recent version of a package and searching for an existing rpm. For this reason it was judged not worth while trying to automatically rebuild the packages (as had been considered a month ago). Graham has been looking at the port of the java environment with its large number of associated packages. Most problems that have been encountered in rebuilding packages have been to do with the new gcc compiler in FC6. Where it will not cost too much effort the packages will also be built for a 64-bit environment.
Iain has upgraded the memory in the townhill beowulf cluster (each node now has 8GB) and the hermes beowulf cluster (each node now has 4GB). Since the upgrade he has not observed a recurrence of the problem with the kernel killing off slapd to save memory.
Graham has finished work on the postgresql component and has created suitable lcfg and dice level header files so that any machine can easily be set up as a postgreSQL server (with kerberos support in the case of a DICE postgresql server).
Craig reported that they had heard back from NCE (who maintain the ATABeasts ) that the recent hang of the Buccleuch Place ATABeast was due to a bug in the firmware. They suggested that we consider downgrading to the previous release of the firmware but Craig said that unless there were further problems they would leave things as they were for the mean time.
Craig said that it was their belief that all the recent samba problems had been caused by the corrupted samba printer databases (which then eventually crashed the samba server). Ken again asked that notes on diagnosing the problem and then fixing it should be published if Craig thought that any future recurrence could be fixed by Front Line Support.
Last Tuesday afternoon we started having problems with the mail server, nutty. Between then and when it was spotted the following morning, at which point nutty was rebooted, mail delivery to 65 users was affected. Both smrsh (the restricted shell for sendmail) and procmail intermittently were unable to change the effective uid to the recipient of the mail. They will continue to monitor the situation but so far have no explanation for this behaviour.
Craig reported on their understanding of what had caused all the recent print server problems. He had yesterday mailed out a link to a document on their wiki space describing their diagnosis. Chris commented that Alastair was not yet convinced that the explanation was correct in all aspects.
Since the last Operational Meeting the User Support Unit had handled 220 new RT tickets (equivalent to about 22 per working day) and resolved 71% of them. There had been a total of 227 tickets (including both new and existing tickets) resolved over the same period.
During the meeting Lindsey used, for the first time, the script for warning users of the intention to delete their old computer account.
Ken has been mainly busy with preparing for the talk that he gave last Wednesday and the appeals panel that he attended on Friday, as well as the usual round of chairing and minuting meetings.
It is likely that there will be another round of IDMS training run by the IS: Applications Division in late May. Local administrative staff and new CSOs would be attending.
The current LaTeX support post holder has replied that he is willing to work on LaTeX issues until at least the end of May. Alison will ask him about porting the local LaTeX style files to FC6.
There seems to be considerable ongoing confusion about the Windows AFS client and whether it is in a state to allow easy use by administrative staff. Simon was surprised that we had decided not to push ahead with its use and Roger is arranging a meeting between Kenny McDonald (MDP project), Simon and himself to sort out what the problems are. Either Lindsey or Alison should also attend.
Only 2% of FC5 desktops still needed to be rebooted to pick up the new kernel. Most of the Unit's seven remaining servers that needed to pick up the new kernel would be rebooted in the next few days.
Alison and Jennifer had tried using the notes that Ken had written describing how to reinstate the devproj server in the event of a complete failure of the server. They had hit a problem caused by a randomly created mysql root password containing characters that had special meaning to the shell. Neil commented that this looked like a bug in the mysql component. Ken had suggested alternative instructions that would work around this problem (by avoiding using a random password in the first place).
There was none.
Please contact us with any comments or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh