![]() |
Apologies for absence
George and Toby had sent their apologies.
minutes of the last meeting.
These were accepted.
Reports from units.
Infrastructure.
No members of the Infrastructure Unit were present at the meeting to present a report (Simon arrived at the end of the meeting).
Managed Platform.
Stephen reported that all FC5 machines are now receiving the current stable release by default, and that he was pleased to see how many computing staff machines were using the testing release - in fact all computing staff machines seem to use either the testing or the develop releases. There were some teething problems which showed up a bug in the LCFG server code. It took about a week to debug this but having applied a fix to the code everything appears to be working well. There should now be a faster turnaround time for making and testing changes to components since such changes are triggering far fewer profiles to be recompiled.
Paul has made changes to the LCFG status page to incorporate optional folding of the categories of machines based on javascript supplied by Kenny MacDonald. Chris will be looking at rationalising a hierarchy of categories that we can use to classify machines.
Chris is looking at DIY DICE under FC5
Stephen has now got kingsbarns running DICE as a 64-bit enabled system. In the process he came across some rpms that had been classified incorrectly as either noarch or i386; he has corrected these. The updaterpms component is still to be ported so machines must have packages manually installed at present.
Stephen mentioned an intermittent problem with the openldap component under FC5 which is affecting certain machines.
Research and Teaching.
Tim reported that they have now finished porting all the teaching and java packages to FC5 apart from certain licensed pieces of software. They are now looking at the licensed software and so far matlab has been made available under FC5.
Graham is making progress with the postgresql server for the new teaching database server but there was a hold up because of a problem with partitioning under FC5.
Iain has set up a condor master under FC5 and a small test cluster of FC5 condor client machines. He has also made some fixes to GridEngine.
Rosemary has been investigating a database problem. Certain queries hang when run as a normal user but run fine when run as daidb from cron. Rosemary has narrowed the problem down to one table which may, in any case, be superfluous. Further experimentation will be made on a test database machine (e.g. peregrine).
Some long-standing RT tickets have been processed with a view to clearing most of them.
Services.
Craig reported a catalogue of recent disk failures:
A disk failed on the ATABeast. Nexsan advised that it would be safe to hot-swap the disk rather than shutting down the ATABeast and attached servers. On pulling out the disk tray to carry out the swap of the failed disk one of the fibre channel cables became disconnected. This caused a large number of errors in the ATABeast fibre channel controller which triggered a feature in the firmware to disable all the channels. So this attempt to avoid disruption to the users unfortunately caused even more disruption than would the alternative strategy of doing a cold swap.
A disk also failed in the ATAboy. There was no danger of a similar cascade of failures as described above because the disks are all accessible from the front. However on replacing the failed disk with a new disk, the latter failed as well; as did the next replacement disk that was tried. Nexsan were consulted and they confirmed the most likely diagnosis that there was a fault on the chassis. A replacement chassis is being sent and if this arrives in time it will be installed this Sunday.
A system disk failed in phoenix yesterday and caused knock-on effects for samba users. The system partition was mirrored via RAID but there was a home directory partition on the failed disk that was not mirrored synchronously in this way. The home directories have now been restored from the (asynchronous) mirror on one of the mirror servers. They will shortly be moved onto a RAID'ed disk on the SATABoy. The Sun engineer will be coming this afternoon to replace the failed disk on phoenix.
User Support.
About 130 machines have now been upgraded to FC5, about 80 of these in the student labs. The rate at which machines are upgraded to FC5 should now increase since several staff who had been on annual leave have now returned.
Three weeks ago Morna, Lindsey, Charlie and Ken, along with staff from Physics, attended an MDP training session for computing staff run by Kenny MacDonald, leader of the EUCS Managed Desktop Team. Yesterday Morna, Charlie and Ken attended a meeting with Kenny MacDonald, Keith Nicol and Angus Rae to finalise the handover of support for our MDP installations from the Managed Desktop Team to the Science and Engineering Support Team.
Ken has had a meeting with Paul to clarify the requirements for the Development Meeting management software. He has also started investigating Ruby on Rails as a possible candidate for the framework within which the software is developed. He has packaged up the various parts of ruby on rails as FC3 rpms and made them all available via a header file core/include/dice/options/ruby_on_rails.h.
Premature closing of connections to ldap master server during running of the /usr/sbin/syncdbldap script.
Ken outlined the symptoms and reported Simon's initial diagnosis that the root cause was almost certainly LDAP. As no member of the Infrastructure Unit was present Ken said he would raise the issue again offline. When Simon arrived shortly afterwards he said that the only circumstances under which OpenLDAP should drop a connection was if there was a timeout. He asked that the problem be reported via bugzilla.
LaTeX under FC5 (specifically latex.local.informatics)
Iain reported that the student providing LaTeX support was working on porting this rpm to FC5 and the new version of tetex. It was not clear to anybody which parts of the local LaTeX support were most urgently needed. Ken commented that Ross was going to try and establish what local LaTeX support was missing from FC5 that was needed for current work (as opposed to say the poster support that shouldn't be needed until next spring). Roger mentioned a LaTeX related request from a member of staff. Alastair reminded us of the general principle that priority needs to be given to things that affect most people.
It was agreed that the User Support Unit would take responsibility for managing the work being done by the student providing LaTeX support.
Morna commented that there were still tex related packages available on RedHat 9 machines that were not even ported to FC3.
AOCB
Replacement for Mozilla composer for MDP machines
Morna raised a point about user-support having to help admin people put things like PDF's and images on the web. Ken mentioned the web page that describes how to publish web pages using mozilla. Simon said that the cvs aspect of publishing on the Informatics web server is handled by a script on the server once the page has been published via composer. He said that it was possible to configure composer to upload all relevant links, but Morna said that she was aware of this and it wasn't the whole solution to the problem. Simon believed that Nvu, which has been developed as a stand alone web page editor based on the composer element of mozilla, should be usable for publishing from MDP machines. Morna and Craig said they would discuss the issue.
Please contact us with any
comments or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh |
![]() |