![]() |
Apologies for absence
Alastair, Alison, Chris, Craig and Toby had sent their apologies.
minutes of the last meeting.
These were accepted.
Report from Computing Executive Group.
This agenda item was not taken.
Reports from units.
Infrastructure.
Simon has made changes to the LDAP schema to hold data on a second home directory (afsHomeDirectory) to support the use of AFS. Current users of AFS have had this value set manually. This change to the LDAP schema precipitated a huge number of changes to the LDAP data (one change per People object). This in turn caused time-out problems for ldapreplicate on FC3 machines; but this has now been fixed. Tweaks were also made to kerberos to support AFS. Toby has started work on building a KDC (Key Distribution Centre) under FC5.
There was a hardware fault in the fan in one of the Appleton Tower switches. We were sent a replacement switch chassis but Gilbert was able to just swap out the fan itself, saving the time to configure the switch. The switch was only down for about 7 minutes as a consequence. The replacement switch chassis with the faulty fan installed has been returned.
We only have about one spare console port in Appleton Tower. Julieta will explore possible rationalisations of console ports in Buccleuch Place in order to free up one of the Cyclades boxes so that it can be moved to Appleton Tower.
Managed Platform.
Stephen reported that there are still some packages for which there were rpms held in the fc5/autolcfg repository but for which there were no rpms in the normal fc5 repositories. Next week he will move any remaining rpms from the autolcfg repository to the dice repository and delete the autolcfg repository.
Paul will shortly be devoting some development time to LCFG and asks that people vote on our bugzilla server for the LCFG bug reports that they would most like to have fixed.
Chris and Stephen will be making some changes to LCFG header files to accommodate Solaris and FC5 64 bit architectures; this will cause the recompilation of a very large number of profiles.
There was some discussion of whether or not there were a large number of unnecessary profile compilations due to unused machines not having their profile updated appropriately. Stephen will check to see whether an unallocated machine's profile is recompiled.
Some new LCFG compiler macros have been defined (LCFG_RELEASE_STABLE, LCFG_RELEASE_TESTING, LCFG_RELEASE_DEVELOP and LCFG_RELEASE_VERSION) which allow one to test what release of software an individual profile is associated with. This will allow the maintainers of LCFG header files to make part of the contents of the header file dependent on the release and/or version. The release and version will also be available on the machine itself via the resources profile.release and inv.release_version (see the DICE Releases FAQ for further details). Neil suggested that the minv command, shipped in the lcfg-inventory-client rpm, could be updated to report these values.
Work has begun on the testing of FC5 DIY DICE.
The fusion SCSI card (as used in Dell PowerEdge 1600, 1750, 1850 and Workstation 450 machines) requires an extra LCFG resource; this has been added by hand to the ws450 machine profiles as a temporary measure but will be added to the appropriate headers shortly.
Research and Teaching.
The ITO have hit a problem with changing the names of tutor groups as recorded in the School database. The problem has been diagnosed but a fix has not yet been developed.
Tim and Rosemary have been investigating instances of processes on the School database getting into an uninterruptable stuck state (the ipm Ingres Process Monitor reports the state as <any>). It appears that something in the low level definition of the person table may be at the root of the problem. It may be possible to fix this when the database server is upgraded to FC5.
It will be necessary to make some changes to one of the tables to allow the upload of RAE data to the University's own RAE database (some data that needs to be in two separate fields is currently held in a single field of the School database).
John has been working on the exam laptops. He has backed up last year's data and has upgraded one of them to FC5.
John has been discussing the requirements for a videoconferencing facility for AIAI with staff from MALTS with a view to them providing the service, rather than it being supported in-house.
Iain has started looking at running GridEngine under FC5. He will need to upgrade about 100 nodes that are currently running FC3. He is also putting together a project proposal for improvements to GridEngine.
Tim and Archie have been looking at DIY DICE for FC5. Archie has also been working on the hardware description language Verilog and supporting lab sessions for the last few weeks.
Services.
The backup server, ouroboros, has been crashing but it turned out that it was not caused by a faulty SCSI controller but by a software fault (AFS software had only partially been installed); this has now been fixed.
The JCMB print server has been showing a similar fault to the Appleton Tower print server mentioned previously.
The disk that had failed on the ATABeast at JCMB was replaced after the last meeting. A disk has now failed in the BP ATABeast. It would be possible to replace a disk when it started to generate errors and before it failed. However it would be a risky strategy if there were two disks showing errors since one would not know for sure which disk would fail first. If one replaced one of the error-generating disks and then the second error-generating disk failed before the RAID array had rebuilt the data on the swapped-in disk then there would be loss of data. Nexsan have suggested that we re-seat the disk controller card on the ATABeast that is showing the disk failures.
The 8 slot LT02 super-loader tape loader had its chassis replaced but it is too soon to say whether or not this has fixed the problem that we have been seeing with tapes jamming.
On Saturday morning the ATABoy at KB locked up again for the third time. Nexsan now say that they have identified the cause and have applied a fix to the previous version of the firmware. Neil applied this patched firmware on Sunday when he came in to attend to the ATABoy fault. Some of the web servers had problems with NFS mounts subsequent to the lock up. One of the partitions (ptn104) on sphinx failed fsck after the lock up and it was unmounted until the Monday morning when the data was restored from the Saturday morning backup. The Services Unit are publishing a breaks in service web page and it is being considered whether this should be made public.
The Head of School would like us to set up two podcasts as soon as possible; in the short term Neil will use scunner.inf (which can NFS mount the multimedia data files from their existing locations) and make media.inf an alias for this host. EUCS has a streaming service and we should investigate using this but the temporary measure can be implemented in a matter of minutes. Mike has also requested that we set up an alternative web page, in Chinese, for the Postgraduate Prospectus web page (the translation of the web page has already been done by one of the staff). This Chinese version will be presented by default to any browser running on a machine in China.
User Support.
The Forrest Hill support office is being moved from A5 to C13 during the course of today. This move is on a trial basis but if all goes as we hope and expect then it will be for the rest of the time that we occupy Forrest Hill.
Ken passed on Alison's report on FC5 upgrades: all the staff/PhD machines that could be upgraded in Appleton Tower and Forrest Hill had been done. There were still 2 to upgrade and 1 to replace in Appleton Tower and 5 to upgrade in Forrest Hill. There had been 137 staff/PhD machine upgrades in Buccleuch Place leaving approximately 10 more to do. There had been 122 staff/PhD machine upgrades in JCMB. Alison will firm up the number of machines outstanding at the end of this month.
Ken has been doing further work on the Development Meeting software. Changes to the proposed dates of deadlines and milestones are now logged. It is also now possible to commit major edits to the project table entries in such a way that the previous values of the changed fields are logged. The logged information is reported in the history page for the project. In connection with this project Ken mentioned the problems that he was experiencing with getting KX509 authentication working with the Ruby on Rails framework with apache 1.3 on FC5. This has been reported to Neil. The authentication to the web pages seems to work for a short while after it has correctly worked with a more normal cgi script in cgi-bin. Stephen suggested that it sounded as if the authentication was working for a child process but that once that process died and another child process was spawned then the authentication failed again.
Roger reported that the Computer Systems Biology group have now moved from the Appleton Tower level 6 to the Darwin Building and that they appear to be reasonably happy. He has now finished his introductory talks to students on Unix and the DICE computing environment and has updated his notes for these talks.
Morna has been updating the Support web pages and has started work on the proposal for the Publishing and Discussion Media Survey Paper project.
She reminded the Services Unit that the provision of the print$ share on which printer drivers would be loaded for use by MDP machines would shortly be blocking the new style of MDP machine installation which the User Support Unit wished to use (the alternative of manually setting up printing on each MDP machine installed would take about an extra half hour work per machine). Neil said that the work would be done in the next week or so, and Ken agreed that this was soon enough and would then not constitute a block on installations.
Neil mentioned that some ITO staff were using a form of ssh for normal shells on remote machines that appeared to have very inferior line-editing capabilities. Tim pointed out that he was referring to a special shell that should only be used to fire up the School database interface and that for all other purposes they should be using winssh.
AOCB
There was none.
Please contact us with any
comments or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh |
![]() |