![]() |
Apologies for absence
Alastair, Archie, Craig, George and Iain had sent their apologies.
minutes of the last meeting.
Neil requested a change to the minute of the Services Unit report; in the comment about thassos it should have said that it was labouring rather than failing.
Reports from units.
Infrastructure.
Toby reported that the changes to the open ldap component and the nss_ldap related changes, both mentioned at the last meeting, have now been rolled out. The ldap schema changes for handling AFS home directories and the monitoring system under development will follow shortly.
The infrastructure secondary router at Forrest Hill, haitink, has a possible disk problem. Because of this has been configured not to act as a dns server. It is still a dhcp server but this could be moved to another machine in the event of a disk failure.
The backup air-conditioning unit in the Buccleuch Place server room is beginning to ice up again; the works department have been informed. The heat sensor in the machine room at Buccleuch Place has been moved to one of the hotter locations within the room.
CereProc Ltd, a spin-off company under the auspices of ettc, has been given space in the basement floor of 6 Buccleuch Place. They have been given a network connection to JANET. Later today Julieta will configure it so that the network is outside of inf.ed.ac.uk. The staff of CereProc will be responsible for their own internal networking.
Managed Platform.
Alastair has added the hwaddr_<tag> resource to the network component so that one can associate a MAC address with a network interface on multi-homed machines under FC5 (see Alastair's note on this).
In contrast, under FC5 the fibre channel storage target has a more predictable device name; this is still to be documented locally.
All packages that had been held back because of the openldap/updaterpms interaction mentioned some weeks ago have now been distributed to machines following the fix to the openldap problem.
Stephen has been making changes to the logrotate related resources of various components in order to compress old log files.
Stephen has shipped a new SMP kernel that will be used on all Dell GX620 machines.
Stephen reminded everyone that LCFG support for managing RedHat 9 machines would disappear in about a week. Morna offered to email round the reasons she had received from users why they continued to use elodie (rh9 multiuser host).
Research and Teaching.
Graham has implemented a kx509 authenticated web page which students can use to request a database account and has added a toggle to allow the authentication to be either via kerberos or via a separate password file. He is now documenting management procedures for the postgresql server.
John has been working on a new condor component and it will be rolled out very shortly. We may need to create a new quiet-student-lab header file to prevent condor from running on student lab machines in the designated quiet labs because of the risk of the noise from Dell GX620 fans disturbing students when the labs are only partially occupied and condor is running on many machines.
John has also been setting up new video conferencing equipment for AIAI.
Iain has been working on a test cluster for Grid Engine to investigate different queueing/priority models.
Rosemary has completed a web page to allow staff to update the RAE 2008 data held in the Informatics database that refers to themselves. This is now undergoing acceptance testing by Alan Bundy.
Tim reported that three pieces of teaching software (Isabelle, polyml and smlnj) had been reported as broken under FC5. The latter two have now been fixed but there is still an outstanding problem with Isabelle despite an attempt to fix it. Until the licensing situation with the new version of Maple has been sorted out the old version of Maple (version 9.03) has temporarily been ported to FC5.
Archie has been working on the hardware for the System Design Project course and new web cams and the tftp server for the Intelligent Autonomous Robotics course.
Services.
At about 1am on Monday morning the ATABoy at KB locked up again (it had previously done this when Craig and Neil were swapping the chassis on August 5th). Nexsan have been sent details and their engineer is looking at the logs. This failure had significant repercussions elsewhere including the admin samba server, mirror machines, the rpm repository and the rpm caches.
Lindsey has told the Services Unit what disk space will be required for the influx of new first year undergraduate and postgraduate students. This should not be a problem to provide but they are being careful to not mix NFS and AFS file space on a single storage device.
Neil has been helping Diana to convert the pages on the School primary web site to use the new style being promoted by the University for colleges and schools. Neil has used cascading style sheets to implement this new look for the site.
Neil has recently been working with Scott Larnach, the postmaster at EUCS, to improve spam filtering. We have supplied Scott with lists of the valid addresses from our legacy mail domains and hourly updates to a list of valid addresses for our current domains. Any invalid addresses are now being bounced at the ed mail server rather than being forwarded on to us to bounce. Several other schools are also doing this and the overall effect is to reduce traffic on the university's internal networks and reduce load on the school mail servers. Neil will double check with Scott about bouncing of mail based on invalid internal originator addresses.
There are now just 4 computing staff still using NFS home directories, Charlie, Sheila, Archie and George.
Header files to support the two principal versions of apache on FC5 will be prepared in the next 10 days.
Two new machines are to be installed as FC5 servers to replace staff.ssh and student.ssh.
Morna raised the issue of firefox lock files on AFS. It is a known problem apparently. Simon described it as a generic but complex problem to do with locks, dead processes, the kernel and AFS (there is however no consensus amongst the people responsible for the various areas as to whose problem it is). Iain is going to put a wrapper script around firefox to delete the lock file when firefox exits. The issue of multiple instances of firefox ( and hence multiple .parentlocks) has to be addressed within this solution.
Neil asked the User Support Unit to request the creation of new mail accounts (passwd entries on the mail server) using RT tickets for the Services Unit queue rather than mailing the services-unit to avoid the possibility of requests being overlooked. Morna and Alison argued that it would make more sense to subcontract the task of creating these passwd file entries to Front Line Support staff. Ken will nominate a representative CO from User Support to agree with Neil a set of instructions covering this task so that it can be subcontracted in this way. Hopefully this will all go away when the mail server is upgraded to FC5 if it becomes possible to use ldap again instead of a local passwd file.
User Support.
Alison reported that there were now 491 FC5 machines under the management of the User Support Unit. There are still about 240 FC3 machines. It should be possible to meet the target of upgrading all the desktop machines of those staff/PhD students that are agreeable to the upgrade by the end of October. One of the student labs at KB holding 13 machines will only be upgraded in mid October because we learnt that it was still being used by some external MSc students who don't finish till early in October (Anna has agreed that this is OK).
Morna has installed two new Dell 1425 servers as site-specific multi-user machines. She asked about what should be done about the other two site-specific multi-user machines since there were no new machines to replace them with. Ken said that he will enquire of the other unit leaders what spare server machines they now had so that we could identify two machines that could be reused as replacements for the older site-specific multi-user machines
Alison has set up an email notification to the User Support Unit of tickets in the support queue that are unresolved and owned by nobody. She will do the equivalent for such tickets in the queues belonging to the other units.
Appleton Tower power shutdown
Various servers in the Appleton Tower basement will be shut down on Friday 15th September about 17:00 in anticipation of the disconnection of all power to the building the following morning at 08:00. The servers that will be shut down include the main kerberos KDC and the file servers roc and phoenix. Ken said that he had installed a cron job on all student lab and office machines in Appleton Tower to shut them down during the period 06:30 to 07:30 on the 16th.
All units should mail details of how the power outage will affect the facilities that they are responsible for, to the User Support Unit by the end of this week. A message will be sent by the User Support Unit later today to sys-announce warning people of the severe disruption that weekend with further details to follow by the end of the week.
Profiles vs Headers under Release Management
Chris drew the meeting's attention to a guide on good practice in using Release Management on servers (see the DICE Releases FAQ). More specifically he noted that:
Toby asked whether there were any reference machines for the different releases. There are none but it would be worth introducing them.
AOCB
There was none
Please contact us with any
comments or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh |
![]() |