![]() |
Apologies for absence
Alastair, Archie, Paul, Stephen and Neil had sent their apologies.
minutes of the last meeting.
These were accepted
Report from Computing Executive Group.
This agenda item was not taken.
Reports from units.
Infrastructure.
George has made changes to the lcfg-iptables and dice-iptables components for FC3 and FC5 and their associated scripts so that generation and insertion of rules into the kernel is considerably faster (there are now 243 hosts generating 2838 holes in the firewall).
Generation has been speeded up by optimising the way the component resources are loaded by the subsidiary scripts, knocking a minute or more off the time taken in the worst cases. This is there on both FC3 and FC5 machines.
Loading has been speeded up by generating a save file from the script, and dumping that into the kernel in one operation, rather than inserting individual rules one at a time (all 3000+ of them). That's turned on by default for FC5 but not for FC3. This knocks about 30 seconds off the loading time.
He has also published the policy for adding firewall holes. (see Notes for Users Regarding Firewall Holes and Notes for COs Regarding Firewall Holes).
LDAP changes to support AFS home directories have gradually been making their way into the stable release. The latest version of slapd (Stand-alone LDAP Daemon) is now available on development release machines.
There is now an FC5 server running as a kerberos KDC (Key Distribution Center).
Craig was called out at 1am in the morning recently by security because the temperature in the Appleton Tower server room had gone 1deg C. above the alarm threshold.
No progress has so far been made with the lack of console ports in the Appleton Tower server room. The rationalisation of console ports in the Buccleuch Place server room is still to be done.
The UPSs in the Buccleuch Place server room will be replaced by larger ones this afternoon and the displaced UPSs will be put into store in JCMB until they can be used in the Appleton Tower.
Managed Platform.
A new version of the LCFG server code has been produced by Paul and will probably be installed on our main LCFG servers later this afternoon by Chris. One of the new features is that machines with broken profiles will remain in spanning maps for about half an hour before dropping out (It later emerged that Chris was wrong in saying that they would drop out of the spanning maps - they won't until the LCFG server is restarted).
Stephen has been working on PXE (Pre-Boot Execution Environment) booting support of certain newer hardware (Dell's PowerEdge PE1850 and 1950, and Optiplex GX745). He has done a re-implementation of the lcfg-pxelinux component which should make it easier in future to add support for new hardware. The new component will be an FC5 option and will appear for development release machines later this week. Support for the the currently unsupported machine models should appear shortly thereafter.
The majority of broken profiles now belong to old RedHat 9 machines. If they are still being used then their profile should include <dice/os/selfmanaged.h> otherwise either <dice/os/unallocated.h> or <dice/os/junk.h>.
Research and Teaching.
Tim and Archie have upgraded the second BioInformatics DIY DICE machine from FC3 to FC5. During the installation they hit a problem because DIY DICE installations had been broken by recent changes, however it was quickly fixed. There is still one server to do but its upgrade is blocked at present on the need for fibre channel support in FC5. Chris thought that this support was now available.
Tim reported that there had been a lot of problems encountered when upgrading the exam laptops to FC5:
Iain has encountered a lot of problems with GridEngine. Many of the beowulf nodes were unusable because jobs with too great a memory requirement were being submitted too quickly. He has taken steps to limit the resource use of jobs using ulimit so that a single job can't swamp a node. One can now request the need for a certain amount of memory for a job and this is then taken into account by GridEngine in assigning it to a node.
Iain will be submitting a couple of project proposals to improve cluster scheduling under GridEngine.
New condor lcfg header files will shortly be available for stable release machines. The headers will correspond to the new condor pools, the KB pool and the central area pool.
A new mailcap file will shortly be in the stable release.
Services.
Craig and Bill have upgraded the firmware, reseated the controller and replaced the disk showing the most errors in the Buccleuch Place ATABeast.
The podcasts and Chinese web page for postgraduate applicants have been set up as requested.
As mentioned earlier local printing on FC5 is currently broken and this will probably also affect print servers when they are upgraded to FC5.
Craig mentioned that some of us may have noticed an increase in the quantity of spam that we are receiving in our mailbox. This is because SpamAssassin as run on the EUCS mail server is not detecting this as spam and giving it a score of 0. EUCS are aware of the problem and are working on a solution.
User Support.
Ken reported on the numbers of machines that had been upgraded/installed with FC5. Alison had provided a more detailed report which is included below:
Site | Upgraded to FC5 | Total at FC5 | At FC3 | |
---|---|---|---|---|
Lab | Other | |||
AT | 117 | 76 | 193 | 5 |
BP | - | 172 | 172 | 2 |
FH | 31 | 67 | 98 | 5 |
KB | 130 | 136 | 266 | 35 |
All | 278 | 451 | 729 | 47 |
The machines above include some CO/CSO spare machines, users who wanted to stall so, as far as Alison is aware, we met our objective of upgrading all machines for users who wanted them upgraded. There are 3 at FH which aren't included in the totals above because they have slightly unusual configurations and are not standard desktops (WS450s) but Alison has scheduled 2 of them for the next week.
Ken reported that since the last Operational Meeting the User Support Unit had handled almost 400 RT tickets (equivalent to about 26 per working day) and resolved 65% of them
Two appointments had been made to the posts vacated by Sarah and Charlie. Jennifer Oxley joined us this Monday and Ewan Grant would be starting in about a fortnight.
Roger has been working on the archiving scripts and finishing upgrading 3 Dell PowerEdge PE1850 servers for the Text Mining research group. He has also spent some time writing scripts for generating AFS home directories for new users.
Morna reported that subsequent to the work done by the Services Unit on providing the print$ share on the samba server she had had some success in setting up some of the printer drivers so that users of MDP machines will be able to point and print to those printers instead of having to have the printing set up manually by support staff. There were a considerable number of printers that still needed to be set up on the print$ share.
Morna, Alison and Roger had been involved in setting up three non-networked PCs for the ITO at very short notice. Ken will try to find out more details about why this was necessary and why more notice had not been given.
Procedure for handling internal access requests to files of former staff/students.
Morna reported that there had recently been requests from some staff to get access to files that belonged to former staff/students. They seemed to be unaware that permission needs to be obtained from owner or, failing that, from the Head of School to retrieve files in these circumstances. Ken will raise it as an issue at the next Service Managers' meeting.
Timing of scheduled down-time for services.
There was some discussion of this and Craig mentioned that there was as yet no consistent policy on when planned outages should take place. Historically things had been handled very differently at Buccleuch Place and say JCMB.
Ken mentioned that Alastair was taking a proposed policy on this issue to the first Computing Strategy Group meeting.
It was accepted that there was now no time during the week and week-end at which services could be withdrawn without causing disruption to somebody so it seemed that the best we could aim for was to minimise disruption for the majority of users. This would point towards avoiding the interval 09:00 till 17:00 on week-days. There were disadvantages to working at week-ends and the evenings; for example in the case of unforeseen hardware problems it would be more difficult to contact hardware manufacturers. There seemed to be some consensus for planning most service interruptions for the hour or so before 09:00 on a week-day
AOCB
There was none
Please contact us with any
comments or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh |
![]() |