We need to reboot both of the JCMB switches, to reset their counters (they wrap after about a year and four months of uptime) and to upgrade the firmware. Two questions:
mh0interfaces to be the bonding primary (though you should now use LCFG_OPTIONS_ETHERBOND_PRIMARY rather than setting resources explicitly). Has everyone done this?
We escaped quite lightly from yesterday's EdLAN problems at EBVC and Roslin. The messages which were logged to our snmptrap logs were consistent with spanning-tree not working on our bridged-to-EdLAN VLANs, with packets being looped around repeatedly as a result. We have quite a few over-bandwidth reports, and some OSPF oddities, but that seems to have been about that.
A reminder of one of the foibles of the AT server room alarm... It's not possible to set the alarm for the outer room ("server room 2") while one of the other zones is unset. There is a semaphore notice which people working in the other server rooms are supposed to hang on the controls but they don't always do it. So, if you go in and the alarm doesn't beep and the lights are on then you should assume that there is someone in there and not be surprised if the alarm won't set as you go out. (And in this case, please don't turn all the lights out!)
OTOH, if the alarm doesn't beep on your way in but the lights are out then it's likely that there has been some procedural error along the way. If the alarm does set on your way out then that's probably things sorted out, but please let us know (email to inf-unit) so we can pass on the word. If the alarm doesn't set on your way out then definitely let us know so we can have the situation checked out. Thanks.
We queried some temperature fluctuations in the AT server room last week. Alan replies: "I asked the heating engineers to have a look last week because there was a big difference between the two temperature sensors in the room and I felt it was a bit warm, although the overall room temperature was fine, the air conditioning units were also show low humidity alarms. They made a few manual adjustments on Tuesday or Wednesday, which reduced the temperature (maybe a bit too much bit it certainly felt better), which also resolved the low humidity alarms. They asked the controls guys to check the system, which they did yesterday and that's why the temperature has gone back up. We are still monitoring it and I think they may re-site the temperature sensors in the room to provide a more accurate reading."
There was a temperature spike in the JCMB server room a couple of weeks or so ago. Alan says: "The problem was basically a human error. Someone reported room 1206 was too cold and the heating engineer inadvertantly adjusted the wrong controls! Jim Cumming said they noticed their error quite quickly and it was rectified, although judging by the comments in this call, I'm not sure about that but I'm not going to argue the point."
All fans in the left-most of the three aircon units in the Forum server room have stopped spinning. E&B are investigating - see RT#61462 and the related EBIS ticket.
There are a couple of other problems with the Forum server room aircon (a failed and long-bypassed fan in ACU01; a failed and long-bypassed water inlet controller valve in ACU03) which we will try to chase with E&B.
Reminder: there are a couple of network ports in B.03, behind the door, hidden behind the dexion. These should be active (let us know if you find otherwise). The intention is that they should be useful for booting old machines to wipe their discs. They're currently on the DHCP subnet, but feel free to reconfigure them.
The draft "Data Protection and Interception Statement for Informatics Managed Systems" has been moved to the computing.help site. Comments are invited.
(RT #61425. See also the 2011-03-09 and 2011-03-23 inf-unit reports, and the NTP known issues page.)
In summary, the default
can be unreliable, giving different
calibration for each boot, and sometimes being
so far out that ntpd can't correct
for it. The fix which was added last time was the
ntp.clockPreferences resource, which takes a list of preferred
clock sources. The first one on the list which matches what the kernel
can provide is set as the
if nothing matches then the current clock source is left in place.
The list "hpet acpi_pm" has been used on the inf-unit machines since.
This wasn't made the default for a couple of reasons:
There may be a case, though, to make "hpet acpi_pm" the default for real hardware servers, or perhaps just generally now (the question of time on virtual servers being still somewhat unclear). Comments are invited.
A couple of useful NTP-debugging commands:
ntpq -c peers [optional-hostname]
om [optional-hostname.]ntp status clocks
Ever since Informatics was first formed, we have named our Linux routers after eminent conductors (starting with Wagnerians...). For those of you wondering who Karel Ancerl was, there's a fascinating article in last Sunday's Observer New Review. There are wikipedia articles on all our choices too, of course.
As previously announced, JANET will begin charging for SSL certificates beginning 1st May 2013. We will be renewing all such certificates during April. The list can be found at the bottom of this page. Please let us know of any changes to this list.
Please contact us with any comments or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh