White dot for spacing only
The Dice Project


Overview

The Infrastructure Unit operates the following general services across three sites: Informatics Forum, Appleton Tower, JCMB. Each site is set up so that it can operate as autonomously as possible, while at the same time providing redundant services to the other sites.

For the decant out of Appleton Tower we are operating the Forrest Hill and Wilkie buildings as virtual floors of AT, to minimise the amount of setup and reinstallation work required. There are edge switches only, with core infrastructure being provided from Appleton Tower. There are separate pages showning the network arrangements for Forrest Hill and Wilkie.

Network infrastructure and services

Each site's ether switch configuration has been tailored to the particular circumstances. At the time of writing (September 2015) we have 170 network switches in the Forum, 7 switches in Appleton Tower, 20 switches in Forrest Hill, 14 switches in Wilkie Building, and 3 switches in JCMB.

JCMB and the Forum also each have pair of FibreChannel fabrics. The Appleton Tower fabric has been discontinued, with the switches now being used as spares.

NOTE that 5600 FC switches are now unobtainable, and as a result of various failures we now have only one remaining FC "spare" switch. This "spare" 5600 also has fewer ports available than does the remaining KB 5800, which isn't currently a problem but does need to be kept in mind.

The Forum and Appleton Tower each have four network infrastructure machines, as follows:

At JCMB we have only three machines: one combining the first three roles above, a second (VM) acting as external nameserver, and a third acting primarily as console server but set up so as to be able to take on the network roles if required.

Consoles

Each site has one "console server" machine, acting as a central point for all of the site's IPMI, KVM and Lantronix-serial consoles. In addition, they act as console server for each other; and in a few cases we have an off-site console server set up for critical machines. There's also a console server on a VM for the Forum self-managed server room consoles, primarily to make access control easier.

Monitoring

A nagios monitoring service is integrated with our machine configuration system. We have two nagios machines:

The primary is a real machine, to minimise dependencies, and the secondary a VM.

Authentication services

We use kerberos for authentication. There is one master KDC, in the Forum, and one slave KDC at each site. The iFriend master KDC is in the Forum, with a slave in Appleton Tower, and an additional slave at KB (this one is non-operational and purely to provide a backup away from the central area)

Although we don't make a lot of use of kx509 at the moment, we still run a kx509 service. There are currently two KCAs, both on VMs.

Most web authentication now uses cosign. We currently have cosign servers (physical machines) in the Forum and Appleton Tower. These also co-host the iFriend KDCs. These services are not suitable for co-locating with the main KDCs for security reasons.

Hosts and services requiring a locally-signed X.509 certificate obtain this using the sixkts service. As this is not a high-availability requirement, we currently have one sixkts server on a VM.

Directory services

The OpenLDAP master is currently in the Forum, and there are site slaves in the Forum, Appleton Tower and JCMB. There are also four VM lightweight slave servers. These seven slaves provide all client/server traditional LDAP provision via sssd on SL7. In addition, all SL6 DICE machines also currently run a full slave configuration.

Account management (prometheus)

The prometheus system runs on a VM in the Forum. This would not be an easy service to replicate, but immediate availability is not a requirement and we could move the service to another machine should the main one be unavailable. This could be either the development server or the JCMB OpenLDAP server (which is kept up to date nightly with prometheus data) or indeed another VM, as decided at the time.

Infrastructure Unit Kit Lists

Linux servers

Real machines, sorted by date and then hostname:
Name Type Location Role S/N P/O & date Warranty Replace UPS
(if non-building)
Comments
(GPS receiver) ACUTIME 2000 Forum roof Timestamps for NTP S1 82175548 a628299 2005-01-31 ?? as and when   (Included here for completeness)
crystal GX745 Forum 5A closet NTP S1, DHCP (B) 428903J ikb0153 2007-06-22 3Y as and when   "Real" serial port required
blackwell R610 AT server room LDAP site-slave 9GD2D4J inf0488 2009-06-30 3Y 2014-15   Pending replacement
darwin R200 AT server room extDNS, extNTP H660D4J inf0488 2009-06-30 3Y ??   Formerly "ancerl"
linnaeus R200 Forum server room extDNS, extNTP C560D4J inf0488 2009-06-30 3Y ??   Formerly "hickox"
mckinley R610 Forum server room LDAP site-slave BGD2D4J inf0488 2009-06-30 3Y 2014-15   Pending replacement
reeves DL180 Forum server room LDAP master CZ30301P1K inf0953 2010-07-09 4Y 2014-15   Pending replacement
abbado DL180 Forum server room Forum netInf, DHCP CZ3115CJ62 inf1175 2011-03-31 5Y 2015-16 3kVA JS0511022795
3kVA XL QS0348111013
Pending replacement by dutoit
hall DL180 JCMB server room LDAP site-slave/prometheus DR CZ3121H23F uoe26808 2011-05-18 5Y 2015-16   Pending replacement
tycho DL180 Forum server room loghost CZ3121H23H uoe26808 2011-05-18 5Y 2015-16 750VA AS0444223639 Pending replacement by copernicus
WARRANTY EXPIRED ABOVE HERE
slatkin R310 JCMB server room KB netServ, consoles, ifriend KDC slave (unused) 4WL3C5J inf1642 2012-05-10 5Y 2017-18 3kVA JS0510018437  
elder R320 JCMB server room KB extRt, netInf, DHCP BB3YC5J inf1748 2012-06-18 5Y 2017-18? 3kVA JS0511022966  
norrington R320 AT server room AT netServ, site DNS, OpenVPN, DHCP (static) CB3YC5J inf1748 2012-06-18 5Y 2017-18? 3kVA JS0510018447
3kVA JS0714011688
 
bevan R320 Forum server room KDC master B15CD5J inf1771 2012-06-22 5Y 2017-18?    
Machines below are outwith the current three-year kit planning period (2015-18)
blatiere R320 Forum server room Forum consoles master, DHCP J9MM9X1 inf2541 2013-03-28 5Y 2018-19    
hanlon R210 Forum server room cosign, iFriend KDC master 6LQM9X1 inf2539 2013-03-28 5Y 2018-19    
mcintyre R210 AT server room cosign, iFriend KDC slave 31JFWX1 inf2539 2013-03-28 5Y 2018-19    
rattle R320 Forum server room Forum netServ, site DNS, OpenVPN, DHCP (static) G1KL9X1 inf2540 2013-03-28 5Y 2018-19 3kVA XL QS0348111013
3kVA JS0511022795
 
gatti R320 AT server room AT netInf, DHCP BKZKT02 inf3726 2014-03-18 5Y 2019-20 3kVA JS0510018447
3kVA JS0714011688
 
grepon R320 AT server room AT consoles 4KZKT02 inf3726 2014-03-18 5Y 2019-20    
knussen R320 Forum server room Forum extRt 9CKKT02 inf3719 2014-03-18 5Y 2019-20 3kVA JS0511022795
3kVA JS0510014805
 
curtis R220 JCMB server room JCMB KDC C4B3952 inf5359 2015-04-29 5Y 2020-21   Replaces hati
heaton R220 Forum server room Forum KDC 35B3952 inf5359 2015-04-29 5Y 2020-21   Replaces fenrir
moffat R220 AT server room AT KDC H2B3952 inf5359 2015-04-29 5Y 2020-21   Replaces skoll
runnicles R320 AT server room AT extRt 9HJ8952 inf5382 2015-05-07 5Y 2020-21 3kVA JS0510018447
3kVA JS0714011688
 
campbell R320 AT server room LDAP site-slave to be D4J8952 INF5380 2015-05-07 5Y 2020-21   Will replace blackwell
nelson R320 Forum server room LDAP site-slave to be J2J8952 INF5380 2015-05-07 5Y 2020-21   Will replace mckinley
polly R320 Forum server room LDAP master to be 34J8952 INF5380 2015-05-07 5Y 2020-21   Will replace reeves
copernicus R330 AT server room loghost to be 4RTZGD2 inf6533 2016-06-13 5Y 2020-21? 3kVA JS0510018447
3kVA JS0714011688
Replaces tycho
dutoit R330 Forum server room Forum netInf to be 4SQYGD2 inf6532 2016-06-13 5Y 2020-21? 3kVA JS0510018447
3kVA JS0714011688
Replaces abbado
klaxon R330 Forum server room Nagios master DS6RGD2 inf6531 2016-06-13 5Y 2020-21?   Replaces cockerel
XXX R330 JCMB server room LDAP site-slave/prometheus DR to be DS5SGD2 inf6531 2016-06-13 5Y 2020-21?   Replaces hall

Virtual machines, sorted loosely by name and functionality:
Name Location Role Comments
buchanan IF kca, misc  
capon AT Nagios secondary  
dammers KB sixkts, kca  
descartes KB Test RADIUS server  
euclid AT Test RADIUS server  
huxley IF Test nameserver/timeserver  
peigne IF Forum self-managed consoles  
pythagoras IF Test RADIUS server  
vandellas IF Prometheus master  
wallace KB extDNS  
damflask IF LDAP lightweight slave  
hutter KB LDAP lightweight slave  
redmires IF LDAP lightweight slave  
schneider AT LDAP lightweight slave  
howden IF LDAP temporary autofs master  
ladybower AT LDAP temporary autofs slave  
langsett KB LDAP temporary autofs slave  

Machines in the Forum and AT server rooms are covered by the inbuilt UPSes, and are shown with a blank in the column unless they have some additional provision. Machines in the JCMB server room may be powered by one of the "rack" UPSes, and in this case are shown with a blank in the column unless they have some additional provision.

FibreChannel Switches

NOTE: we are phasing out FC in favour of discs either in or directly attached to servers. The table below may be a little out of date.

(Sorted by date and then hostname. All warranties have now expired.)
Name Type Location Role S/N P/O & date Comments
fc00 SANbox 5600 IF-B.02 0835C00819 ikb0626 2008-09-05  
fc01 SANbox 5600 IF-B.02 0834C00021 ikb0626 2008-09-05  
fc0a SANbox 5600 AT server room 0834C00108 ikb0626 2008-09-05 Taken out of service as a possible spare
fc10 SANbox 5600 IF-B.02 0834C00018 ikb0626 2008-09-05  
fc11 SANbox 5600 IF-B.02 0834C00113 ikb0626 2008-09-05  
fc1a SANbox 5600 AT server room 0834C00026 ikb0626 2008-09-05 Taken out of service and in use as a spare at KB
fc0 SANbox 5800 JCMB   1005F00659 inf0778 2010-03-09 PSU fault
fc1 SANbox 5800 JCMB   1005F00525 inf0778 2010-03-09  

UPSes

(Sorted by location and role, more or less...)
Name Type Role Location S/N P/O & date Rating Battery Comments
atb0.ups.at.net Smart-UPS 3000 RM AT core network comms cabinet JS0510018447 a631778 2005-07-11 3kVA 2013-02 Formerly AT5; battery moved from AT8 (RT#63576)
atb1.ups.at.net Smart-UPS 3000 RM AT core network comms cabinet JS0714011688 ikb0141 2007-06-19 3kVA 2013-09 Formerly AT8
core.ups.f.net SMART-UPS 3000 RM XL Forum core0 comms racks QS0348111013 a614789 2004-02-04 3kVA 2013-01  
netInf.ups.f.net Smart-UPS 3000 RM Forum core1 comms racks JS0511022795 a631778 2005-07-11 3kVA 2012-05  
netServ.ups.f.net Smart-UPS 3000 RM Forum core2 comms racks JS0510014805 a631753 2005-06-15 3kVA 2016-03  
(USB) Smart-UPS 750 RM Forum loghost comms racks AS0444223639 a627391 2004-12-02 750VA    
(USB) Smart-UPS 1500 Test & development IF-2.36 YS0315121217 a609907 2003-08-14 1500VA 2013-03  
rack0.ups.kb.net Smart-UPS 3000 RM JCMB server room Rack 0 JS0617023553 ?? 3kVA    
rack1.ups.kb.net Smart-UPS 3000 RM JCMB server room Rack 1 JS0511022966 a631778 2005-07-11 3kVA    
rack2.ups.kb.net Smart-UPS 3000 RM JCMB server room Rack 2 JS0510018437 a631778 2005-07-11 3kVA    
rack3.ups.kb.net Smart-UPS 3000 RM JCMB server room Rack 3 JS0617023554 a637388 2006-06-27 3kVA 2014-10 Formerly AT4

IUkit.html,v 1.358 2016/07/07 10:36:10 toby Exp


 : Units : Infrastructure : Documentation 

Mini Informatics Logo - Link to Main Informatics Page
Please contact us with any comments or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh
Spacing Line