White dot for spacing only
The Dice Project


Overview

The Infrastructure Unit operates the following general services across three sites: Informatics Forum, Appleton Tower, JCMB. Each site is set up so that it can operate as autonomously as possible, while at the same time providing redundant services to the other sites.

For the decant out of Appleton Tower we will be operating the Forrest Hill and Wilkie buildings as virtual floors of AT, to minimise the amount of setup and reinstallation work required. There will be edge switches only, with core infrastructure being provided from Appleton Tower. There are separate pages showning the proposed network arrangements for Forrest Hill and Wilkie.

Network infrastructure and services

Each site's ether switch configuration has been tailored to the particular circumstances. At the time of writing (September 2014) we have 164 network switches in the Forum, 43 switches in Appleton Tower, and 2 switches in JCMB. JCMB and the Forum also each have pair of FibreChannel fabrics. The Appleton Tower fabric has been discontinued, with the switches now being used as spares.

NOTE that 5600 FC switches are now unobtainable, and as a result of various failures we now have only one remaining FC "spare" switch. This "spare" 5600 also has fewer ports available than does the remaining KB 5800, which isn't currently a problem but does need to be kept in mind.

The Forum and Appleton Tower each have four network infrastructure machines, as follows:

At JCMB we have only three machines: one combining the first three roles above, a second acting as external nameserver, and a third acting primarily as console server but set up so as to be able to take on the network roles if required.

Consoles

Each site has one "console server" machine, acting as a central point for all of the site's IPMI, KVM and Lantronix-serial consoles. In addition, they act as console server for each other; and in a few cases we have an off-site console server set up for critical machines. There's also a console server on a VM for the Forum self-managed server room consoles, primarily to make access control easier.

Monitoring

A nagios monitoring service is integrated with our machine configuration system. We have two nagios machines:

The primary is a real machine, to minimise dependencies, and the secondary a VM.

Chat

We operate a chat service, as this is also used by the nagios system to send alerts. The service runs on a VM.

This service will be transferred to the Services Unit in due course.

Authentication services

We use kerberos for authentication. There is one master KDC, in the Forum, and one slave KDC at each site. The iFriend master KDC is in the Forum, with a slave in Appleton Tower, and an additional slave at KB (this one is non-operational and purely to provide a backup away from the central area)

Although we don't make a lot of use of kx509 at the moment, we still run a kx509 service. There are currently two KCAs, both on VMs.

Most web authentication now uses cosign. We currently have cosign servers (physical machines) in the Forum and Appleton Tower. These also co-host the iFriend KDCs. These services are not suitable for co-locating with the main KDCs for security reasons.

Hosts and services requiring a locally-signed X.509 certificate obtain this using the sixkts service. As this is not a high-availability requirement, we currently have one sixkts server on a VM.

Directory services

The OpenLDAP master is currently in the Forum, and there are site slaves in the Forum, Appleton Tower and JCMB. In addition, all DICE machines also currently run a full slave configuration. Note that this is currently under review, and arrangements may change as a result.

Account management (prometheus)

The prometheus system runs on a VM in the Forum. This would not be an easy service to replicate, but immediate availability is not a requirement and we could move the service to another machine should the main one be unavailable. This could be either the development server or the JCMB OpenLDAP server, or indeed another VM, as decided at the time.

Infrastructure Unit Kit Lists

Linux servers

(Sorted by date and then hostname.)
Name Type Location Role S/N P/O & date Warranty Replace UPS
(if non-building)
Comments
(GPS receiver) ACUTIME 2000 Forum roof Timestamps for NTP S1 82175548 a628299 2005-01-31 ?? as and when   (Included here for completeness)
crystal GX745 Forum 5A closet NTP S1, DHCP (B) 428903J ikb0153 2007-06-22 3Y as and when   "Real" serial port required
blackwell R610 AT server room LDAP site-slave 9GD2D4J inf0488 2009-06-30 3Y 2014-15 [1]    
darwin R200 AT server room extDNS, extNTP H660D4J inf0488 2009-06-30 3Y ??   Formerly "ancerl"
fenrir R200 Forum server room KDC, AFSDB J660D4J inf0488 2009-06-30 3Y 2014-15    
linnaeus R200 Forum server room extDNS, extNTP C560D4J inf0488 2009-06-30 3Y ??   Formerly "hickox"
mckinley R610 Forum server room LDAP site-slave BGD2D4J inf0488 2009-06-30 3Y 2014-15 [1]    
hati R210 JCMB server room KDC, AFSDB GJ6TJ4J inf0622 2009-11-02 3Y 2014-15    
skoll R210 AT server room KDC, AFSDB FJ6TJ4J inf0622 2009-11-02 3Y 2014-15    
otaka DL180 AT server room AT netInf, DHCP CZ30291JCP inf0953 2010-07-09 4Y 2014-15 1kVA AS0614310419 Replace norrington with 10Gbps DA and roll down?
reeves DL180 Forum server room LDAP master CZ30301P1K inf0953 2010-07-09 4Y 2014-15? [1]    
WARRANTY EXPIRED ABOVE HERE
abbado DL180 Forum server room Forum netInf, DHCP CZ3115CJ62 inf1175 2011-03-31 5Y 2016-17 3kVA JS0511022795
3kVA XL QS0348111013
 
cockerel DL180 Forum server room Nagios master CZ3121H23L uoe26808 2011-05-18 5Y 2016-17    
hall DL180 JCMB server room LDAP site-slave/prometheus DR CZ3121H23F uoe26808 2011-05-18 5Y 2016-17 [1]    
tycho DL180 Forum server room loghost CZ3121H23H uoe26808 2011-05-18 5Y 2016-17 750VA AS0444223639  
slatkin R310 JCMB server room KB netServ, consoles, ifriend KDC slave (unused) 4WL3C5J inf1642 2012-05-10 5Y 2017-18 3kVA JS0510018437  
elder R320 JCMB server room KB extRt, netInf, DHCP BB3YC5J inf1748 2012-06-18 5Y 2017-18 3kVA JS0511022966  
norrington R320 AT server room AT extRt CB3YC5J inf1748 2012-06-18 5Y 2017-18 1400VA XL QS0322110541
1400VA XL QS0322210008
Roll down to netInf and add new 10Gbps DA machine as extRt?
bevan R320 Forum server room KDC master B15CD5J inf1771 2012-06-22 5Y 2017-18 [2]    
blatiere R320 Forum server room Forum consoles master, DHCP J9MM9X1 inf2541 2013-03-28 5Y 2018-19    
hanlon R210 Forum server room cosign, iFriend KDC master 6LQM9X1 inf2539 2013-03-28 5Y 2018-19    
mcintyre R210 AT server room cosign, iFriend KDC slave 31JFWX1 inf2539 2013-03-28 5Y 2018-19    
rattle R320 Forum server room Forum netServ, site DNS, OpenVPN, DHCP (static) G1KL9X1 inf2540 2013-03-28 5Y 2018-19 3kVA XL QS0348111013
3kVA JS0511022795
 
gatti R320 AT server room AT netServ, site DNS, OpenVPN, DHCP (static) BKZKT02 inf3726 2014-03-18 5Y 2019-20 1400VA XL QS0322110541
1400VA XL QS0322210008
 
grepon R320 AT server room AT consoles 4KZKT02 inf3726 2014-03-18 5Y 2019-20    
knussen R320 Forum server room Forum extRt 9CKKT02 inf3719 2014-03-18 5Y 2019-2 ??  
babbler VM IF jabber            
buchanan VM IF kca, misc            
capon VM AT Nagios secondary            
dammers VM KB sixkts, kca            
huxley VM IF Test nameserver/timeserver            
peigne VM IF Forum self-managed consoles            
vandellas VM IF Prometheus master            
wallace VM KB extDNS            

Machines in the Forum and AT server rooms are covered by the inbuilt UPSes, and are shown with a blank in the column unless they have some additional provision. Machines in the JCMB server room may be powered by one of the "rack" UPSes, and in this case are shown with a blank in the column unless they have some additional provision.

Notes:

  1. OpenLDAP arrangements are currently under review. While all these servers are due for replacement in some form, it's not yet clear what they should be replaced with.

FibreChannel Switches

(Sorted by date and then hostname. All warranties have now expired.)
Name Type Location Role S/N P/O & date Comments
fc00 SANbox 5600 IF-B.02 0835C00819 ikb0626 2008-09-05  
fc01 SANbox 5600 IF-B.02 0834C00021 ikb0626 2008-09-05  
fc0a SANbox 5600 AT server room 0834C00108 ikb0626 2008-09-05 Taken out of service as a possible spare
fc10 SANbox 5600 IF-B.02 0834C00018 ikb0626 2008-09-05  
fc11 SANbox 5600 IF-B.02 0834C00113 ikb0626 2008-09-05  
fc1a SANbox 5600 AT server room 0834C00026 ikb0626 2008-09-05 Taken out of service and in use as a spare at KB
fc0 SANbox 5800 JCMB   1005F00659 inf0778 2010-03-09 PSU fault
fc1 SANbox 5800 JCMB   1005F00525 inf0778 2010-03-09  

NOTE that 5600 switches are now unobtainable, and as a result of various failures we now have only one remaining FC "spare" switch.

UPSes

(Sorted by location and role, more or less...)
Name Type Role Location S/N P/O & date Rating Battery Comments
  SMART-UPS 1400 RM XL AT server room comms cabinet QS0322110541 a602939 2003-06-06 1400VA 2011-06  
  SMART-UPS 1400 RM XL AT server room comms cabinet QS0322210008 a602939 2003-06-06 1400VA 2011-06  
  Smart-UPS 1000 RM AT server room rack 2 AS0614310419 a637388 2006-06-27? 1kVA    
  Smart-UPS 3000 RM AT3 comms cabinet JS0510018446 a631778 2005-07-11 3kVA 2011-07  
  Smart-UPS 3000 RM AT4 comms cabinet JS0617023554 a637388 2006-06-27 3kVA 2014-10  
  Smart-UPS 3000 RM AT5 comms cabinet JS0510018447 a631778 2005-07-11 3kVA 2013-02 Battery moved from AT8 (RT#63576)
  Smart-UPS 3000 RM AT6 comms cabinet JS0511022967 a631778 2005-07-11 3kVA 2014-10  
  Smart-UPS 3000 RM AT7 comms cabinet JS0714011678 ikb0141 2007-06-19 3kVA    
  Smart-UPS 3000 RM AT8 comms cabinet JS0714011688 ikb0141 2007-06-19 3kVA 2013-09  
  SMART-UPS 3000 RM XL Forum core0 ("core") comms racks QS0348111013 a614789 2004-02-04 3kVA 2013-01  
  Smart-UPS 3000 RM Forum core1 ("netInf") comms racks JS0511022795 a631778 2005-07-11 3kVA 2012-05  
  Smart-UPS 3000 RM Forum core2 ("netServ") comms racks JS0617023553 a637388 2006-06-27 3kVA 2011-07?  
  Smart-UPS 750 RM Forum loghost comms racks AS0444223639 a627391 2004-12-02 750VA    
  Smart-UPS 1500 Test & development IF-1.09 YS0315121217 a609907 2003-08-14 1500VA 2013-03  
  Smart-UPS 3000 RM JCMB server room Rack 0 JS0511022966 a631778 2005-07-11 3kVA    
  Smart-UPS 3000 RM JCMB server room Rack 1 JS0510018437 a631778 2005-07-11 3kVA    
  SMART-UPS 5000 RM DL4 JCMB server room Rack 2 CS0543110262 ?? 5kVA    

IUkit.html,v 1.308 2014/10/13 10:26:43 gdmr Exp


 : Units : Infrastructure : Documentation 

Mini Informatics Logo - Link to Main Informatics Page
Please contact us with any comments or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh
Spacing Line