The new infrastructure for the DICE LCFG service is quite different to the single server model that used to be in place. The simplest way to communicate the new model is probably through a diagram; hence here is a high level representation of the new architecture:
There are various aspects of the new architecture that are worth elaborating upon, not least the fact that there is now more than one LCFG server. Each of the important aspects of the new architecture are discussed in the following sections:
The current RFE Master Server is rfehost.inf.ed.ac.uk. It serves various roles within the new architecture:
The primary role of the RFE Master Server, at least in terms of the new DICE LCFG infrastructure, is to host the LCFG source material. This data is stored on the RFE Master Server under the directory hierarchy:
The differences between the old and the new architectures only begin to become apparent once the RFE edits are complete. While in the old single single server model the LCFG server would very quickly pick up the changes, in the new model this will take slightly longer. This is because the transfer of the updated LCFG source data from the RFE Master Server to the LCFG Slave Servers introduces an additional latency of a few seconds.
TODO: More here?
Perhaps the most subtle difference between the old and new models is the way
the LCFG component defaults files are managed. These are the
*.def files contained within the
created as part of the LCFG component build process.
With the old single server model the defaults files were installed onto the
LCFG Server by installing the component defaults RPMs on the LCFG Server
itself. This was done by editing the
listed here, followed by
om updaterpms run on the LCFG server to pull in the new
Following the old model in the new architecture would result in component
authors having to run
om updaterpms run on mutliple LCFG
Servers to update the defaults files. To avoid having to do this the LCFG
defaults files are mastered on the RFE Master Server in the new architecture.
These defaults files are then copied over to the LCFG Slave Servers along with
the LCFG source data prior to compilation. The end result is that the process
required to deploy a new LCFG component defaults RPM is essentially unchanged,
apart from which host the
om updaterpms run should be executed on
to install the new LCFG component defaults RPM.
Within the new DICE LCFG Infrastructure
om updaterpms run should
be executed on rfehost.inf.ed.ac.uk when deploying a new
LCFG component defaults RPM.
To allow for the transfer of LCFG source data from the RFE Master Server various rsync modules have been exported. Some of these exported modules are required for the DICE LCFG Infrastructure, others are to allow for third parties to obtain useful LCFG source material. At the time of writing, the following LCFG related rsync modules are available:
|lcfgdefaults||LCFG component defaults files (*.def).|
|lcfginf||Root of all Informatics LCFG data.|
|lcfgdefs||Common LCFG header files. This module should be renamed!|
|lcfgpacks||Common LCFG package list files (*.rpms).|
|edpacks||Common Edinburgh Environment package list files (*.rpms).|
TODO: More here?
The current DICE LCFG Slave Servers are:
Following the links above will direct your web browser to the LCFG Status Web Pages on the respective LCFG Slave Server.
More information is available on the following aspects of LCFG Slave Servers:
A DICE LCFG Slave Server is simply a specialised LCFG Server that does not act as a master location for any of the LCFG source material that it compiles. The LCFG source material is pulled over to the LCFG Slave Server via the rsync facility built into the the LCFG Server software. It is then compiled to create LCFG XML Profiles which are then served to LCFG Client hosts via a HTTP Server running on the host. The motivation for not mastering any LCFG source material on the new DICE LCFG Servers is drawn mainly from desire to enhance the reliability and availability characteristics of the DICE LCFG service.
LCFG Servers by their very nature as busy compilation hosts are subject to relatively long periods of high resource utilisation. This is particularly true for CPU load and to a lesser extent disk subsystem activity. Such resource utilization patterns tend to lead to increased risk of hardware failure. These characteristics alone are a good reason to not master configuration data on such hosts. When combined with the rather centralized dependence of the DICE infrastructure as a whole on the LCFG service, this presents the single server model as somewhat precarious. To address this situation the architecture of the new DICE LCFG Infrastrucure allows for multiple LCFG Servers.
TODO: Write about how things were rearranged to allow for multiple LCFG servers.
The configuration of a DICE LCFG Slave Server is deliberately very simple.
The primary design goal was to make them essentially expendable and
trivial to replicate. The end result is that creating a DICE LCFG Slave Server
is little more than
inf/lcfg_slave.h LCFG header file into the host's LCFG
configuration and installing DICE on the machine.
Once up and running a DICE LCFG Slave Server is similar to any other LCFG Server except that the LCFG source material is mastered off-host. There are various filesystem locations on the DICE LCFG Slave Servers that are of particular interest to COs and CSOs:
||LCFG Server log file.|
||Main runtime configuration directory for the LCFG Server.|
||Destination directory for the LCFG source material copied over from the RFE Master Server.|
||Destination directory for the LCFG components defaults files copied over from the RFE Master Server.|
||Output directory tree for the generated LCFG XML Profiles.|
TODO: More here?
For the most part having multiple LCFG Slave Servers within the DICE LCFG Infrastructure will have little affect on day to day operation. There are however a few areas where there are implications on how things are done.
As there is an LCFG Server processes compiling LCFG XML Profiles on each of the LCFG Slave Servers, multiple compilation log files are produced as a result. The upshot of this is that there are multiple log files to watch to fully monitor the LCFG compilation process on DICE. One on each LCFG Slave Server. In practice, as the set of LCFG source files compiled on each LCFG Slave Server is identical, watching one log file on any one of the LCFG Slave Servers is usually enough to gain adequate feedback on the state of compilation.
While the set of LCFG source files compiled on each of the LCFG Slave Servers is identical, the state of compilation may differ between the various LCFG Slave Servers. Resource utilisation on the LCFG Slave Servers, network latencies and timing of the rsync updates from the RFE Master Server can all affect the relative state of each LCFG Slave Server. As a consequence, it is quite normal to see disparity between the respective LCFG Slave Server logs as a result of one LCFG Slave Server being slighty ahead of another in the compilation process. LCFG Slave Server downtime, either scheduled or not, will also introduce differences between the states of the LCFG Slave Servers.
These differences are most apparent in the LCFG Server process log files, but can also be seen in the LCFG Status Web Pages on the LCFG Slave Servers.
Having multiple LCFG Servers compiling the LCFG XML Profiles for a given host also means that the given host will receive multiple LCFG Update Notifications from the LCFG Server processes. Conversely, each LCFG Client will acknowledge receipt of the update notification to all the LCFG Slave Servers. The LCFG Client however will only ever use the most recent LCFG XML Profile to configure a host.
To evaluate which LCFG XML Profile on the various LCFG Slave Servers is most recent the LCFG Client looks at the time of the edit to the LCFG Source files, not the time of creation of the LCFG XML Profile. This information is embedded within the LCFG XML Profile at compilation time and has only one source - the LCFG Source files on the RFE Master Server. Hence, there should be no ambiguity in determining the most up to date LCFG XML Profile.
To be able to take advantage of the multiple LCFG Servers present in the new
DICE LCFG Infrastructure, one small modification is required. The
client.url resource of the LCFG Client component is
adjusted on all DICE Clients to list all available LCFG Slave Servers.
!client.url mSET(http://lcfg1.inf.ed.ac.uk/profiles http://lcfg3.inf.ed.ac.uk/profiles)
This is set globally for all DICE clients in
Multiple entries in the
client.url LCFG resource instructs the
LCFG Client software to check multiple LCFG Servers for new LCFG XML Profiles.
As noted in the
Implications of Having Multiple LCFG Slave Servers
section, the LCFG Client software will compare the available LCFG XML Profiles to
determine which one is derived from the most recent configuration change.
Finding and correcting an LCFG error affecting many machines can be time-consuming, because each iteration of the edit/compile/debug cycle can potentially take an hour or more.
The LCFG test server lcfg.test.inf.ed.ac.uk speeds up the feedback loop considerably. Instead of generating profiles for every host, the test server generates profiles for only a sample of hosts.
The rules controlling the test host sample can be edited with "rfe lcfgtesthosts". Computing staff can add their own test machines to the sample, but remember that the sample should be representative of the variety of machines in use - desktop, laptop, server; staff, student; multiple models; DICE, self-managed; multiple sites; and so on.
The test server runs a web server and makes the usual status web pages available.
By default DICE machines do not get their profiles from the test server. To change this for a test machine, add the test server URL to the machine's client.url resource. There are two ways of doing this.
Profiles from the test server should be treated with care, as follows. If a host imports a spanning map (only certain servers do) then the map will only contain data from machines which have been compiled on the same server.
So there is no problem in theory for ordinary clients to take their profiles from the test server (if they don't import maps).
The problem comes if you are trying to test a service which imports spanning maps (e.g. DHCP). Then if the importing server takes its profile from the lcfg test server, the maps will be incomplete - this might (or might not) be adequate for testing, but you wouldn't want it in production. Note that in this case, you may get profiles from the test and production servers with the same time stamp, but different data and your results would be indeterminate - so you don't want to include both in the URL.
Please contact us with any comments or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh