White dot for spacing only
The Dice Project


DICE LCFG Infrastructure

The new infrastructure for the DICE LCFG service is quite different to the single server model that used to be in place. The simplest way to communicate the new model is probably through a diagram; hence here is a high level representation of the new architecture:

Diagram of DICE LCFG service infrastructure.

There are various aspects of the new architecture that are worth elaborating upon, not least the fact that there is now more than one LCFG server. Each of the important aspects of the new architecture are discussed in the following sections:

DICE RFE Master Server

The current RFE Master Server is rfehost.inf.ed.ac.uk. It serves various roles within the new architecture:

Hosting LCFG Source Material

The primary role of the RFE Master Server, at least in terms of the new DICE LCFG infrastructure, is to host the LCFG source material. This data is stored on the RFE Master Server under the directory hierarchy:

/etc/rfedata/lcfg

The differences between the old and the new architectures only begin to become apparent once the RFE edits are complete. While in the old single single server model the LCFG server would very quickly pick up the changes, in the new model this will take slightly longer. This is because the transfer of the updated LCFG source data from the RFE Master Server to the LCFG Slave Servers introduces an additional latency of a few seconds.

TODO: More here?

Hosting LCFG Component Defaults

Perhaps the most subtle difference between the old and new models is the way the LCFG component defaults files are managed. These are the *.def files contained within the (lcfg|dice)-<component-name>-defaults-<schema-version>-*-* RPMs (e.g. lcfg-postgresql-defaults-s1-0.1.2-1.noarch.rpm) created as part of the LCFG component build process.

With the old single server model the defaults files were installed onto the LCFG Server by installing the component defaults RPMs on the LCFG Server itself. This was done by editing the */*_defaults.rpms lists listed here, followed by running om updaterpms run on the LCFG server to pull in the new defaults RPMs.

Following the old model in the new architecture would result in component authors having to run om updaterpms run on mutliple LCFG Servers to update the defaults files. To avoid having to do this the LCFG defaults files are mastered on the RFE Master Server in the new architecture. These defaults files are then copied over to the LCFG Slave Servers along with the LCFG source data prior to compilation. The end result is that the process required to deploy a new LCFG component defaults RPM is essentially unchanged, apart from which host the om updaterpms run should be executed on to install the new LCFG component defaults RPM.

Within the new DICE LCFG Infrastructure om updaterpms run should be executed on rfehost.inf.ed.ac.uk when deploying a new LCFG component defaults RPM.

LCFG Rsync Exports

To allow for the transfer of LCFG source data from the RFE Master Server various rsync modules have been exported. Some of these exported modules are required for the DICE LCFG Infrastructure, others are to allow for third parties to obtain useful LCFG source material. At the time of writing, the following LCFG related rsync modules are available:

lcfgdefaults LCFG component defaults files (*.def).
lcfginf Root of all Informatics LCFG data.
lcfgdefs Common LCFG header files. This module should be renamed!
lcfgpacks Common LCFG package list files (*.rpms).
edpacks Common Edinburgh Environment package list files (*.rpms).

TODO: More here?

DICE LCFG Slave Servers

The current DICE LCFG Slave Servers are:

Following the links above will direct your web browser to the LCFG Status Web Pages on the respective LCFG Slave Server.

More information is available on the following aspects of LCFG Slave Servers:

An Introduction to LCFG Slave Servers

A DICE LCFG Slave Server is simply a specialised LCFG Server that does not act as a master location for any of the LCFG source material that it compiles. The LCFG source material is pulled over to the LCFG Slave Server via the rsync facility built into the the LCFG Server software. It is then compiled to create LCFG XML Profiles which are then served to LCFG Client hosts via a HTTP Server running on the host. The motivation for not mastering any LCFG source material on the new DICE LCFG Servers is drawn mainly from desire to enhance the reliability and availability characteristics of the DICE LCFG service.

LCFG Servers by their very nature as busy compilation hosts are subject to relatively long periods of high resource utilisation. This is particularly true for CPU load and to a lesser extent disk subsystem activity. Such resource utilization patterns tend to lead to increased risk of hardware failure. These characteristics alone are a good reason to not master configuration data on such hosts. When combined with the rather centralized dependence of the DICE infrastructure as a whole on the LCFG service, this presents the single server model as somewhat precarious. To address this situation the architecture of the new DICE LCFG Infrastrucure allows for multiple LCFG Servers.

TODO: Write about how things were rearranged to allow for multiple LCFG servers.

The Anatomy of A DICE LCFG Slave Server

The configuration of a DICE LCFG Slave Server is deliberately very simple. The primary design goal was to make them essentially expendable and trivial to replicate. The end result is that creating a DICE LCFG Slave Server is little more than #include-ing the inf/lcfg_slave.h LCFG header file into the host's LCFG configuration and installing DICE on the machine.

Once up and running a DICE LCFG Slave Server is similar to any other LCFG Server except that the LCFG source material is mastered off-host. There are various filesystem locations on the DICE LCFG Slave Servers that are of particular interest to COs and CSOs:

Filesystem Location Description
/var/lcfg/log/server LCFG Server log file.
/var/lcfg/conf/server Main runtime configuration directory for the LCFG Server.
/var/lcfg/conf/server/data Destination directory for the LCFG source material copied over from the RFE Master Server.
/var/lcfg/conf/server/defaults Destination directory for the LCFG components defaults files copied over from the RFE Master Server.
/var/lcfg/conf/server/web Output directory tree for the generated LCFG XML Profiles.

TODO: More here?

Implications of Having Multiple LCFG Slave Servers

For the most part having multiple LCFG Slave Servers within the DICE LCFG Infrastructure will have little affect on day to day operation. There are however a few areas where there are implications on how things are done.

Monitoring LCFG Server Status

As there is an LCFG Server processes compiling LCFG XML Profiles on each of the LCFG Slave Servers, multiple compilation log files are produced as a result. The upshot of this is that there are multiple log files to watch to fully monitor the LCFG compilation process on DICE. One on each LCFG Slave Server. In practice, as the set of LCFG source files compiled on each LCFG Slave Server is identical, watching one log file on any one of the LCFG Slave Servers is usually enough to gain adequate feedback on the state of compilation.

While the set of LCFG source files compiled on each of the LCFG Slave Servers is identical, the state of compilation may differ between the various LCFG Slave Servers. Resource utilisation on the LCFG Slave Servers, network latencies and timing of the rsync updates from the RFE Master Server can all affect the relative state of each LCFG Slave Server. As a consequence, it is quite normal to see disparity between the respective LCFG Slave Server logs as a result of one LCFG Slave Server being slighty ahead of another in the compilation process. LCFG Slave Server downtime, either scheduled or not, will also introduce differences between the states of the LCFG Slave Servers.

These differences are most apparent in the LCFG Server process log files, but can also be seen in the LCFG Status Web Pages on the LCFG Slave Servers.

LCFG Client Update Notifications and Acknowlegdements

Having multiple LCFG Servers compiling the LCFG XML Profiles for a given host also means that the given host will receive multiple LCFG Update Notifications from the LCFG Server processes. Conversely, each LCFG Client will acknowledge receipt of the update notification to all the LCFG Slave Servers. The LCFG Client however will only ever use the most recent LCFG XML Profile to configure a host.

To evaluate which LCFG XML Profile on the various LCFG Slave Servers is most recent the LCFG Client looks at the time of the edit to the LCFG Source files, not the time of creation of the LCFG XML Profile. This information is embedded within the LCFG XML Profile at compilation time and has only one source - the LCFG Source files on the RFE Master Server. Hence, there should be no ambiguity in determining the most up to date LCFG XML Profile.

Configuration of DICE LCFG Clients

To be able to take advantage of the multiple LCFG Servers present in the new DICE LCFG Infrastructure, one small modification is required. The client.url resource of the LCFG Client component is adjusted on all DICE Clients to list all available LCFG Slave Servers. For example:

!client.url mSET(http://lcfg1.inf.ed.ac.uk/profiles http://lcfg3.inf.ed.ac.uk/profiles)

This is set globally for all DICE clients in inf/sitedefs.h.

Multiple entries in the client.url LCFG resource instructs the LCFG Client software to check multiple LCFG Servers for new LCFG XML Profiles. As noted in the Implications of Having Multiple LCFG Slave Servers section, the LCFG Client software will compare the available LCFG XML Profiles to determine which one is derived from the most recent configuration change.

Purpose of The LCFG Test Server

Finding and correcting an LCFG error affecting many machines can be time-consuming, because each iteration of the edit/compile/debug cycle can potentially take an hour or more.

The LCFG test server lcfg.test.inf.ed.ac.uk speeds up the feedback loop considerably. Instead of generating profiles for every host, the test server generates profiles for only a sample of hosts.

The rules controlling the test host sample can be edited with "rfe lcfgtesthosts". Computing staff can add their own test machines to the sample, but remember that the sample should be representative of the variety of machines in use - desktop, laptop, server; staff, student; multiple models; DICE, self-managed; multiple sites; and so on.

The test server runs a web server and makes the usual status web pages available.

By default DICE machines do not get their profiles from the test server. To change this for a test machine, add the test server URL to the machine's client.url resource. There are two ways of doing this.

  1. To ensure that the profile used will always and unambiguously be the one generated by the test server, make the test server the only one mentioned in client.url, like this:
    !client.url  mSET(http://lcfg.test.inf.ed.ac.uk/profiles)
    
  2. For greater reliability, use the default profile servers as well as the test server, like this:
    !client.url  mEXTRA(http://lcfg.test.inf.ed.ac.uk/profiles)
    

Note:

Profiles from the test server should be treated with care, as follows. If a host imports a spanning map (only certain servers do) then the map will only contain data from machines which have been compiled on the same server.

So there is no problem in theory for ordinary clients to take their profiles from the test server (if they don't import maps).

The problem comes if you are trying to test a service which imports spanning maps (e.g. DHCP). Then if the importing server takes its profile from the lcfg test server, the maps will be incomplete - this might (or might not) be adequate for testing, but you wouldn't want it in production. Note that in this case, you may get profiles from the test and production servers with the same time stamp, but different data and your results would be indeterminate - so you don't want to include both in the URL.


 : Units : Managed_platform 

Mini Informatics Logo - Link to Main Informatics Page
Please contact us with any comments or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh
Spacing Line