White dot for spacing only
The Dice Project


Discussion paper: DICE client LDAP configuration

Project 267 - Review of OpenLDAP DICE client configuration was intended to investigate options for the configuration of LDAP on DICE 'client' machines - where 'client' here means any machine other than a designated LDAP master server, or LDAP slave server.

The specific final proposal which has arisen from the project is that, starting from the next operating system upgrade, (i.e. the upgrade to Scientific Linux 7), all DICE client machines should be configured as 'conventional' LDAP clients. This discussion document attempts to summarize the pros and cons of that proposal.


Contents


1. Some background

1.1 What is LDAP, and how does it work?

From: http://en.wikipedia.org/wiki/Lightweight_Directory_Access_Protocol:

"The Lightweight Directory Access Protocol (LDAP) is an open, vendor-neutral, industry standard application protocol for accessing and maintaining distributed directory information services over an Internet Protocol (IP) network."

From: http://www.redbooks.ibm.com/redbooks/pdfs/sg244986.pdf:

"[LDAP] Directories are usually accessed using the client/server model of communication. An application that wants to read or write information in a directory does not access the directory directly. Instead, it calls a function or application programming interface (API) that causes a message to be sent to another process. This second process accesses the information in the directory on behalf of the requesting application via TCP/IP. ...

"The request is performed by the directory client, and the process that maintains and looks up information in the directory is called the directory server. In general, servers provide a specific service to clients. Sometimes, a server might become the client of other servers in order to gather the information necessary to process a request.

"The client and server processes may or may not be on the same machine. A server is capable of serving many clients. ..."

1.2 For what purpose is LDAP generally used?

In a typical Linux deployment, LDAP is used to provide to the client machine the same user data as Yellow Pages / NIS once did: namely passwd, group and netgroup entries. Access to those is set up via suitable configuration in the /etc/nsswitch.conf file. Note that this means that data describing 'normal' (i.e. non-system) users to the client system is not available should LDAP lookups fail in any way.

1.3 For what purpose is LDAP currently used on DICE?

DICE machines use LDAP to look up passwd, group and netgroup entries describing 'normal' (i.e. non-system) users via suitable configuration in /etc/nsswitch.conf. Since authorization to access any DICE machine is moderated by these entries, this means that, in the event of an LDAP failure, a DICE machine cannot be logged into as any user other than root.

In addition, the entire DICE authorization system - roles, capabilities/entitlements - is arranged via LDAP.

All of this means that LDAP plays a crucial role in the DICE security model; indeed, the role of LDAP in overall security was a deliberate design choice at the time that DICE was initially developed. The corollary is that, whatever might be chosen to replace the current overall LDAP setup, there is a clear requirement that security must be maintained.

1.4 How is LDAP generally deployed?

As described above, LDAP is designed as a distributed service: the architecture is client/server. In addition, multiple LDAP servers themselves can themselves operate in a federated way: the content of a single LDAP directory information tree can be split between a group of cooperating servers, each of which then returns referrals to clients as and when necessary.

In practice, so far as the authors are aware, most sites deploying LDAP adopt a networked client/server model: client machines make lookups over the network against distinct LDAP server machines.

1.5 How is LDAP currently deployed in DICE? Why

For its LDAP implementation, DICE uses the OpenLDAP software, along with related open source modules. Currently, every DICE machine - desktop, application server, file server, anything else - runs its own fully-populated LDAP server, and all LDAP lookups on the machine are made against that local server. (More detail on this is provided in Section 3.1 Current client configuration.

Why was this design chosen over the more 'conventional' one described in Section 1.4 How is LDAP generally deployed? above?:

  1. As implied in Section 1.3 For what purpose is LDAP currently used on DICE?, LDAP is part of the DICE security model. At the time DICE was initially developed, running an LDAP server on every DICE machine was judged the best method of implementing the LDAP service in a robust and secure way.
  2. When DICE was initially developed, the design intention was to support the completely autonomous operation of DICE desktops - i.e. there was a desire to support laptops etc. when such machines were completely disconnected from the network. This requirement has now - arguably - gone away.
  3. The chosen configuration is intrinsically robust against network failures.
  4. OpenLDAP has always been the LDAP software in use on DICE, and remains the only reasonable choice. However, when OpenLDAP was first deployed on DICE, it was unreliable on the master LDAP server: the slapd daemon was prone to frequent and apparently random crashes. To limit the effects of problems on a central server affecting all DICE clients, it made sense for each DICE client to run its own local LDAP server.

    Comment: OpenLDAP is now much more reliable: we no longer experience random crashes of the slapd daemon on either the LDAP master or slave servers.

  5. When DICE was initially developed, the performance of LDAP when used in a distributed client/server mode was demonstrated to be poor. (For example, running ls -l in a directory of files owned by many different owners was unacceptably slow.)

    Comment: Whatever were the exact reasons for that behaviour, recent testing indicates that it is no longer the case.


2. Requirements for any DICE LDAP implementation

  1. Any DICE LDAP implementation must be extremely reliable: if LDAP lookups are not available, then a machine cannot be accessed in a normal way.
  2. The client must be guaranteed that all data returned from any DICE LDAP lookup is correct. All such data must therefore transmitted by a mechanism which guarantees it originates from a bona-fide Informatics LDAP server, and which guarantees that the data cannot be altered en route.
  3. Any final implementation must be guaranteed to be free of boot-time deadlocks over the entire collection of DICE machines.

3.Current and proposed configurations

3.1 Current client configuration

[Taken from Review of OpenLDAP DICE client configuration, with minor corrections/clarifications]

  1. All DICE clients run their own LDAP server which contains a complete copy of the DICE directory information tree. The content of this server is sync'ed hourly against an LDAP master server by the in-house slaprepl script. slaprepl runs via SASL->GSS-API->-Kerberos, so the exchange is both authenticated and encrypted. Machines on the 'stable' release currently synchronise against the single master LDAP server; machines on the 'develop' release sychronise against the slaves, via the DNS round-robin entry dir.inf.ed.ac.uk.
  2. A DICE client in current standard configuration makes all LDAP lookups against its own server, but no DICE client ever _writes_ to its own LDAP server. The only LDAP server ever written to is the single master.
  3. When a user process on a DICE client does an nss LDAP lookup (i.e. a lookup originating from a glibc call, e.g. getpwnam), that lookup always proceeds via the so-called 'name service ldap connection daemon', nslcd. The LDAP server therefore has no knowledge of the UID of the actual process which made the lookup request. No Kerberos/SASL authentication or encryption is involved: the LDAP request is done via anonymous bind, and transmitted in plain-text.

    nslcd is an integral part of the redesigned nss-pam-ldapd module (see Arthur de Jong's design notes) and, on DICE, first appeared in the move to SL6. It is not a caching daemon, and there are possible alternatives - in particular, the replacement of nslcd by the caching daemon sssd.

  4. On a DICE client, an explicit LDAP lookup via ldapsearch normally proceeds via SASL and Kerberos, so the exchange is both authenticated and encrypted.
  5. In the normal course of events, both of the above-mentioned lookups on a DICE client take place entirely within the local machine. However, if a DICE client machine is configured to use an alternative LDAP server (e.g. !openldap.server mSET(dir.inf.ed.ac.uk)) then the same authentication/encryption setups apply.

    Specifically that means that, on our setup, no authentication or encryption is used for remote (i.e. off-machine) LDAP lookups which are done via the nslcd daemon. All such lookups are done via anonymous binds and proceed over unencrypted links.

  6. Some DICE applications on DICE clients use the hard-coded address of 'localhost' as the location of the LDAP server against which they expect to do a lookup. Currently, the only known source of this hard-coding is the dice-authorize package. We need to establish whether or not this is the only such case.

3.2 Proposed client configuration

  1. No DICE client will run its own LDAP server.
  2. All DICE clients will make all LDAP lookups against a set of designated LDAP servers.
  3. Lookups against the collection of designated LDAP servers will be organized in a way which provides robustness against the failure of any individual server.

    Comment: the exact details of how that robustness is arranged depend on which client caching/connection daemon is in use on the clients. See Appendix A: caching/connection daemons

  4. All DICE client LDAP lookups via nss will be done via anonymous binds over a TLS connection. The use of TLS guarantees that the data returned via each LDAP query is both authentic and correct.
  5. All DICE client LDAP lookups via nss will be moderated either by the connection daemon nslcd or the caching daemon sssd.

    Comment: which one of these two daemons will be used is yet to be decided - see Appendix A: caching/connection daemons

  6. Special case: DICE 'client machines' deemed to be of crucial importance in our overall infrastructure will be eligible to be configured as LDAP slave servers (replicating LDAP content via the official OpenLDAP syncrepl mechanism) in order that they remain capable of truly autonomous operation. Such LDAP slaves will provide service to themselves only.

    We expect such cases to be rare, but the facility exists.

3.3. Discussion

3.3.1 Current client configuration

3.3.1.1 Benefits
  1. Within Informatics, this is a known and trusted model which has no immediate dependencies on external LDAP servers.
  2. The LDAP service on the client has no dependency on the network.
  3. DICE client machines are autonomous: neither network interruptions, nor any failure affecting the master/slave LDAP servers themselves, have no effect on the LDAP service on the client.
3.3.1.2 Risks
  1. The in-house utility slaprepl (and any associated code) needs to be maintained, and to be ported at each OS upgrade.
  2. Problems or crashes with the LDAP server on any client machine leave that machine in a unusable state.
  3. Hourly replication means client machines lag LDAP updates.
  4. Failure to replicate leaves a machine with an out-of-date LDAP repository.
  5. The OpenLDAP server slapd has (at least in the past) proven to be prone to high memory use; this is possibly inappropriate and unnecessary on 'client' machines.

3.3.2 Proposed client configuration

3.3.2.1 Benefits
  1. Worldwide, this is the standard usage model for LDAP.
  2. It is a simpler model than the current one in use on DICE.
  3. Testing (for about the past year) on CO desktops shows that the model is stable in normal circumstances.
  4. The need to maintain and port the in-house utility slaprepl disappears.
  5. All DICE clients always have an up-to-date view of the Informatics LDAP repository.
  6. Updates to the version of OpenLDAP in use, and/or the storage backend used by OpenLDAP, need to be applied to the master and slave servers only, and not to any DICE clients. The same comments applies to changes to the LDAP server configuration; the LDAP schema; etc.
3.3.2.2 Risks
  1. DICE client machines are longer autonomous: network interruptions, or any failures affecting the LDAP servers, will render such clients unusable.
  2. LDAP servers become key points of failure.
  3. Unexpected dependencies might be introduced which could affect the overall startup sequencing of all Informatics servers in the event of complete site shutdown/restart.

    Note that, at present, we are unaware of any such dependencies.

  4. There might be chicken-and-egg dependencies which we have not yet realised.

    In practice: we need to be quite sure that - in the event of serious system failures - we are able to do the necessary operations on the servers which control important parts of our overall infrastructure which will to allow us to correct problems which might be affecting the rest of our systems.

  5. The dice-authorize software needs to be changed so that it no longer has a hard-coded dependency on localhost.
  6. There might be other local software which has similar dependencies, and which has so far been overlooked.
  7. The proposed LDAP service will run over TLS and will therefore rely on agreement between the Informatics CA root certificates as seen by both server and client. These certificates are periodically updated; we need to be sure that these mass updates can be arranged with no breaks in service, or other more serious problems.

Appendix A: caching/connection daemons

Red Hat Linux (and therefore Scientific Linux) provides two daemons which can be interposed between the LDAP client and server:

  1. nslcd - the Name Service LDAP Connection Daemon - is an integral part of the redesigned nss-pam-ldapd module (see Arthur de Jong's design notes), and first appeared in the move to SL6. It is specifically designed as an interface between LDAP clients and servers; nothing else. It is not a caching daemon.
  2. sssd - the System Security Services Daemon - is a more recent development whose 'primary function is to provide access to identity and authentication remote resource through a common framework' - see https://fedorahosted.org/sssd/. It can be used as an interface between LDAP clients and servers, amongst other things. It does provide caching.

In the SL6 production version of DICE, we use nscld. We assume that both nscld and sssd daemons will be available in SL7, but we get the impression that the use of sssd might generally come to be favoured. In the context of this document, the principal advantage of sssd over nscld is its caching functionality.

Testing of the proposed client/server LDAP configuration on CO desktops over the past year has only involved nslcd. We have now done initial testing with sssd, and we are fairly confident that it can successfully integrated into a client/server DICE LDAP configuration.

Using either daemon will allow LDAP server failover and redundancy, but the implementation details will differ and, in either case, further work and tuning remains to be done.

nslcd

In the test client/server LDAP configuration currently running on CO's desktops, the configuration is as follows:

A similar configuration might be suitable for a final deployment, but we need to check whether or not the DNS round-robin works successfully across all Informatics subnets. If it turns out that subnet-specific tailoring is necessary, then multiple uri: entries in each client's nslcd.conf might be a better arrangement. If necessary, load-balancing could be enforced by the randomization (per client) of such entries.

sssd

One proposed configuration is as follows:

As an alternative, we note that sssd claims to support both load-balancing and failover through DNS service discovery (i.e. SRV records.) This needs to be tested.

In whichever way we might want to deploy sssd, it will require further rigorous testing.

Appendix B: Open questions

Provision of LDAP servers

If we retain our existing LDAP configuration, then all of our existing LDAP servers (which are physical hardware) are due to be replaced in the current financial year. If we choose to move to the proposed configuration, we would need to clarify the following:

  1. How many LDAP servers will we need?
  2. Where should these LDAP servers be located on our sites/network?
  3. On what types of machine (VMs?; physical hardware?; in either case, of what specification?) should these LDAP servers be located?

None of these are easy questions to answer in advance of any real deployment, and tuning after the fact will almost certainly be necessary. Note, though, that the use of either nslcd or sssd means that LDAP servers can be added to (or removed from) the running service 'on the fly'.

Our current preference is to have at least one (and ideally more, for multiple-failure-proofing) physical LDAP servers under our control, in order that dependencies are as clear as possible. Such a pool of physical servers could be topped up with additional VMs, as and when load demands.

Timescales and effort

There are 'effort' implications to either decision on this matter, but no more detailed estimates have yet been made.

Ideally, any work arising needs to be completed in time for the SL7 deployment. In case of delays, we have the option of falling back on the current setup (provided that that is suitably ported and tested), and transitioning to a new setup at some later date.

External dependencies

All of this work implied by this document has a dependency on both the provision of working SL7 test machines, and the ability to build packages for the same.
Discussion_paper.html,v 1.11 2014/09/10 08:34:45 idurkacz Exp
Mini Informatics Logo - Link to Main Informatics Page
Please contact us with any comments or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh
Spacing Line