1. HOME
  2. What's New
  3. Users and Groups Management in HPC Systems

What's New

Some contents are available in Japanese only.

Tech Blog

Users and Groups Management in HPC Systems

Modern HPC systems have the ability to manage a large number of jobs from thousands of users. In order to successfully execute the scheduled jobs and properly collect the information about consumed computational resources, job schedulers like Slurm or PBS expect that all compute nodes belonging to a single job share the same group IDs, user IDs, passwords, home directory paths, etc. In other words, there is a need to provide a centralized synchronization of existing users and groups across all management and compute nodes of an HPC cluster. 

In one of our previous articles, we offered a short guide about how to deploy a simple HPC cluster supported by Slurm running in Docker. Here, we expand it with the integration of the users and groups management system. But before modifying the source code and configs files, we would like to explain the basic architecture of users and groups management in HPC clusters running on the Linux operating system. Having acquired this knowledge, we will proceed to describe deployment in the next article.

How to Provide Access to Multiple Nodes for Multiple Users

Users and groups management on a dedicated Linux node looks simple at first glance. The following files are responsible for users and groups management:

  • `/etc/passwd/` – general user information that contains usernames, UIDs, GIDs, home directories, default shells, and GECOS (general description of a user)
  • `/etc/shadow` – hashed passwords of users
  • `/etc/group` – general group information that contains group names, GIDs, and GECOS (general description of a group)

Assume we have a simple HPC cluster managed through Slurm and we need to execute a job on it. If we run jobs under the “root” user everything will be executed successfully because the default installation of Linux has a root user with UID and GID equals “0”. If we want to run different jobs by different users, we must create these users on each node and all the parameters of each user must be identical on all the nodes on which the user runs. The users and groups distribution on our cluster will be like on the diagram below:

If we want to update the information of a particular user, we have to update the corresponding file on each node. This is neither convenient nor reasonable. So, instead of local files, we need some kind of centralized users and groups storage and a tool that will distribute users and groups to all the relevant nodes in our network:

On the updated diagram, the component named “auth_server_01” contains the centralized users and group storage, while “auth_server_01 connector daemon” represents a software tool that delivers users and groups from storage to a particular node.

This looks better, but is not yet enough. We should take into account cases when compute nodes become inaccessible if the user’s storage is down. Besides, a particular node may have jobs that should not be shared with other nodes. For example, a job that runs software specifically for the current node.

The best solution is to maintain both local files and centralized storage in the defined priority. In other words, try to get the user information from the first source, and if it fails, then try to get the user information from the second source. Let’s mark this algorithm on the diagram in the “SWITCH” component:

Finally, let’s include the probability of the existence of multiple independent centralized users and groups storage services by adding another “auth_server” component to the diagram:

Done! The users and groups management model for the HPC cluster is now ready. And this is exactly how it works in real HPC clusters. In other words, all of the components we displayed on the diagram have real implementations for Linux.

As you can see on the diagram above, to establish the synchronization of users and groups across all nodes of an HPC cluster, we have to set up three kinds of components:

  • Directory Service (or Name Service): the service that is responsible for the users and groups management
  • Name Service Daemon: the daemon that runs on each node of the cluster and is responsible for the delivery of users and groups data from the directory service to a particular node
  • Name Service Switch: the operating system tool that defines the order and priority of the attached directory services

Let’s consider these components in detail.

Directory Services: Centralized Users and Groups Management

The Linux operating system is a network-oriented system by design, and it can dynamically load system resources that are shared within the network. The system resources could be users, users’ passwords, groups, mail aliases, domain-to-IP maps (DNS), and others. The network service that serves and distributes system resources across network nodes is called “Directory Service” (or Name Service).

There are many directory service systems for users and group management. The early ones were created from scratch with no reliance on any standards or conventions. Later, a series of computer network standards called X.500 were created to define directory services operations and provide a common interface for them. The X.500 standards define the following aspects of directory services:

  • How a client interacts with the directory service
  • How two directory servers interact with each other
  • How directory servers replicate information
  • How directories manage agreements between each other, such as those relating to replication
  • Authentication and authorization

The X.500 standard was later simplified and transformed into the Lightweight Directory Access Protocol (LDAP). It represents a more precise definition of operations and procedures of directory services. For example:

  • Documented procedures of add, change, delete, and search operations using LDAP Data Interchange Format (LDIF)
  • The extensible mechanism for data structure customizations, called “schema”
  • Data encryption and security

Thus, we can now split the implementation of directory services into two groups:

  • LDAP-based systems
    • OpenLDAP 
    • FreeIPA (Red Hat)
    • Active Directory (Microsoft)
  • Systems made before LDAP
    • Hesiod
    • Network Information Service (NIS), formerly Yellow Pages (YP) made by Sun Microsystems
    • Banyan VINES

Nowadays, the most popular directory service systems are LDAP-based. But still, some non-LDAP systems are widely used due to their simplicity and reliability.

Name Service Daemon: Directory Service Connector

To provide a connection between network nodes and the directory service, each node must run a specific daemon. Each directory service distribution can provide its own implementation for the connector daemon. In the case of LDAP-based services, you can use a daemon that supports LDAP. 

For example, the following packages could be used as name service daemons:

  • nslcd for LDAP
  • System Security Services Daemon (SSSD) for LDAP
  • ypbind for NIS version 2
  • rpc.nisd for NIS version 3 (NIS Plus)

Usually, to set up the name service daemon, you need to take the following steps:

  • Set up the network address of the directory service to connect to
  • Set up the authentication for a current node on the directory service
  • Run the daemon

Name Service Switch: Routing Between Directory Services

As mentioned above, the Name Service Switch (NSSwitch) is designed to prioritize system resources. In Linux systems, you can set the priority order by changing the configuration file “/etc/nsswitch.conf”. By default, it may be as follows: (comment lines of this config file were omitted.)

passwd:         files
group:          files
shadow:         files
gshadow:        files

hosts:          files dns
networks:       files

protocols:      db files
services:       db files
ethers:         db files
rpc:            db files

netgroup:       nis

The configuration file consists of a list of “key: value” pairs where:

  • “key” is a name of a resource
  • “value” is a list of space-separated names of sources that will be consequently polled for the requested resource one after another until the first successful response.

As you can see, Linux takes information from local plain text files (`files`) or local binary files with the `.db` extension by default. But we are free to update this file as necessary.

For example, if we have the configuration presented below in our NSSwitch config, the system will first look for users and groups in local files, next in the LDAP database, and finally in the NIS database. If nothing is found, the system will return an error response:

passwd:     files ldap nis
group:      files ldap nis
shadow:     files ldap nis

Depending on system configuration, we can also provide a fallback scenario as shown in the example below. In other words, if a user or a group wasn’t found in either LDAP or NIS databases, the system should fall back to the local users and groups list:

passwd:     ldap nis files
group:      ldap nis files
shadow:     ldap nis files

The “[NOTFOUND=return]” keyword effect is to ignore all sources listed after this keyword in the current line. In the example below, the lookup will stop after the “ldap” source:

passwd:     files ldap [NOTFOUND=return] nis
group:      files ldap [NOTFOUND=return] nis
shadow:     files ldap [NOTFOUND=return] nis

Source names might be as follows:

  • `files` for the local files “passwd”, “shadow” and “group”
  • `db` for the local files compiled into the binary database (“.db” files)
  • `nis` for NIS version 2, also called Yellow Pages (YP)
  • `nisplus` for NIS version 3 or later
  • `dns` for DNS (Domain Name Service)
  • `compat` for NIS in compat mode
  • `hesiod` for Hesiod
  • `ldap` for nslcd (LDAP)
  • `sss` for sssd (System Security Services Daemon)

More details can be found in the related Linux manual page `man 5 nsswitch.conf`.

Conclusion

In this article, we have explored how to provide synchronization for users and groups for all of the nodes of an HPC cluster. We now have basic knowledge about directory services and the Linux tools that communicate with them. In a future article, we will explain how to install and set up a directory service as an extension of the Slurm cluster in Docker that we discussed in the previous article.

<About the author>
Yury Krapivko is a software engineer with more than 10 years of extensive working experience. He’s been engaged in different kinds of projects in full-stack web development, cloud engineering, and high-performance computing. Working on the HPC field more than 5 years.