How to Build an HPC (High-Performance Cluster) with Raspberry Pi Computers

How would you like to have a supercomputer in your home? Multiple cores, network-attached storage, job scheduling, and process control all sitting on your desktop! The Raspberry Pi can make that a reality. Well, sort of. An HPC is a group of individual computers, all running in parallel, and managed by a central computer. And we can mimic all of that with Raspberry Pi computers. Unfortunately, the Pi isn’t a huge processing powerhouse, but it’s got enough ummph to model an HPC and let us run all the HPC commands that we’re familiar with. So, it’s a great way to learn how to find your way around an HPC environment.

Another feature of the Raspberry Pi HPC is that it’s completely implemented in software. You only need to supply a couple of Raspberries that are networked and running a current, 64-bit version of the OS.

Set Up the Hardware, OS, and Networking

Set up the Raspberries in the usual way: burn the OS to thumb drives, boot them up and run raspi-config to set the timezone, passwords, and localizations. Give each of the Raspberries a sequential name. I’m using node01, node02, node03, and node04. You can use more, but you need a minimum of two to make an actual cluster. Make sure each of the Raspberries can resolve the others’ names. Either make sure your DNS server is up to snuff or put with of the names in /etc/hosts on each of the computers. For example:

root@node02:~# cat /etc/hosts
127.0.0.1       localhost
::1             localhost ip6-localhost ip6-loopback
ff02::1         ip6-allnodes
ff02::2         ip6-allrouters
127.0.1.1       node02

192.168.1.41    node01
192.168.1.42    node02
192.168.1.43    node03
192.168.1.44    node04

I’ve got my 4 nodes running DHCP and getting their IP addresses from pihole, so I’m sure that the names and IP addresses are consistent. I don’t need to update all the /etc/hosts files.

Network Attached Storage

You’re also going to need some common storage between the nodes. These nodes are going to be working on the same datasets and need access to a common directory. I created an NFS file share on my OMV server called /cluster. Each of the nodes in the HPC mount this NFS share on the mount point /work. Just create an empty directory called /work on each node and then add the following line to your /etc/fstab:

omv:/data/clusterfs    /work   nfs     defaults        0       0

Installing the HPC Software Packages

We’ll need 3 packages on each of the nodes. Let’s install the head node first, create the config files and then share them to the clients/work nodes. I am running 2022-01-28-raspios-bullseye-arm64 and was pleased to find all of the software available in the repos.

On the head node or “controller”, install the following:

apt install slurm slurmctld munge -y

slurm config files

There are just a couple of config files that we’ll need to get this thing off the ground. We’ll create the files on the head node and then copy them to the clients. {{ NOTE: I’m running Bullseye. The directory name is /etc/slurm. In Buster, the directory is /etc/slurm-llnl }} The main config file slurm.conf is already on the system. Let’s copy it into place:

cd /usr/share/doc/slurm-client/examples/
gunzip slurm.conf.simple.gz
cp slurm.conf.simple /etc/slurm/slurm.conf

There are just a couple of edits that we’ll need. The top of each section is from the sample config. Below the (comment) is what I changed mine to:

SlurmctldHost=workstation
(replace workstation with this node)
SlurmctldHost=node01(192.168.1.41)

SelectType=select/cons_tres
(change tres to res)
SelectType=select/cons_res

NodeName=server CPUs=1 State=UNKNOWN
PartitionName=debug Nodes=server Default=YES MaxTime=INFINITE State=UP
(this is the actual cluster config - make it yours')
NodeName=node01 NodeAddr=192.168.1.41 CPUs=4 State=UNKNOWN
NodeName=node02 NodeAddr=192.168.1.42 CPUs=4 State=UNKNOWN
NodeName=node03 NodeAddr=192.168.1.43 CPUs=4 State=UNKNOWN
NodeName=node04 NodeAddr=192.168.1.44 CPUs=4 State=UNKNOWN
PartitionName=picluster Nodes=node[02-04] Default=YES MaxTime=INFINITE State=UP

You can create the NodeName lines by running slurm -C on each of the nodes to get accurate info on each:

root@node05:~# slurmd -C
    NodeName=node05 CPUs=4 Boards=1 SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=1 RealMemory=7812
    UpTime=0-02:35:14

slurm now uses cgroups. I’m going to configure then to be used, but to make them UNrestrictve as possible. I just want to make sure everything works and then turn up the security later.

Here’s my /etc/slurm/cgroup.conf:

CgroupMountpoint="/sys/fs/cgroup"
CgroupAutomount=yes
CgroupReleaseAgentDir="/etc/slurm/cgroup"
AllowedDevicesFile="/etc/slurm/cgroup_allowed_devices_file.conf"
ConstrainCores=no
TaskAffinity=no
ConstrainRAMSpace=yes
ConstrainSwapSpace=no
ConstrainDevices=no
AllowedRamSpace=100
AllowedSwapSpace=0
MaxRAMPercent=100
MaxSwapPercent=100
MinRAMSpace=30

Finally, “/etc/slurm/cgroup_allowed_devices_file.conf”

/dev/null
/dev/urandom
/dev/zero
/dev/sda*
/dev/cpu/*/*
/dev/pts/*
/work*

Copy all of /etc/slurm to /work (the shared directory). We’ll need them on the other nodes. While you’re at it, copy /etc/munge/munge.conf to the /work directory, too!

Let’s enable all these processes:

systemctl enable munge
systemctl enable slurmd
systemctl enable slurmctld

and then reboot.

Set up a slurm client or a few!

The client install is very similar. We’ll follow all these steps on each of the worker nodes. First, let’s get the software installed:

apt install slurmd slurm-client munge -y
Configuration is easy.  Remember all those config files on /work?  Let's copy them into place(s):
cd /work
cp munge.key /etc/munge/munge.key
cp slurm.conf /etc/slurm/slurm.conf
cp cgroup* /etc/slurm
rm *

Nice! Now enable the processes:

systemctl status slurmd
systemctl status munge

… and reboot!

Testing Our New Raspberry HPC

First, let’s check that munge is working. Munge is the authentication system that runs between the computers in the cluster. Here’s a simple command that asks another node to decrypt a password:

ssh node01 munge -n | unmunge

Make sure each of your nodes can do this:

root@node03:/etc/slurm# ssh node01 munge -n | unmunge
root@node01's password:
STATUS:          Success (0)
ENCODE_HOST:     node03 (127.0.1.1)
ENCODE_TIME:     2022-03-16 20:52:19 -0400 (1647478339)
DECODE_TIME:     2022-03-16 20:52:19 -0400 (1647478339)
TTL:             300
CIPHER:          aes128 (4)
MAC:             sha256 (5)
ZIP:             none (0)
UID:             root (0)
GID:             root (0)
LENGTH:          0

If that’s working, you can try out some real HPC commands. “sinfo” is a good place to start:

root@node01:~# sinfo -a
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
picluster* up infinite 2 idle node[02-04]

Let’s have a look at the whole cluster:

root@node01:/work# scontrol show partition
PartitionName=picluster
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=YES QoS=N/A
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=node[02-04]
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
   OverTimeLimit=NONE PreemptMode=OFF
   State=UP TotalCPUs=12 TotalNodes=3 SelectTypeParameters=NONE
   JobDefaults=(null)
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

Yep. 3 nodes and 12 CPUs! Can you run a job? Let’s get each of the nodes to give the time:

root@node01:~# srun --nodes=3 date
Wed 16 Mar 2022 08:57:52 PM EDT
Wed 16 Mar 2022 08:57:52 PM EDT
Wed 16 Mar 2022 08:57:52 PM EDT
root@node01:~#

Looks like an HPC to me! The next post will cover some more interesting commands! (I promise.)

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.