Dell Lab Server User and Admin Guide

Dell T640 (hostname: heisenberg) is an in-lab 40 core/80 thread machine. Dell T440 (hostname: hartree and fermi) machines are also available within the local network.

The server machine can be accessed from the lab via SSH.
ssh username@heisenberg.local

ssh username@hartree.local
ssh username@fermi.local

Using Slurm
sbatch to submit a job
squeue to check the job status
scancel to delete the job

Sample job script
#SBATCH -n 30
#SBATCH -p regular
#SBATCH -t 00:30:00
module load mpi/openmpi-x86_64
mpirun -np 30 ./nrlmol_exe

Note: -mca btl ^openib option is to hide an infini-band related warning.

Environmental Module
MPI is available via environmental module.
module load mpi/openmpi-x86_64

For System Admim
Upon reboot, you should start/restart slurmd, slurmctld, and slurmdbd.
systemctl restart slurmd.service
systemctl restart slurmctld.service
systemctl restart slurmdbd.service

Mostly used job management commands
To stop a running job and putting back in to queue:
scontrol requeue jobid
scontrol suspend jobid
scontrol requeue jobid
scontrol release jobid

To stop a running job and resuming the job immediately:
scontrol suspend jobid
scontrol resume jobid

To preventing a queued job from start running:
scontrol hold jobid
scontrol release jobid

To see the node states, do:

To see why the node went the DOWN state
scontrol show node heisenberg

The node may become drain state due to user mishaps.
To undrain the node, do the following.
scontrol update NodeName=heisenberg State=DOWN Reason="undraining"
scontrol update NodeName=heisenberg State=RESUME

Disk quota is set to protect the system from disk overflowing. Admins, follows these steps right after creating a new user.
Kernel option for xfs:
XFS quota guide:

To check the status of disk quota,
xfs_quota -x -c 'report -h' /

To set a disk quota to a user,
xfs_quota -x -c 'limit bsoft=100G bhard=100G username' /

Check file limit (which should be set to zero) as well as space limit, do the following,
repquota / -s

Disk quota on ext4 file systems

set user quota with edquota username

and check quota with repquota -augP or repquota -au