Dell Lab Server User and Admin Guide

Dell T640 (hostname: heisenberg) is an in-lab 40 core/80 thread machine. Dell T440 (hostname: hartree and fermi) machines are also available within the local network.

Access
The server machine can be accessed from the lab via SSH.
ssh username@heisenberg.local

or
ssh username@hartree.local
ssh username@fermi.local

Using Slurm
sbatch to submit a job
squeue to check the job status
scancel to delete the job

Sample job script
job.sl
#!/bin/bash
#SBATCH -N 1
#SBATCH -n 30
#SBATCH -p regular
#SBATCH -t 00:30:00
module load mpi/openmpi-x86_64
mpirun -np 30 ./nrlmol_exe

Note: -mca btl ^openib option is to hide an infini-band related warning.
Reference

Environmental Module
MPI is available via environmental module.
module load mpi/openmpi-x86_64

For System Admim
Upon reboot, you should start/restart slurmd, slurmctld, and slurmdbd.
systemctl restart slurmd.service
systemctl restart slurmctld.service
systemctl restart slurmdbd.service

Mostly used job management commands
To stop a running job and putting back in to queue:
scontrol requeue jobid
or
scontrol suspend jobid
scontrol requeue jobid
scontrol release jobid

To stop a running job and resuming the job immediately:
scontrol suspend jobid
scontrol resume jobid

To preventing a queued job from start running:
scontrol hold jobid
scontrol release jobid

To see the node states, do:
sinfo

To see why the node went the DOWN state
scontrol show node heisenberg

The node may become drain state due to user mishaps.
To undrain the node, do the following.
scontrol update NodeName=heisenberg State=DOWN Reason="undraining"
scontrol update NodeName=heisenberg State=RESUME

1/19/2019
Disk quota is set to protect the system from disk overflowing. Admins, follows these steps right after creating a new user.
Kernel option for xfs: https://help.directadmin.com/item.php?id=557
XFS quota guide: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6...

To check the status of disk quota,
xfs_quota -x -c 'report -h' /

To set a disk quota to a user,
xfs_quota -x -c 'limit bsoft=100G bhard=100G username' /

Check file limit (which should be set to zero) as well as space limit, do the following,
repquota / -s

10/24/2022
Disk quota on ext4 file systems
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8...

set user quota with edquota username

and check quota with repquota -augP or repquota -au