Dell T640 (hostname: heisenberg) is an in-lab 40 core/80 thread machine. Dell T440 (hostname: hartree and fermi) machines are also available within the local network.
Access
The server machine can be accessed from the lab via SSH.
ssh username@heisenberg.local
or
ssh username@hartree.local
ssh username@fermi.local
Using Slurm
sbatch to submit a job
squeue to check the job status
scancel to delete the job
Sample job script
job.sl
#!/bin/bash
#SBATCH -N 1
#SBATCH -n 30
#SBATCH -p regular
#SBATCH -t 00:30:00
module load mpi/openmpi-x86_64
mpirun -np 30 ./nrlmol_exe
Note: -mca btl ^openib option is to hide an infini-band related warning.
Reference
Environmental Module
MPI is available via environmental module.
module load mpi/openmpi-x86_64
For System Admim
Upon reboot, you should start/restart slurmd, slurmctld, and slurmdbd.
systemctl restart slurmd.service
systemctl restart slurmctld.service
systemctl restart slurmdbd.service
Mostly used job management commands
To stop a running job and putting back in to queue:
scontrol requeue jobid
or
scontrol suspend jobid
scontrol requeue jobid
scontrol release jobid
To stop a running job and resuming the job immediately:
scontrol suspend jobid
scontrol resume jobid
To preventing a queued job from start running:
scontrol hold jobid
scontrol release jobid
To see the node states, do:
sinfo
To see why the node went the DOWN state
scontrol show node heisenberg
The node may become drain state due to user mishaps.
To undrain the node, do the following.
scontrol update NodeName=heisenberg State=DOWN Reason="undraining"
scontrol update NodeName=heisenberg State=RESUME
1/19/2019
Disk quota is set to protect the system from disk overflowing. Admins, follows these steps right after creating a new user.
Kernel option for xfs: https://help.directadmin.com/item.php?id=557
XFS quota guide: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6...
To check the status of disk quota,
xfs_quota -x -c 'report -h' /
To set a disk quota to a user,
xfs_quota -x -c 'limit bsoft=100G bhard=100G username' /
Check file limit (which should be set to zero) as well as space limit, do the following,
repquota / -s
10/24/2022
Disk quota on ext4 file systems
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8...
set user quota with edquota username
and check quota with repquota -augP
or repquota -au