Quick start
Within this section you will find essential information to get you started on MeluXina if you are already used to a supercomputer's environment.
For additional details on using our supercomputer facilities see the following sections, starting with the Connecting page.
UPPERCASE words in command sections
below highlight information that you should replace with the actual value, e.g. THEQOS -> short.
Access MeluXina by SSH with the user account you've received: ssh YOURUSER@login.lxp.lu -p 8822
- Access is performed using SSH keys, to log into MeluXina you'll need the private key counterpart to the public key you've provided us.
- MeluXina has 4 login nodes, accessing
login.lxp.lu
should be preferred, but connecting tologin[01-04].lxp.lu
directly is possible (depending on availability). - Login nodes are meant only for job submission and monitoring, not for computation or long running tasks (including data-intensive processes).
- Login nodes do not have access to the user software environment (it's only present on compute nodes), compilation/execution tasks must be performed on compute nodes.
SLURM is the MeluXina job scheduler and resource management system:
- See the available partitions (queues), their node count and state with:
sinfo
- See the current job queue with:
squeue -l
- See the list of SLURM accounts you have access to with:
sacctmgr show user $USER withassoc
- See the list of QOS that prioritize and set job constraints with:
sacctmgr show qos
- For performance assurance and security, job scheduling is done with full nodes. Requesting one task on one compute node will give you access to the complete node, all of its cores, memory and (if applicable) GPU or FPGA accelerators.
- Compute nodes have HyperThreading enabled, to ensure the use of only physical cores, use the
--hint=nomultithread
option at job submission. - See the hardware configuration for compute nodes with:
scontrol show node THENODE
- See the list of SLURM features defined on compute nodes with:
sinfo -o %N,%f
- See the node reservations you have access to:
sinfo -T
orscontrol show res
SLURM partitions defined on MeluXina:
Partition | Nodes | Default Time | Max. Time | Description |
---|---|---|---|---|
cpu | mel[0001-0573] | no default, users must specify time limit | set by QOS | Default partition, MeluXina Cluster Module |
gpu | mel[2001-2200] | no default, users must specify time limit | set by QOS | MeluXina Accelerator Module - GPU Nodes |
fpga | mel[3001-3020] | no default, users must specify time limit | set by QOS | MeluXina Accelerator Module - FPGA Nodes |
largemem | mel[4001-4020] | no default, users must specify time limit | set by QOS | MeluXina Large Memory Module |
QOS for jobs on MeluXina, enabling various usage modes of the computational resources:
QOS | Max. Time (hh:mm) | Max. nodes per job | Max. jobs per user | Priority | Used for.. |
---|---|---|---|---|---|
dev | 06:00 | 1 | 1 | Regular | Interactive executions for code/workflow development, with a maximum of 1 job per user; QOS linked to special reservations |
test | 00:30 | 5% | 1 | High | Testing and debugging, with a maximum of 1 job per user |
short | 06:00 | 5% | No limit | Regular | Small jobs for backfilling |
short-preempt | 06:00 | 5% | No limit | Regular | Small jobs for backfilling |
default | 48:00 | 25% | No limit | Regular | Standard QOS for production jobs |
long | 144:00 | 5% | 1 | Low | Non-scalable executions with a maximum of 1 job per user |
large | 24:00 | 70% | 1 | Regular | Very large scale executions by special arrangement, max 1 job per user, run once every two weeks (Sun) |
urgent | 06:00 | 5% | No limit | Very high | Urgent computing needs, by special arrangement, they can preempt the 'short-preempt' QOS |
Development/interactive jobs using the dev
QOS must be run within the following SLURM reservations:
Reservation name | Corresponding to node partition | Nodes maintained available |
---|---|---|
cpudev | cpu | 5 |
gpudev | gpu | 5 |
fpgadev | fpga | 1 |
largememdev | largemem | 1 |
The above reservations are self-extending, trying to maintain a pool of compute nodes readily available for interactive development.
Running jobs:
- Interactive jobs are run with:
salloc
+ options - Job steps are run with
srun
+ options - Passive/batch jobs are run with:
sbatch
+ options - You always need to specify the project account your job will charge:
salloc/sbatch -A YOURPROJECT
- You always need to specify the QOS your job will use:
salloc/sbatch -q THEQOS
A few examples:
- Run an interactive development job on a CPU node for 2 hours:
salloc -A YOURPROJECT --res cpudev -q dev -N 1 -t 2:0:0
- Run a job script on 140 GPU nodes (560 GPUs) for 12 hours:
sbatch -A YOURPROJECT -p gpu -q large -N 140 -t 12:0:0 YOURSCRIPT
Data locations:
- Users have a private Home directory
/home/users/YOURUSER
, and access to Project directories they're part of. - Projects have:
- a Home directory
/project/home/THEPROJECT
, and/or - a Scratch directory
/project/scratch/THEPROJECT
- a Home directory
- Compute & data resource allocation quotas and utilization in the current scheduling period can be viewed with the
myquota
tool on the login nodes
Software environment:
- We're using EasyBuild and LMod for the user software environment.
- The software environment is only available on compute nodes.
- Discover the modules providing access to the different software packages with:
module available
. - Search for a specific application in the current production software environment with:
module av THESOFTWARE
. - Load the profile of a specific application together with its dependencies with:
module load THESOFTWARE
.
We welcome your requests for information and support either at: servicedesk.lxp.lu or by mail to servicedesk [at] lxp.lu