Quick start

Within this section you will find essential information to get you started on MeluXina if you are already used to a supercomputer's environment.
For additional details on using our supercomputer facilities see the following sections, starting with the Connecting page. UPPERCASE words in command sections below highlight information that you should replace with the actual value, e.g. THEQOS -> short.

Access MeluXina by SSH with the user account you've received: ssh YOURUSER@login.lxp.lu -p 8822

Access is performed using SSH keys, to log into MeluXina you'll need the private key counterpart to the public key you've provided us.
MeluXina has 4 login nodes, accessing login.lxp.lu should be preferred, but connecting to login[01-04].lxp.lu directly is possible (depending on availability).
Login nodes are meant only for job submission and monitoring, not for computation or long running tasks (including data-intensive processes).
Login nodes do not have access to the user software environment (it's only present on compute nodes), compilation/execution tasks must be performed on compute nodes.

SLURM is the MeluXina job scheduler and resource management system:

See the available partitions (queues), their node count and state with: sinfo
See the current job queue with: squeue -l
See the list of SLURM accounts you have access to with: sacctmgr show user $USER withassoc
See the list of QOS that prioritize and set job constraints with: sacctmgr show qos
For performance assurance and security, job scheduling is done with full nodes. Requesting one task on one compute node will give you access to the complete node, all of its cores, memory and (if applicable) GPU or FPGA accelerators.
Compute nodes have HyperThreading enabled, to ensure the use of only physical cores, use the --hint=nomultithread option at job submission.
See the hardware configuration for compute nodes with: scontrol show node THENODE
See the list of SLURM features defined on compute nodes with: sinfo -o %N,%f
See the node reservations you have access to: sinfo -T or scontrol show res

SLURM partitions defined on MeluXina:

Partition	Nodes	Default Time	Max. Time	Description
cpu	mel[0001-0573]	no default, users must specify time limit	set by QOS	Default partition, MeluXina Cluster Module
gpu	mel[2001-2200]	no default, users must specify time limit	set by QOS	MeluXina Accelerator Module - GPU Nodes
fpga	mel[3001-3020]	no default, users must specify time limit	set by QOS	MeluXina Accelerator Module - FPGA Nodes
largemem	mel[4001-4020]	no default, users must specify time limit	set by QOS	MeluXina Large Memory Module

QOS for jobs on MeluXina, enabling various usage modes of the computational resources:

QOS	Max. Time (hh:mm)	Max. nodes per job	Max. jobs per user	Priority	Used for..
dev	06:00	1	1	Regular	Interactive executions for code/workflow development, with a maximum of 1 job per user; QOS linked to special reservations
test	00:30	5%	1	High	Testing and debugging, with a maximum of 1 job per user
short	06:00	5%	No limit	Regular	Small jobs for backfilling
short-preempt	06:00	5%	No limit	Regular	Small jobs for backfilling
default	48:00	25%	No limit	Regular	Standard QOS for production jobs
long	144:00	5%	1	Low	Non-scalable executions with a maximum of 1 job per user
large	24:00	70%	1	Regular	Very large scale executions by special arrangement, max 1 job per user, run once every two weeks (Sun)
urgent	06:00	5%	No limit	Very high	Urgent computing needs, by special arrangement, they can preempt the 'short-preempt' QOS

Development/interactive jobs using the dev QOS must be run within the following SLURM reservations:

Reservation name	Corresponding to node partition	Nodes maintained available
cpudev	cpu	5
gpudev	gpu	5
fpgadev	fpga	1
largememdev	largemem	1

The above reservations are self-extending, trying to maintain a pool of compute nodes readily available for interactive development.

Running jobs:

Interactive jobs are run with: salloc + options
Job steps are run with srun + options
Passive/batch jobs are run with: sbatch + options
You always need to specify the project account your job will charge: salloc/sbatch -A YOURPROJECT
You always need to specify the QOS your job will use: salloc/sbatch -q THEQOS

A few examples:

Run an interactive development job on a CPU node for 2 hours: salloc -A YOURPROJECT --res cpudev -q dev -N 1 -t 2:0:0
Run a job script on 140 GPU nodes (560 GPUs) for 12 hours: sbatch -A YOURPROJECT -p gpu -q large -N 140 -t 12:0:0 YOURSCRIPT

Data locations:

Users have a private Home directory /home/users/YOURUSER, and access to Project directories they're part of.
Projects have:
- a Home directory /project/home/THEPROJECT, and/or
- a Scratch directory /project/scratch/THEPROJECT
Compute & data resource allocation quotas and utilization in the current scheduling period can be viewed with the myquota tool on the login nodes

Software environment:

We're using EasyBuild and LMod for the user software environment.
The software environment is only available on compute nodes.
Discover the modules providing access to the different software packages with: module available.
Search for a specific application in the current production software environment with: module av THESOFTWARE.
Load the profile of a specific application together with its dependencies with: module load THESOFTWARE.

We welcome your requests for information and support either at: servicedesk.lxp.lu or by mail to servicedesk [at] lxp.lu