Skip to content

FAQ

Your Frequently Asked Questions will pop up here, check back in frequently!

When connecting or transferring data

  • Connection timed out message when connecting to MeluXina

    • Ensure you are using the correct port (8822), e.g. ssh yourlogin@login.lxp.lu -p 8822
    • Ensure that your organization is not blocking access to port 8822
    • Ensure that you are connecting to the master login address (login.lxp.lu) and not a specific login node (login[01-04].lxp.lu) as it may be under maintenance
    • Check the MeluXina Weather Report for any ongoing platform events announced by our teams
  • Permission denied when using ssh yourlogin@login.lxp.lu -p 8822

    • Ensure you are using the correct SSH key
    • Ensure you have added your SSH key to the SSH agent, e.g. ssh-add ~/.ssh/id_ed25519_mlux
  • Too many authentication failures when using ssh yourlogin@login.lxp.lu -p 8822

    • You may have too many SSH keys (more than 6), ensure you use only the correct one e.g. with ssh yourlogin@login.lxp.lu -p 8822 -i ~/.ssh/id_ed25519_mlux -o "IdentitiesOnly yes" or by using both the IdentityFile ~/.ssh/id_ed25519_mlux and IdentitiesOnly yes directives in your .ssh/config file
  • Failed setting locale from environment variables when using ssh yourlogin@login.lxp.lu -p 8822

    • You may be using a special locale, try connecting with LC_ALL="en_US.UTF-8" ssh yourlogin@login.lxp.lu -p 8822

When starting jobs

  • Job submit/allocate failed: Invalid account or account/partition combination specified when starting a job with sbatch or salloc

    • Ensure that you are specifying the SLURM account (project) you will debit for the job, with -A ACCOUNT on the command line or #SBATCH -A ACCOUNT directive in the launcher script
  • Job submit/allocate failed: Time limit specification required, but not provided when starting a job with sbatch or salloc

    • Ensure that you are providing a time limit for your job, with -t timelimit or #SBATCH -t timelimit (timelimit in the HH:MM:SS specification)
  • My job is not starting

    • Jobs will wait in the queue with a PD (Pending) status until the SLURM job scheduler finds resources corresponding to your request and can launch your job, this is normal. In the squeue output, the NODELIST(REASON) column will show why the job is not yet started.
    • Common job reason codes:

      • Priority, One or more higher priority jobs are in queue for running. Your job will eventually run, you can check the estimated StartTime using scontrol show job $JOBID.
      • AssocGrpGRES, you are submitting a job to a partition you don't have access to.
      • AssocGrpGRESMinutes, you have insufficient node-hours on your monthly compute allocation for the partition you are requesting.
    • If the job seems not to start for a while check the MeluXina Weather Report for any ongoing platform events announced by our teams, and if no events are announced, raise a support ticket in our ServiceDesk

When running applications

  • -bash: module: command not found when trying to browse the software stack or load a module

    • Ensure you are not trying to run module commands on login nodes (all computing must be done on compute nodes, as login nodes do not have access to the EasyBuild modules system)
    • Ensure that your launcher script starts with #!/bin/bash -l (lowercase L), which enables the modules system
  • Open MPI's OFI driver detected multiple equidistant NICs from the current process message when using MPI code

    • The warning can be ignored, this will be solved in a future PMIx update
  • mm_posix.c:194 UCX ERROR open(file_name=/proc/9791/fd/41 flags=0x0) failed: No such file or directory when running MPI programs compiled with OpenMPI.

    • The problem will be solved by exporting the environment variable: export OMPI_MCA_pml=ucx.
  • My job cannot access a project home or scratch directory, and it used to work

    • Ensure that project folder's permissions (ls -l /path/to/directory) have not been changed and allow your access
    • Check the MeluXina Weather Report for any ongoing platform events announced by our teams (especially for the Data storage category)
  • My job is crashing, and it used to work

    • Ensure your environment (software you are using, input files, way of launching the jobs, etc.) has not changed
    • Ensure you have kept your software environment up-to-date with our production software stack releases
    • Check the MeluXina Weather Report for any ongoing platform events announced by our teams
    • Raise a support ticket in our ServiceDesk and we will check together with you
  • My multi-gpu-nodes job shows slow bandwidth

    • If experiencing low bandwidth when using MPI with GPUs, the following variable might help increase bandwidth: UCX_MAX_RNDV_RAILS=1. See the following link here for more details.
    • If experiencing low bandwidth when using NCCL with GPUs, the following variable might help increase bandwidth: NCCL_CROSS_NIC=1. See the following link here for more details.
  • MPI/IO abnormally slow with OpenMPI.

    • OMPIO is included in OpenMPI and is used by default when invoking MPI/IO API functions starting with 2.x versions. However, OMPIO has proven to sometimes lead to severe bugs, data corruption and performance issues. Use OMPI_MCA_io=romio321 variable to switch to ROMIO component of the io framework in OpenMPI

When citing us

  • Acknowledgements
    • Add us to your publications' acknowledgement sections using the following template:

Text

The simulations were performed on the Luxembourg national supercomputer MeluXina.
The authors gratefully acknowledge the LuxProvide teams for their expert support.

  • Citing LuxProvide
    • Use the Luxprovide logo and the LuxProvide color palette:

Logo Color Palette
luxprovide_logo luxprovide_colorpalette

  • Citing MeluXina
    • Use the MeluXina logo and the MeluXina color palette:

Logo Color Palette
meluxina_logo meluxina_colorpalette