Skip to content

What's new on MeluXina

SYSTEM

2023-10-30: Upgrade to MOFED 5.8-3.0.7.0

The NVIDIA MOFED driver has been upgraded to the long-term support (LTS) version 5.8-3.0.7.0. This update includes important bug fixes and adds support for the latest operating systems. For a detailed overview of the changes, please check the release note for the detailed changes.

2023-10-30: NVIDIA Driver Update

To support the latest CUDA 12.x releases, we are updating the NVIDIA driver to version 535.104.12. This will enable you to take full advantage of the latest GPU-accelerated applications and frameworks.

For a detailed list of all the changes and updates brought by NVIDIA driver version 535.104.12, please visit the official release notes at the following link: NVIDIA Driver Version 535.104.12 Release Notes.

2023-10-30: Upgrade to Rocky 8.8

The login and compute nodes have been migrated to RHEL/Rocky 8.8. This updated version will bring improved performance, security, and compatibility with the latest software packages.

2023-08-10: Changes to the default Lustre striping policy

A default OST (Object Storage Target) striping policy has been set on both Tier1 (scratch) and Tier2 (project/home) lustre filesystems. This will affect only new files, with existing files not impacted.

Previously no policy was set, meaning files were not striped by default over multiple OSTs.

See the Lustre OST striping section for more details.

Info

The new policy is meant to improve resiliency and potentially data access performance.
If this is not the case for your workload, you can revert a directory and its subdirectories to the prior behaviour with the following command: lfs setstripe -E -1 -c 1 -S 1M <directory>

2023-06-15: Upgrade to Rocky 8.7

The compute nodes OS has been upgraded from Rocky Linux version 8.6 to 8.7 with latest bug fixes and security patches.

Info

No impact on user applications is expected and there are no changes to the MeluXina software stack.

2023-06-15: Upgrade to MOFED 5.8-2.0.3.0

The NVIDIA MOFED driver has been upgraded to LTS version 5.8-2.0.3.0 with critical bug fixes and support for the new OS. See release note for the detailed changes.

2023-06-15: Upgrade to Nvidia driver 515.105.01

Upgrade of Nvidia driver on GPU nodes fixes the following issue:

  • Resolved an issue that sometimes caused abnormal BAR1 memory usage.
  • Fixed a race condition that can arise when calling cudaFreeAsync() and cudaDeviceSynchronize() from different threads.
  • Fixed an issue where the NVIDIA Linux GPU kernel driver was calling the Linux kernel scheduler while holding a lock with preemption disabled during event notification.

Info

The CUDA Toolkit 11 version remains 11.7.

2023-06-15: Upgrade to Lustre client driver 2.14.0

The Lustre client driver has been upgraded to the 2.14 release, with expected performance improvements.

SLURM & ParaStation

2023-12-15: Slurm update to 23.02.7

SLURM has been upgraded from version 22.02.6 to 23.02.7 that addresses a range of recently discovered security issues, which are identified by CVE-2023-49933 through CVE-2023-49938. More details can found here.

2023-10-30: Slurm update to 23.02.6 and new energy monitoring

SLURM has been upgraded from version 22.05.9 to 23.02.6 with additional functionality and numerous fixes.

Info

No impact on user jobs or launcher scripts are expected with the new Slurm version.

New energy monitoring is now in place for Slurm jobs using the srun launcher, see the complete details in energy monitoring.

For a full list of changes since SLURM 22.05.9 please see the SchedMD SLURM release notes. Note that not all updates may apply to MeluXina, as the job launcher daemon on the compute nodes is not the upstream slurmd.

2023-10-30: Upgrade to psmgmt 5.1.57-2

psmgmt has been upgraded from version 5.1.55-5 to 5.1.57-2 with bug fixes. For a full list of changes since version 5.1.55-5 please see the psmgmt Change log.

Info

Message Passing Interface (PMIx) component has been updated to 4.2.6, improving inter-process communication and resource management for your applications. The new upcoming software stack (2023.1) will therefore feature PMIx 4.2.6.

2023-06-15: Upgrade to SLURM 22.05.9

SLURM has been upgraded from version 22.05.5 to 22.05.9 with additional functionality and numerous fixes.

Info

No impact on user jobs or launcher scripts are expected.

For a full list of changes since SLURM 22.05.5 please see the SchedMD SLURM release notes. Note that not all updates may apply to MeluXina, as the job launcher daemon on the compute nodes is not the upstream slurmd.

2023-06-15: Upgrade to psmgmt 5.1.55-2

psmgmt has been upgraded from version 5.1.52-5 to 5.1.55-2 with bug fixes. For a full list of changes since version 5.1.52-5 please see the psmgmt Change log.

2022-11-24: Upgrade to psmgmt 5.1.52-5

psmgmt has been upgraded to version 5.1.52-5 with some of the following bug fixes:

Info

  • Bugfix: prevent segfault due to late REQUEST_LAUNCH_TASKS message. This fix prevent large scale jobs from hanging.
  • Bugfix: prevent psslurm from segfault by tracking init of basic configuration.

For a full list of changes since psmgmt 5.1.52-2 please see psmgmt Change log.

2022-11-14: Upgrade to SLURM 22.05.5

SLURM has been upgraded from 21.08.8-2 to 22.05.5, a major new version with additional functionality, security fixes and also breaking changes.

Warn

Highlighted change in this release that may impact your jobs: srun will no longer read in SLURM_CPUS_PER_TASK. This means you will explicitly have to specify --cpus-per-task (or -c) in your srun calls, or set the new SRUN_CPUS_PER_TASK environment variable get the correct result.

For a full list of changes since SLURM 21.08.8-2 please see the SchedMD SLURM release notes. Note that not all updates may apply to MeluXina, as the job launcher daemon on the compute nodes is not the upstream slurmd.

Software stacks

2023-12-15: New 2023.1 software stack release

We are thrilled to announce that our latest (2023.1) MeluXina User Software Environment is now available for use!

Info

Until 14 February 2024 try the new release by using module load env/release/2023.1 or module load env/staging/2023.1 in your job scripts.

Warn

The previous stable release 2022.1 is kept as default on MeluXina during a transition period until 14 February 2024. On 15 February, the 2023.1 release will become default in your jobs on MeluXina. To keep using the deprecated software stack, use module load env/release/2022.1. Please use the 2023.1 release in the transition period and let us know of your impressions and any problems by contacting our service desk. 2021.3 and 2021.5 software stacks will not be accessible anymore from 15 February 2024.

The new 2023.1 software stack release brings new versions of most tools, build on latest compiler toolchains. Highlights in this release:

  • Compilers: AOCC 4.0.0, GCC 12.3.0, Intel 2023.1, NVIDIA HPC SDK / NVHPC 23.7
  • Languages: Python 3.11.3, R 4.3.2, Julia 1.9.3, Go 1.20.4
  • Performance Engineering: Linaro-Forge 23.0.3 (replacing Arm-Forge), Scalasca 2.6.1, AMD-uPROF 4.1.424, Extrae 4.0.6, gperf 3.1, omnitrace 1.10.4
  • MPI, parallelization & acceleration: OpenMPI 4.1.5 (with UCC 1.2.0 and UCX 1.14.1), IntelMPI 2021.9.0, psmpi 5.9.2-1, CUDA 12.2.0 and 11.7.0, TBB 2021.10, PETSc 3.19.4, Kokkos 4.1.0, cuDNN 8.9.2.26, NCCL 2.18.3
  • Data science: JupyterHub 4.0.2, JupyterLab 4.0.5, jupyter-server 2.7.2, NVIDIA rapidsai suite 23.10
  • AI, Machine Learning, Deep Learning: PyTorch 1.13.1 (containerized version), TensorFlow 2.13.0, Keras 2.9, Horovod 0.26.0, Spark 3.3.0
  • Physics & chemistry: GROMACS 2023.3, NWChem 7.2.2, NAMD 3.0b5, QuantumESPRESSO 7.2, ORCA 5.0.4, DualSPHysics: 5.0.233, QUDA 1.1.0
  • CFD: OpenFOAM 10/v2306
  • Visualisation & computer vision: ParaView 5.11.2, VMD 1.9.4a57, OpenCV 4.8.1,
  • API interaction and data transfer: aws-cli 2.7.1, s3cmd 2.2.0, aria2 1.36.0
  • Quantum simulators: QSimCirq 0.17.0, cirq-core 1.2.0, NVIDIA cuQuantum suite
  • Container systems: Apptainer 1.2.4 (replacing Singularity)

NVIDIA NGC Containers

We are thrilled to announce that our latest software stack update features integration with containers from the NVIDIA NGC Catalog, providing users with enhanced capabilities and unparalleled performance.

Key Highlights:

  • NGC Container Integration: Leverage the power of NVIDIA optimized, pre-built containers for AI and HPC applications.

  • Easy Module Deployment: The stack is structured as a module, making it incredibly straightforward to deploy in your existing environment.

  • Boosted Performance: Experience significant performance improvements in your computational tasks, thanks to NVIDIA's state-of-the-art technology.

  • Wide Application Range: Ideal for AI, machine learning, deep learning, and high-performance computing tasks.

Info

The containers are available through the modules system, currently in the 2023.1 staging stack. Additional documentation on using containerized applications will follow in our dedicated section on using containers.

module load env/staging/2023.1 
module use /apps/USE/containers/staging/2023.1/modules/all/
module avail


------------------------------------------- /apps/USE/containers/staging/2023.1/modules/all --------------------------------------------
   CP2K/2023.1-CUDA-12.2.0-NGC-23.10               QMCPACK/v3.16.0-CUDA-12.2.0-NGC-23.10
   GROMACS/2023.2-CUDA-12.2.0-NGC-23.10            QuantumESPRESSO/7.1-CUDA-12.2.0-NGC-23.10
   LAMMPS/patch_15Jun2023-CUDA-12.2.0-NGC-23.10    TensorFlow/2.13.0-CUDA-12.2.0-NGC-23.10
   PyTorch/2.1.0-CUDA-12.2.0-NGC-23.10

Support & Feedback: Your feedback is crucial for our continuous improvement. If you encounter any issues or have suggestions, please contact our support team.

2023-01-15: 2022.1 software stack set as default

The 2022.1 software stack announced and installed on MeluXina in mid November 2022 is now the default stack in on the compute nodes. For details into the new tools available vs the previous release please see the notice below.

You can still use the previous production software stack by using module load env/release/2021.3 in your job scripts.

Info

The myquota tool (v0.3.1) for monitoring the usage of your resource allocations is now available on compute nodes through the module system, and loaded by default.

2022-11-14: New 2022.1 software stack release

A new MeluXina User Software Environment release is now available: version 2022.1.

Info

Until 14 January 2023 try the new release by using module load env/release/2022.1 in your job scripts.

Warn

The previous stable release 2021.3 is kept as default on MeluXina during a transition period until 14 January 2023. On 15 January, the 2022.1 release will become default in your jobs on MeluXina. To keep using the deprecated software stack, use module load env/release/2021.3. Please use the 2022.1 release in the transition period and let us know of your impressions and any problems by contacting our service desk. All staging software stacks will be cleaned and not be accessible from 15 January 2023.

The new 2022.1 software stack release brings new versions of most tools, build on latest compiler toolchains. Highlights in this release:

  • Compilers: AOCC 3.2, GCC 11.3, Intel 2022.1, NVIDIA HPC SDK / NVHPC 22.7
  • Languages: Python 3.10.4, R 4.2.1, Julia 1.8.2, Go 1.19.1
  • Performance Engineering: ARM Forge 22.0.4, Scalasca 2.6, AMD-uPROF 3.6.449
  • MPI, parallelization & acceleration: OpenMPI 4.1.4 (with hcoll 4.7.3202, xpmem 2.6.5-36 and knem 1.1.4), CUDA 11.7, TBB 2021.5, PETSc 3.18, Kokkos 3.6.01, cuDNN 8.4.1.50, NCCL 2.12.12
  • Data science: JupyterHub 2.3.1, JupyterLab 3.2.8
  • AI, Machine Learning, Deep Learning: PyTorch 1.12, TensorFlow 2.9.1, Keras 2.9, Horovod 0.26.0, Spark 3.3.0
  • Physics & chemistry: GROMACS 2022.3, NWChem 7.0.2, NAMD 2.14, LAMMPS 23Jun2022, QuantumESPRESSO 7.1, ORCA 5.0.3, DualSPHysics: 5.0.175, QUDA 1.1.0
  • CFD: OpenFOAM 9/v2206
  • Visualisation & computer vision: ParaView 5.10.1, VMD 1.9.4a57, OpenCV 4.6
  • API interaction and data transfer: aws-cli 2.7.1, s3cmd, aria2 1.36.0
  • Quantum simulators: QSimCirq 0.14.0
  • Container systems: Singularity-CE 3.10.2