What's new on MeluXina
2023-10-30: Upgrade to MOFED 5.8-188.8.131.52
The NVIDIA MOFED driver has been upgraded to the long-term support (LTS) version 5.8-184.108.40.206. This update includes important bug fixes and adds support for the latest operating systems. For a detailed overview of the changes, please check the release note for the detailed changes.
2023-10-30: NVIDIA Driver Update
To support the latest CUDA 12.x releases, we are updating the NVIDIA driver to version 535.104.12. This will enable you to take full advantage of the latest GPU-accelerated applications and frameworks.
For a detailed list of all the changes and updates brought by NVIDIA driver version 535.104.12, please visit the official release notes at the following link: NVIDIA Driver Version 535.104.12 Release Notes.
2023-10-30: Upgrade to Rocky 8.8
The login and compute nodes have been migrated to RHEL/Rocky 8.8. This updated version will bring improved performance, security, and compatibility with the latest software packages.
2023-08-10: Changes to the default Lustre striping policy
A default OST (Object Storage Target) striping policy has been set on both Tier1 (scratch) and Tier2 (project/home) lustre filesystems. This will affect only new files, with existing files not impacted.
Previously no policy was set, meaning files were not striped by default over multiple OSTs.
See the Lustre OST striping section for more details.
The new policy is meant to improve resiliency and potentially data access performance.
If this is not the case for your workload, you can revert a directory and its subdirectories to the prior behaviour with the following command:
lfs setstripe -E -1 -c 1 -S 1M <directory>
2023-06-15: Upgrade to Rocky 8.7
The compute nodes OS has been upgraded from Rocky Linux version 8.6 to 8.7 with latest bug fixes and security patches.
No impact on user applications is expected and there are no changes to the MeluXina software stack.
2023-06-15: Upgrade to MOFED 5.8-220.127.116.11
The NVIDIA MOFED driver has been upgraded to LTS version 5.8-18.104.22.168 with critical bug fixes and support for the new OS. See release note for the detailed changes.
2023-06-15: Upgrade to Nvidia driver 515.105.01
Upgrade of Nvidia driver on GPU nodes fixes the following issue:
- Resolved an issue that sometimes caused abnormal BAR1 memory usage.
- Fixed a race condition that can arise when calling cudaFreeAsync() and cudaDeviceSynchronize() from different threads.
- Fixed an issue where the NVIDIA Linux GPU kernel driver was calling the Linux kernel scheduler while holding a lock with preemption disabled during event notification.
The CUDA Toolkit 11 version remains 11.7.
2023-06-15: Upgrade to Lustre client driver 2.14.0
The Lustre client driver has been upgraded to the 2.14 release, with expected performance improvements.
SLURM & ParaStation
2023-10-30: Slurm update to 23.02.6 and new energy monitoring
SLURM has been upgraded from version 22.05.9 to 23.02.6 with additional functionality and numerous fixes.
No impact on user jobs or launcher scripts are expected with the new Slurm version.
New energy monitoring is now in place for Slurm jobs using the
srun launcher, see the complete details in energy monitoring.
For a full list of changes since SLURM 22.05.9 please see the SchedMD SLURM release notes. Note that not all updates may apply to MeluXina, as the job launcher daemon on the compute nodes is not the upstream
2023-10-30: Upgrade to psmgmt 5.1.57-2
psmgmt has been upgraded from version 5.1.55-5 to 5.1.57-2 with bug fixes. For a full list of changes since version 5.1.55-5 please see the psmgmt Change log.
Message Passing Interface (PMIx) component has been updated to 4.2.6, improving inter-process communication and resource management for your applications. The new upcoming software stack (2023.1) will therefore feature PMIx 4.2.6.
2023-06-15: Upgrade to SLURM 22.05.9
SLURM has been upgraded from version 22.05.5 to 22.05.9 with additional functionality and numerous fixes.
No impact on user jobs or launcher scripts are expected.
For a full list of changes since SLURM 22.05.5 please see the SchedMD SLURM release notes. Note that not all updates may apply to MeluXina, as the job launcher daemon on the compute nodes is not the upstream
2023-06-15: Upgrade to psmgmt 5.1.55-2
psmgmt has been upgraded from version 5.1.52-5 to 5.1.55-2 with bug fixes. For a full list of changes since version 5.1.52-5 please see the psmgmt Change log.
2022-11-24: Upgrade to psmgmt 5.1.52-5
psmgmt has been upgraded to version 5.1.52-5 with some of the following bug fixes:
- Bugfix: prevent segfault due to late REQUEST_LAUNCH_TASKS message. This fix prevent large scale jobs from hanging.
- Bugfix: prevent psslurm from segfault by tracking init of basic configuration.
For a full list of changes since psmgmt 5.1.52-2 please see psmgmt Change log.
2022-11-14: Upgrade to SLURM 22.05.5
SLURM has been upgraded from 21.08.8-2 to 22.05.5, a major new version with additional functionality, security fixes and also breaking changes.
Highlighted change in this release that may impact your jobs: srun will no longer read in
SLURM_CPUS_PER_TASK. This means you will explicitly have to specify
-c) in your srun calls, or set the new
SRUN_CPUS_PER_TASK environment variable get the correct result.
For a full list of changes since SLURM 21.08.8-2 please see the SchedMD SLURM release notes. Note that not all updates may apply to MeluXina, as the job launcher daemon on the compute nodes is not the upstream
2023-01-15: 2022.1 software stack set as default
The 2022.1 software stack announced and installed on MeluXina in mid November 2022 is now the default stack in on the compute nodes. For details into the new tools available vs the previous release please see the notice below.
You can still use the previous production software stack by using
module load env/release/2021.3 in your job scripts.
myquota tool (v0.3.1) for monitoring the usage of your resource allocations is now available on compute nodes through the module system, and loaded by default.
2022-11-14: New 2022.1 software stack release
A new MeluXina User Software Environment release is now available: version 2022.1.
Until 14 January 2023 try the new release by using
module load env/release/2022.1 in your job scripts.
The previous stable release 2021.3 is kept as default on MeluXina during a transition period until 14 January 2023. On 15 January, the 2022.1 release will become default in your jobs on MeluXina. To keep using the deprecated software stack, use
module load env/release/2021.3. Please use the 2022.1 release in the transition period and let us know of your impressions and any problems by contacting our service desk. All staging software stacks will be cleaned and not be accessible from 15 January 2023.
The new 2022.1 software stack release brings new versions of most tools, build on latest compiler toolchains. Highlights in this release:
- Compilers: AOCC 3.2, GCC 11.3, Intel 2022.1, NVIDIA HPC SDK / NVHPC 22.7
- Languages: Python 3.10.4, R 4.2.1, Julia 1.8.2, Go 1.19.1
- Performance Engineering: ARM Forge 22.0.4, Scalasca 2.6, AMD-uPROF 3.6.449
- MPI, parallelization & acceleration: OpenMPI 4.1.4 (with hcoll 4.7.3202, xpmem 2.6.5-36 and knem 1.1.4), CUDA 11.7, TBB 2021.5, PETSc 3.18, Kokkos 3.6.01, cuDNN 22.214.171.124, NCCL 2.12.12
- Data science: JupyterHub 2.3.1, JupyterLab 3.2.8
- AI, Machine Learning, Deep Learning: PyTorch 1.12, TensorFlow 2.9.1, Keras 2.9, Horovod 0.26.0, Spark 3.3.0
- Physics & chemistry: GROMACS 2022.3, NWChem 7.0.2, NAMD 2.14, LAMMPS 23Jun2022, QuantumESPRESSO 7.1, ORCA 5.0.3, DualSPHysics: 5.0.175, QUDA 1.1.0
- CFD: OpenFOAM 9/v2206
- Visualisation & computer vision: ParaView 5.10.1, VMD 1.9.4a57, OpenCV 4.6
- API interaction and data transfer: aws-cli 2.7.1, s3cmd, aria2 1.36.0
- Quantum simulators: QSimCirq 0.14.0
- Container systems: Singularity-CE 3.10.2