Managing data
The MeluXina system has several storage spaces (data tiers). To obtain the best results, you should ensure that you are using the right space for the right task. In this section we will discuss the intended use of each storage space, along with some of its characteristics.
Typical data workflow
Within this section we are focusing on the data storage tiers that can be used for data processing (and not backup), i.e., HOME-PROJECT and SCRATCH data tiers, both hosted in independent Lustre-based filesystems.
Storage space | Tier | Easy Access | Real location | Environment Variable |
---|---|---|---|---|
HOME | Tier2 | /home/users/$USERID | /mnt/tier2/users/$USERID | $HOME |
PROJECT | Tier2 | /project/home/$PROJECTID | /mnt/tier2/project/$PROJECTID | $PROJECT |
SCRATCH | Tier1 | /project/scratch/$PROJECTID | /mnt/tier1/project/$PROJECTID | $SCRATCH |
LOCALSCRATCH | Tier0 | N.A | /mnt/tier0/project/$PROJECTID | $LOCALSCRATCH |
Most user data will be stored in the HOME-PROJECT Lustre filesystem, also referred to as tier2.
HOME will contain your user specific data, while PROJECT is intended to store the data shared with the different members of a project.
The SCRATCH tier provides for accelerated data access. It can thus be useful to keep data separated, and use SCRATCH for temporary result files or datasets under intensive processing, while storing the final results in the larger capacity PROJECT tier.
The BACKUP tier holds backup data from the HOME and PROJECT storage spaces, and it not directly user accessible.
Finally, for data that needs to be retained for a longer period of time and accessed infrequently, the MeluXina ARCHIVE tier provides a large, but slower, storage space. Based on tape drives, it is similar to a physical archive, where files need to be retrieved from before they can be used.
Remember!
Data-intensive applications or long running tasks must not be run directly on the MeluXina login nodes. Use the computing nodes for all executions, either interactively or in batch mode.
HOME
Your HOME directory is intended for keeping source code, input data files, job submission scripts, and has a relatively small quota as it is not meant for storing large datasets or temporary result files.
Your HOME directory is personal, not meant to be shared with other project members, and is available from both login and compute nodes.
cd ~ # move to your home directory
pwd # print the path
Shared PROJECT
The PROJECT space has a significantly larger quota, is meant for storing and processing large datasets, and for sharing data among members of the project.
The PROJECT directory is available from both compute nodes and the login nodes.
cd /project/home/$PROJECTID
On compute nodes, the PROJECT directory can also be accessed using the environment variable $PROJECT.
cd $PROJECT
Multiple projects
If you are a member of multiple projects, each project will have its own project directory to which you will have access. You are responsible for placing your project data in the correct project directory to prevent from accidentally sharing it with users that should not have access to it. You can view the projects you're a member of with the myquota
command.
Shared SCRATCH
For intensive processing on large datasets and especially large files (e.g. hundreds of MB per file), the SCRATCH space is your best choice. This data tier is not under any automatic snapshot or backup policy, thus you must take care to keep copies of any important files. Scratch should be used primarily for temporary data: application checkpoint files, output from jobs and other data that can easily be recreated.
cd /project/scratch/$PROJECTID
On compute nodes, the SCRATCH directory can also be accessed using the environment variable $SCRATCH.
cd $SCRATCH
Access to SCRATCH
Access to the SCRATCH space is not automatically granted to new projects, it must be part of the project request or contract.
Purging of SCRATCH data
There is no automatic purging of data on the SCRATCH tier. The project members are responsible of keeping an eye on their usage. You can see your usage with the myquota
command.
Best practices
- If your program must search within a file, it is faster to do it by first reading it completely before searching.
- If you no longer use certain files but they must be retained, archive and compress them.
- Multiple access to a single file stored on a shared filesystem like home, scratch and project is likely to create problems unless you are using a specialized libraries such as MPI-IO or Parallel-HDF5.
Local storage
Except for the CPU nodes, all MeluXina compute nodes have either local SSD or NVMe storage which can be used to stage data and keep results while a job is running. The advantage is fast access to data, however this storage is local to each compute node (not shared between the nodes within a job) and automatically cleaned upon job termination. Data of interest must be copied back to one of the parallel filesystems (HOME, PROJECT or SCRATCH data spaces) before the job ends.
cd $LOCALSCRATCH
cp <my_data> $PROJECT
The local storage directory can also be accessed using the environment variable $LOCALSCRATCH
.
cd $LOCALSCRATCH
Local storage type
MeluXina GPU and FPGA nodes have a 1.8TB SSD, while MeluXina large-memory nodes have a 1.8TB NVMe.
Backup and Archive
Daily backups of HOME and PROJECT directories are made on the dedicated BACKUP storage tier. Data restore, long-term-archive, or additional backup requirements have to be requested through the Service Desk.
The default backup policy has a 1 month retention on deleted files, meaning if a file is modified or deleted, it can be restored to the previous state within the 1 month time window.
Backup file types
Only File, Directory and Symlink file types are backed up, and can therefore be restored. Other file types cannot be backed up (e.g. Device, FIFO or Socket), and will simply be skipped by the backup process.
File system quotas
The MeluXina shared filesystems are large but shared between all projects and their members. Data quotas are in place, allowing users to stay within their storage budget.
The below tables list the quota restrictions applied to the different storage categories.
Storage space | Total capacity | Default capacity per user/group | Default max files per user/group | Backup | Archive |
---|---|---|---|---|---|
HOME PROJECT |
12 PiB | 100 GiB / user On-demand |
100k files / user 1M files / group |
Yes | No |
SCRATCH | 0.6 PiB | On-demand | 1M files | No | No |
Backup | 7 PiB | On-demand | - | - | - |
Archive | 5 PiB | On-demand | - | - | - |
Local SSD | 1.8 TiB | 1.8TiB per GPU/FPGA/LargeMem node | - | No | No |
Decreasing or increasing quotas
Every project is unique and has its own storage requirements. Please contact us to discuss project quota modifications via the Service Desk.
Checking storage quotas
In addition to quota for the HOME directory, you also will want to check the quota for the PROJECT and SCRATCH directories for each project you're a member of. To simplify this we provide the myquota
utility to query your quota status for all shared filesystems and all projects you're a member of.
check storage quota
myquota
Compression of data
Data which is not accessed frequently, such as finished results, may be compressed in order to reduce storage space utilization.
We recommend xz and tar to compress single files or whole folder structures. To compress a single file please use the following command:
xz my_file_to_compress
To decompress:
xz --decompress my_file_to_compress
To create an archive containing multiple files or folders:
tar cfJv my_archive.tar.xz my_files_or_folder_to_compress
It is recommended to use the file suffix .tar.xz to make it clear that archive was compressed with xz.
To extract an archive (use -C folder to extract the files in folder):
tar xvf my_archive.tar.xz
NOTE: Also have a look at the dbz2
and dtar
commands provided by mpiFileUtils.
Ensuring integrity of data (Checksums)
Data integrity is a crucial aspect for data security. One way to validate a data integrity is through checksums. A checksum is a small value generated by running an algorithm, called a cryptographic hash function, on a set of data files. As long as the contents of the data does not change, the calculation of the checksum will always result in the same value. If you recalculate the checksum, and it is different from what was previously generated, then you know the file has been altered or corrupted in some way.
Typically, the following situations may need a checksum control:
- Downloaded or received files.
- Data saved from a local computer to MeluXina or from MeluXina to a local computer.
- Ensure important computation results are not corrupted.
There are many utilities for calculating checksums, the most commonly used are MD5, SHA1 and SHA256. MD5 and SHA1 are deprecated, and their use is discouraged.
First, we need to prepare some test data on which a checksum will be computed.
echo "My important secret file!" > test_checksum.txt
The SHA256 checksum is computed using the following command:
sha256sum test_checksum.txt > test_checksum.sha256
You can verify the file against the checksum using the following command:
sha256sum -c test_checksum.sha256
Output
test_checksum.sha256: OK
Let's change the file and see what happens to the checksum:
echo "Corrupted!" >> test_checksum.sha256
Then, let's compare again against the previously computed checksum
sha256sum -c test_checksum.sha256
Output
test_checksum.sha256: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
Filesystem permissions
MeluXina uses Linux, meaning a UNIX (or POSIX) style permission system for the filesystems. There are various guides available on UNIX permissions, which will not be duplicated here, but here is a link to a good example: Unix / Linux - File Permission / Access Modes.
Home directory
The default permissions on MeluXina HOME directories is 0700 (drwx------
) which translates to:
- User has full read-write-execute permission
- Default Group (hpcusers) has no permission
- Other users have no permission
Newly created files in your home directory will be created with an umask of 0027, as such:
- Default directory permissions will be 0750 (
drwxr-x---
), this means that:- User has full read-write-execute permission
- Default Group (hpcusers) has only read-execute permission
- Other users have no permission
- Default file permissions will be 0640 (
-rw-r-----
), this means that:- User has full read-write permission
- Default Group (hpcusers) has only read permission
- Other users have no permission
Primary user group - hpcusers
The hpcusers group has by default permissions on newly created files and directories. However, since members of the hpcusers group cannot access by default other users' home directories, they cannot access each other files. As such you should never change the permissions on your home directory!
Project and Scratch shared directories
The default permissions on MeluXina PROJECT and SCRATCH shared directories is 2770 (drwxrws---
) which translates to:
- User has full read-write-execute permission
- Project Group has full read-write-execute permission with SGID
- Other users have no permission
Newly created files in your PROJECT and SCRATCH shared directories will be created with an umask of 0027, as such:
- Default directory permissions will be 2750 (
drwxr-s---
), this means that:- User has full read-write-execute permission
- Project Group has only read-execute permission with SGID
- Other users have no permission
- Default file permissions will be 0640 (
-rw-r-----
), this means that:- User has full read-write permission
- Project Group has only read permission
- Other users have no permission
SGID bit
Because of the SGID bit on the shared PROJECT and SCRATCH directories, any newly created files in these directories will inherit the project group, instead of the users default group (hpcusers). So newly created files in these directories will have the group ownership set to the project group.
The SGID bit is also why newly created directories in the shared PROJECT and SCRATCH directories have mode 2750 (drwxr-s---
) instead of 0750 (drwxr-x---
) for your home directory.
Change umask
To simplify sharing of newly created files and directories in the shared PROJECT and SCRATCH directories, you may want to change your default umask by setting umask 0007
in your ~/.bashrc
file. On the next login any new files you create will have mode 0660 (-rw-rw----
) with full read-write for user and group, and no permissions for others. And likewise, any new directories created will have either mode 0660 (drwxrwx---
) or mode 2660 (drwxrws---
) (depending on SGID bit), with full read-write-execute permissions for user and group, and no permissions for others.
Change ownership and permissions
If files are copied from your HOME to your PROJECT or SCRATCH shared directories using for instance cp -a
(which does a recursive copy, while preserving ownership and timestamps), other members on the project will likely not have the necessary permissions to edit the files. As such you may have to change the ownership (chown
) and permissions (chmod
) to give other project members the appropriate access to the data. Alternatively doing a recursive copy using cp -r
, will set the group ownership to the destination, but is still not likely to give project members the ability to write to them.
e.g., a newly created (or uploaded) file or directory in your HOME directory could have the following ownership and permissions:
$ ls -l
total 4
drwxr-x---. 2 u1234567 hpcusers 4096 Nov 17 14:12 example_dir
-rw-r-----. 1 u1234567 hpcusers 0 Nov 17 14:13 example_file
If you copy or move these to your project directory, the group ownership, depending on the copy menthod, may be retained. In addition, other members of the same project will likely only have read-only access to them. As such to give them proper access, we encourage you to change the group ownership and possibly also write permissions.
$ cp -a example_* /project/home/p2345678/
$ cd /project/home/p2345678/
$ ls -l
total 4
drwxr-s---. 2 u1234567 hpcusers 4096 Nov 17 14:12 example_dir
-rw-r-----. 1 u1234567 hpcusers 0 Nov 17 14:13 example_file
$ chown -R :p2345678 example_*
$ ls -l
total 4
drwxr-s---. 2 u1234567 p2345678 4096 Nov 17 14:12 example_dir
-rw-r-----. 1 u1234567 p2345678 0 Nov 17 14:13 example_file
If you also want to give project members write access, you can do so as follows:
$ chmod -R g+w example_*
$ ls -l
total 4
drwxrws---. 2 u1234567 p2345678 4096 Nov 17 14:12 example_dir
-rw-rw----. 1 u1234567 p2345678 0 Nov 17 14:13 example_file
recursive
In the example chown
and chmod
commands above the -R
option is used. This is particularly useful for large directory structures, as it will apply the changes recursively to every matching file and directory.
chown and chmod restrictions
As a non-privileged user certain restrictions apply to the chown
and chmod
commands.
- You cannot use
chown
to change the user that owns the file or directory. - Only the user that owns the file or directory can change the group (to a group they are a member of) with
chown
. - Only the user that owns the file or directory can change permissions with
chmod
.
Deleting files from shared directories
Files in shared directories can be deleted under the following conditions:
- If you have write access to the file
- If your the user that owns the file
- If your the user that owns the parent directory. e.g. if it is in the root of your project directory, the owner would be the project owner.
Unable to delete files, change ownership or permissions
If you are having difficulty deleting files or directories, as the owner, try to fix the permissions bits, and try again. e.g.:
chmod -R u+rw <dir_or_file_to_delete>
If you run into an issue where you are unable to delete files, change ownership or permissions, your teams project owner should contact our servicedesk with a clear list and description of the changes needed.
Deleting a large number of files
When you have a directory tree with a large number of files in it that you wish to delete, it is recommended that you do not use the regular rm
command as it can overload the Lustre metadata servers (potentially impacting many MeluXina users), and take a lot of time.
This is because the rm
command will generate a stat()
call to the metadata server for each file that is removed.
Instead, it is recommended that you use the Lustre munlink
application, as it will delete a file or symlink without performing the extra stat()
call.
munlink
however does not support removal of directories, nor recursive operations, which makes it harder to use.
You can find an excellent write-up of using munlink
to delete large directory trees in the Pawsey SuperComputer User Support Documentation.
On MeluXina we are providing a wrapper script named rmtree
, aimed to simplify and accelerate the process of deleting large directory trees, while hiding the complexity of using munlink
.
Here is an example of its usage, when deleting a directory tree:
$ rmtree ./trash
About to delete "./trash", are you sure?y
INFO: Unlinking files and symlinks in: ./trash
INFO: Deleting directories in: ./trash
The rmtree
tool is provided 'AS IS'.
NOTE: Also have a look at the drm
command provided by mpiFileUtils.
Finding files
A typical way to find a file or directory based on name or attributes such as modification time (mtime), is to use the find
command.
Lustre provides it's own version of find via lfs find
, which requires fewer metadata operations, and can therefore be faster than the default GNU find
.
The Lustre lfs find
is not completely command and output interchangeable with the regular GNU find
command, but for simple cases, it often can be as easy as replacing find
with lfs find
.
Documentation for lfs find
is available via the man lfs-find
command.
NOTE: Also have a look at the dfind
command provided by mpiFileUtils.
Lustre MDT striping
Lustre metadata (MDT) striping is a privileged operation and can therefore not be set or modified by users. On MeluXina MDT striping is set such that individual home directories, and project directories, will be placed on different MDT's in a round-robin fashion.
MDT striping can be queried using lfs getdirstripe <directory>
. User documentation is available via man lfs-getdirstripe
.
Lustre OST striping
Lustre file data (OST) striping can be set by the user on an individual file, or set as a PFL (Progressive File Layout) rule on the parent directory from where new files will inherit the setting.
The Lustre filesystems have a default PFL rule set, which you can find below. In most cases, modifying OST striping is not needed, and can even have adverse effects. Please read the documentation before modifying OST stripe settings.
The UTK Lustre Striping Guide and the UTK Lustre User Guide provide some excellent examples and best practices.
NOTES:
- When a PFL rule changes, existing files will not automatically be re-striped, unless they are copied.
- OST striping for files can be queried with the
lfs getstripe <filename>
command (seeman lfs-getstripe
) - The PFL rule for directories can be queried with
lfs getstripe -d <directory>
. - A user can manually re-stripe a file with the
lfs_migrate
command (seeman lfs_migrate
). - A user can set their own PFL rule on a directory under their control using the
lfs setstripe
command (seeman lfs-setstripe
). - To disable OST striping for a directory, use the command
lfs setstripe -E -1 -c 1 -S 1M <directory>
- A user set PFL rule can be cleared with
lfs setstripe -d <directory>
, after which it will revert to the filesystem default - OST striping can also be accomplished using the
dstripe
command command provided by mpiFileUtils.
Default PFL for Tier1 (scratch)
The following lustre PFL rule was set on Tier1 (scratch):
lfs setstripe -E 1G -c 1 -S 1M -E 4G -c 4 -S 1M -E -1 -c 8 -S 1M /mnt/tier1
This PFL rule has the following effect:
- Files up to 1 GiB will go to a single OST
- Files up to 4 GiB will be striped over four OSTs
- Larger files will be striped over eight OSTs
Default PFL for Tier2 (project/home)
The following lustre PFL rule was set on Tier2 (project/home):
lfs setstripe -E 1G -c 1 -S 1M -E 4G -c 4 -S 1M -E -1 -c -1 -S 1M /mnt/tier2
This PFL rule has the following effect:
- Files up to 1 GiB will go to a single OST
- Files up to 4 GiB will be striped over four OSTs
- Larger files will be striped over all OSTs
mpiFileUtils
mpiFileUtils is available on the MeluXina compute nodes as a loadable module. e.g,
ml mpifileutils
See the MeluXina Software environment for more information on modules.
The mpiFileUtils module provides a range of utilities and the libmfu
library, which uses MPI to perform filesystem operations in parallel,
potentially delivering a significant performance boost.
An overview of the available utilities:
- dbcast - Broadcast a file to each compute node.
- dbz2 - Compress and decompress a file with bz2.
- dchmod - Change owner, group, and permissions on files.
- dcmp - Compare contents between directories or files.
- dcp - Copy files.
- ddup - Find duplicate files.
- dfind - Filter files.
- dreln - Update symlinks to point to a new path.
- drm - Remove files.
- dstripe - Re-stripe files (Lustre).
- dsync - Synchronize source and destination directories or files.
- dtar - Create and extract tape archive files.
- dwalk - List, sort, and profile files.
See the mpiFileUtils documentation for more information.
NOTE: mpiFileUtils is not available on MeluXina login nodes, as they are meant to be run via mpirun
.