Managing data

The MeluXina system has several storage spaces (data tiers). To obtain the best results, you should ensure that you are using the right space for the right task. In this section we will discuss the intended use of each storage space, along with some of its characteristics.

Typical data workflow

Within this section we are focusing on the data storage tiers that can be used for data processing (and not backup), i.e., HOME-PROJECT and SCRATCH data tiers, both hosted in independent Lustre-based filesystems.

Storage space	Tier	Easy Access	Real location	Environment Variable
HOME	Tier2	/home/users/$USERID	/mnt/tier2/users/$USERID	$HOME
PROJECT	Tier2	/project/home/$PROJECTID	/mnt/tier2/project/$PROJECTID	$PROJECT
SCRATCH	Tier1	/project/scratch/$PROJECTID	/mnt/tier1/project/$PROJECTID	$SCRATCH
LOCALSCRATCH	Tier0	N.A	/mnt/tier0/project/$PROJECTID	$LOCALSCRATCH

Most user data will be stored in the HOME-PROJECT Lustre filesystem, also referred to as tier2.

HOME will contain your user specific data, while PROJECT is intended to store the data shared with the different members of a project.

The SCRATCH tier provides for accelerated data access. It can thus be useful to keep data separated, and use SCRATCH for temporary result files or datasets under intensive processing, while storing the final results in the larger capacity PROJECT tier.

The BACKUP tier holds backup data from the HOME and PROJECT storage spaces, and it not directly user accessible.

Finally, for data that needs to be retained for a longer period of time and accessed infrequently, the MeluXina ARCHIVE tier provides a large, but slower, storage space. Based on tape drives, it is similar to a physical archive, where files need to be retrieved from before they can be used.

Remember!

Data-intensive applications or long running tasks must not be run directly on the MeluXina login nodes. Use the computing nodes for all executions, either interactively or in batch mode.

HOME

Your HOME directory is intended for keeping source code, input data files, job submission scripts, and has a relatively small quota as it is not meant for storing large datasets or temporary result files.

Your HOME directory is personal, not meant to be shared with other project members, and is available from both login and compute nodes.

cd ~      # move to your home directory
pwd       # print the path

Shared PROJECT

The PROJECT space has a significantly larger quota, is meant for storing and processing large datasets, and for sharing data among members of the project.

The PROJECT directory is available from both compute nodes and the login nodes.

cd /project/home/$PROJECTID

On compute nodes, the PROJECT directory can also be accessed using the environment variable $PROJECT.

cd $PROJECT

Multiple projects

If you are a member of multiple projects, each project will have its own project directory to which you will have access. You are responsible for placing your project data in the correct project directory to prevent from accidentally sharing it with users that should not have access to it. You can view the projects you're a member of with the myquota command.

Shared SCRATCH

For intensive processing on large datasets and especially large files (e.g. hundreds of MB per file), the SCRATCH space is your best choice. This data tier is not under any automatic snapshot or backup policy, thus you must take care to keep copies of any important files. Scratch should be used primarily for temporary data: application checkpoint files, output from jobs and other data that can easily be recreated.

cd /project/scratch/$PROJECTID

On compute nodes, the SCRATCH directory can also be accessed using the environment variable $SCRATCH.

cd $SCRATCH

Access to SCRATCH

Access to the SCRATCH space is not automatically granted to new projects, it must be part of the project request or contract.

Purging of SCRATCH data

There is no automatic purging of data on the SCRATCH tier. The project members are responsible of keeping an eye on their usage. You can see your usage with the myquota command.

Best practices

If your program must search within a file, it is faster to do it by first reading it completely before searching.
If you no longer use certain files but they must be retained, archive and compress them.
Multiple access to a single file stored on a shared filesystem like home, scratch and project is likely to create problems unless you are using a specialized libraries such as MPI-IO or Parallel-HDF5.

Local storage

Except for the CPU nodes, all MeluXina compute nodes have either local SSD or NVMe storage which can be used to stage data and keep results while a job is running. The advantage is fast access to data, however this storage is local to each compute node (not shared between the nodes within a job) and automatically cleaned upon job termination. Data of interest must be copied back to one of the parallel filesystems (HOME, PROJECT or SCRATCH data spaces) before the job ends.

cd $LOCALSCRATCH
cp <my_data> $PROJECT

The local storage directory can also be accessed using the environment variable $LOCALSCRATCH.

cd $LOCALSCRATCH

Local storage type

MeluXina GPU and FPGA nodes have a 1.8TB SSD, while MeluXina large-memory nodes have a 1.8TB NVMe.

Backup and Archive

Daily backups of HOME and PROJECT directories are made on the dedicated BACKUP storage tier. Data restore, long-term-archive, or additional backup requirements have to be requested through the Service Desk.

The default backup policy has a 1 month retention on deleted files, meaning if a file is modified or deleted, it can be restored to the previous state within the 1 month time window.

Backup file types

Only File, Directory and Symlink file types are backed up, and can therefore be restored. Other file types cannot be backed up (e.g. Device, FIFO or Socket), and will simply be skipped by the backup process.

File system quotas

The MeluXina shared filesystems are large but shared between all projects and their members. Data quotas are in place, allowing users to stay within their storage budget.

The below tables list the quota restrictions applied to the different storage categories.

Storage space	Total capacity	Default capacity per user/group	Default max files per user/group	Backup	Archive
HOME PROJECT	12 PiB	100 GiB / user On-demand	100k files / user 1M files / group	Yes	No
SCRATCH	0.6 PiB	On-demand	1M files	No	No
Backup	7 PiB	On-demand	-	-	-
Archive	5 PiB	On-demand	-	-	-
Local SSD	1.8 TiB	1.8TiB per GPU/FPGA/LargeMem node	-	No	No

Decreasing or increasing quotas

Every project is unique and has its own storage requirements. Please contact us to discuss project quota modifications via the Service Desk.

Checking storage quotas

In addition to quota for the HOME directory, you also will want to check the quota for the PROJECT and SCRATCH directories for each project you're a member of. To simplify this we provide the myquota utility to query your quota status for all shared filesystems and all projects you're a member of.

check storage quota

myquota

Compression of data

Data which is not accessed frequently, such as finished results, may be compressed in order to reduce storage space utilization.

We recommend xz and tar to compress single files or whole folder structures. To compress a single file please use the following command:

xz my_file_to_compress

To decompress:

xz --decompress my_file_to_compress

To create an archive containing multiple files or folders:

tar cfJv my_archive.tar.xz my_files_or_folder_to_compress

It is recommended to use the file suffix .tar.xz to make it clear that archive was compressed with xz.

To extract an archive (use -C folder to extract the files in folder):

tar xvf my_archive.tar.xz

NOTE: Also have a look at the dbz2 and dtar commands provided by mpiFileUtils.

Ensuring integrity of data (Checksums)

Data integrity is a crucial aspect for data security. One way to validate a data integrity is through checksums. A checksum is a small value generated by running an algorithm, called a cryptographic hash function, on a set of data files. As long as the contents of the data does not change, the calculation of the checksum will always result in the same value. If you recalculate the checksum, and it is different from what was previously generated, then you know the file has been altered or corrupted in some way.

Typically, the following situations may need a checksum control:

Downloaded or received files.
Data saved from a local computer to MeluXina or from MeluXina to a local computer.
Ensure important computation results are not corrupted.

There are many utilities for calculating checksums, the most commonly used are MD5, SHA1 and SHA256. MD5 and SHA1 are deprecated, and their use is discouraged.

First, we need to prepare some test data on which a checksum will be computed.

echo "My important secret file!" > test_checksum.txt

SHA256

The SHA256 checksum is computed using the following command:

sha256sum test_checksum.txt > test_checksum.sha256

You can verify the file against the checksum using the following command:

sha256sum -c test_checksum.sha256

Output

test_checksum.sha256: OK

Let's change the file and see what happens to the checksum:

echo "Corrupted!" >> test_checksum.sha256

Then, let's compare again against the previously computed checksum

sha256sum -c test_checksum.sha256

Output

test_checksum.sha256: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match

Filesystem permissions

MeluXina uses Linux, meaning a UNIX (or POSIX) style permission system for the filesystems. There are various guides available on UNIX permissions, which will not be duplicated here, but here is a link to a good example: Unix / Linux - File Permission / Access Modes.

Home directory

The default permissions on MeluXina HOME directories is 0700 (drwx------) which translates to:

User has full read-write-execute permission
Default Group (hpcusers) has no permission
Other users have no permission

Newly created files in your home directory will be created with an umask of 0027, as such:

Default directory permissions will be 0750 (drwxr-x---), this means that:
- User has full read-write-execute permission
- Default Group (hpcusers) has only read-execute permission
- Other users have no permission
Default file permissions will be 0640 (-rw-r-----), this means that:
- User has full read-write permission
- Default Group (hpcusers) has only read permission
- Other users have no permission

Primary user group - hpcusers

The hpcusers group has by default permissions on newly created files and directories. However, since members of the hpcusers group cannot access by default other users' home directories, they cannot access each other files. As such you should never change the permissions on your home directory!

Project and Scratch shared directories

The default permissions on MeluXina PROJECT and SCRATCH shared directories is 2770 (drwxrws---) which translates to:

User has full read-write-execute permission
Project Group has full read-write-execute permission with SGID
Other users have no permission

Newly created files in your PROJECT and SCRATCH shared directories will be created with an umask of 0027, as such:

Default directory permissions will be 2750 (drwxr-s---), this means that:
- User has full read-write-execute permission
- Project Group has only read-execute permission with SGID
- Other users have no permission
Default file permissions will be 0640 (-rw-r-----), this means that:
- User has full read-write permission
- Project Group has only read permission
- Other users have no permission

SGID bit

Because of the SGID bit on the shared PROJECT and SCRATCH directories, any newly created files in these directories will inherit the project group, instead of the users default group (hpcusers). So newly created files in these directories will have the group ownership set to the project group. The SGID bit is also why newly created directories in the shared PROJECT and SCRATCH directories have mode 2750 (drwxr-s---) instead of 0750 (drwxr-x---) for your home directory.

Change umask

To simplify sharing of newly created files and directories in the shared PROJECT and SCRATCH directories, you may want to change your default umask by setting umask 0007 in your ~/.bashrc file. On the next login any new files you create will have mode 0660 (-rw-rw----) with full read-write for user and group, and no permissions for others. And likewise, any new directories created will have either mode 0660 (drwxrwx---) or mode 2660 (drwxrws---) (depending on SGID bit), with full read-write-execute permissions for user and group, and no permissions for others.

Change ownership and permissions

If files are copied from your HOME to your PROJECT or SCRATCH shared directories using for instance cp -a (which does a recursive copy, while preserving ownership and timestamps), other members on the project will likely not have the necessary permissions to edit the files. As such you may have to change the ownership (chown) and permissions (chmod) to give other project members the appropriate access to the data. Alternatively doing a recursive copy using cp -r, will set the group ownership to the destination, but is still not likely to give project members the ability to write to them.

e.g., a newly created (or uploaded) file or directory in your HOME directory could have the following ownership and permissions:

$ ls -l
total 4
drwxr-x---. 2 u1234567 hpcusers 4096 Nov 17 14:12 example_dir
-rw-r-----. 1 u1234567 hpcusers    0 Nov 17 14:13 example_file

If you copy or move these to your project directory, the group ownership, depending on the copy menthod, may be retained. In addition, other members of the same project will likely only have read-only access to them. As such to give them proper access, we encourage you to change the group ownership and possibly also write permissions.

$ cp -a example_* /project/home/p2345678/
$ cd /project/home/p2345678/
$ ls -l
total 4
drwxr-s---. 2 u1234567 hpcusers 4096 Nov 17 14:12 example_dir
-rw-r-----. 1 u1234567 hpcusers    0 Nov 17 14:13 example_file
$ chown -R :p2345678 example_*
$ ls -l
total 4
drwxr-s---. 2 u1234567 p2345678 4096 Nov 17 14:12 example_dir
-rw-r-----. 1 u1234567 p2345678    0 Nov 17 14:13 example_file

If you also want to give project members write access, you can do so as follows:

$ chmod -R g+w example_*
$ ls -l
total 4
drwxrws---. 2 u1234567 p2345678 4096 Nov 17 14:12 example_dir
-rw-rw----. 1 u1234567 p2345678    0 Nov 17 14:13 example_file

recursive

In the example chown and chmod commands above the -R option is used. This is particularly useful for large directory structures, as it will apply the changes recursively to every matching file and directory.

chown and chmod restrictions

As a non-privileged user certain restrictions apply to the chown and chmod commands.

You cannot use chown to change the user that owns the file or directory.
Only the user that owns the file or directory can change the group (to a group they are a member of) with chown.
Only the user that owns the file or directory can change permissions with chmod.

Deleting files from shared directories

Files in shared directories can be deleted under the following conditions:

If you have write access to the file
If your the user that owns the file
If your the user that owns the parent directory. e.g. if it is in the root of your project directory, the owner would be the project owner.

Unable to delete files, change ownership or permissions

If you are having difficulty deleting files or directories, as the owner, try to fix the permissions bits, and try again. e.g.:

chmod -R u+rw <dir_or_file_to_delete>

If you run into an issue where you are unable to delete files, change ownership or permissions, your teams project owner should contact our servicedesk with a clear list and description of the changes needed.

Deleting a large number of files

When you have a directory tree with a large number of files in it that you wish to delete, it is recommended that you do not use the regular rm command as it can overload the Lustre metadata servers (potentially impacting many MeluXina users), and take a lot of time.

This is because the rm command will generate a stat() call to the metadata server for each file that is removed.

Instead, it is recommended that you use the Lustre munlink application, as it will delete a file or symlink without performing the extra stat() call. munlink however does not support removal of directories, nor recursive operations, which makes it harder to use.

You can find an excellent write-up of using munlink to delete large directory trees in the Pawsey SuperComputer User Support Documentation.

On MeluXina we are providing a wrapper script named rmtree, aimed to simplify and accelerate the process of deleting large directory trees, while hiding the complexity of using munlink.

Here is an example of its usage, when deleting a directory tree:

$ rmtree ./trash
About to delete "./trash", are you sure?y
INFO: Unlinking files and symlinks in: ./trash
INFO: Deleting directories in: ./trash

The rmtree tool is provided 'AS IS'.

NOTE: Also have a look at the drm command provided by mpiFileUtils.

Finding files

A typical way to find a file or directory based on name or attributes such as modification time (mtime), is to use the find command.

Lustre provides it's own version of find via lfs find, which requires fewer metadata operations, and can therefore be faster than the default GNU find. The Lustre lfs find is not completely command and output interchangeable with the regular GNU find command, but for simple cases, it often can be as easy as replacing find with lfs find.

Documentation for lfs find is available via the man lfs-find command.

NOTE: Also have a look at the dfind command provided by mpiFileUtils.

Lustre MDT striping

Lustre metadata (MDT) striping is a privileged operation and can therefore not be set or modified by users. On MeluXina MDT striping is set such that individual home directories, and project directories, will be placed on different MDT's in a round-robin fashion.

MDT striping can be queried using lfs getdirstripe <directory>. User documentation is available via man lfs-getdirstripe.

Lustre OST striping

Lustre file data (OST) striping can be set by the user on an individual file, or set as a PFL (Progressive File Layout) rule on the parent directory from where new files will inherit the setting.

The Lustre filesystems have a default PFL rule set, which you can find below. In most cases, modifying OST striping is not needed, and can even have adverse effects. Please read the documentation before modifying OST stripe settings.

The UTK Lustre Striping Guide and the UTK Lustre User Guide provide some excellent examples and best practices.

NOTES:

When a PFL rule changes, existing files will not automatically be re-striped, unless they are copied.
OST striping for files can be queried with the lfs getstripe <filename> command (see man lfs-getstripe)
The PFL rule for directories can be queried with lfs getstripe -d <directory>.
A user can manually re-stripe a file with the lfs_migrate command (see man lfs_migrate).
A user can set their own PFL rule on a directory under their control using the lfs setstripe command (see man lfs-setstripe).
To disable OST striping for a directory, use the command lfs setstripe -E -1 -c 1 -S 1M <directory>
A user set PFL rule can be cleared with lfs setstripe -d <directory>, after which it will revert to the filesystem default
OST striping can also be accomplished using the dstripe command command provided by mpiFileUtils.

Default PFL for Tier1 (scratch)

The following lustre PFL rule was set on Tier1 (scratch):

lfs setstripe -E 1G -c 1 -S 1M -E 4G -c 4 -S 1M -E -1 -c 8 -S 1M /mnt/tier1

This PFL rule has the following effect:

Files up to 1 GiB will go to a single OST
Files up to 4 GiB will be striped over four OSTs
Larger files will be striped over eight OSTs

Default PFL for Tier2 (project/home)

The following lustre PFL rule was set on Tier2 (project/home):

lfs setstripe -E 1G -c 1 -S 1M -E 4G -c 4 -S 1M -E -1 -c -1 -S 1M /mnt/tier2

This PFL rule has the following effect:

Files up to 1 GiB will go to a single OST
Files up to 4 GiB will be striped over four OSTs
Larger files will be striped over all OSTs

mpiFileUtils

mpiFileUtils is available on the MeluXina compute nodes as a loadable module. e.g,

ml mpifileutils

See the MeluXina Software environment for more information on modules.

The mpiFileUtils module provides a range of utilities and the libmfu library, which uses MPI to perform filesystem operations in parallel, potentially delivering a significant performance boost.

An overview of the available utilities:

dbcast - Broadcast a file to each compute node.
dbz2 - Compress and decompress a file with bz2.
dchmod - Change owner, group, and permissions on files.
dcmp - Compare contents between directories or files.
dcp - Copy files.
ddup - Find duplicate files.
dfind - Filter files.
dreln - Update symlinks to point to a new path.
drm - Remove files.
dstripe - Re-stripe files (Lustre).
dsync - Synchronize source and destination directories or files.
dtar - Create and extract tape archive files.
dwalk - List, sort, and profile files.

See the mpiFileUtils documentation for more information.

NOTE: mpiFileUtils is not available on MeluXina login nodes, as they are meant to be run via mpirun.