Skip to content

S3 - Data transfer

In development

The S3 interface is accessible for testing, and the service may be subject to disruption. To request access, or to provide feedback, please use our service desk. We are glad to hear from you and learn from your experience!


Overview

The MeluXina S3 gateway allows users to access their data hosted on MeluXina Tier1 and Tier2 storage platforms through the S3 protocol.

The S3 gateway is based on the DDN S3DS (S3 Data Services) software, which implements the Amazon Simple Storage Service S3 protocol.

Gaining access

A prerequisite to using the S3 interface is to obtain specific credentials for this service by contacting our ServiceDesk. The credentials are composed of an access key and a secret key, which will be used by an S3 client (see the section below) for authentication to the service.

S3 clients

To use the S3 protocol for transferring data to MeluXina, you will need to use an S3 client on your workstation or server. There are several clients you may choose from:

Feature s3cmd aws s3 CyberDuck Python Boto3
Non-AWS bucket support Yes Limited Yes Yes
Create buckets Yes Limited No Yes
List buckets Yes Yes Yes Yes
Write ACLs Yes Yes No No
Read ACLs No Yes No Yes
Parallel file transfer No Yes Yes Yes
  • Recommended S3 client
  • When creating a bucket, only s3cmd (or a custom curl command) can be used to properly specify the bucket location in the filesystem
  • Better support for ACLs (read and write)
  • Parallel and faster file transfers
  • GUI for Windows and macOS
  • CLI for Linux (called duck)
  • Parallel and faster file transfers
  • Provide APIs to CRUD resources with Python scripts
  • Process the data inside the Python applications
  • Build Python applications on top of S3

Client configuration

Depending on the S3 client you choose, the following settings may need to be configured, as follows:

Parameter Value Description
Region lxp S3 region name
Endpoint s3.lxp.lu S3 endpoint
Path mode dns, path or auto Bucket resolution mode (dns recommended)
DNS bucket name resolution Yes Use DNS to resolve buckets names
HTTPS Yes Use HTTPS

File ~/.s3cfg:

[default]
access_key = *****
secret_key = **********
host_base = s3.lxp.lu
host_bucket = yes
bucket_location = lxp
guess_mime_type = True
use_https = True

File ~/.aws/credentials:

[default]
aws_access_key_id = *****
aws_secret_access_key = **********

Custom profile LuxProvideS3.cyberduckprofile:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
    <dict>
        <key>Protocol</key>
        <string>s3</string>
        <key>Vendor</key>
        <string>lxp+s3</string>
        <key>Scheme</key>
        <string>https</string>
        <key>Description</key>
        <string>LuxProvide S3 gateway</string>
        <key>Default Port</key>
        <string>443</string>
        <key>Region</key>
        <string>lxp</string>
        <key>Hostname Configurable</key>
        <true/>
        <key>Port Configurable</key>
        <true/>
        <key>Username Configurable</key>
        <true/>
    </dict>
</plist>

Users can interact with S3 in either Client or Resources mode:

  • Client: Low-level service access
  • Resources: Higher-level object-oriented service access

Initializing a Client:

import boto3

access_key = '*****'
secret_key = '**********'
region = 'lxp'
endpoint_url='https://s3.lxp.lu'

s3_client = boto3.client(
    's3',
    endpoint_url=endpoint_url,
    aws_access_key_id=access_key,
    aws_secret_access_key=secret_key,
    region_name=region
)

Initializing a Resource:

import boto3

access_key = '*****'
secret_key = '**********'
region = 'lxp'
endpoint_url='https://s3.lxp.lu'

session = boto3.Session(
    region_name=region,
    aws_access_key_id=access_key,
    aws_secret_access_key=secret_key
)

s3_resource = session.resource('s3', endpoint_url=endpoint_url)

Usage examples

Listing buckets

s3cmd ls
aws s3 --endpoint-url https://s3.lxp.lu ls

Not supported.

Iterator:

for bucket in s3_resource.buckets.all():
    pass

Iterator unpacking:

buckets = list(s3_resource.buckets.all())

Bucket visibility

  • Only the buckets created with the current credentials are listed
  • Buckets shared via ACLs are not listed

Listing bucket content

s3cmd ls s3://${bucket_name}
aws s3 --endpoint-url https://s3.lxp.lu ls s3://${bucket_name}
duck --username "${access_key}" --list "lxp+s3://${bucket_name}.s3.lxp.lu/"

Get a bucket handler:

bucket_name: str = 'bucket_name'

bucket = s3.Bucket(name=bucket_name)

Print the bucket's objects keys:

for obj in bucket.objects.all():
    print(obj.key)

Creating a bucket

s3cmd mb s3://${bucket_name} --add-header "x-ddn-specified-directory: ${bucket_path}"
s3cmd mb s3://${bucket_name} --add-header "x-ddn-specified-directory: /mnt/${tier}/users/${username}/${bucket_path}"
s3cmd mb s3://${bucket_name} --add-header "x-ddn-specified-directory: /mnt/${tier}/project/${project_name}/${bucket_path}"

Where:

  • bucket_name is an S3-compatible, user-defined bucket name
  • bucket_path is an existing directory
  • tier is one of the MeluXina storage tiers, i.e.,:
    • tier1
    • tier2
  • username is a username on MeluXina
  • project_name is a project name on MeluXina

Not supported as no extra HTTP header can be passed.

Not supported as no extra HTTP header can be passed.

bucket_name: str = 'bucket_name'

s3_client.create_bucket(Bucket=bucket_name)

Extra headers

Buckets should be created with the extra header x-ddn-specified-directory. Creating a bucket without specifying a directory may succeed, but it results in a bucket with extremely low disk space and inode limitations.

Buckets name

A bucket name must be unique among all MeluXina users and projects.
As such, we advise including the project or username in the bucket name.

  • For a project bucket: p004321-data
  • For a user bucket: u001234-data

Bucket location

Buckets must be located in an existing subdirectory of a user's home directory or project directory.

Deleting a bucket

Bucket deletion

Removing a bucket will remove the underlying files and directories.

s3cmd rb s3://${bucket_name}
aws s3 --endpoint-url https://s3.lxp.lu rb s3://${bucket_name}

Not supported.

bucket_name: str = 'bucket_name'

s3_client.delete_bucket(Bucket=bucket_name)

Uploading data to a bucket

s3cmd put ${local_filename} s3://${bucket_name}
s3cmd put ${local_filename} s3://${bucket_name}/${remote_filename}
aws s3 --endpoint-url https://s3.lxp.lu cp ${local_filename} s3://${bucket_name}
aws s3 --endpoint-url https://s3.lxp.lu cp ${local_filename} s3://${bucket_name}/${remote_filename}
duck --username "${access_key}" --upload "lxp+s3://${bucket_name}.s3.lxp.lu/" ${local_filename}

Get a bucket instance:

bucket_name: str = 'bucket_name'

bucket = s3.Bucket(name=bucket_name)

Upload a local file to the bucket:

local_filename: str
remote_filename: str

bucket.upload_file(local_filename, remote_filename)

Notes:

  • local_filename: The original file name in the local location
  • remote_filename: The file name saved in Bucket

Removing data from a bucket

s3cmd del s3://${bucket_name}/${remote_filename}
aws s3 --endpoint-url https://s3.lxp.lu rm s3://${bucket_name}/${remote_filename}
duck --username "${access_key}" -D "lxp+s3://${bucket_name}.s3.lxp.lu/${remote_filename}"

Option 1: Delete an object in a bucket with a Boto3 Resource:

bucket_name: str
remote_filename: str

s3_resource.Object(bucket_name, remote_filename).delete()

Option 2: Delete an object in a bucket with a Boto3 Client:

bucket_name: str
remote_filename: str

s3_client.delete_object(Bucket=bucket_name, Key=remote_filename)

Downloading data from a bucket

s3cmd get s3://${bucket_name}/${remote_filename}
s3cmd get s3://${bucket_name}/${remote_filename} ${local_filename}
aws s3 --endpoint-url https://s3.lxp.lu cp s3://${bucket_name}/${remote_filename} ${local_filename}
duck --username "${access_key}" --download "lxp+s3://${bucket_name}.s3.lxp.lu/${remote_filename}"

Get a bucket instance:

bucket_name: str

bucket = s3.Bucket(name=bucket_name)

Download the remote file to the local location:

remote_filename: str
local_filename: str

bucket.upload_file(remote_filename, local_filename)

Notes:

  • local_filename: The original file name in the local location
  • remote_filename: The file name saved in Bucket

Sharing buckets

Buckets can be shared with other users. This requires that the user be granted access via S3 ACLs. See Manage bucket ACLs for more information.

Managing bucket ACLs

Grant ACLs:

s3cmd setacl --acl-grant=${permission}:${user_id} s3://${bucket_name}

Revoke ACLs:

s3cmd setacl --acl-revoke=${permission}:${user_id} s3://${bucket_name}

Where:

  • permission is one of:
    • full_control or all
    • read
    • write
  • uid is an S3 user ID (as defined in S3 access keys generated by LuxProvide)
  • bucket_name is the bucket name

Set ACLs:

aws s3api put-object-acl --endpoint-url https://s3.lxp.lu --bucket ${bucket_name} --acl ${acl_name}
aws s3api put-object-acl --endpoint-url https://s3.lxp.lu --bucket ${bucket_name} ${permission} ${uid}

Where:

  • permission is one of:
    • --grant-full-control
    • --grant-read
    • --grant-write
  • uid is an S3 user ID (as defined in S3 access keys generated by LuxProvide)
  • bucket_name is the bucket name

Not supported.

Not supported.

Filesystem permissions

  • Bucket ACLs do not reflect filesystem permissions
  • Filesystem permissions do not reflect bucket ACLs

Retrieving bucket ACLs

Only available in the aws s3 client!

aws s3api get-bucket-acl --endpoint-url https://s3.lxp.lu --bucket ${bucket_name}

Not supported.

Not supported.

bucket_name: str

s3_client.get_bucket_acl(Bucket=bucket_name)