S3 - Data transfer
In development
The S3 interface is accessible for testing, and the service may be subject to disruption. To request access, or to provide feedback, please use our service desk. We are glad to hear from you and learn from your experience!
Overview
The MeluXina S3 gateway allows users to access their data hosted on MeluXina Tier1 and Tier2 storage platforms through the S3 protocol.
The S3 gateway is based on the DDN S3DS (S3 Data Services) software, which implements the Amazon Simple Storage Service S3 protocol.
Gaining access
A prerequisite to using the S3 interface is to obtain specific credentials for this service by contacting our ServiceDesk. The credentials are composed of an access key and a secret key, which will be used by an S3 client (see the section below) for authentication to the service.
S3 clients
To use the S3 protocol for transferring data to MeluXina, you will need to use an S3 client on your workstation or server. There are several clients you may choose from:
Feature | s3cmd | aws s3 | CyberDuck | Python Boto3 |
---|---|---|---|---|
Non-AWS bucket support | Yes | Limited | Yes | Yes |
Create buckets | Yes | Limited | No | Yes |
List buckets | Yes | Yes | Yes | Yes |
Write ACLs | Yes | Yes | No | No |
Read ACLs | No | Yes | No | Yes |
Parallel file transfer | No | Yes | Yes | Yes |
- Recommended S3 client
- When creating a bucket, only
s3cmd
(or a customcurl
command) can be used to properly specify the bucket location in the filesystem
- Better support for ACLs (read and write)
- Parallel and faster file transfers
- GUI for Windows and macOS
- CLI for Linux (called
duck
) - Parallel and faster file transfers
- Provide APIs to CRUD resources with Python scripts
- Process the data inside the Python applications
- Build Python applications on top of S3
Client configuration
Depending on the S3 client you choose, the following settings may need to be configured, as follows:
Parameter | Value | Description |
---|---|---|
Region | lxp |
S3 region name |
Endpoint | s3.lxp.lu |
S3 endpoint |
Path mode | dns , path or auto |
Bucket resolution mode (dns recommended) |
DNS bucket name resolution | Yes | Use DNS to resolve buckets names |
HTTPS | Yes | Use HTTPS |
File ~/.s3cfg
:
[default]
access_key = *****
secret_key = **********
host_base = s3.lxp.lu
host_bucket = yes
bucket_location = lxp
guess_mime_type = True
use_https = True
File ~/.aws/credentials
:
[default]
aws_access_key_id = *****
aws_secret_access_key = **********
Custom profile LuxProvideS3.cyberduckprofile
:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Protocol</key>
<string>s3</string>
<key>Vendor</key>
<string>lxp+s3</string>
<key>Scheme</key>
<string>https</string>
<key>Description</key>
<string>LuxProvide S3 gateway</string>
<key>Default Port</key>
<string>443</string>
<key>Region</key>
<string>lxp</string>
<key>Hostname Configurable</key>
<true/>
<key>Port Configurable</key>
<true/>
<key>Username Configurable</key>
<true/>
</dict>
</plist>
Users can interact with S3 in either Client
or Resources
mode:
Client
: Low-level service accessResources
: Higher-level object-oriented service access
Initializing a Client
:
import boto3
access_key = '*****'
secret_key = '**********'
region = 'lxp'
endpoint_url='https://s3.lxp.lu'
s3_client = boto3.client(
's3',
endpoint_url=endpoint_url,
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
region_name=region
)
Initializing a Resource
:
import boto3
access_key = '*****'
secret_key = '**********'
region = 'lxp'
endpoint_url='https://s3.lxp.lu'
session = boto3.Session(
region_name=region,
aws_access_key_id=access_key,
aws_secret_access_key=secret_key
)
s3_resource = session.resource('s3', endpoint_url=endpoint_url)
Usage examples
Listing buckets
s3cmd ls
aws s3 --endpoint-url https://s3.lxp.lu ls
Not supported.
Iterator:
for bucket in s3_resource.buckets.all():
pass
Iterator unpacking:
buckets = list(s3_resource.buckets.all())
Bucket visibility
- Only the buckets created with the current credentials are listed
- Buckets shared via ACLs are not listed
Listing bucket content
s3cmd ls s3://${bucket_name}
aws s3 --endpoint-url https://s3.lxp.lu ls s3://${bucket_name}
duck --username "${access_key}" --list "lxp+s3://${bucket_name}.s3.lxp.lu/"
Get a bucket handler:
bucket_name: str = 'bucket_name'
bucket = s3.Bucket(name=bucket_name)
Print the bucket's objects keys:
for obj in bucket.objects.all():
print(obj.key)
Creating a bucket
s3cmd mb s3://${bucket_name} --add-header "x-ddn-specified-directory: ${bucket_path}"
s3cmd mb s3://${bucket_name} --add-header "x-ddn-specified-directory: /mnt/${tier}/users/${username}/${bucket_path}"
s3cmd mb s3://${bucket_name} --add-header "x-ddn-specified-directory: /mnt/${tier}/project/${project_name}/${bucket_path}"
Where:
bucket_name
is an S3-compatible, user-defined bucket namebucket_path
is an existing directorytier
is one of the MeluXina storage tiers, i.e.,:tier1
tier2
username
is a username on MeluXinaproject_name
is a project name on MeluXina
Not supported as no extra HTTP header can be passed.
Not supported as no extra HTTP header can be passed.
bucket_name: str = 'bucket_name'
s3_client.create_bucket(Bucket=bucket_name)
Extra headers
Buckets should be created with the extra header x-ddn-specified-directory
.
Creating a bucket without specifying a directory may succeed, but it
results in a bucket with extremely low disk space and inode limitations.
Buckets name
A bucket name must be unique among all MeluXina users and projects.
As such, we advise including the project or username in the bucket name.
- For a project bucket:
p004321-data
- For a user bucket:
u001234-data
Bucket location
Buckets must be located in an existing subdirectory of a user's home directory or project directory.
Deleting a bucket
Bucket deletion
Removing a bucket will remove the underlying files and directories.
s3cmd rb s3://${bucket_name}
aws s3 --endpoint-url https://s3.lxp.lu rb s3://${bucket_name}
Not supported.
bucket_name: str = 'bucket_name'
s3_client.delete_bucket(Bucket=bucket_name)
Uploading data to a bucket
s3cmd put ${local_filename} s3://${bucket_name}
s3cmd put ${local_filename} s3://${bucket_name}/${remote_filename}
aws s3 --endpoint-url https://s3.lxp.lu cp ${local_filename} s3://${bucket_name}
aws s3 --endpoint-url https://s3.lxp.lu cp ${local_filename} s3://${bucket_name}/${remote_filename}
duck --username "${access_key}" --upload "lxp+s3://${bucket_name}.s3.lxp.lu/" ${local_filename}
Get a bucket instance:
bucket_name: str = 'bucket_name'
bucket = s3.Bucket(name=bucket_name)
Upload a local file to the bucket:
local_filename: str
remote_filename: str
bucket.upload_file(local_filename, remote_filename)
Notes:
local_filename
: The original file name in the local locationremote_filename
: The file name saved in Bucket
Removing data from a bucket
s3cmd del s3://${bucket_name}/${remote_filename}
aws s3 --endpoint-url https://s3.lxp.lu rm s3://${bucket_name}/${remote_filename}
duck --username "${access_key}" -D "lxp+s3://${bucket_name}.s3.lxp.lu/${remote_filename}"
Option 1: Delete an object in a bucket with a Boto3 Resource
:
bucket_name: str
remote_filename: str
s3_resource.Object(bucket_name, remote_filename).delete()
Option 2: Delete an object in a bucket with a Boto3 Client
:
bucket_name: str
remote_filename: str
s3_client.delete_object(Bucket=bucket_name, Key=remote_filename)
Downloading data from a bucket
s3cmd get s3://${bucket_name}/${remote_filename}
s3cmd get s3://${bucket_name}/${remote_filename} ${local_filename}
aws s3 --endpoint-url https://s3.lxp.lu cp s3://${bucket_name}/${remote_filename} ${local_filename}
duck --username "${access_key}" --download "lxp+s3://${bucket_name}.s3.lxp.lu/${remote_filename}"
Get a bucket instance:
bucket_name: str
bucket = s3.Bucket(name=bucket_name)
Download the remote file to the local location:
remote_filename: str
local_filename: str
bucket.upload_file(remote_filename, local_filename)
Notes:
local_filename
: The original file name in the local locationremote_filename
: The file name saved in Bucket
Sharing buckets
Buckets can be shared with other users. This requires that the user be granted access via S3 ACLs. See Manage bucket ACLs for more information.
Managing bucket ACLs
Grant ACLs:
s3cmd setacl --acl-grant=${permission}:${user_id} s3://${bucket_name}
Revoke ACLs:
s3cmd setacl --acl-revoke=${permission}:${user_id} s3://${bucket_name}
Where:
permission
is one of:full_control
orall
read
write
uid
is an S3 user ID (as defined in S3 access keys generated by LuxProvide)bucket_name
is the bucket name
Set ACLs:
aws s3api put-object-acl --endpoint-url https://s3.lxp.lu --bucket ${bucket_name} --acl ${acl_name}
aws s3api put-object-acl --endpoint-url https://s3.lxp.lu --bucket ${bucket_name} ${permission} ${uid}
Where:
permission
is one of:--grant-full-control
--grant-read
--grant-write
uid
is an S3 user ID (as defined in S3 access keys generated by LuxProvide)bucket_name
is the bucket name
Not supported.
Not supported.
Filesystem permissions
- Bucket ACLs do not reflect filesystem permissions
- Filesystem permissions do not reflect bucket ACLs
Retrieving bucket ACLs
Only available in the aws s3
client!
aws s3api get-bucket-acl --endpoint-url https://s3.lxp.lu --bucket ${bucket_name}
Not supported.
Not supported.
bucket_name: str
s3_client.get_bucket_acl(Bucket=bucket_name)