- Common recommendations
- Special Profile settings
- Scenario-specific recommendations
- Pre-seeded folder specifics
- Archive and .sync folder specifics
- Permissions required to access buckets
- Limitations and peculiarities
- S3 Compatible storages
Data delivery to cloud storage, like all data delivery in Resilio Connect, is performed by Agents (not the Management Console); therefore, some agents will have to take the role of a "Cloud Agent" to enable working with cloud storage. This article explains some of the specifics of the Cloud Agent's behavior and contains recommendations on how to configure the Cloud Agent for better performance.
Common recommendations
Technically an Agent can serve multiple roles and be set up to sync files on local storage in some jobs and on cloud storage in other jobs. However, for the latter, some fine-tuning of the Agent's settings will be required which may negatively impact the former. Thus, it is highly advisable to dedicate an Agent to a specific cloud storage and assign a special profile to that Agent.
Create a new group, add the Agent to it and assign the Cloud storage profile to the group or apply the "Cloud storage" preset to a separate profile for this group. Later, perform all settings adjustments in that isolated profile only.
Behavior differences
The Cloud storage path for a group is not supported. Don't add these groups to a job, use the specific Agents instead.Some of the behavior patterns are hardcoded and cannot be changed manually. Others are to be changed in the isolated profile as advised above.
A cloud agent does not try to resolve filename conflict and will upload a file to storage as-is, ending up with two objects. Other non-cloud agents will act in accordance with their system specifics and Profile settings in this regard.
Cloud agent does not check files' pieces for copying local file pieces benefits. The updated file will be fully re-synced.
Cloud storages don't have common file system notifications about new or updated files while working in continuous Synchronization job, thus cloud agent relies on periodic folder scan to discover file updates. In the configured "Cloud storage profile", the "Rescan interval" parameter is set to an hour. If files change frequently there, decrease the rescan interval even more. Else, use the force rescan option.
If the cloud agent acts as a destination in a Consolidation or a Distribution Job, set it to 0 (= disable) so the Agents do not check local, pre-seeded objects.
RAM requirements are the same as for synchronization to a non-cloud. See this article for details.
Special Profile settings
As mentioned above, the agent that will perform file transfer to/from a cloud storage, will require some special settings set in its Profile:
1) Increase the "Number of disk I/O threads" parameter (agent profile) to 20 for your cloud agent. It's already set in the pre-configured Cloud storage profile.
2) When working with Google Cloud storage and uploading large files there, increase "Min size of torrent block" to 8388608 = 8Mb).
3) Disable periodic folder scan if the cloud agent is a destination agent in Distribution or Consolidation job. Set Rescan interval to 0 and restart the agent.
4) It's advisable to disable Archive for Wasabi cloud storage through parameter "Use archive" in the Agent Profile. If large files are synced, it is required to add custom parameter "fs_enable_meta" with the value "false" to the Agent Profile.
Scenario-specific recommendations
Usage scenario | Recommendation |
No changes in cloud expected | Set "Rescan interval" parameter to zero (= disable it) |
Cloud agent is Linux |
Run the following commands to allow Agent to control buffers for better performance Add the following custom settings to profile Change these settings as below: |
Cloud agent is Windows, performance over 1Gbps |
Add the following settings to Profile: Change these settings as below:
|
Use distribution or consolidation job, cloud is the destination | Set "Rescan interval" parameter (agent profile) to zero. |
Delivery to S3, performance up to 1Gbps | EC2 linux instance type t3.xlarge and below, general-purpose drive |
Delivery to S3, performance above 1Gbps | EC2 linux instance type t3.xlarge with guaranteed 10Gbps performance (e.g. m5.8xlarge), provisioned drive |
Delivery to Azure, performance up to 1Gbps | Linux general-purpose machine virtual D4s v3 or better |
Delivery to Azure, performance above 1Gbps |
Linux memory-optimized virtual, storage D14 v2
|
Delivery to Google cloud, performance above 2Gbps |
Linux machine type n1-standard-64 or better
|
Pre-seeded folder specifics
All cloud solutions do not allow to explicitly adjust modification time of the files that were uploaded to a cloud. This forces Agents to follow a complex logic to make a decision whether to download data from cloud or deliver data to cloud when same filenames are already present on a destination node.
Sync job
Basically, agents rely on file size to determine whether some syncing needs to be performed.
If file size matches on all agents - on local and cloud storages, it's assumed that file on local file system is newer if its timestamp is newer, it will be uploaded to cloud. If timestamp on local filesystem is older, nothing is synced. If all files are on cloud storage, files are considered equal if sizes match.
If size differs, the one with latest timestamp will be uploaded to others. Once synced, it will keep true mtime of files in cloud in its own database.
If you synchronize data with RW -> RO pattern, please see the behavior in the "Transfer jobs" section for the initial synchronization.
Distribution / consolidation jobs
Mtime on SRC is newer | Mtime on SRC is older | Mtime is equal on SRC and DST | |
Cloud is DESTINATION | Upload file to cloud |
Do nothing * |
Do nothing |
Cloud is SOURCE | Do nothing | Download file from cloud ** (or set TJET=0 to do nothing) |
Do nothing (set transfer_job_force_owner_to_hash_file= false to do nothing) |
TJET - transfer_job_exact_timestamps
* set "transfer_job_exact_timestamps" custom parameter = 0 (i.e. "Disable the mtime optimization feature") to override and DO UPLOAD all files to cloud on every job run
** set "transfer_job_exact_timestamps" = 0 (i.e. "Enable mtime optimization feature for regular disks, too") to override and DO NOT download file from cloud
Only use job profile
Set the custom parameters "transfer_job_exact_timestamps" and "transfer_job_force_owner_to_hash_file" in job profile, not in the agent profile.What if the agent decides to upload/download a file to/from cloud
As opposed to standard "Disk-to-disk" transfer, agent WILL NOT hash file in cloud to decide which pieces it needs to upload/download, as this is equal to full file download from a cloud and costs money. Instead:
- when agent wants to upload a new version of a file into the cloud - it simply overwrites existing in the cloud.
- when agent wants to download a new version of a file from the cloud - it simply downloads complete file and overwrites local one.
Agents WILL hash newly added files as they are uploaded to the cloud. These hashes are used when merging.
Archive and .sync folder specifics
Agents in cloud can create the service .sync directory in a cloud storage with the following specifics:
- Amazon S3 storage won't create it, until it's necessary to store an object deleted or updated on remote agents in its .sync/Archive (for Synchronization), or updated files on each job run (for Distribution and Consolidation jobs).
- Azure files will create .sync with Archive and ID file there to store file version and deleted files (for Synchronization), or updated files on each job run (for Distribution and Consolidation jobs).
- Azure Blob creates .sync, but Archive is created only after it's necessary to store an object deleted or updated on remote agent (for Synchronization), or updated files on each job run (for Distribution and Consolidation jobs).
- Google Cloud storage automatically creates .sync folder to store file version and deleted files (for Synchronization), or updated files on each job run (for Distribution and Consolidation jobs)
- S3 compatible storages have their own peculiarities, see below for more details.
- Sharepoint Online automatically creates .sync folder with Archive and ID file to store file version and deleted files (for Synchronization), or updated files on each job run (for Distribution and Consolidation jobs).
- Oracle Cloud: Archive is not created automatically, until there's need to store file version or a deleted file
Permissions required to access buckets
["s3:AbortMultipartUpload", "s3:GetObject", "s3:ListBucket", "s3:ListAllMyBuckets", "s3:ListBucketMultipartUpload", "s3:ListMultipartUploadParts", "s3:PutObject", "s3:CreateBucket", "s3:DeleteBucket", "s3:DeleteObject"]
Here's the list of minimal permissions, in case when a bucket is already created in the storage:
["s3:AbortMultipartUpload", "s3:GetObject", "s3:ListBucket", "s3:ListBucketMultipartUpload", "s3:ListMultipartUploadParts", "s3:PutObject", "s3:DeleteObject"]
If you use KMS to manage encryption keys - please ensure to allow following actions for the policy included in IAM role assigned to EC2 instance.
["kms:Encrypt", "kms:Decrypt", "kms:GenerateDataKey"]
CopyBlobRequest
CreateContainerRequest
DeleteBlobRequest
DeleteContainerRequest
GetBlobPropertiesRequest
GetBlobRequest
GetBlockListRequest
GetContainerPropertiesRequest
ListBlobsRequest
ListContainersRequest
PutBlobRequest
PutBlockListRequest
PutBlockRequest
CopyFileRequest
CreateFileRequest
CreateShareRequest
DeleteFileRequest
DeleteShareRequest
GetFilePropertiesRequest
GetFileRequest
ListFilesRequest
ListRangesRequest
ListSharesRequest
PutRangeRequest
SetFilePropertiesRequest
storage.objects.get
storage.objects.create
storage.objects.update
storage.objects.delete
storage.objects.list
storage.buckets.get
storage.buckets.create
storage.buckets.delete
storage.buckets.list
Limitations and peculiarities
- Renaming/moving files not supported. Renamed files will be simply re-uploaded to a cloud.
- Selective sync is not supported for cloud storage.
- Real-time notifications for cloud storage not supported. New files / updates are only detected via rescan.
- Cloud storage-specific attributes are not synchronized.
- Configuring cloud storage for a group of agents is not supported. Only for a single agent.
- File attributes synchronization not supported (both basic and extended).
- Agent cannot process objects with empty names.
- Seeding of partially downloaded files not supported. Cloud agent will only seed files if it has full file content.
- Symbolic links synchronization is supported with peculiarities and not supported at all for S3 storages.
- Checking file content before uploading to cloud not supported (files are always simply uploaded).
- You cannot use cloud storage as a source in Distribution or Consolidation jobs that are set to sync file permissions. Files won't be synced at all as cloud storage does not have file permissions.
- Bandwidth scheduler speed limits are not applied to a cloud storage connection by default. It can be changed with a custom parameter
rate_limit_cloud_connections
in Agent' Cloud storage profile. Note, if a Job has only two Agents one of which is a Cloud Agent, the configured bandwidth speed limits will be applied to and enforced by the other Agent in the job. -
Usage limit imposed by Sharepoint. There is a request limit in MS Graph API, which could affect Agent’s performance in terms of files ops. Every request to Sharepoint (not necessarily to upload/download files) is counted in the quotas. Read more about these quotas here.
If the limit is exceeded, the Agent will report the error in the job.
S3 Compatible storages - Wasabi: it's advisable to disable Archive through parameter "Use archive" in the Agent Profile. If large files are synced, it is required to add custom parameter "fs_enable_meta" with the value "false" to the Agent Profile.
- MatrixStore: it's advisable to place the VM with the Agent as close to the data as possible to avoid connection timeouts.
- Even though OpenStack and SwiftStack storages support zero-sized objects and symbolic links, these are not synced with the storage by Resilio.
- Swiftstack: storage Archive is not supported.
- Backblaze: the Agent ignores etags by default. It's configurable with a custom parameter in the Job Profile.
- Azure Data Lake Storage Gen2 with hierarchical namespace: creating a new job from the MC is not supported. It's required that Admin creates the folder on the storage in advance.
- Oracle Cloud: more details here on proper configuration.
- Storages known to be NOT supported: Amazon Glacier, Minio, Hitachi S3.