It's not rare for files to be present on several servers before creating a job, and Resilio Connect Agents are supposed to synchronize them fast and reliably.
Agents perform a lot of complex logical operations to do the following:
- minimize data transfer across the network;
- make the data available on all Agents as soon as possible;
- allow end users to manage the data as soon as possible.
In the pre-seeded use case scenario, the Resilio Agent faces two quite opposite challenges:
1. Do not transfer the matching pieces of files; transfer only those that differ. To make it possible, agents on all computers need to check their local files, learn the hash of each piece of the file, exchange this information with each other (merge the folder tree), and make a decision about what piece to transfer and from what Agent (which is the newest). This is going to take a while, especially if there are a lot of agents in the job with a lot of files. Also, this requires performing a lot of disk read operations, which is usually slow on HDD or network drives;
2. Bring the system into balance ASAP and eliminate all those background activities.
Resilio Connect provides a flexible and refined configuration for these cases. In this guide, we will go through each of the possible tweaks in detail, based on which Resilio admin may set up the most desirable final configuration. Change them with caution, as each confirmation has its pros and cons.
The two above-mentioned challenges boil down to two big blocks of settings - 1) whether a file needs to be synced or not, and 2) whether to hash files or not.
1. To sync or not to sync?
When making this decision, the Agent automatically looks at the four attributes of a file: creation timestamp, modification timestamp, size, and file permissions. Checking these is quite a quick operation, so if all the attributes on all computers match, Agents assume that nothing needs to be synced. If at least one of these doesn't match, Agents will get ready for data transfer.
It's possible to remove the creation timestamp from the equation. To do so, add the custom parameter transfer_job_exact_ctime_timestamps with value 3 to the Agent profile.
Disabling "Synchronize NTFS permissions" or "Synchronize Posix permissions" in the Job profile can remove file permissions from the equation.
Once the Agents decide that a file needs to be synced, they follow the "Disable differential sync" parameter in the profile.
Yes (default value) - the whole file will be simply synced across the network.
No - Agents will check file pieces, calculate their hashes so as to discover the changed pieces and sync only those.
2. To hash or not to hash?
The default behavior is for the Agents to not hash the file unless the file is requested to be synced. While this might seem to be an optimal solution it has a significant drawback: when a lot of files need to be synced, Agents will sync and hash at the same time, which will look like a slow data replication. This behavior is defined with a Lazy indexing parameter in the Job Profile.
- With its value set to true as the default behavior, the Agent does not hash the files but instead hashes them during upload requests or when any other agent explicitly asks to hash a file.
- If this value is set to false, the Agents will hash the files the same moment it discovers the file or file change.
The speed of hashing dramatically depends on the speed of the disk. It's generally highly advisable to enable the hashing of files, although hashing may take a long time.
Another custom parameter in the Job Profile is Force file owner to hash files. It works similarly to Lazy indexing but applies only to the file's owner. File owner is the Agent which has the latest file version or where the file originated initially. If the parameter is set to true, the file's owner will hash the files regardless of whether they need to be synced or not. However, this may greatly reduce the overall job's progress and performance. With Lazy indexing enabled, only the file's owner will know the hash of a file. So, if any remote Agent needs to download the file, the file's owner will be the only source of the file. Another side effect is that if a cloud Agent owns the file, the Agent must download the file to hash it.
Setting this parameter to false may improve the job's performance, but with enabled Lazy indexing, no Agent will have the file hashed, which will result in full file re-downloading if anything changes. This might be especially ineffective if cloud storage is involved.
If either hashing is enabled, adding the custom parameter prioritize_initial_indexing_mode with value 15 to Agent Profile is advisable. This will force Agents to first hash the files and only then merge the folder tree (don't do it in parallel).
Below are a few sample configurations to illustrate the idea:
Disable differential sync = Yes |
This is the default behavior. Optimal if the attributes of most of the files match; nothing will be synced. However, if anything is changed, only file's owner will have the hash of file, which will slow down data replication - all files will be redownloaded. |
Disable differential sync = No |
This configuration allows to discover and synchronize only file differences (instead of re-downloading the whole file) by forcing all Agents to hash all files in advance, even before they start merging folder trees. As a side effect - job won't perform any transfer activity until agents hash the files. - higher disk load is expected, reducing the overall job performance. |
Disable differential sync = No |
It is similar to sample #1. The difference between them is that if files change, only changed part will be synced |
Disable differential sync = No |
It is the fastest of all to bring the job in balance. If files differ, only changed part will be synched. No agent will hash the matching files. However, this is also a great disadvantage if files are moved around in the job folder - this will results in full file redownload. |
Other peculiarities
1) Archive. If files' timestamps don't perfectly match, ensure there's plenty of space for the Archive folder. If there's no, and no way to give more space, disable Archive through Agent profile at all (mind the other drawbacks of a disabled Archive and enable it back once the job folders are in balance).
2) File permissions. If modified timestamps match, file permissions will get randomly synced between all read-write peers. To avoid it, select Reference Agent in the job
3) Time required to get job folders in balance. If you go with "hash all in advance" scenario, it may take a long time.