When running Distribution or Consolidation jobs periodically (e.g., daily), it is common for the dataset to have minimal changes. In such cases, consecutive runs can take as much time as the initial transfer due to the source agent having to calculate hash values for all of the files before they’re transferred.
You can optimize the performance of your periodic Distribution and Consolidation using custom job profile parameters.
Previous Versions (2.x and 3.x)
In versions 2.x and 3.x, this issue was addressed by setting the parameter Force file owner to hash files
to No
(or custom parameter transfer_job_force_owner_to_hash_file
to false
). However, this approach had a significant drawback: agents stopped controlling file integrity by calculating hashes. Since Active Everywhere is a peer-to-peer product and files can be assembled from multiple sources, disabling hashing increases the risk of building corrupted files if they arrive from agents with different content.
New Solution in Active Everywhere v4.0
Starting from Active Everywhere v4.0, you can instruct source agents to retain hash values between job runs. To optimize job performance on subsequent runs, source agents will reuse previously calculated data for files whose size and modification timestamps remain unchanged.
To enable hash values retention, add the following custom parameters to the job profile:
-
transfer_job_retain_file_hashes
Set to
true
to retain file-level hashes. -
transfer_job_retain_file_hashes_metadata
(optional)Set to
true
to retain file block-level hashes. Enable this only if you use Differential sync.
These parameters will:
- Force source agents to keep the hash values after the job run completes.
- Force source agents to load the hash values dataset at the start of the next run.
If you are currently using the transfer_job_force_owner_to_hash_file
parameter, we strongly recommend switching to transfer_job_retain_file_hashes
.
Note: The first job run after applying these parameters will take the usual amount of time, as the hash values dataset from the previous run is unavailable. The following run will use the retained hash values, resulting in improved performance.
Important Considerations
-
Disk Space Usage: The databases that store hashes will occupy disk space. The size of the file-level database is proportional to the number of files delivered in the job, while the block-level database size is proportional to the total volume of the data.
-
Data Path Dependency: The database is associated with the data's absolute path. If the source agent path changes, hashes will need to be recalculated during the next job run.
-
File Moves: File moving is not supported. If files are moved or renamed within the job folder between runs, the next job run will treat them as new files and process them accordingly.