When running Distribution or Consolidation jobs periodically (e.g., daily), it is common for the dataset to have minimal changes. In such cases, consecutive runs can take as much time as the initial run because the destination requests the source to calculate file hashes for integrity control.
Previous Versions (2.x and 3.x)
In versions 2.x and 3.x, this issue was addressed by setting the parameter Force file owner to hash files
to No
(or custom parameter transfer_job_force_owner_to_hash_file
to false
). However, this approach had a significant drawback: agents stopped controlling file integrity by calculating hashes. Since Active Everywhere is a peer-to-peer product and files can be assembled from multiple sources, disabling hashing increases the risk of building corrupted files if they arrive from agents with different content.
New Solution in Active Everywhere v4.0
Starting from Active Everywhere v4.0, administrators can force agents to retain file hash databases between job runs. This ensures that on consecutive runs, agents will use file hashes from the database if the file size and timestamps have not changed.
Set the following custom parameters in the job profile:
transfer_job_retain_file_hashes
:- Set to
true
to retain file-level hashes.
- Set to
transfer_job_retain_file_hashes_metadata
(optional):- Set to
true
to retain file block-level hashes. Enable this only if you use Differential sync.
- Set to
These parameters will:
- Force agents to keep the database(s) after the job run completes.
- Force agents to load the databases at the start of the next run.
If you are currently using the transfer_job_force_owner_to_hash_file
parameter, we strongly recommend switching to transfer_job_retain_file_hashes
.
Note: The first job run after applying these parameters will still take the usual amount of time, as the databases are not yet saved. The second run will start using the retained databases, resulting in faster performance.
Important Considerations
-
Disk Space Usage: The databases that store hashes will occupy disk space. The size of the file-level database is proportional to the number of files delivered in the job, while the block-level database size is proportional to the total amount of data delivered.
-
Data Path Dependency: The database is associated with the data's absolute path. If the source agent path changes, hashes will need to be recalculated during the next job run.
-
File Moves: Moves are not supported. If files are moved or renamed within the job folder between runs, the next job run will treat them as new files and process them accordingly.
By following this guide, you can optimize the performance of your periodic Distribution and Consolidation jobs while maintaining data integrity and efficiency.