Available for: Windows, Linux Agents. Full sync shares.
Not available for: NAS, macOS, mobile Agents. TSS shares. Script jobs.
Understanding HA groups
Failover
Create HA group
Using the HA group in jobs
Peculiarities and limitations
Understanding HA groups
Agents in such a group work in an active/passive High Availability mode. Automatically a group leader is elected, who performs all the synchronization and data transfer. Other Agents work in 'follower' mode, keeping the metadata in sync, but not actively participating in the job run. Corresponding status - following - is reported in a job run for them.
Additionally, corresponding Role in HA group is reported in a job run - leader or follower:
When the leader fails for a reason - it goes offline or encounters a critical error - a new leader is elected among the available followers. After successful failover the job continues from where it was interrupted. For failover peculiarities see paragraph below.
Failover
Failover is the process that is triggered if the group leader fails to execute the job:
- it's disconnected from followers in the job.
It means that restarting the Agent process, either right on the server where it's installed or from the Management Console, automatically triggers the failover in all the jobs, where this Agent is the HA group leader. Note, key point here is connection to other Agents in HA group, but not MC. The leader may be not connected to MC, but is connected to other Agents and in this case failover won't happen. - identifying .sync/ID file is deleted, this applies only for sync job - local or cloud storages.
- it encounters Database error.
A number of activities is performed during this process, as a result a new leader is elected and it continues the job.
There's no way for Resilio Admin to choose the new leader, Management Console does not select the new leader either. Resilio's internal algorithm is used for that.
During failover Agents in HA group report status "failover" in the job run. Corresponding event is also reported in job's logs.
Failover process takes some time, around 30 seconds. Additionally, the new leader will have to scan the whole folder after that before it continues the job.
If a new leader is not elected for a reason, the failover process continues until it's elected in the end or times out.
During the failover, Agents do not report any information about the job run's progress and details.
While the Agents work in the Highly Available mode, some of the operations are interrupted by the failover and started again after it, for example:
- initial indexing of the job (when the Agent scans the folder and builds the folder database);
- trigger execution in a Distribution/Consolidation job, the new leader will start these from scratch.
- file download from the leader. If Lazy indexing parameter in the Profile is enabled, remote Agents, outside the HA group that download the file from the HA leader, will re-start the file download from the new leader.
Create HA group
Group can be created in advance before creating a job, or right when creating the job. High availability groups are supported for Sync, Distribution, Consolidation jobs, also for Primary storages in File cache and Hybrid work jobs. It's only supported for Agent with Windows and Linux OS. All Agents in the group must be of the same operating system.
High availability groups are not supported for Script jobs. For other limitations see paragraph below.
A HA group must contain at least one Agent.
Automatic sorting of Agents is also supported. Resilio Admin must ensure that the auto-sorting rules guarantee that there is at least one Agent in the group. Agents won't be automatically added to a HA group in the following cases:
- it's a macOS Agent
- it's a different OS than other Agents in the HA group
- it's of a pre-4.0.0 version
- it does not have "Storage connectors" feature in its license package
Using HA groups in the job
All Agents in the HA group synchronise the same physical location. If the storage location is misconfigured, the followers report the error and cease synchronisation.
Only direct path or a storage connector location are supported. Path macros are not supported for HA groups, and if used, the behavior will be undefined. Changing the job path for an already configured job is not supported, you need to remove the group from the job and add again with a new path.
Job run details and its progressing - number of files, size of the job, etc - are reported by the leader. Followers' information is not taken into consideration. For getting the correct understanding and expectations of how the job run is progressing, all leaders in HA groups in this job must be elected (not in the process of failover) and online (connected to MC). Otherwise the job's progress will be reduced accordingly.
Follower Agents do not report this information to the MC:
Job runs table: transfer progress %, Agents completed %, Agents in progress %, Transfer ETA, Size, Data/files transferred, Files completed %
Job run ->Overview tab: ETA, Total bytes/files in job, Total bytes/files to transfer,
Job run -> Statistics tab: Bytes transferred, files transferred
Job run -> Agents tab: all statistics is hidden from UI . Available are only Group, Role in group, Errors, Peers, Status, and some static information like OS, OS user, version, permissions, host, etc.
Job run_ > Agent -> Overview tab: "Speed“ and “Transfer“ sections, Start/Finish time, Priority (for leaders as well), Bytes/Files to receive, ETA, Archive size (managed by agent as well)
Job run -> Agent -> Aactive files: the whole tab is not available for follower
Agents table: Total files/size
Agent -> Overview tab: Total files/size
Sync jobs/File cache/Hybrid work
HA group can be used in these jobs with the "not supported" limitations mentioned in paragraph below.
Synchronization of file permissions is supported. Reference Agent must be selected in such a job. A whole HA group can be selected as a Reference Agent. See here for more details about Reference Agent in general and working with HA groups in particular.
Cross-platform synchronization of file permissions
Using Highly Available groups on a system that cannot apply the replicated permissions (for example, on a Linux with replicated NTFS permissions or vise versa) should be avoided. It may lead to unexpected permissions issues and access problems. Always ensure that HA groups are used on systems with compatible file permission structures.For HA groups, working in a job with Read-Only access, option "Overwrite any changed files" from the Agent profile must be set to "Yes". Otherwise, if it's "No", Agents in such a group won't be able to properly detect the file changed and job will stuck.
Distribution/Consolidation jobs
HA groups can be used as source or destination in these jobs.
All triggers in the job are executed by the leader in the HA group. If failover happens during script execution, the newly elected leader starts executing the script from scratch.
An active job run can be stopped on an agent from HA group - the job run is stopped for all Agents in the group.
Adding new agents to an active job run with a HA group in it is not supported.
Restarting the job run that has a HA group in it is not supported, Agents will report the error about the misconfigured storage path.
Job run will be aborted on all agents in HA group in case of some error, which is added to "Abort on error" list, appeared on the group leader. "Agent offline" error is ignored on all HA agents.
Tags AGENT_NAME and AGENT_ID are supported for HA groups. The name and ID of the HA group will be used.
Peculiarities and limitations
Not supported:
- all Agents of version older than 4.0.0
- macOS, mobile Agents, Agents install on NAS devices
- Script jobs
- Path macros for HA groups
- changing the job path for HA group
- Adding new agents to a job run or restarting the job run that has a HA group
- Network policy rules
- Job priorities
High availability groups are not compatible with Priority Agent functionality.
Temporary error "Share's identifying .sync/ID file is broken" for HA groups may appear after changing the job type or recreating the job using the same files storage.
Followers in an HA group are reported as "In progress" in Idle sync jobs. They are also counted in the percentage of "Agents in progress" in a job run