There are two general modes of Server Redundancy, transparent and non-transparent.

In transparent Redundancythe Failoverof Serverresponsibilities from one Serverto another is transparent to the Client. The Clientis unaware that a Failoverhas occurred and the Clienthas no control over the Failoverbehaviour. Furthermore, the Clientdoes not need to perform any actions to continue to send or receive data.

In non-transparent Redundancythe Failoverfrom one Serverto another and actions to continue to send or receive data are performed by the Client. The Clientshall be aware of the Redundant Server Setand shall perform the required actions to benefit from the Server Redundancy.

The ServerRedundancy Objectdefined in OPC 10000-5indicates the mode supported by the Server. The ServerRedundancyType ObjectTypeand its subtypes TransparentRedundancyType andNonTransparentRedundancyType defined in OPC 10000-5specify information for the supported Redundancymode.

OPC UA Serversthat are part of a Redundant Server Sethave certain AddressSpacerequirements. These requirements allow a Client to consistently access information from Serversin a Redundant Server Setand to make intelligent choices related to the health and availability of Serversin the Redundant Server Set.

Serversin the Redundant Server Setshall have an identical AddressSpaceincluding:

The only Nodesthat can differ between Serversin a Redundant Server Setare the Nodesthat are in the local Servernamespace like the Server diagnostic Nodes. A Client that fails over shall not be required to translate browse paths or otherwise resolve NodeIds. Serversare allowed to add and delete Nodesas long as all Serversin the Redundant Server Setwill be updated with the same Nodechanges.

All Serversin a Redundant Server Setshall be synchronized with respect to time. This may mean installing a NTP service or a PTP service.

There are other important considerations for a redundant system regarding synchronization:

To a Client the transparent Redundant Server Set appears as if it is just a single Server and the Client has no Failoveractions to perform. All Serversin the Redundant Server Sethave an identical ServerUriand an identical EndpointUrl.

Figure 26shows a typical transparent Redundancysetup.

image029.png

Figure 26– Transparent Redundancy setup example

For transparent Redundancy, OPC UA provides data structures to allow Clientsto identify which Serversare in the Redundant Server Set, the ServiceLevelof each Server, and which Server is currently responsible for the Client Session. This information is specified in TransparentRedundancyType ObjectTypedefined in OPC 10000-5. Since the ServerUriis identical for all Serversin the Redundant Server Set, the Serversare identified with a ServerIdcontained in the information provided in the TransparentRedundancyType Object.

In transparent Redundancy, a Clientis not able to control which physical Serverit actually connects to. Failoveris controlled by the Redundant Server Setand a Clientis also not able to actively Failoverto another Serverin the Redundant Server Set.

All OPC UA interactions within a given Sessionshall be supported by one Serverand the Clientis able to identify which Serverthat is, allowing a complete audit trail for the data. It is the responsibility of the Serversto ensure that information is synchronized between the Servers. A functional Serverwill take over the Sessionand Subscriptionsfrom the Failed Server. Failovermay require a reconnection of the Client’s SecureChannelbut the EndpointUrlof the Serverand the ServerUrishall not change. The Clientshall be able to continue communication with the Sessionsand Subscriptionscreated on the previously used Server.

Figure 26provides an abstract view of a transparent Redundant Server Set. The two or more Serversin the Redundant Server Setshare a virtual network address and therefore all Servershave the identical EndpointUrl. How this virtual network address is created and managed is vendor specific. There may be special hardware that mediates the network address displayed to the rest of the network. There may be custom hardware, where all components are redundant and Failoverat a hardware level automatically. There may even be software based systems where all the transparency is governed completely by software.

For non-transparent Redundancy, OPC UA provides the data structures to allow the Client to identify what Serversare available in the Redundant Server Setand also Serverinformation which tells the Clientwhat modes of Failoverthe Serversupports. This information allows the Clientto determine what actions it may need to take in order to accomplish Failover. This information is specified in NonTransparentRedundancyType ObjectTypedefined in OPC 10000-5.

Figure 27shows a typical non-transparent Redundancysetup.

image030.png

Figure 27– Non-Transparent Redundancy setup

For non-transparent Redundancy, the Serverswill have unique IP addresses. The Serveralso has additional Failovermodes of Cold, Warm, Hotand HotAndMirrored. The Clientshall be aware of the Redundant Server Setand shall be required to perform some actions depending on the Failovermode. These actions are described in Table 111and additional examples and explanations are provided in 6.6.2.4.5.2.for Cold, 6.6.2.4.5.3for Warm, 6.6.2.4.5.4for Hotand 6.6.2.4.5.5for HotAndMirrored.

A Client needs to be able to expect that the SourceTimestamp associated with a value is approximately the same from all Serversin the Redundant Server Setfor the same value.

The ServiceLevelprovides information to a Clientregarding the health of a Serverand its ability to provide data. See OPC 10000-5for a formal definition for ServiceLevel. The ServiceLevelis a byte with a range of 0 to 255, where the values fall into the sub-ranges defined in Table 109.

The algorithm used by a Serverto determine its ServiceLevelwithin each sub-range is Serverspecific. However, all Serversin a Redundant Server Setshall use the same algorithm to determine the ServiceLevel. AllServers, regardless of Redundant Server Setmembership, shall adhere to the sub-ranges defined in Table 109.

Table 109– ServiceLevel ranges

Sub-range

Name

Description

0-0

Maintenance

The Failed Serveris in maintenance sub-range. Therefore, new Clients shall not connect and currently connected Clientsshall disconnect. The Servershould expose a target time at which the Clients are able to reconnect. See EstimatedReturnTimedefined in OPC 10000-5for additional information.

A Serverthat has been set to Maintenanceis typically undergoing some maintenance or updates. The main goal for the Maintenance ServiceLevelis to ensure that Clients do not generate load on the Serverand allow time for the Serverto complete any actions that are required. This load includes even simple connections attempts or monitoring of the ServiceLevel. The EstimatedReturnTimeindicates when the Client should check to see if the Serveris available. If updates or patches are taking longer than expected the Client may discover that the EstimatedReturnTimehas been extended further into the future. If the Serverdoes not provide the EstimatedReturnTime,or if the time has lapsed, the Clientshould use a much longer interval between reconnects to a Serverin the Maintenancesub-range than its normal reconnect interval.

1-1

NoData

The Failed Serveris not operational. Therefore, a Client is not able to exchange any information with it. The Servermost likely has no data other than ServiceLevel, ServerStatusand diagnostic information available.

A Failed Server in this sub-range has no data available. Clients may connect to it to obtain ServiceLevel, ServerStatusand other diagnostic information. If the underlying system has failed, typically the ServerStatuswould indicate COMMUNICATION_FAULT_6. The Client may monitor this Serverfor a ServerStatusand ServiceLevelchange, which would indicate that normal communication could be resumed.

2-199

Degraded

The Serveris partially operational, but is experiencing problems such that portions of the AddressSpaceare out of service or unavailable. An example usage of this ServiceLevelsub-range would be if 3 of 10 devices connected to a Serverare unavailable.

Serversthat report a ServiceLevelin the Degradedsub-range are partially able to service Clientrequests. The degradation could be caused by loss of connection to underlying systems. Alternatively, it could be that the Serveris overloaded to the point that it is unable to reliably deliver data to Clients in a timely manner.

If Clientsare experiencing difficulties obtaining required data, they shall switch to another Serverif any Serversin the Healthyrange are available. If no Serversare available in the Healthyrange, then Clients may switch to a Serverwith a higher ServiceLevelor one that provides the required data. Some Clients may also be configured for higher priority data and may check all Degraded Servers, to see if any of the Serversare able to report as good quality the high priority data, but this functionality would be Client specific. In some cases a Clientmay connect to multiple Degraded Serversto maximize the available information.

200-255

Healthy

The Serveris fully operational. Therefore, a Client can obtain all information from this Server. The sub-range allows a Serverto provide information that can be used by Clientsto load balance. An example usage of this ServiceLevelsub-range would be to reflect the Server’sCPU load where data is delivered as expected.

Serversin the Healthy ServiceLevel sub-range are able to deliver information in a timely manner. This ServiceLevelmay change for internal Serverreason or it may be used for load balancing described in 6.6.2.4.3.

Clientshall connect to the Serverwith the highest ServiceLevel. Once connected, the ServiceLevelmay change, but a Clientshall not Failoverto a different Serveras long as the ServiceLevelof the Serveris accessible and in the Healthysub-range.

In systems where multiple Hot Servers(see 6.6.2.4.5.4) are available, the Serversin the Redundant Server Setcan share the load generated by Clientsby setting the ServiceLevelin the Healthysub-range based on the current load. Clients are expected to connect to the Server with the highest ServiceLevel. Clients shall not Failoverto a different Server in the Redundant Server Setof Serversas long as the Server is in the Healthysub-range. This is the normal behaviour for all Clients, when communicating with redundant Servers. Serverscan adjust their ServiceLevelbased on the number of Clientsthat are connected, CPU loading, memory utilization, or any other Server specific criteria.

For example in a system with 3 Servers, all Serversare initially at ServiceLevel255, but when a Client connects, the Server with the Client connection sets its level to 254. The next Client would connect to a different Server since both of the other Serversare still at 255.

It is up to the Server vendor to define the logic for spreading the load and the number of expected Clients, CPU load or other criteria on each Server before the ServiceLevelis decremented. It is envisioned that some Serverswould be able to accomplish this without any communication between the Servers.

The Failovermode of a Serveris provided in the ServerRedundancy Objectdefined in OPC 10000-5. The different Failovermodes for non-transparent Redundancy are described in Table 110.

Table 110– Server Failover modes

Name

Description

Cold

Cold Failovermode is where only one Servercan be active at a time. This may mean that redundant Serversare unavailable (not powered up) or are available but not running (PC is running, but application is not started)

Warm

Warm Failovermode is where the backup Server(s)can be active, but cannot connect to actual data points (typically, a system where the underlying devices are limited to a single connection). Underlying devices, such as PLCs, may have limited resources that permit a single Server connection. Therefore, only a single Server will be able to consume data. The ServiceLevel Variabledefined in OPC 10000-5indicates the ability of the Serverto provide its data to the Client.

Hot

Hot Failovermode is where all Serversare powered-on, and are up and running. In scenarios where Serversacquire data from a downstream device, such as a PLC, then one or more Serversare actively connected to the downstream device(s) in parallel. These Servershave minimal knowledge of the other Serversin their group and are independently functioning. When a Serverfails or encounters a serious problem then its ServiceLeveldrops. On recovery, the Serverreturns to the Redundant Server Setwith an appropriate ServiceLevelto indicate that it is available.

HotAndMirrored

HotAndMirrored Failovermode is where Failoversare for Serversthat are mirroring their internal states to all Serversin the Redundant Server Setand more than one Servercan be active and fully operational. Mirroring state minimally includes Sessions, Subscriptions, registered Nodes, ContinuationPoints, sequence numbers, and sent Notifications. The ServiceLevel Variabledefined in OPC 10000-5should be used by the Clientto find the Serverswith the highest ServiceLevelto achieve load balancing.

Each Server maintains a list of ServerUrisfor all redundant Serversin the Redundant Server Set. The list is provided together with the Failovermode in the ServerRedundancy Objectdefined in OPC 10000-5. To enable Clients to connect to all Serversin the list, each Serverin the list shall provide the ApplicationDescriptionfor all Serversin the Redundant Server Setthrough the FindServers Service. This information is needed by the Clientto translate the ServerUriinto information needed to connect to the other Serversin the Redundant Server Set. Therefore a Clientneeds to connect to only one of the redundant Serversto find the other Serversbased on the provided information. A Clientshould persist information about other Serversin the Redundant Server Set.

Table 111defines a list of Clientactions for initial connections and Failovers.

Table 111– Redundancy Failover actions

Failover mode and Clientoptions

Cold

Warm

Hot (a)

Hot (b)

HotAndMirrored

On initial connection in addition to actions on Active Server:

Connect to more than one OPC UA Server.

X

X

X

Optional for status check

Create Subscriptionsand add monitored items.

X

X

X

Activate sampling on the Subscriptions.

X

X

Activate publishing.

X

At Failover:

OpenSecureChannel to backup OPC UA Server

X

X

CreateSession on backup OPC UA Server

X

ActivateSession on backup OPC UA Server

X

X

Create Subscriptionsand add monitored items.

X

Activate sampling on the Subscriptions.

X

X

Activate publishing.

X

X

X

Clientscommunicating with a non-transparent Redundant Server Setof Serversrequire some additional logic to be able to handle Server failures and to Failoverto another Server in the Redundant Server Set. Figure 28provides an overview of the steps a Clienttypically performs when it is first connecting to a Redundant Server Set. The figure does not cover all possible error scenarios.

image031.png

Figure 28– Client Start-up steps

The initial Server may be obtained via standard discovery or from a persisted list of Serversin the Redundant Server Set. But in any case the Client needs to check which Server in the Server set it should connect to. Individual actions will depend on the Server Failovermode the Server provides and the Failovermode the Client will use.

Clients once connected to a redundant Server shall be aware of the modes of Failoversupported by a Serversince this support affects the available options related to Client behaviour. A Client may always treat a Server using a lesser Failovermode, i.e. for a Server that provides Hot Redundancy, a Client might connect and choose to treat it as if the Server was running in Warm Redundancyor Cold Redundancy. This choice is up to the Client. In the case of Failovermode HotAndMirrored, the Clientshall not use Failovermode Hotor Warmas it would generate unnecessary load on the Servers.

A Cold Failovermode is where the Clientcan only connect to one Serverat a time. When the Clientloses connectivity with the Active Serverit will attempt a connection to the redundant Server(s) which may or may not be available. In this situation the Clientmay need to wait for the redundant Serverto become available and then create Subscriptionsand MonitoredItemsand activate publishing. The Client shall cache any information that is required related to the list of available Serversin the Redundant Server Set. Figure 29illustrates the action a Client would take if it is talking to a Server using Cold Failovermode.

image032.png

Figure 29– Cold Failover

NOTE There may be a loss of data from the time the connection to the Active Serveris interrupted until the time the Clientgets Publish Responsesfrom the backup Server.

A Warm Failovermode is where the Clientshould connect to one or more Serversin the Redundant Server Setprimarily to monitor the ServiceLevel. A Clientcan connect and create Subscriptionsand MonitoredItemson more than one Server,but sampling and publishing can only be active on one Server. However, the active Serverwill return actual data, whereas the other Serversin the Redundant Server Setwill return an appropriate error for the MonitoredItemsin the Publishresponse such as Bad_NoCommunication. The one Active Servercan be found by reading the ServiceLevel Variablefrom all Servers. The Serverwith the highest ServiceLevelis the Active Server. For Failoverthe Clientactivates sampling and publishing on the Serverwith the highest ServiceLevel. Figure 30illustrates the steps a Client would perform when communicating with a Server using Warm Failovermode.

image033.png

Figure 30– Warm Failover

NOTE There may be a temporary loss of data from the time the connection to the Active Serveris interrupted until the time the Clientgets Publish Responsesfrom the backup Server.

A Hot Failovermode is where the Clientshould connect to two or more Serversin the Redundant Server Setand to subscribe to the ServiceLevelvariable defined in OPC 10000-5to find the highest ServiceLevelto achieve load balancing; this means that Clientsshould issue Servicerequests such as Browse, Read, Writeto the Serverwith the highest ServiceLevel. Subscriptionrelated activities will need to be invoked for each connected Server. Clientshave the following choices for implementing Subscriptionbehaviour in a Hot Failovermode:

  1. The Clientconnects to multiple Serversand establishes Subscription(s) in each where only one is Reporting; the others are Samplingonly. The Clientshould setup the queue size for the MonitoredItemssuch that it can buffer all changes during the Failovertime. The Failovertime is the time between the connection interruption and the time the Clientgets Publish Responsesfrom the backup Server. On a Failoverthe Clientshall enable Reportingon the Serverwith the next highest availability.
  2. The Clientconnects to multiple Servers and establishes Subscription(s) in each where all Subscriptionsare Reporting. The Clientis responsible for handling/processing multiple Subscriptionstreams concurrently.

Figure 31illustrate the functionality a Client would perform when communicating with a Server using Hot Failovermode (the figure include both (a) and (b) options)

image034.png

Figure 31– Hot Failover

Clientsare not expected to automatically switch over to a Serverthat has recovered from a failure, but the Clientshould establish a connection to it.

A HotAndMirrored Failovermode is where a Clientonly connects to one Server in the Redundant Server Setbecause the Serverwill share this session/state information with the other Servers. In order to validate the capability to connect to other redundant Serversit is allowed to create Sessionswith other Serversand maintain the open connections by periodically reading the ServiceLevel. A Clientshall not create Subscriptionson the backup Serversfor status monitoring (to prevent excessive load on the Servers). This mode allows Clientsto fail over without creating a new context for communication. On a Failoverthe Clientwill simply create a new SecureChannelon an alternate Serverand then call ActivateSession; all Clientactivities (browsing, subscriptions, history reads, etc.) will then resume. Figure 32illustrate the behaviour a Client would perform when communicating to a Server in HotAndMirrored Failovermode.

image035.png

Figure 32– HotAndMirrored Failover

This Failovermode is similar to the transparent Redundancy. The advantage is that the Clienthas full control over selecting the Server. The disadvantage is that the Clientneeds to be able to handle Failovers.

A vendor can use the non-transparent Redundancyfeatures to create a Serverproxy running on the Clientmachine to provide transparent Redundancyto the Client. This reduces the amount of functionality that needs to be designed into the Clientand to enable simpler Clientsto take advantage of non-transparent Redundancy. The Serverproxy simply duplicates Subscriptionsand modifications to Subscriptions, by passing the calls on to both Servers, but only enabling publishing and sampling on one Server. When the proxy detects a failure, it enables publishing and/or sampling on the backup Server, just as the Clientwould if it were a Redundancyaware Client.

Figure 33shows the Serverproxy used to provide transparent Redundancy.

image036.png

Figure 33– Server proxy for Redundancy