Errata exists for this version of the document.

The OPC UA Services define a number of mechanisms to meet the security requirements outlined in OPC 10000-2. This clause describes a number of important security-related procedures that OPC UA Applications shall follow.

All OPC UA Applications require an Application Instance Certificate which shall contain the following information:

  • The network name or address of the computer where the application runs;
  • The name of the organisation that administers or owns the application;
  • The name of the application;
  • The URI of the application instance;
  • The name of the Certificate Authority that issued the Certificate;
  • The issue and expiry date for the Certificate;
  • The public key issued to the application by the Certificate Authority (CA);
  • A digital signature created by the Certificate Authority (CA).

In addition, each Application Instance Certificate has a private key which should be stored in a location that can only be accessed by the application. If this private key is compromised, the administrator shall assign a new Application Instance Certificate and private key to the application.

This Certificate may be generated automatically when the application is installed. In this situation the private key assigned to the Certificate shall be used to create the Certificate signature. Certificates created in this way are called self-signed Certificates.

If the administrator responsible for the application decides that a self-signed Certificate does not meet the security requirements of the organisation, then the administrator should install a Certificate issued by a Certification Authority. The steps involved in requesting an Application Instance Certificate from a Certificate Authority are shown in Figure 19.

image022.png

Figure 19 – Obtaining and Installing an Application Instance Certificate

The figure above illustrates the interactions between the application, the Administrator and the Certificate Authority. The Application is as OPC UA Application installed on a single machine. The Administrator is the person responsible for managing the machine and the OPC UA Application. The Certificate Authority is an entity that can issue digital Certificates that meet the requirements of the organisation deploying the OPC UA Application.

If the Administrator decides that a self-signed Certificate meets the security requirements for the organisation, then the Administrator may skip Steps 3 through 5. Application vendors shall ensure that a Certificate is available after the installation process. Every OPC UA Application shall allow the Administrators to replace Application Instance Certificates with Certificates that meet their requirements.

When the Administrator requests a new Certificate from a Certificate Authority, the Certificate Authority may require that the Administrator provide proof of authorization to request Certificates for the organisation that will own the Certificate. The exact mechanism used to provide this proof depends on the Certificate Authority.

Vendors may choose to automate the process of acquiring Certificates from an authority. If this is the case, the Administrator would still go through the steps illustrated in Figure 19, however, the installation program for the application would do them automatically and only prompt the Administrator to provide information about the application instance being installed.

Applications shall never communicate with another application that they do not trust. An Application decides if another application is trusted by checking whether the Application Instance Certificate for the other application is trusted. Applications shall rely on lists of Certificates provided by the Administrator to determine trust. There are two separate lists: a list of trusted Applications and a list of trusted Certificate Authorities (CAs). If an application is not directly trusted (i.e. its Certificate is not in the list of trusted applications) then the application shall build a chain of Certificates back to a trusted CA.

When building a chain each Certificate in the chain shall be validated. If any validation error occurs then the trust check fails. Some validation errors are non-critical which means they can be suppressed by a user of an Application with the appropriate privileges. Suppressed validation errors are always reported via auditing (i.e. an appropriate Audit event is raised).

Building a trust chain requires access to all Certificates in the chain. These Certificates may be stored locally or they may be provided with the application Certificate. Processing fails with Bad_SecurityChecksFailed if a CA Certificate cannot be found.

Table 106 specifies the steps used to validate a Certificate in the order that they shall be followed. These steps are repeated for each Certificate in the chain. Each validation step has a unique error status and audit event type that shall be reported if the check fails. The audit event is in addition to any audit event that was generated for the particular Service that was invoked. The Service audit event in its message text shall include the audit EventId of the AuditCertificateEventType (for more details, see 6.5). Processing halts if an error occurs, unless it is non-critical and it has been suppressed.

ApplicationInstanceCertificates shall not be used in a Client or Server until they have been evaluated and marked as trusted. This can happen automatically by a PKI trust chain or in an offline manner where the Certificate is marked as trusted by an administrator after evaluation.

Table 106 – Certificate Validation Steps

Step

Error/AuditEvent

Description

Certificate Structure

Bad_CertificateInvalid Bad_SecurityChecksFailed

AuditCertificateInvalidEventType

The Certificate structure is verified.

This error may not be suppressed.

If this check fails on the Server side, the error Bad_SecurityChecksFailed shall be reported back to the Client.

Build Certificate Chain

Bad_CertificateChainIncomplete

Bad_SecurityChecksFailed

AuditCertificateInvalidEventType

The trust chain for the Certificate is created.

An error during the chain creation may not be suppressed.

If this check fails on the Server side, the error Bad_SecurityChecksFailed shall be reported back to the Client.

Signature

Bad_CertificateInvalid

Bad_SecurityChecksFailed

AuditCertificateInvalidEventType

A Certificate with an invalid signature shall always be rejected.

A Certificate signature is invalid if the Issuer Certificate is unknown. A self-signed Certificate is its own issuer.

If this check fails on the Server side, the error Bad_SecurityChecksFailed shall be reported back to the Client.

Security Policy Check

Bad_CertificatePolicyCheckFailed

Bad_SecurityChecksFailed

AuditCertificateInvalidEventType

A Certificate signature shall comply with the CertificateSignatureAlgorithm, MinAsymmetricKeyLength and MaxAsymmetricKeyLength requirements for the used SecurityPolicy defined in OPC 10000-7.

If this check fails on the Server side, the error Bad_SecurityChecksFailed shall be reported back to the Client.

This error may be suppressed.

Trust List Check

Bad_CertificateUntrusted

Bad_SecurityChecksFailed

AuditCertificateUntrustedEventType

If the Application Instance Certificate is not trusted and none of the CA Certificates in the chain is trusted, the result of the Certificate validation shall be Bad_CertificateUntrusted.

If this check fails on the Server side, the error Bad_SecurityChecksFailed shall be reported back to the Client.

Validity Period

Bad_CertificateTimeInvalid

Bad_CertificateIssuerTimeInvalid

AuditCertificateExpiredEventType

The current time shall be after the start of the validity period and before the end.

This error may be suppressed.

Host Name

Bad_CertificateHostNameInvalid

AuditCertificateDataMismatchEventType

The HostName in the URL used to connect to the Server shall be the same as one of the HostNames specified in the Certificate.

This check is skipped for CA Certificates.

This check is skipped for Server side validation.

This error may be suppressed.

URI

Bad_CertificateUriInvalid

AuditCertificateDataMismatchEventType

Application and Software Certificates contain an application or product URI that shall match the URI specified in the ApplicationDescription provided with the Certificate.

This check is skipped for CA Certificates.

This error may not be suppressed.

The gatewayServerUri is used to validate an Application Certificate when connecting to a Gateway Server (see 7.1).

Certificate Usage

Bad_CertificateUseNotAllowed

Bad_CertificateIssuerUseNotAllowed

AuditCertificateMismatchEventType

Each Certificate has a set of uses for the Certificate (see OPC 10000-6). These uses shall match use requested for the Certificate (i.e. Application, Software or CA).

This error may be suppressed unless the Certificate indicates that the usage is mandatory.

Find Revocation List

Bad_CertificateRevocationUnknown Bad_CertificateIssuerRevocationUnknown

AuditCertificateRevokedEventType

Each CA Certificate may have a revocation list. This check fails if this list is not available (i.e. a network interruption prevents the application from accessing the list). No error is reported if the Administrator disables revocation checks for a CA Certificate.

This error may be suppressed.

Revocation Check

Bad_CertificateRevoked

Bad_CertificateIssuerRevoked

AuditCertificateRevokedEventType

The Certificate has been revoked and may not be used.

This error may not be suppressed.

If this check fails on the Server side, the error Bad_SecurityChecksFailed shall be reported back to the Client.

Certificates are usually placed in a central location called a CertificateStore. Figure 20 illustrates the interactions between the Application, the Administrator and the CertificateStore. The CertificateStore could be on the local machine or in some central server. The exact mechanisms used to access the CertificateStore depend on the application and PKI environment set up by the Administrator.

image023.png

Figure 20 – Determining if a Application Instance Certificate is Trusted

All OPC UA Applications shall establish a SecureChannel before creating a Session. This SecureChannel requires that both applications have access to Certificates that can be used to encrypt and sign Messages exchange. The Application Instance Certificates installed by following the process described in 6.1.2 may be used for this purpose.

The steps involved in establishing a SecureChannel are shown in Figure 21.

image024.png

Figure 21 – Establishing a SecureChannel

Figure 21 above assumes Client and Server have online access to a CertificateA uthority (CA). If online access is not available and if the administrator has installed the CA public key on the local machine, then the Client and Server shall still validate the application Certificates using that key. The figure shows only one CA, however, there is no requirement that the Client and Server Certificates be issued by the same authority. A self-signed Application Instance Certificate does not need to be verified with a CA. Any Certificate shall be rejected if it is not in a trust list provided by the administrator.

Both the Client and Server shall have a list of Certificates that they have been configured to trust (sometimes called the Certificate Trust List or CTL). These trusted Certificates may be Certificates for Certificate Authorities or they may be OPC UA Application Instance Certificates. OPC UA Applications shall be configured to reject connections with applications that do not have a trusted Certificate.

Certificates can be compromised, which means they should no longer be trusted. Administrators can revoke a Certificate by removing it from the trust list for all applications or the CA can add the Certificate to the Certificate Revocation List (CRL) for the Issuer Certificate. Administrators may save a local copy of the CRL for each Issuer Certificate when online access is not available.

A Client does not need to call GetEndpoints each time it connects to the Server. This information should change rarely and the Client can cache it locally. If the Server rejects the OpenSecureChannel request the Client should call GetEndpoints and make sure the Server configuration has not changed.

There are two security risks which a Client shall be aware of when using the GetEndpoints Service. The first could come from a rogue Discovery Server that tries to direct the Client to a rogue Server. For this reason the Client shall verify that the ServerCertificate in the EndpointDescription is a trusted Certificate before it calls CreateSession.

The second security risk comes from a third party that alters the contents of the EndpointDescriptions as they are transferred over the network back to the Client. The Client protects itself against this by comparing the list of EndpointDescriptions returned from the GetEndpoints Service with list returned in the CreateSession response.

The exact mechanisms for using the security token to sign and encrypt Messages exchanged over the SecureChannel are described in OPC 10000-6. The process for renewing tokens is also described in detail in OPC 10000-6.

In many cases, the Certificates used to establish the SecureChannel will be the Application Instance Certificates. However, some Communication Stacks might not support Certificates that are specific to a single application. Instead, they expect all communication to be secured with a Certificate specific to a user or the entire machine. For this reason, OPC UA Applications will need to exchange their Application Instance Certificates when creating a Session.

Once an OPC UA Client has established a SecureChannel with a Server it can create an OPC UA Session.

The steps involved in establishing a Session are shown in Figure 22.

image025.png

Figure 22 – Establishing a Session

Figure 22 above illustrates the interactions between a Client, a Server, a Certificate Authority (CA) and an identity provider. The CA is responsible for issuing the Application Instance Certificates. If the Client or Server does not have online access to the CA, then they shall validate the Application Instance Certificates using the CA public key that the administrator shall install on the local machine.

The identity provider may be a central database that can verify that user token provided by the Client. This identity provider may also tell the Server which access rights the user has. The identity provider depends on the user identity token. It could be a Certificate Authority, a Kerberos ticket granting service, a WS-Trust Server or a proprietary database of some sort.

The Client and Server shall prove possession of their Application Instance Certificates by signing the Certificates with a nonce appended. The exact mechanism used to create the proof of possession signatures is described in 5.6.2. Similarly, the Client shall prove possession by either providing a secret like a password in the user identity token or by creating a signature with the secret associated with a user identity token like x.509 v3.

Once an OPC UA Client has established a Session with a Server it can change the user identity associated with the Session by calling the ActivateSession service.

The steps involved in impersonating a user are shown in Figure 23.

image026.png

Figure 23 – Impersonating a User

Authorization Services provide Access Tokens to Clients on behalf of Users that they pass to a Server to be granted access to resources.

In a basic model (as shown in Figure 22) the Server is responsible for authorization (i.e. deciding what a user can do) while a separate identity provider (e.g. the operating system) is responsible for authentication (deciding who the user is).

In more complex models, the Server relies on external Authorization Services to provide some of its authorization requirements. These Authorization Services act in concert with an external identity provider which validates the user credentials before the external Authorization Service creates an Access Token that tells the Server what the user is a allowed to do. The Client interactions with these services may be indirect as shown in 6.2.2 or direct as shown in 6.2.3.

Even when the Server requires the Client to use an external Authorization Service the Server is still responsible for managing and enforcing the Permissions assigned to Nodes in its Address Space. The clauses below discuss the use of an external Authorization Service in more detail.

Authorization Services (AS) provide access to identity providers which can validate the credentials provided by Clients. They then provide tokens which can be passed to a Server instead of the credentials. These tokens are passed as an IssuedIdentityToken defined in 7.36.6.

The protocol to request tokens depends on the Authorization Service (AS). Common protocols include Kerberos and OAuth2. OAuth2 supports claims based authorization as described in OPC 10000-2.

Servers publish the Authorization Services (AS) they support in the UserTokenPolicies list return with GetEndpoints. The IssuedTokenType field specifies the protocol used to communicate with the AS. The IssuerEndpointUrl field contains the information needed by the Client to connect to the AS using the protocol required by the AS.

The basic handshake is shown in Figure 24.

image027.png

Figure 24 – Indirect Handshake with an Identity Provider

Authorization Services require that Servers be registered with them because the Access Tokens can only be used with a single Server. This can introduce a lot of complexity for administrators. One way to reduce this complexity is to leverage the Server information that is already managed by a Global Discovery Service (GDS) described in OPC 10000-12. In this model the user identities are still managed by a central Authorization Service. The interactions are shown in Figure 25.

image028.png

Figure 25 – Direct Handshake with an Identity Provider

The UserTokenPolicy returned from the Server provides the URL of the Authorization Service and the identity provider. If the Application Authorization Service is linked with the GDS, it knows of all Servers which have been issued Certificates. The ApplicationUri is used as the identifier for the Server passed to the AS. The identity provider is responsible for managing users known to the system. It validates the credentials provided by the Client and returns an Identity Access Token which identifies the user. The Identity Access Token is passed to the Application Authorization Service which validates the Client and Server applications and creates a new Access Token that can be used to access the Server.

The Session-less Service invocation is introduced for Services, such as Read, Write or Call, that do not require any caller specific state information. It is accessible through the SessionlessInvoke Service which provides the context information required to call Services without a Session.

Session-less invocation is limited to Services of the View Service Set (with exception of RegisterNodes and UnregisterNodes), Attribute Service Set, Method Service Set, NodeManagement Service Set and Query Service Set. All Services belonging to these Service Sets that are supported by a Server via a Session shall also be supported via the SessionlessInvoke Service.

Session-less Services can be invoked via a SecureChannel by using the Access Token returned from the Authorization Service as the authenticationToken in the requestHeader. The SecureChannel shall have encryption enabled to prevent eavesdroppers from seeing the Access Token. The Access Token provides the user authentication. If application authentication through the SecureChannel is sufficient, Servers may not require the Access Token and assume an anonymous user. In this case the authenticationToken shall be null.

The SessionlessInvoke Messages are just an envelope for the Service to invoke and do not have a RequestHeader and ResponseHeader like other Services. Those parameters are already part of the body which contains the Message for the Service to invoke.

Any Endpoint used for normal communication could be used for Session-less invocation provided the Endpoint supports encryption. The Server returns Bad_ServiceUnsupported if it does not support Session-less invocation for the request specified in the body. If it supports invocation but not with the combination of Endpoint and security settings used it returns Bad_SecurityModeInsufficient.

Servers may expose Endpoints which are only for use with Session-less invocation. These Endpoints shall support GetEndpoints and FindServers in addition to the SessionlessInvoke Service. The Server returns Bad_ServiceUnsupported for the other Services.

A Session ensures that a namespace index or a server index does not change during the lifetime of a Session. This cannot be ensured between Session-less Services invocations. There are two options to ensure the namespace indices in the call match the expected namespace URIs in the Server. One option for the caller is to pass in the list of namespace URIs used to build the namespace indices. This works best for single Session-less Service invocations. The second option is to pass in the UrisVersion to ensure consistency of namespace arrays between Client and Server. The UrisVersion is first read from the Server together with the NamespaceArray and ServerArray. This reduces the overhead per call for a sequence of Session-less Service invocations.

Table 107 defines the parameters for the Service.

Table 107 – SessionlessInvoke Service Parameters

Name

Type

Description

Request

urisVersion

VersionTime

The version of the NamespaceArray and the ServerArray used for the Service invocation. The version must match the value of the UrisVersion Property that defines the version for the URI lists in the NamespaceArray and the ServerArray Properties defined in OPC 10000-5. If the urisVersion parameter does not match the Servers UrisVersion Property, the Server shall return Bad_VersionTimeInvalid. In this case the Client shall read the UrisVersion, NamespaceArray and the ServerArray from the Server Object to repeat the Service invocation with the right version. The VersionTime DataType is defined in 7.38.

If the value is 0, the parameter is ignored and the URIs are defined by the namespaceUris and serverUris parameters in request and response.

If the value is non-zero, the namespaceUris and serverUris parameters in the request are ignored by the Server and set to null arrays in the response.

namespaceUris []

String

A list of URIs referenced by NodeIds or QualifiedNames in the request.

NamespaceIndex 0 shall not be in this list.

The first entry in this list is NamespaceIndex 1.

The parameter shall be ignored by the Server if the urisVersion is not 0.

serverUris []

String

A list of URIs referenced by ExpandedNodeIds in the request.

ServerIndex 0 shall not be in this list.

The first entry in this list is ServerIndex 1.

The parameter shall be ignored by the Server if the urisVersion is not 0.

localeIds []

LocaleId

List of locale ids in priority order for localized strings. The first LocaleId in the list has the highest priority. If the Server returns a localized string to the Client, the Server shall return the translation with the highest priority that it can. If it does not have a translation for any of the locales identified in this list, then it shall return the string value that it has and include the locale id with the string. See OPC 10000-3 for more detail on locale ids. If localeIds is empty, the returned language variant is Server specific.

serviceId

UInt32

The numeric identifier assigned to the Service request DataType describing the body.

body

*

The body of the request.

The body is an embedded structure containing the corresponding Service request for the serviceId.

Response

namespaceUris []

String

A list of URIs referenced by NodeIds or QualifiedNames in the response.

NamespaceIndex 0 shall not be in this list.

The first entry in this list is NamespaceIndex 1.

An empty array shall be returned if the urisVersion is not 0.

serverUris []

String

A list of URIs referenced by ExpandedNodeIds in the response.

ServerIndex 0 shall not be in this list.

The first entry in this list is ServerIndex 1.

An empty array shall be returned if the urisVersion is not 0.

serviceId

UInt32

The numeric identifier assigned to the Service response DataType describing the body.

body

*

The body of the response.

The body is an embedded structure containing the corresponding Service response for the serviceId.

Table 108 defines the Service results specific to this Service. Common StatusCodes are defined in Table 177.

Table 108 – SessionlessInvoke Service Result Codes

Symbolic Id

Description

Bad_VersionTimeInvalid

The provided version time is no longer valid.

Note: Details on SoftwareCertificates need to be defined in a future version.

Auditing is a requirement in many systems. It provides a means of tracking activities that occur as part of normal operation of the system. It also provides a means of tracking abnormal behaviour. It is also a requirement from a security standpoint. For more information on the security aspects of auditing, see OPC 10000-2. This sub-clause describes what is expected of an OPC UA Server and Client with respect to auditing and it details the audit requirements for each service set. Auditing can be accomplished using one or both of the following methods:

  1. The OPC UA Application that generates the audit event can log the audit entry in a log file or other storage location;
  2. The OPC UA Server that generates the audit event can publish the audit event using the OPC UA event mechanism. This allows an external OPC UA Client to subscribe to and log the audit entries to a log file or other storage location.

Each OPC UA Service request contains a string parameter that is used to carry an audit record id. A Client or any Server operating as a Client, such as an aggregating Server, can create a local audit log entry for a request that it submits. This parameter allows this Client to pass the identifier for this entry with the request. If this Server also maintains an audit log, it should include this id in its audit log entry that it writes. When this log is examined and that entry is found, the examiner will be able to relate it directly to the audit log entry created by the Client. This capability allows for traceability across audit logs within a system.

A Server that maintains an audit log shall provide the audit log entries via Event Messages. The AuditEventType and its sub-types are defined in OPC 10000-3. An audit Event Message also includes the audit record Id. The details of the AuditEventType and its subtypes are defined in OPC 10000-5. A Server that is an aggregating Server that supports auditing shall also subscribe for audit events for all of the Servers that it is aggregating (assuming they provide auditing). The combined stream should be available from the aggregating Server.

This Service Set can be separated into two groups: Services that are called by OPC UA Clients and Services that are invoked by OPC UA Servers. The FindServers and GetEndpoints Services that are called by OPC UA Clients may generate audit entries for failed Service invocations. The RegisterServer Service that is invoked by OPC UA Servers shall generate audit entries for all new registrations and for failed Service invocations. These audit entries shall include the Server URI, Server names, Discovery URIs and isOnline status. Audit entries should not be generated for RegisterServer invocation that does not cause changes to the registered Servers.

All Services in this Service Set for Servers that support auditing may generate audit entries and shall generate audit Events for failed service invocations and for successful invocation of the OpenSecureChannel and CloseSecureChannel Services. The Client generated audit entries should be setup prior to the actual call, allowing the correct audit record Id to be provided. The OpenSecureChannel Service shall generate an audit Event of type AuditOpenSecureChannelEventType or a subtype of it for the requestType ISSUE_0. Audit Events for the requestType RENEW_1 are only created if the renew fails. The CloseSecureChannel service shall generate an audit Event of type AuditChannelEventType or a subtype of it. Both of these Event types are subtypes of the AuditChannelEventType. See OPC 10000-5 for the detailed assignment of the SourceNode, the SourceName and additional parameters. For the failure cases the Message for Events of this type should include a description of why the service failed. This description should be more detailed than what was returned to the client. From a security point of view a Client only needs to know that it failed, but from an Auditing point of view the exact details of the failure need to be known. In the case of Certificate validation errors the description should include the audit EventId of the specific AuditCertificateEventType that was generated to report the Certificate error. The AuditCertificateEventType shall also contain the detailed Certificate validation error. The additional parameters should include the details of the request. It is understood that these events may be generated by the underlying Communication Stacks in many cases, but they shall be made available to the Server and the Server shall report them.

All Services in this Service Set for Servers that support auditing may generate audit entries and shall generate audit Events for both successful and failed Service invocations. These Services shall generate an audit Event of type AuditSessionEventType or a subtype of it. In particular, they shall generate the base EventType or the appropriate subtype, depending on the service that was invoked. The CreateSession service shall generate AuditCreateSessionEventType events or sub-types of it. The ActivateSession service shall generate AuditActivateSessionEventType events or subtypes of it. When the ActivateSession Service is called to change the user identity then the Server shall generate AuditActivateSessionEventType events or subtypes of it. The CloseSession service shall generate AuditSessionEventType events or subtypes of it. It shall always be generated if a Session is terminated like Session timeout expiration or Server shutdown. The SourceName for Events of this type shall be “Session/Timeout” for a Session timeout, “Session/CloseSession” for a CloseSession Service call and “Session/Terminated” for all other cases. See OPC 10000-5 for the detailed assignment of the SourceNode, the SourceName and additional parameters. For the failure case the Message for Events of this type should include a description of why the Service failed. The additional parameters should include the details of the request.

This Service Set shall also generate additional audit events in the cases when Certificate validation errors occur. These audit Events are generated in addition to the AuditSessionEventType Events. See OPC 10000-3 for the definition of AuditCertificateEventType and its subtypes.

For Clients, that support auditing, accessing the services in the Session Service Set shall generate audit entries for both successful and failed invocations of the Service. These audit entries should be setup prior to the actual Service invocation, allowing the invocation to contain the correct audit record id.

All Services in this Service Set for Servers that support auditing may generate audit entries and shall generate audit Events for both successful and failed Service invocations. These Services shall generate an audit Event of type AuditNodeManagementEventType or subtypes of it. See OPC 10000-5 for the detailed assignment of the SourceNode, the SourceName and additional parameters. For the failure case, the Message for Events of this type should include a description of why the service failed. The additional parameters should include the details of the request.

For Clients that support auditing, accessing the Services in the NodeManagement Service Set shall generate audit entries for both successful and failed invocations of the Service. All audit entries should be setup prior to the actual Service invocation, allowing the invocation to contain the correct audit record id.

The Write or HistoryUpdate Services in this Service Set for Servers that support auditing may generate audit entries and shall generate audit Events for both successful and failed Service invocations. These Services shall generate an audit Event of type AuditUpdateEventType or subtypes of it. In particular, the Write Service shall generate an audit event of type AuditWriteUpdateEventType or a subtype of it. The HistoryUpdate Service shall generate an audit Event of type AuditHistoryUpdateEventType or a subtype of it. Three subtypes of AuditHistoryUpdateEventType are defined as AuditHistoryEventUpdateEventType, AuditHistoryValueUpdateEventType and AuditHistoryDeleteEventType. The subtype depends on the type of operation being performed, historical event update, historical data value update or a historical delete. See OPC 10000-5 for the detailed assignment of the SourceNode, the SourceName and additional parameters. For the failure case the Message for Events of this type should include a description of why the Service failed. The additional parameters should include the details of the request.

The Read and HistoryRead Services may generate audit entries and audit Events for failed Service invocations. These Services should generate an audit Event of type AuditEventType or a subtype of it. See OPC 10000-5 for the detailed assignment of the SourceNode, SourceName and additional parameters. The Message for Events of this type should include a description of why the Service failed.

For Clients that support auditing, accessing the Write or HistoryUpdate services in the Attribute Service Set shall generate audit entries for both successful and failed invocations of the Service. Invocations of the other Services in this Service Set may generate audit entries. All audit entries should be setup prior to the actual Service invocation, allowing the invocation to contain the correct audit record id.

All Services in this Service Set for Servers that support auditing may generate audit entries and shall generate audit Events for both successful and failed service invocations if the invocation modifies the AddressSpace, writes a value or modifies the state of the system (alarm acknowledge, batch sequencing or other system changes). These method calls shall generate an audit Event of type AuditUpdateMethodEventType or subtypes of it. Methods that do not modify the AddressSpace, write values or modify the state of the system may generate events. See OPC 10000-5 for the detailed assignment of the SourceNode, SourceName and additional parameters.

For Clients that support auditing, accessing the Method Service Set shall generate audit entries for both successful and failed invocations of the Service, if the invocation modifies the AddressSpace, writes a value or modifies the state of the system (alarm acknowledge, batch sequencing or other system changes). Invocations of the other Methods may generate audit entries. All audit entries should be setup prior to the actual Service invocation, allowing the invocation to contain the correct audit record id.

All of the Services in these four Service Sets only provide the Client with information, with the exception of the TransferSubscriptions Service in the Subscription Service Set. In general, these services will not generate audit entries or audit Event Messages. The TransferSubscriptions Service shall generate an audit Event of type AuditSessionEventType or subtypes of it for both successful and failed Service invocations. See OPC 10000-5 for the detailed assignment of the SourceNode, the SourceName and additional parameters. For the failure case, the Message for Events of this type should include a description of why the service failed.

For Clients that support auditing, accessing the TransferSubscriptions Service in the Subscription Service Set shall generate audit entries for both successful and failed invocations of the Service. Invocations of the other Services in this Service Set do not require audit entries. All audit entries should be setup prior to the actual Service invocation, allowing the invocation to contain the correct audit record id.

OPC UA enables Servers, Clients and networks to be redundant. OPC UA provides the data structures and Services by which Redundancy may be achieved in a standardized manner.

Server Redundancy allows Clients to have multiple sources from which to obtain the same data. Server Redundancy can be achieved in multiple manners, some of which require Client interaction, others that require no interaction from a Client. Redundant Servers could exist in systems without redundant networks or Clients. Redundant Servers could also coexist in systems with network and Client Redundancy. Server Redundancy is formally defined in 6.6.2.

Client Redundancy allows identically configured Clients to behave as if they were single Clients, but not all Clients are obtaining data at a given time. Ideally there should be no loss of information when a Client Failover occurs. Redundant Clients could exist in systems without redundant networks or Servers. Redundant Clients could also coexist in systems with network and Server Redundancy. Client Redundancy is formally defined in 6.6.3.

Network Redundancy allows a Client and Server to have multiple communication paths to obtain the same data. Redundant networks could exist in systems without redundant Servers or Clients. Redundant networks could also coexist in systems with Client and Server Redundancy. Network Redundancy is formally defined in 6.6.4.

There are two general modes of Server Redundancy, transparent and non-transparent.

In transparent Redundancy the Failover of Server responsibilities from one Server to another is transparent to the Client. The Client is unaware that a Failover has occurred and the Client has no control over the Failover behaviour. Furthermore, the Client does not need to perform any actions to continue to send or receive data.

In non-transparent Redundancy the Failover from one Server to another and actions to continue to send or receive data are performed by the Client. The Client must be aware of the Redundant Server Set and must perform the required actions to benefit from the Server Redundancy.

The ServerRedundancy Object defined in OPC 10000-5 indicates the mode supported by the Server. The ServerRedundancyType ObjectType and its subtypes TransparentRedundancyType and NonTransparentRedundancyType defined in OPC 10000-5 specify information for the supported Redundancy mode.

OPC UA Servers that are part of a Redundant Server Set have certain AddressSpace requirements. These requirements allow a Client to consistently access information from Servers in a Redundant Server Set and to make intelligent choices related to the health and availability of Servers in the Redundant Server Set.

Servers in the Redundant Server Set shall have an identical AddressSpace including:

The only Nodes that can differ between Servers in a Redundant Server Set are the Nodes that are in the local Server namespace like the Server diagnostic Nodes. A Client that fails over shall not be required to translate browse paths or otherwise resolve NodeIds. Servers are allowed to add and delete Nodes as long as all Servers in the Redundant Server Set will be updated with the same Node changes.

All Servers in a Redundant Server Set shall be synchronised with respect to time. This may mean installing a NTP service or a PTP service.

There are other important considerations for a redundant system regarding synchronization:

To a Client the transparent Redundant Server Set appears as if it is just a single Server and the Client has no Failover actions to perform. All Servers in the Redundant Server Set have an identical ServerUri and an identical EndpointUrl.

Figure 26 shows a typical transparent Redundancy setup.

image029.png

Figure 26 – Transparent Redundancy setup example

For transparent Redundancy, OPC UA provides data structures to allow Clients to identify which Servers are in the Redundant Server Set, the ServiceLevel of each Server, and which Server is currently responsible for the Client Session. This information is specified in TransparentRedundancyType ObjectType defined in OPC 10000-5. Since the ServerUri is identical for all Servers in the Redundant Server Set, the Servers are identified with a ServerId contained in the information provided in the TransparentRedundancyType Object.

In transparent Redundancy, a Client is not able to control which physical Server it actually connects to. Failover is controlled by the Redundant Server Set and a Client is also not able to actively Failover to another Server in the Redundant Server Set.

All OPC UA interactions within a given Session shall be supported by one Server and the Client is able to identify which Server that is, allowing a complete audit trail for the data. It is the responsibility of the Servers to ensure that information is synchronised between the Servers. A functional Server will take over the Session and Subscriptions from the Failed Server. Failover may require a reconnection of the Client’s SecureChannel but the EndpointUrl of the Server and the ServerUri shall not change. The Client shall be able to continue communication with the Sessions and Subscriptions created on the previously used Server.

Figure 26 provides an abstract view of a transparent Redundant Server Set. The two or more Servers in the Redundant Server Set share a virtual network address and therefore all Servers have the identical EndpointUrl. How this virtual network address is created and managed is vendor specific. There may be special hardware that mediates the network address displayed to the rest of the network. There may be custom hardware, where all components are redundant and Failover at a hardware level automatically. There may even be software based systems where all the transparency is governed completely by software.

For non-transparent Redundancy, OPC UA provides the data structures to allow the Client to identify what Servers are available in the Redundant Server Set and also Server information which tells the Client what modes of Failover the Server supports. This information allows the Client to determine what actions it may need to take in order to accomplish Failover. This information is specified in NonTransparentRedundancyType ObjectType defined in OPC 10000-5.

Figure 27 shows a typical non-transparent Redundancy setup.

image030.png

Figure 27 – Non-Transparent Redundancy setup

For non-transparent Redundancy, the Servers will have unique IP addresses. The Server also has additional Failover modes of Cold, Warm, Hot and HotAndMirrored. The Client must be aware of the Redundant Server Set and shall be required to perform some actions depending on the Failover mode. These actions are described in Table 111 and additional examples and explanations are provided in 6.6.2.4.5.2.for Cold, 6.6.2.4.5.3 for Warm, 6.6.2.4.5.4 for Hot and 6.6.2.4.5.5 for HotAndMirrored.

A Client needs to be able to expect that the SourceTimestamp associated with a value is approximately the same from all Servers in the Redundant Server Set for the same value.

The ServiceLevel provides information to a Client regarding the health of a Server and its ability to provide data. See OPC 10000-5 for a formal definition for ServiceLevel. The ServiceLevel is a byte with a range of 0 to 255, where the values fall into the sub-ranges defined in Table 109.

The algorithm used by a Server to determine its ServiceLevel within each sub-range is Server specific. However, all Servers in a Redundant Server Set shall use the same algorithm to determine the ServiceLevel. All Servers, regardless of Redundant Server Set membership, shall adhere to the sub-ranges defined in Table 109.

Table 109 – ServiceLevel Ranges

Sub-range

Name

Description

0-0

Maintenance

The Failed Server is in maintenance sub-range. Therefore, new Clients shall not connect and currently connected Clients shall disconnect. The Server should expose a target time at which the Clients are able to reconnect. See EstimatedReturnTime defined in OPC 10000-5 for additional information.

A Server that has been set to Maintenance is typically undergoing some maintenance or updates. The main goal for the Maintenance ServiceLevel is to ensure that Clients do not generate load on the Server and allow time for the Server to complete any actions that are required. This load includes even simple connections attempts or monitoring of the ServiceLevel. The EstimatedReturnTime indicates when the Client should check to see if the Server is available. If updates or patches are taking longer than expected the Client may discover that the EstimatedReturnTime has been extended further into the future. If the Server does not provide the EstimatedReturnTime, or if the time has lapsed, the Client should use a much longer interval between reconnects to a Server in the Maintenance sub-range than its normal reconnect interval.

1-1

NoData

The Failed Server is not operational. Therefore, a Client is not able to exchange any information with it. The Server most likely has no data other than ServiceLevel, ServerStatus and diagnostic information available.

A Failed Server in this sub-range has no data available. Clients may connect to it to obtain ServiceLevel, ServerStatus and other diagnostic information. If the underlying system has failed, typically the ServerStatus would indicate COMMUNICATION_FAULT_6. The Client may monitor this Server for a ServerStatus and ServiceLevel change, which would indicate that normal communication could be resumed.

2-199

Degraded

The Server is partially operational, but is experiencing problems such that portions of the AddressSpace are out of service or unavailable. To understand Client options, see Degraded Servers discussion in this section. An example usage of this ServiceLevel sub-range would be if 3 of 10 devices connected to a Server are unavailable.

Servers that report a ServiceLevel in the Degraded sub-range are partially able to service Client requests. The degradation could be caused by loss of connection to underlying systems. Alternatively, it could be that the Server is overloaded to the point that it is unable to reliably deliver data to Clients in a timely manner.

If Clients are experiencing difficulties obtaining required data, they shall switch to another Server if any Servers in the Healthy range are available. If no Servers are available in the Healthy range, then Clients may switch to a Server with a higher ServiceLevel or one that provides the required data. Some Clients may also be configured for higher priority data and may check all Degraded Servers, to see if any of the Servers are able to report as good quality the high priority data, but this functionality would be Client specific. In some cases a Client may connect to multiple Degraded Servers to maximize the available information.

200-255

Healthy

The Server is fully operational. Therefore, a Client can obtain all information from this Server. The sub-range allows a Server to provide information that can be used by Clients to load balance. An example usage of this ServiceLevel sub-range would be to reflect the Server’s CPU load where data is delivered as expected.

Servers in the Healthy ServiceLevel sub-range are able to deliver information in a timely manner. This ServiceLevel may change for internal Server reason or it may be used for load balancing described in 6.6.2.4.3.

Client shall connect to the Server with the highest ServiceLevel. Once connected, the ServiceLevel may change, but a Client shall not Failover to a different Server as long as the ServiceLevel of the Server is accessible and in the Healthy sub-range.

In systems where multiple Hot Servers (see 6.6.2.4.5.4) are available, the Servers in the Redundant Server Set can share the load generated by Clients by setting the ServiceLevel in the Healthy sub-range based on the current load. Clients are expected to connect to the Server with the highest ServiceLevel. Clients shall not Failover to a different Server in the Redundant Server Set of Servers as long as the Server is in the Healthy sub-range. This is the normal behaviour for all Clients, when communicating with redundant Servers. Servers can adjust their ServiceLevel based on the number of Clients that are connected, CPU loading, memory utilization, or any other Server specific criteria.

For example in a system with 3 Servers, all Servers are initially at ServiceLevel 255, but when a Client connects, the Server with the Client connection sets its level to 254. The next Client would connect to a different Server since both of the other Servers are still at 255.

It is up to the Server vendor to define the logic for spreading the load and the number of expected Clients, CPU load or other criteria on each Server before the ServiceLevel is decremented. It is envisioned that some Servers would be able to accomplish this without any communication between the Servers.

The Failover mode of a Server is provided in the ServerRedundancy Object defined in OPC 10000-5. The different Failover modes for non-transparent Redundancy are described in Table 110.

Table 110 – Server Failover Modes

Name

Description

Cold

Cold Failover mode is where only one Server can be active at a time. This may mean that redundant Servers are unavailable (not powered up) or are available but not running (PC is running, but application is not started)

Warm

Warm Failover mode is where the backup Server(s) can be active, but cannot connect to actual data points (typically, a system where the underlying devices are limited to a single connection). Underlying devices, such as PLCs, may have limited resources that permit a single Server connection. Therefore, only a single Server will be able to consume data. The ServiceLevel Variable defined in OPC 10000-5 indicates the ability of the Server to provide its data to the Client.

Hot

Hot Failover mode is where all Servers are powered-on, and are up and running. In scenarios where Servers acquire data from a downstream device, such as a PLC, then one or more Servers are actively connected to the downstream device(s) in parallel. These Servers have minimal knowledge of the other Servers in their group and are independently functioning. When a Server fails or encounters a serious problem then its ServiceLevel drops. On recovery, the Server returns to the Redundant Server Set with an appropriate ServiceLevel to indicate that it is available.

HotAndMirrored

HotAndMirrored Failover mode is where Failovers are for Servers that are mirroring their internal states to all Servers in the Redundant Server Set and more than one Server can be active and fully operational. Mirroring state minimally includes Sessions, Subscriptions, registered Nodes, ContinuationPoints, sequence numbers, and sent Notifications. The ServiceLevel Variable defined in OPC 10000-5 should be used by the Client to find the Servers with the highest ServiceLevel to achieve load balancing.

Each Server maintains a list of ServerUris for all redundant Servers in the Redundant Server Set. The list is provided together with the Failover mode in the ServerRedundancy Object defined in OPC 10000-5. To enable Clients to connect to all Servers in the list, each Server in the list shall provide the ApplicationDescription for all Servers in the Redundant Server Set through the FindServers Service. This information is needed by the Client to translate the ServerUri into information needed to connect to the other Servers in the Redundant Server Set. Therefore a Client needs to connect to only one of the redundant Servers to find the other Servers based on the provided information. A Client should persist information about other Servers in the Redundant Server Set.

Table 111 defines a list of Client actions for initial connections and Failovers.

Table 111 – Redundancy Failover actions

Failover mode and Client options

Cold

Warm

Hot (a)

Hot (b)

HotAndMirrored

On initial connection in addition to actions on Active Server:

Connect to more than one OPC UA Server.

X

X

X

Optional for status check

Create Subscriptions and add monitored items.

X

X

X

Activate sampling on the Subscriptions.

X

X

Activate publishing.

X

At Failover:

OpenSecureChannel to backup OPC UA Server

X

X

CreateSession on backup OPC UA Server

X

ActivateSession on backup OPC UA Server

X

X

Create Subscriptions and add monitored items.

X

Activate sampling on the Subscriptions.

X

X

Activate publishing.

X

X

X

Clients communicating with a non-transparent Redundant Server Set of Servers require some additional logic to be able to handle Server failures and to Failover to another Server in the Redundant Server Set. Figure 28 provides an overview of the steps a Client typically performs when it is first connecting to a Redundant Server Set. The figure does not cover all possible error scenarios.

image031.png

Figure 28 – Client Start-up Steps

The initial Server may be obtained via standard discovery or from a persisted list of Servers in the Redundant Server Set. But in any case the Client needs to check which Server in the Server set it should connect to. Individual actions will depend on the Server Failover mode the Server provides and the Failover mode the Client will make use.

Clients once connected to a redundant Server have to be aware of the modes of Failover supported by a Server since this support affects the available options related to Client behaviour. A Client may always treat a Server using a lesser Failover mode, i.e. for a Server that provides Hot Redundancy, a Client might connect and choose to treat it as if the Server was running in Warm Redundancy or Cold Redundancy. This choice is up to the client. In the case of Failover mode HotAndMirrored, the Client shall not use Failover mode Hot or Warm as it would generate unnecessary load on the Servers.

A Cold Failover mode is where the Client can only connect to one Server at a time. When the Client loses connectivity with the Active Server it will attempt a connection to the redundant Server(s) which may or may not be available. In this situation the Client may need to wait for the redundant Server to become available and then create Subscriptions and MonitoredItems and activate publishing. The Client shall cache any information that is required related to the list of available Servers in the Redundant Server Set. Figure 29 illustrate the action a Client would take if it is talking to a Server using Cold Failover mode.

image032.png

Figure 29 – Cold Failover

Note: There may be a loss of data from the time the connection to the Active Server is interrupted until the time the Client gets Publish Responses from the backup Server.

A Warm Failover mode is where the Client should connect to one or more Servers in the Redundant Server Set primarily to monitor the ServiceLevel. A Client can connect and create Subscriptions and MonitoredItems on more than one Server, but sampling and publishing can only be active on one Server. However, the active Server will return actual data, whereas the other Servers in the Redundant Server Set will return an appropriate error for the MonitoredItems in the Publish response such as Bad_NoCommunication. The one Active Server can be found by reading the ServiceLevel Variable from all Servers. The Server with the highest ServiceLevel is the Active Server. For Failover the Client activates sampling and publishing on the Server with the highest ServiceLevel. Figure 30 illustrates the steps a Client would perform when communicating with a Server using Warm Failover mode.

image033.png

Figure 30 – Warm Failover

Note: There may be a temporary loss of data from the time the connection to the Active Server is interrupted until the time the Client gets Publish Responses from the backup Server.

A Hot Failover mode is where the Client should connect to two or more Servers in the Redundant Server Set and to subscribe to the ServiceLevel variable defined in OPC 10000-5 to find the highest ServiceLevel to achieve load balancing; this means that Clients should issue Service requests such as Browse, Read, Write to the Server with the highest ServiceLevel. Subscription related activities will need to be invoked for each connected Server. Clients have the following choices for implementing subscription behaviour in a Hot Failover mode:

  1. Client connects to multiple Servers and establishes subscription(s) in each where only one is Reporting; the others are Sampling only. The Client should setup the queue size for the MonitoredItems such that it can buffer all changes during the Failover time. The Failover time is the time between the connection interruption and the time the Client gets Publish Responses from the backup Server. On a fail-over the Client must enable Reporting on the Server with the next highest availability.
  2. Client connects to multiple Servers and establishes subscription(s) in each where all subscriptions are Reporting. The Client is responsible for handling/processing multiple subscription streams concurrently.

Figure 31 illustrate the functionality a Client would perform when communicating with a Server using Hot Failover mode (the figure include both (a) and (b) options)

image034.png

Figure 31 – Hot Failover

Clients are not expected to automatically switch over to a Server that has recovered from a failure, but the Client should establish a connection to it.

A HotAndMirrored Failover mode is where a Client only connects to one Server in the Redundant Server Set because the Server will share this session/state information with the other Servers. In order to validate the capability to connect to other redundant Servers it is allowed to create Sessions with other Servers and maintain the open connections by periodically reading the ServiceLevel. A Client shall not create Subscriptions on the backup Servers for status monitoring (to prevent excessive load on the Servers). This mode allows Clients to fail over without creating a new context for communication. On a fail-over the Client will simply create a new SecureChannel on an alternate Server and then call ActivateSession; all Client activities (browsing, subscriptions, history reads, etc.) will then resume. Figure 32 illustrate the behaviour a Client would perform when communicating to a Server in HotAndMirrored Failover mode.

image035.png

Figure 32 – HotAndMirrored Failover

This Failover mode is similar to the transparent Redundancy. The advantage is that the Client has full control over selecting the Server. The disadvantage is that the Client needs to be able to handle Failovers.

A vendor can use the non-transparent Redundancy features to create a Server proxy running on the Client machine to provide transparent Redundancy to the client. This reduces the amount of functionality that needs to be designed into the Client and to enable simpler Clients to take advantage of non-transparent Redundancy. The Server proxy simply duplicates Subscriptions and modifications to Subscriptions, by passing the calls on to both Servers, but only enabling publishing and sampling on one Server. When the proxy detects a failure, it enables publishing and/or sampling on the backup Server, just as the Client would if it were a Redundancy aware Client.

Figure 33 shows the Server proxy used to provide transparent Redundancy.

image036.png

Figure 33 – Server proxy for Redundancy

Client Redundancy is supported in OPC UA by the TransferSubscriptions Service and by exposing Client information in the Server diagnostic information. Since Subscription lifetime is not tied to the Session in which it was created, backup Clients may use standard diagnostic information available to monitor the active Client’s Session with the Server. Upon detection of an active Client failure, a backup Client would then instruct the Server to transfer the Subscriptions to its own session. If the Subscription is crafted carefully, with sufficient resources to buffer data during the change-over, data loss from a Client Failover can be prevented.

OPC UA does not provide a standardized mechanism for conveying the SessionId and SubscriptionIds from the active Client to the backup Clients, but as long as the backup Clients know the Client name of the active Client, this information is readily available using the SessionDiagnostics and SubscriptionDiagnostics portions of the ServerDiagnostics data. This information is available for authorized users and for the user active on the Session. TransferSubscriptions requires the same user on all redundant Clients to succeed.

Redundant networks can be used with OPC UA in either transparent or non-transparent Redundancy.

Network Redundancy can be combined with Server and Client Redundancy.

In the transparent network use-case a single Server Endpoint can be reached through different network paths. This case is completely handled by the network infrastructure. The selected network path and Failover are transparent to the Client and the Server.

image037.png

Figure 34 – Transparent Network Redundancy

Examples:

  • A physical appliance/device such as a router or gateway which automatically changes the network routing to maintain communications.
  • A virtual adapter which automatically changes the network adapter to maintain communications.

In the non-transparent network use-case the Server provides different Endpoints for the different network paths. This requires both the Server and the Client to support multiple network connections. In this case the Client is responsible for selecting the Endpoint and for Failover. For Failover the normal reconnect scenario described in 6.7 can be used. Only the SecureChannel is created with another Endpoint. Sessions and Subscriptions can be reused.

image038.png

Figure 35 – Non-Transparent Network Redundancy

The information about the different network paths is specified in NonTransparentRedundancyType ObjectType defined in OPC 10000-5.

In redundant systems, it is common to require that a particular Server in the Redundant Server Set be taken out of the Redundant Server Set for a period of time. Some items that could cause this may include:

  • Certificate update
  • Security reconfiguration
  • Rebooting or restarting of the machine for
  • software updates and patches
  • installation of new software
  • Reconfiguration of the AddressSpace

The removal from the Redundant Server Set can be done through a complete shutdown or by setting the ServiceLevel of the Server to Maintenance sub-range. This can be done through a Server specific configuration tool or through the Method RequestServerStateChange on the ServerRedundancyType. The Method is formally defined in OPC 10000-5.

This Method requires that the Client provide credentials with administrative rights on the Server.

After a Client establishes a connection to a Server and creates a Subscription, the Client monitors the connection status. Figure 36 shows the steps to connect a Client to a Server and the general logic for reconnect handling. Not all possible error scenarios are covered.

The preferred mechanism for a Client to monitor the connection status is through the keep-alive of the Subscription. A Client should subscribe for the State Variable in the ServerStatus to detect shutdown or other failure states. If no Subscription is created or the Server does not support Subscriptions, the connection can be monitored by periodically reading the State Variable.

image039.png

Figure 36 – Reconnect Sequence

When a Client loses the connection to the Server, the goal is to reconnect without losing information. To do this the Client shall re-establish the connection by creating a new SecureChannel and activating the Session with the Service ActivateSession. This assigns the new SecureChannel to the existing Session and allows the Client to reuse the Session and Subscriptions in the Server. To re-establish the SecureChannel and activate the Session, the Client shall use the same security policy, application instance certificate and the same user credential used to create the original SecureChannel. This will result in the Client receiving data and event Notifications without losing information provided the queues in the MonitoredItems do not overflow.

The Client shall only create a new Session if ActivateSession fails. TransferSubscriptions is used to transfer the Subscription to the new Session. If TransferSubscriptions fails, the Client needs to create a new Subscription.

When the connection is lost, Publish responses may have been sent but not received by the Client.

After re-establishing the connection the Client shall call Republish in a loop, starting with the next expected sequence number and incrementing the sequence number until the Server returns the status Bad_MessageNotAvailable. After receiving this status, the Client shall start sending Publish requests with the normal Publish handling. This sequence ensures that the lost NotificationMessages queued in the Server are not overwritten by new Publish responses.

If the Client detects missing sequence numbers in the Publish and is not able to get the lost NotificationMessages through Republish, the Client should use the Method ResendData or should read the values of all data MonitoredItems to make sure the Client has the latest values for all MonitoredItems.

The Server Object provides a Method ResendData that initiates resending of all data monitored items in a Subscription. This Method is defined in OPC 10000-5. If this Method is called, subsequent Publish responses shall contain the current values of all data MonitoredItems in the Subscription where the MonitoringMode is set to Reporting. If a value is queued for a data MonitoredItem, the next value in the queue is sent in the Publish response. If no value is queued for a data MonitoredItem, the last value sent is repeated in the Publish response. The Server shall verify that the Method is called within the Session context of the Session that owns the Subscription.

Independent of the detailed recovery strategy, the Client should make sure that it does not overwrite newer data in the Client with older values provided through Republish.

If the Republish returns Bad_SubscriptionIdInvalid, then the Client needs to create a new Subscription.

Re-establishing the connection by creating a new SecureChannel may be rejected, because of a new Server Application Instance Certificate or other security errors. In case of security failures, the Client shall use the GetEndpoints Service to fetch the most up to date security information from the Server.

OPC 10000-6 defines a reverse connect mechanism where the Server initiates the logical connection. All subsequent steps like creating a SecureChannel are initiated by the Client. In this scenario the Client is only able to initiate a reconnect if the Server initiates a new logical connection after a connection interruption. The Client side reconnect handling described in Figure 36 applies also to the reverse connect case. A Server is not able to actively check the connection status; therefore the Server shall initiate a new connection in a configurable interval, even if a connection to the Client is established. This ensures that an initiated connection is available for the reconnect handling in addition to other scenarios where the Client needs more than one connection.

MonitoredItems are used to monitor Variable Values for data changes and event notifier Objects for new Events. Subscriptions are used to combine data changes and events of the assigned MonitoredItems to an optimized stream of network messages. A reliable delivery is ensured as long as the lifetime of the Subscription and the queues in the MonitoredItems are long enough for a network interruption between OPC UA Client and Server. All queues that ensure reliable delivery are normally kept in memory and a Server restart would delete them.

There are use cases where OPC UA Clients have no permanent network connection to the OPC UA Server or where reliable delivery of data changes and events is necessary even if the OPC UA Server is restarted or the network connection is interrupted for a longer time.

To ensure this reliable delivery, the OPC UA Server must store collected data and events in non-volatile memory until the OPC UA Client has confirmed reception. It is possible that there will be data lost if the Server is not shut down gracefully or in case of power failure. But the OPC UA Server should store the queues frequently even if the Server is not shut down.

The Method SetSubscriptionDurable defined in OPC 10000-5 is used to set a Subscription into this durable mode and to allow much longer lifetimes and queue sizes than for normal Subscriptions. The Method shall be called before the MonitoredItems are created in the durable Subscription. The Server shall verify that the Method is called within the Session context of the Session that owns the Subscription.

A value of 0 for the parameter lifetimeInHours requests the highest lifetime supported by the Server.

An OPC UA Server providing durable Subscriptions shall

An OPC UA Client using durable Subscriptions shall