The OPC UA Services define a number of mechanisms to meet the security requirements outlined in OPC 10000-2. This clause describes a number of important security-related procedures that OPC UA Applications shall follow.
All OPC UA Applications require an Application Instance Certificate which shall contain the following information:
- The network name or address of the computer where the application runs;
- The name of the organization that administers or owns the application;
- The name of the application;
- The URI of the application instance;
- The validFrom and validTo date for the Certificate.
Application Instance Certificates issued by a Certificate Authority (CA) shall contain the following additional information:
- The name of the Certificate Authority that issued the Certificate;
- The public key issued to the application by the Certificate Authority;
- A digital signature created by the Certificate Authority.
NoteSelf-signed Certificates contain this information but in this case the information is set to itself.
In addition, each Application Instance Certificate has a private key which should be stored in a location that can only be accessed by the application. If this private key is compromised, the administrator shall force the creation of a new Application Instance Certificate and private key by the application.
This Certificate may be generated automatically when the application is installed. In this situation the private key assigned to the Certificate shall be used to create the Certificate signature. Certificates created in this way are called self-signed Certificates.
Manual management and replacement before expiry of self-signed Certificates may be appropriate for a few Clients connected to one Server. In complex communication scenarios a central management of Certificates based on a Certificate Authority is recommended. This includes initial roll-out and automatic updates by a CertificateManager defined in OPC 10000-12.
If the administrator responsible for the application decides that a self-signed Certificate does not meet the security requirements of the organization, then the administrator should install a Certificate issued by a Certification Authority. The steps involved in requesting an Application Instance Certificate from a Certificate Authority are shown in Figure 19.
Figure 19 – Obtaining and installing an Application Instance Certificate
Figure 19 above illustrates the interactions between the application, the Administrator and the Certificate Authority. The application is a OPC UA Application installed on a single machine. The Administrator is the person responsible for managing the machine and the OPC UA Application. The Certificate Authority is an entity that can issue digital Certificates that meet the requirements of the organization deploying the OPC UA Application.
OPC UA defines interfaces and workflows to register OPC UA Applications with a central discovery service and to execute the interaction necessary with a CertificateManager to issue the initial Certificate Authority signed Certificate, The CertificateManager interface includes features to get a TrustList and also Certificate updates from a central place. The Global Discovery Server (GDS) and CertificateManager functionality is defined in OPC 10000-12.
If the Administrator decides that a self-signed Certificate meets the security requirements for the organization, then the Administrator may skip Steps 3 through 5. Application vendors shall ensure that a Certificate is available after the installation process. Every OPC UA Application shall allow the Administrators to replace Application Instance Certificates with Certificates that meet their requirements.
When the Administrator requests a new Certificate from a Certificate Authority, the Certificate Authority may require that the Administrator provide proof of authorization to request Certificates for the organization that will own the Certificate. The exact mechanism used to provide this proof depends on the Certificate Authority.
Vendors should automate the process of acquiring Certificates from an authority using the CertificateManager defined in OPC 10000-12. If this is the case, the Administrator would still go through the steps illustrated in Figure 19, however, the installation program for the application would do them automatically and only prompt the Administrator to provide information about the application instance being installed.
Applications shall never communicate with another application that they do not trust. An Application decides if another application is trusted by checking whether the Application Instance Certificate for the other application is trusted. A Certificate is only trusted if its chain can be validated.
Applications shall rely on lists of Certificates provided by the Administrator to determine trust. There are two separate lists: a list of trusted Certificates and a list of issuer Certificates (i.e. CAs). The list of trusted Certificates may contain a Certificate issued to another Application or it may be a Certificate belonging to a CA. The list of issuer Certificates contains CA Certificates needed for chain validation that are not in the list of trusted Certificates.
When building a chain each Certificate in the chain shall be validated back to a CA with a self-signed Certificate (a.k.a. a root CA). If any validation error occurs then the trust check fails. Some validation errors are non-critical which means they can be suppressed by a user of an Application with the appropriate privileges. Suppressed validation errors are always reported via auditing (i.e. an appropriate Audit event is raised).
Determining trust requires access to all Certificates in the chain. These Certificates may be stored locally or they may be provided with the application Certificate. Processing fails with Bad_SecurityChecksFailed if an element in the chain cannot be found. A Certificate is trusted if the Certificate or at least one of the Certificates in the chain are in the list of trusted Certificates for the Application and the chain is valid.
Table 106 specifies the steps used to validate a Certificate in the order that they shall be followed. These steps are repeated for each Certificate in the chain. Each validation step has a unique error status and audit event type that shall be reported if the check fails. The audit event is in addition to any audit event that was generated for the particular Service that was invoked. The Service audit event in its message text shall include the audit EventId of the AuditCertificateEventType (for more details, see 6.5). Processing halts if an error occurs, unless it is non-critical and it has been suppressed.
ApplicationInstanceCertificates shall not be used in a Client or Server until they have been evaluated and marked as trusted. This can happen automatically by a PKI trust chain or in an offline manner where the Certificate is marked as trusted by an administrator after evaluation.
Table 106 – Certificate validation steps
Step |
Error/AuditEvent |
Description |
Certificate Structure |
Bad_CertificateInvalid Bad_SecurityChecksFailed AuditCertificateInvalidEventType |
The Certificate structure is verified. This error may not be suppressed. If this check fails on the Server side, the error Bad_SecurityChecksFailed shall be reported back to the Client. |
Build Certificate Chain |
Bad_CertificateChainIncomplete Bad_SecurityChecksFailed AuditCertificateInvalidEventType |
The trust chain for the Certificate is created. An error during the chain creation may not be suppressed. If this check fails on the Server side, the error Bad_SecurityChecksFailed shall be reported back to the Client. |
Signature |
Bad_CertificateInvalid Bad_SecurityChecksFailed AuditCertificateInvalidEventType |
A Certificate with an invalid signature shall always be rejected. A Certificate signature is invalid if the Issuer Certificate is unknown. A self-signed Certificate is its own issuer. If this check fails on the Server side, the error Bad_SecurityChecksFailed shall be reported back to the Client. |
Security Policy Check |
Bad_CertificatePolicyCheckFailed Bad_SecurityChecksFailed AuditCertificateInvalidEventType |
A Certificate signature shall comply with the CertificateSignatureAlgorithm, MinAsymmetricKeyLength and MaxAsymmetricKeyLength requirements for the used SecurityPolicy defined in OPC 10000-7. If this check fails on the Server side, the error Bad_SecurityChecksFailed shall be reported back to the Client. This error may be suppressed. |
Trust List Check |
Bad_CertificateUntrusted Bad_SecurityChecksFailed AuditCertificateUntrustedEventType |
If the Application Instance Certificate is not trusted and none of the CA Certificates in the chain is trusted, the result of the Certificate validation shall be Bad_CertificateUntrusted. If this check fails on the Server side, the error Bad_SecurityChecksFailed shall be reported back to the Client. |
Validity Period |
Bad_CertificateTimeInvalid Bad_CertificateIssuerTimeInvalid AuditCertificateExpiredEventType |
The current time shall be after the start of the validity period and before the end. This error may be suppressed. |
Host Name |
Bad_CertificateHostNameInvalid AuditCertificateDataMismatchEventType |
The HostName in the URL used to connect to the Server shall be the same as one of the HostNames specified in the Certificate. This check is skipped for CA Certificates. This check is skipped for Server side validation. This error may be suppressed. |
URI |
Bad_CertificateUriInvalid AuditCertificateDataMismatchEventType |
Application and Software Certificates contain an application or product URI that shall match the URI specified in the ApplicationDescription provided with the Certificate. This check is skipped for CA Certificates. This error may not be suppressed. The gatewayServerUri is used to validate an Application Certificate when connecting to a Gateway Server (see 7.2). |
Certificate Usage |
Bad_CertificateUseNotAllowed Bad_CertificateIssuerUseNotAllowed AuditCertificateMismatchEventType |
Each Certificate has a set of uses for the Certificate (see OPC 10000-6). These uses shall match use requested for the Certificate (i.e. Application, Software or CA). This error may be suppressed unless the Certificate indicates that the usage is mandatory. |
Find Revocation List |
Bad_CertificateRevocationUnknown Bad_CertificateIssuerRevocationUnknown AuditCertificateRevokedEventType |
Each CA Certificate may have a revocation list. This check fails if this list is not available (i.e. a network interruption prevents the application from accessing the list). No error is reported if the Administrator disables revocation checks for a CA Certificate. This error may be suppressed. Bad_SecurityChecksFailed should be reported back to the Client. |
Revocation Check |
Bad_CertificateRevoked Bad_CertificateIssuerRevoked AuditCertificateRevokedEventType |
The Certificate has been revoked and may not be used. This error may not be suppressed. If this check fails on the Server side, the error Bad_SecurityChecksFailed shall be reported back to the Client. |
Certificates are usually placed in a central location called a CertificateStore. Figure 20 illustrates the interactions between the Application, the Administrator and the CertificateStore. The CertificateStore could be on the local machine or in some central server. The exact mechanisms used to access the CertificateStore depend on the application and PKI environment set up by the Administrator.
Figure 20 – Determining if an Application Instance Certificate is trusted
All OPC UA Applications shall establish a SecureChannel before creating a Session. This SecureChannel requires that both applications have access to Certificates that can be used to encrypt and sign Messages exchange. The Application Instance Certificates installed by following the process described in 6.1.2 may be used for this purpose.
The steps involved in establishing a SecureChannel are shown in Figure 21.
Figure 21 – Establishing a SecureChannel
Figure 21 assumes Client and Server have online access to a CertificateA uthority (CA). If online access is not available and if the administrator has installed the CA public key on the local machine, then the Client and Server shall still validate the application Certificates using that key. The figure shows only one CA, however, there is no requirement that the Client and Server Certificates be issued by the same authority. A self-signed Application Instance Certificate does not need to be verified with a CA. Any Certificate shall be rejected if it is not in a TrustList provided by the administrator.
Both the Client and Server shall have a list of Certificates that they have been configured to trust (sometimes called the Certificate Trust List or CTL). These trusted Certificates may be Certificates for Certificate Authorities or they may be OPC UA Application Instance Certificates. OPC UA Applications shall be configured to reject connections with applications that do not have a trusted Certificate.
Certificates can be compromised, which means they should no longer be trusted. Administrators can revoke a Certificate by removing it from the TrustList for all applications or the CA can add the Certificate to the Certificate Revocation List (CRL) for the Issuer Certificate. Administrators may save a local copy of the CRL for each Issuer Certificate when online access is not available.
A Client does not need to call GetEndpoints each time it connects to the Server. This information should change rarely and the Client can cache it locally. If the Server rejects the OpenSecureChannel request the Client should call GetEndpoints and make sure the Server configuration has not changed.
There are two security risks which a Client shall be aware of when using the GetEndpoints Service. The first could come from a rogue Discovery Server that tries to direct the Client to a rogue Server. For this reason the Client shall verify that the ServerCertificate in the EndpointDescription is a trusted Certificate before it calls CreateSession.
The second security risk comes from a third party that alters the contents of the EndpointDescriptions as they are transferred over the network back to the Client. The Client protects itself against this by comparing the list of EndpointDescriptions returned from the GetEndpoints Service with list returned in the CreateSession response.
The exact mechanisms for using the SecurityToken to sign and encrypt Messages exchanged over the SecureChannel are described in OPC 10000-6. The process for renewing tokens is also described in detail in OPC 10000-6.
In many cases, the Certificates used to establish the SecureChannel will be the Application Instance Certificates. However, some Communication Stacks might not support Certificates that are specific to a single application. Instead, they expect all communication to be secured with a Certificate specific to a user or the entire machine. For this reason, OPC UA Applications will need to exchange their Application Instance Certificates when creating a Session.
Once an OPC UA Client has established a SecureChannel with a Server it can create an OPC UA Session.
The steps involved in establishing a Session are shown in Figure 22.
Figure 22 – Establishing a Session
Figure 22 illustrates the interactions between a Client, a Server, a Certificate Authority (CA) and an identity provider. The CA is responsible for issuing the Application Instance Certificates. If the Client or Server does not have online access to the CA, then they shall validate the Application Instance Certificates using the CA public key that the administrator shall install on the local machine.
The identity provider may be a central database that can verify that user token provided by the Client. This identity provider may also tell the Server which access rights the user has. The identity provider depends on the user identity token. It could be a Certificate Authority, an Authorization Service or a proprietary database of some sort.
The Client and Server shall prove possession of their Application Instance Certificates by signing the Certificates with a nonce appended. The exact mechanism used to create the proof of possession signatures is described in 5.7.2. Similarly, the Client shall prove possession by either providing a secret like a password in the user identity token or by creating a signature with the secret associated with a user identity token like X.509 v3.
Once an OPC UA Client has established a Session with a Server it can change the user identity associated with the Session by calling the ActivateSession service.
The steps involved in impersonating a user are shown in Figure 23. The access of the Server to the identity provider is Server-internal and it may be just an access to an internal user database.
Figure 23 – Impersonating a User
ApplicationInstanceCertificates or UserIdentityTokens may expire, get invalid or may be rejected on Client or Server side.
ApplicationInstanceCertificates verification shall be executed every time the SecurityToken is renewed for a SecureChannel. OPC UA Applications may do additional verifications between SecurityToken renews e.g. if the TrustList is updated from a GDS.
If the SecureChannel does not use ApplicationInstanceCertificates, the OPC UA Application should execute ApplicationInstanceCertificate checks for the Session at a rate used for SecureChannel renewals.
The recovery mechanisms for ApplicationInstanceCertificate replacement scenarios are described in 6.7.
OPC UA Application should have internal notification mechanisms to get informed about removal of user identities or should frequently check if the UserIdentityTokens is still valid or if the authorization for a UserIdentityTokens was changed.
Authorization Services provide Access Tokens to Clients on behalf of Users that they pass to a Server to be granted access to resources.
In a basic model (as shown in Figure 22) the Server is responsible for authorization (i.e. deciding what a user can do) while a separate identity provider (e.g. the operating system) is responsible for authentication (deciding who the user is).
In more complex models, the Server relies on external Authorization Services to provide some of its authorization requirements. These Authorization Services act in concert with an external identity provider which validates the user credentials before the external Authorization Service creates an Access Token that tells the Server what the user is a allowed to do. The Client interactions with these services may be indirect as shown in 6.2.2 or direct as shown in 6.2.3.
Even when the Server requires the Client to use an external Authorization Service the Server is still responsible for managing and enforcing the Permissions assigned to Nodes in its Address Space. The clauses below discuss the use of an external Authorization Service in more detail.
Authorization Services (AS) provide access to identity providers which can validate the credentials provided by Clients. They then provide tokens which can be passed to a Server instead of the credentials. These tokens are passed as an IssuedIdentityToken defined in 7.41.6.
The protocol to request tokens depends on the Authorization Service (AS). Common protocols include OAuth2 and OPC UA. OAuth2 supports claims based authorization as described in OPC 10000-2.
Servers publish the Authorization Services (AS) they support in the UserTokenPolicies list return with GetEndpoints. The IssuedTokenType field specifies the protocol used to communicate with the AS. The IssuerEndpointUrl field contains the information needed by the Client to connect to the AS using the protocol required by the AS.
The basic handshake is shown in Figure 24.
Figure 24 – Indirect handshake with an Identity Provider
Authorization Services require that Servers be registered with them because the Access Tokens can only be used with a single Server. This can introduce a lot of complexity for administrators. One way to reduce this complexity is to leverage the Server information that is already managed by a Global Discovery Service (GDS) described in OPC 10000-12. In this model the user identities are still managed by a central Authorization Service. The interactions are shown in Figure 25.
Figure 25 – Direct handshake with an Identity Provider
The UserTokenPolicy returned from the Server provides the URL of the Authorization Service and the identity provider. If the Application Authorization Service is linked with the GDS, it knows of all Servers which have been issued Certificates. The ApplicationUri is used as the identifier for the Server passed to the AS. The identity provider is responsible for managing users known to the system. It validates the credentials provided by the Client and returns an Identity Access Token which identifies the user. The Identity Access Token is passed to the Application Authorization Service which validates the Client and Server applications and creates a new Access Token that can be used to access the Server.
The Session-less Service invocation is introduced for Services, such as Read, Write or Call, that do not require any caller specific state information. It is accessible through the SessionlessInvoke Service which provides the context information required to call Services without a Session.
Session-less invocation is limited to Services of the View Service Set (with exception of RegisterNodes and UnregisterNodes), Attribute Service Set, Method Service Set, NodeManagement Service Set and Query Service Set. If Session-less Service invocation is supported by a Server, all Services belonging to these Service Sets that are supported by a Server via a Session shall also be supported via the SessionlessInvoke Service.
Session-less Services are invoked via a SecureChannel using the Access Token returned from the Authorization Service as the authenticationToken in the requestHeader. The SecureChannel shall have encryption enabled to prevent eavesdroppers from seeing the Access Token. The Access Token provides the user authentication. If application authentication through the SecureChannel is sufficient, Servers may not require the Access Token and assume an anonymous user. In this case the authenticationToken shall be null.
The SessionlessInvoke Messages are just an envelope for the Service to invoke and do not have a RequestHeader and ResponseHeader like other Services. Those parameters are already part of the body which contains the Message for the Service to invoke.
Any Endpoint used for normal communication could be used for Session-less invocation provided the Endpoint supports encryption. The Server returns Bad_ServiceUnsupported if it does not support Session-less invocation for the request specified in the body. If it supports invocation but not with the combination of Endpoint and security settings used it returns Bad_SecurityModeInsufficient.
Servers may expose Endpoints which are only for use with Session-less invocation. These Endpoints shall support GetEndpoints and FindServers in addition to the SessionlessInvoke Service. The Server returns Bad_ServiceUnsupported for the other Services.
A Session ensures that a namespace index or a server index does not change during the lifetime of a Session. This cannot be ensured between Session-less Services invocations. There are two options to ensure the namespace indices in the call match the expected namespace URIs in the Server. One option for the caller is to provide the list of namespace URIs used to build the namespace indices. This works best for single Session-less Service invocations. The second option is to provide the UrisVersion to ensure consistency of namespace arrays between Client and Server. The UrisVersion is first read from the Server together with the NamespaceArray and ServerArray. This reduces the overhead per call for a sequence of Session-less Service invocations.
Table 107 defines the parameters for the Service.
Table 107 – SessionlessInvoke Service Parameters
Name |
Type |
Description |
Request |
|
|
urisVersion |
VersionTime |
The version of the NamespaceArray and the ServerArray used for the Service invocation. The version shall match the value of the UrisVersion Property that defines the version for the URI lists in the NamespaceArray and the ServerArray Properties defined in OPC 10000-5. If the urisVersion parameter does not match the Server's UrisVersion Property, the Server shall return Bad_VersionTimeInvalid. In this case the Client shall read the UrisVersion, NamespaceArray and the ServerArray from the Server Object to repeat the Service invocation with the right version. The VersionTime DataType is defined in 7.44. If the value is 0, the parameter is ignored and the URIs are defined by the namespaceUris and serverUris parameters in request and response. If the value is non-zero, the namespaceUris and serverUris parameters in the request are ignored by the Server and set to null or empty arrays in the response. |
namespaceUris [] |
String |
A list of URIs referenced by NodeIds or QualifiedNames in the request. NamespaceIndex 0 shall not be in this list. The first entry in this list is NamespaceIndex 1. The parameter shall be ignored by the Server if the urisVersion is not 0. |
serverUris [] |
String |
A list of URIs referenced by ExpandedNodeIds in the request. ServerIndex 0 shall not be in this list. The first entry in this list is ServerIndex 1. The parameter shall be ignored by the Server if the urisVersion is not 0. |
localeIds [] |
LocaleId |
List of locale ids to use. See locale negotiation in 5.4 which applies to this Service. |
serviceId |
UInt32 |
The numeric identifier assigned to the Service request DataType NodeId describing the body. |
body |
* |
The body of the request. The body is an embedded structure containing the corresponding Service request for the serviceId. |
Response |
|
|
namespaceUris [] |
String |
A list of URIs referenced by NodeIds or QualifiedNames in the response. NamespaceIndex 0 shall not be in this list. The first entry in this list is NamespaceIndex 1. An empty array shall be returned if the urisVersion is not 0. |
serverUris [] |
String |
A list of URIs referenced by ExpandedNodeIds in the response. ServerIndex 0 shall not be in this list. The first entry in this list is ServerIndex 1. An empty array shall be returned if the urisVersion is not 0. |
serviceId |
UInt32 |
The numeric identifier assigned to the Service response DataType NodeId describing the body. |
body |
* |
The body of the response. The body is an embedded structure containing the corresponding Service response for the serviceId. |
Table 108 defines the Service results specific to this Service. Common StatusCodes are defined in Table 182.
Table 108 – SessionlessInvoke Service Result Codes
Symbolic Id |
Description |
Bad_VersionTimeInvalid |
The provided version time is no longer valid. |
NoteDetails on SoftwareCertificates will be defined in a future version of this document.
Auditing is a requirement in many systems. It provides a means of tracking activities that occur as part of normal operation of the system. It also provides a means of tracking abnormal behaviour. It is also a requirement from a security standpoint. For more information on the security aspects of auditing, see OPC 10000-2. Subclause 6.5 describes what is expected of an OPC UA Server and Client with respect to auditing and it details the audit requirements for each service set. Auditing can be accomplished using one or both of the following methods:
- The OPC UA Application that generates the audit event can log the audit entry in a log file or other storage location;
- The OPC UA Server that generates the audit event can publish the audit event using the OPC UA event mechanism. This allows an external OPC UA Client to subscribe to and log the audit entries to a log file or other storage location.
Each OPC UA Service request contains a string parameter that is used to carry an audit record id. A Client or any Server operating as a Client, such as an aggregating Server, can create a local audit log entry for a request that it submits. This parameter allows this Client to pass the identifier for this entry with the request. If this Server also maintains an audit log, it should include this id in its audit log entry that it writes. When this log is examined and that entry is found, the examiner will be able to relate it directly to the audit log entry created by the Client. This capability allows for traceability across audit logs within a system.
A Server that maintains an audit log shall provide the audit log entries via Event Messages. The AuditEventType and its sub-types are defined in OPC 10000-3. An audit Event Message also includes the audit record Id. The details of the AuditEventType and its subtypes are defined in OPC 10000-5. A Server that is an aggregating Server that supports auditing shall also subscribe for audit events for all of the Servers that it is aggregating (assuming they provide auditing). The combined stream should be available from the aggregating Server.
This Service Set can be separated into two groups: Services that are called by OPC UA Clients and Services that are invoked by OPC UA Servers. The FindServers and GetEndpoints Services that are called by OPC UA Clients may generate audit entries for failed Service invocations. The RegisterServer Service that is invoked by OPC UA Servers shall generate audit entries for all new registrations and for failed Service invocations. These audit entries shall include the Server URI, Server names, Discovery URIs and isOnline status. Audit entries should not be generated for RegisterServer invocation that does not cause changes to the registered Servers.
All Services in this Service Set for Servers that support auditing may generate audit entries and shall generate audit Events for failed service invocations and for successful invocation of the OpenSecureChannel and CloseSecureChannel Services. The Client generated audit entries should be setup prior to the actual call, allowing the correct audit record Id to be provided. The OpenSecureChannel Service shall generate an audit Event of type AuditOpenSecureChannelEventType or a subtype of it for the requestType ISSUE. Audit Events for the requestType RENEW are only created if the renew fails. The CloseSecureChannel service shall generate an audit Event of type AuditChannelEventType or a subtype of it. Both of these Event types are subtypes of the AuditChannelEventType. See OPC 10000-5 for the detailed assignment of the SourceNode, the SourceName and additional parameters. For the failure cases the Message for Events of this type should include a description of why the service failed. This description should be more detailed than what was returned to the Client. From a security point of view a Client only needs to know that it failed, but from an Auditing point of view the exact details of the failure need to be known.
In the case of Certificate validation errors the CertificateErrorEventId of the AuditOpenSecureChannelEventType should include the audit EventId of the specific AuditCertificateEventType that was generated to report the Certificate error. The AuditCertificateEventType shall also contain the detailed Certificate validation error. The additional parameters should include the details of the request. It is understood that these events may be generated by the underlying Communication Stacks in many cases, but they shall be made available to the Server and the Server shall report them.
All Services in this Service Set for Servers that support auditing may generate audit entries and shall generate audit Events for both successful and failed Service invocations. These Services shall generate an audit Event of type AuditSessionEventType or a subtype of it. In particular, they shall generate the base EventType or the appropriate subtype, depending on the service that was invoked. The CreateSession service shall generate AuditCreateSessionEventType events or sub-types of it. The ActivateSession service shall generate AuditActivateSessionEventType events or subtypes of it. When the ActivateSession Service is called to change the user identity then the Server shall generate AuditActivateSessionEventType events or subtypes of it. The CloseSession service shall generate AuditSessionEventType events or subtypes of it. It shall always be generated if a Session is terminated like Session timeout expiration or Server shutdown. The SourceName for Events of this type shall be “Session/Timeout” for a Session timeout, “Session/CloseSession” for a CloseSession Service call and “Session/Terminated” for all other cases. See OPC 10000-5 for the detailed assignment of the SourceNode, the SourceName and additional parameters. For the failure case the Message for Events of this type should include a description of why the Service failed. The additional parameters should include the details of the request.
This Service Set shall also generate additional audit events in the cases when Certificate validation errors occur. These audit Events are generated in addition to the AuditSessionEventType Events. See OPC 10000-3 for the definition of AuditCertificateEventType and its subtypes.
For Clients, that support auditing, accessing the services in the Session Service Set shall generate audit entries for both successful and failed invocations of the Service. These audit entries should be setup prior to the actual Service invocation, allowing the invocation to contain the correct audit record id.
All Services in this Service Set for Servers that support auditing may generate audit entries and shall generate audit Events for both successful and failed Service invocations. These Services shall generate an audit Event of type AuditNodeManagementEventType or subtypes of it. See OPC 10000-5 for the detailed assignment of the SourceNode, the SourceName and additional parameters. For the failure case, the Message for Events of this type should include a description of why the service failed. The additional parameters should include the details of the request.
For Clients that support auditing, accessing the Services in the NodeManagement Service Set shall generate audit entries for both successful and failed invocations of the Service. All audit entries should be setup prior to the actual Service invocation, allowing the invocation to contain the correct audit record id.
The Write or HistoryUpdate Services in this Service Set for Servers that support auditing may generate audit entries and shall generate audit Events for both successful and failed Service invocations. These Services shall generate an audit Event of type AuditUpdateEventType or subtypes of it. In particular, the Write Service shall generate an audit event of type AuditWriteUpdateEventType or a subtype of it. The HistoryUpdate Service shall generate an audit Event of type AuditHistoryUpdateEventType or a subtype of it. Three subtypes of AuditHistoryUpdateEventType are defined as AuditHistoryEventUpdateEventType, AuditHistoryValueUpdateEventType and AuditHistoryDeleteEventType. The subtype depends on the type of operation being performed, historical event update, historical data value update or a historical delete. See OPC 10000-5 for the detailed assignment of the SourceNode, the SourceName and additional parameters. For the failure case the Message for Events of this type should include a description of why the Service failed. The additional parameters should include the details of the request.
The Read and HistoryRead Services may generate audit entries and audit Events for failed Service invocations. These Services should generate an audit Event of type AuditEventType or a subtype of it. See OPC 10000-5 for the detailed assignment of the SourceNode, SourceName and additional parameters. The Message for Events of this type should include a description of why the Service failed.
For Clients that support auditing, accessing the Write or HistoryUpdate services in the Attribute Service Set shall generate audit entries for both successful and failed invocations of the Service. Invocations of the other Services in this Service Set may generate audit entries. All audit entries should be setup prior to the actual Service invocation, allowing the invocation to contain the correct audit record id.
All Services in this Service Set for Servers that support auditing may generate audit entries and shall generate audit Events for both successful and failed service invocations if the invocation modifies the AddressSpace, writes a value or modifies the state of the system (alarm acknowledge, batch sequencing or other system changes). These method calls shall generate an audit Event of type AuditUpdateMethodEventType or subtypes of it. Methods that do not modify the AddressSpace, write values or modify the state of the system may generate events. See OPC 10000-5 for the detailed assignment of the SourceNode, SourceName and additional parameters.
For Clients that support auditing, accessing the Method Service Set shall generate audit entries for both successful and failed invocations of the Service, if the invocation modifies the AddressSpace, writes a value or modifies the state of the system (alarm acknowledge, batch sequencing or other system changes). Invocations of the other Methods may generate audit entries. All audit entries should be setup prior to the actual Service invocation, allowing the invocation to contain the correct audit record id.
All of the Services in these four Service Sets only provide the Client with information, with the exception of the TransferSubscriptions Service in the Subscription Service Set. In general, these services will not generate audit entries or audit Event Messages. The TransferSubscriptions Service shall generate an audit Event of type AuditSessionEventType or subtypes of it for both successful and failed Service invocations. See OPC 10000-5 for the detailed assignment of the SourceNode, the SourceName and additional parameters. For the failure case, the Message for Events of this type should include a description of why the service failed.
For Clients that support auditing, accessing the TransferSubscriptions Service in the Subscription Service Set shall generate audit entries for both successful and failed invocations of the Service. Invocations of the other Services in this Service Set do not require audit entries. All audit entries should be setup prior to the actual Service invocation, allowing the invocation to contain the correct audit record id.
OPC UA enables Servers, Clients and networks to be redundant. OPC UA provides the data structures and Services by which Redundancy may be achieved in a standardized manner.
Server Redundancy allows Clients to have multiple sources from which to obtain the same data. Server Redundancy can be achieved in multiple manners, some of which require Client interaction, others that require no interaction from a Client. Redundant Servers could exist in systems without redundant networks or Clients. Redundant Servers could also coexist in systems with network and Client Redundancy. Server Redundancy is formally defined in 6.6.2.
Client Redundancy allows identically configured Clients to behave as if they were single Clients, but not all Clients are obtaining data at a given time. Ideally there should be no loss of information when a Client Failover occurs. Redundant Clients could exist in systems without redundant networks or Servers. Redundant Clients could also coexist in systems with network and Server Redundancy. Client Redundancy is formally defined in 6.6.3.
Network Redundancy allows a Client and Server to have multiple communication paths to obtain the same data. Redundant networks could exist in systems without redundant Servers or Clients. Redundant networks could also coexist in systems with Client and Server Redundancy. Network Redundancy is formally defined in 6.6.4.
There are two general modes of Server Redundancy, transparent and non-transparent.
In transparent Redundancy the Failover of Server responsibilities from one Server to another is transparent to the Client. The Client is unaware that a Failover has occurred and the Client has no control over the Failover behaviour. Furthermore, the Client does not need to perform any actions to continue to send or receive data.
In non-transparent Redundancy the Failover from one Server to another and actions to continue to send or receive data are performed by the Client. The Client shall be aware of the RedundantServerSet and shall perform the required actions to benefit from the Server Redundancy.
The ServerRedundancy Object defined in OPC 10000-5 indicates the mode supported by the Server. The ServerRedundancyType ObjectType and its subtypes TransparentRedundancyType and NonTransparentRedundancyType defined in OPC 10000-5 specify information for the supported Redundancy mode.
OPC UA Servers that are part of a RedundantServerSet have certain AddressSpace requirements. These requirements allow a Client to consistently access information from Servers in a RedundantServerSet and to make intelligent choices related to the health and availability of Servers in the RedundantServerSet.
Servers in the RedundantServerSet shall have an identical AddressSpace including:
- identical NodeIds
- identical browse paths and structure of the AddressSpace
- identical logic for setting the ServiceLevel
The only Nodes that can differ between Servers in a RedundantServerSet are the Nodes that are in the local Server namespace like the Server diagnostic Nodes. A Client that fails over shall not be required to translate browse paths or otherwise resolve NodeIds. Servers are allowed to add and delete Nodes as long as all Servers in the RedundantServerSet will be updated with the same Node changes.
All Servers in a RedundantServerSet shall be synchronized with respect to time. This may mean installing a NTP service or a PTP service.
There are other important considerations for a redundant system regarding synchronization:
- EventIds:Each UA Server in a Transparent (6.6.2.3) and HotAndMirrored (6.6.2.4.5.5) RedundantServerSet shall synchronize EventIds to prevent a Client from mistakenly processing the same event multiple times simply because the EventIds are different. This is very important for Alarms & Conditions. For Cold, Warm, and Hot RedundantServerSets Clients shall be able to handle EventIds that are not synchronized. Following any Failover the Client shall call ConditionRefresh defined in OPC 10000-9.
- Timestamp (Source/Server):If a Server is exposing data from a downstream device (PLC, DCS etc.) then the SourceTimestamp and ServerTimestamp reported by all redundant Servers should match as closely as possible. Clients should favour the use of the SourceTimestamp.
- ContinuationPoints:Behaviour of continuation points does not change, in that Clients shall be prepared for lost continuation points. Servers in Transparent and HotAndMirrored Redundancy sets shall synchronize continuation points and they may do so in other modes.
To a Client the transparent RedundantServerSet appears as if it is just a single Server and the Client has no Failover actions to perform. All Servers in the RedundantServerSet have an identical ServerUri and an identical EndpointUrl.
Figure 26 shows a typical transparent Redundancy setup.
Figure 26 – Transparent Redundancy setup example
For transparent Redundancy, OPC UA provides data structures to allow Clients to identify which Servers are in the RedundantServerSet, the ServiceLevel of each Server, and which Server is currently responsible for the Client Session. This information is specified in TransparentRedundancyType ObjectType defined in OPC 10000-5. Since the ServerUri is identical for all Servers in the RedundantServerSet, the Servers are identified with a ServerId contained in the information provided in the TransparentRedundancyType Object.
In transparent Redundancy, a Client is not able to control which physical Server it actually connects to. Failover is controlled by the RedundantServerSet and a Client is also not able to actively Failover to another Server in the RedundantServerSet.
All OPC UA interactions within a given Session shall be supported by one Server and the Client is able to identify which Server that is, allowing a complete audit trail for the data. It is the responsibility of the Servers to ensure that information is synchronized between the Servers. A functional Server will take over the Session and Subscriptions from the Failed Server. Failover may require a reconnection of the Client’s SecureChannel but the EndpointUrl of the Server and the ServerUri shall not change. The Client shall be able to continue communication with the Sessions and Subscriptions created on the previously used Server.
Figure 26 provides an abstract view of a transparent RedundantServerSet. The two or more Servers in the RedundantServerSet share a virtual network address and therefore all Servers have the identical EndpointUrl. This includes all other EndpointDescriptions content like identical Certificates and security settings. How this virtual network address is created and managed is vendor specific. There may be special hardware that mediates the network address displayed to the rest of the network. There may be custom hardware, where all components are redundant and Failover at a hardware level automatically. There may even be software based systems where all the transparency is governed completely by software.
For non-transparent Redundancy, OPC UA provides the data structures to allow the Client to identify what Servers are available in the RedundantServerSet and also Server information which tells the Client what modes of Failover the Server supports. This information allows the Client to determine what actions it may need to take in order to accomplish Failover. This information is specified in NonTransparentRedundancyType ObjectType defined in OPC 10000-5.
The Servers in the non-transparent RedundantServerSet shall use the ServerCapability NTRS defined in OPC 10000-12 for discovery including the registration with a GlobalDiscoveryServer.
Figure 27 shows a typical non-transparent Redundancy setup.
Figure 27 – Non-Transparent Redundancy setup
For non-transparent Redundancy, the Servers will have unique IP addresses and unique ApplicationUris. The Server also has additional Failover modes of Cold, Warm, Hot and HotAndMirrored. The Client shall be aware of the RedundantServerSet and shall be required to perform some actions depending on the Failover mode. These actions are described in Table 111 and additional examples and explanations are provided in 6.6.2.4.5.2.for Cold, 6.6.2.4.5.3 for Warm, 6.6.2.4.5.4 for Hot and 6.6.2.4.5.5 for HotAndMirrored.
A Client needs to be able to expect that the SourceTimestamp associated with a value is approximately the same from all Servers in the RedundantServerSet for the same value.
The ServiceLevel provides information to a Client regarding the health of a Server and its ability to provide data. See OPC 10000-5 for a formal definition for ServiceLevel. The ServiceLevel is a byte with a range of 0 to 255, where the values fall into the sub-ranges defined in Table 109.
The algorithm used by a Server to determine its ServiceLevel within each sub-range is Server specific. However, all Servers in a RedundantServerSet shall use the same algorithm to determine the ServiceLevel. All Servers, regardless of RedundantServerSet membership, shall adhere to the sub-ranges defined in Table 109.
Table 109 – ServiceLevel ranges
Sub-range |
Name |
Description |
0-0 |
Maintenance |
The Failed Server is in maintenance sub-range. Therefore, new Clients shall not connect and currently connected Clients shall disconnect. The Server should expose a target time at which the Clients are able to reconnect. See EstimatedReturnTime defined in OPC 10000-5 for additional information. A Server that has been set to Maintenance is typically undergoing some maintenance or updates. The main goal for the Maintenance ServiceLevel is to ensure that Clients do not generate load on the Server and allow time for the Server to complete any actions that are required. This load includes even simple connections attempts or monitoring of the ServiceLevel. The EstimatedReturnTime indicates when the Client should check to see if the Server is available. If updates or patches are taking longer than expected the Client may discover that the EstimatedReturnTime has been extended further into the future. If the Server does not provide the EstimatedReturnTime, or if the time has lapsed, the Client should use a much longer interval between reconnects to a Server in the Maintenance sub-range than its normal reconnect interval. |
1-1 |
NoData |
The Failed Server is not operational. Therefore, a Client is not able to exchange any information with it. The Server most likely has no data other than ServiceLevel, ServerStatus and diagnostic information available. A Failed Server in this sub-range has no data available. Clients may connect to it to obtain ServiceLevel, ServerStatus and other diagnostic information. If the underlying system has failed, typically the ServerStatus would indicate COMMUNICATION_FAULT. The Client may monitor this Server for a ServerStatus and ServiceLevel change, which would indicate that normal communication could be resumed. |
2-199 |
Degraded |
The Server is partially operational, but is experiencing problems such that portions of the AddressSpace are out of service or unavailable. An example usage of this ServiceLevel sub-range would be if 3 of 10 devices connected to a Server are unavailable. Servers that report a ServiceLevel in the Degraded sub-range are partially able to service Client requests. The degradation could be caused by loss of connection to underlying systems or functioning in a mode like a backup Server which results in less than full functionality being available. Alternatively, it could be that the Server is overloaded to the point that it is unable to reliably deliver data to Clients in a timely manner. If Clients are experiencing difficulties obtaining required data, they shall switch to another Server if any Servers in the Healthy range are available. If no Servers are available in the Healthy range, then Clients may switch to a Server with a higher ServiceLevel or one that provides the required data. Some Clients may also be configured for higher priority data and may check all Degraded Servers, to see if any of the Servers are able to report as good quality the high priority data, but this functionality would be Client specific. In some cases a Client may connect to multiple Degraded Servers to maximize the available information. |
200-255 |
Healthy |
The Server is fully operational. Therefore, a Client can obtain all information from this Server. The sub-range allows a Server to provide information that can be used by Clients to load balance. An example usage of this ServiceLevel sub-range would be to reflect the Server’s CPU load where data is delivered as expected. Servers in the Healthy ServiceLevel sub-range are able to deliver information in a timely manner. This ServiceLevel may change for internal Server reason or it may be used for load balancing described in 6.6.2.4.3. Client shall connect to the Server with the highest ServiceLevel. Once connected, the ServiceLevel may change, but a Client shall not Failover to a different Server as long as the ServiceLevel of the Server is accessible and in the Healthy sub-range. |
In systems where multiple Hot Servers (see 6.6.2.4.5.4) are available, the Servers in the RedundantServerSet can share the load generated by Clients by setting the ServiceLevel in the Healthy sub-range based on the current load. Clients are expected to connect to the Server with the highest ServiceLevel. Clients shall not Failover to a different Server in the RedundantServerSet of Servers as long as the Server is in the Healthy sub-range. This is the normal behaviour for all Clients, when communicating with redundant Servers. Servers can adjust their ServiceLevel based on the number of Clients that are connected, CPU loading, memory utilization, or any other Server specific criteria.
For example in a system with 3 Servers, all Servers are initially at ServiceLevel 255, but when a Client connects, the Server with the Client connection sets its level to 254. The next Client would connect to a different Server since both of the other Servers are still at 255.
It is up to the Server vendor to define the logic for spreading the load and the number of expected Clients, CPU load or other criteria on each Server before the ServiceLevel is decremented. It is envisioned that some Servers would be able to accomplish this without any communication between the Servers.
The Failover mode of a Server is provided in the ServerRedundancy Object defined in OPC 10000-5. The different Failover modes for non-transparent Redundancy are described in Table 110.
Table 110 – Server Failover modes
Name |
Description |
Cold |
Cold Failover mode is where only one Server can be active at a time. This may mean that redundant Servers are unavailable (not powered up) or are available but not running (PC is running, but application is not started) |
Warm |
Warm Failover mode is where the backup Server(s) can be active, but is not operating in a mode which delivers the same level of functionality available from the primary Server. For example it cannot connect to actual data points (typically, a system where the underlying devices are limited to a single connection). Underlying devices, such as PLCs, may have limited resources that permit a single Server connection. Therefore, only a single Server will be able to consume data. The ServiceLevel Variable defined in OPC 10000-5 and the sub-range defined in Table 109 indicates the ability of the Server to provide its data to the Client. The ServiceLevel of the primary Server will be in the Healthy ServiceLevel sub-range. The ServiceLevel of the available backup Server will be in the Degraded ServiceLevel sub-range. |
Hot |
Hot Failover mode is where all Servers are powered-on, and are up and running. In scenarios where Servers acquire data from a downstream device, such as a PLC, then one or more Servers are actively connected to the downstream device(s) in parallel. These Servers have minimal knowledge of the other Servers in their group and are independently functioning. When a Server fails or encounters a serious problem then its ServiceLevel drops. On recovery, the Server returns to the RedundantServerSet with an appropriate ServiceLevel to indicate that it is available. |
HotAndMirrored |
HotAndMirrored Failover mode is where Failovers are for Servers that are mirroring their internal states to all Servers in the RedundantServerSet and more than one Server can be active and fully operational. Mirroring state minimally includes Sessions, Subscriptions, registered Nodes, ContinuationPoints, sequence numbers, and sent Notifications. The ServiceLevel Variable defined in OPC 10000-5 should be used by the Client to find the Servers with the highest ServiceLevel to achieve load balancing. |
Each Server maintains a list of ServerUris for all redundant Servers in the RedundantServerSet. The list is provided together with the Failover mode in the ServerRedundancy Object defined in OPC 10000-5. To enable Clients to connect to all Servers in the list, each Server in the list shall provide the ApplicationDescription for all Servers in the RedundantServerSet through the FindServers Service. This information is needed by the Client to translate the ServerUri into information needed to connect to the other Servers in the RedundantServerSet. Therefore a Client needs to connect to only one of the redundant Servers to find the other Servers based on the provided information. A Client should persist information about other Servers in the RedundantServerSet.
Table 111 defines a list of Client actions for initial connections and Failovers.
Table 111 – Redundancy Failover actions
Failover mode and Client options |
Cold |
Warm |
Hot (a) |
Hot (b) |
HotAndMirrored |
On initial connection in addition to actions on Active Server: |
|
|
|
|
|
Connect to more than one OPC UA Server. |
|
X |
X |
X |
Optional for status check |
Create Subscriptions and add monitored items. |
|
X |
X |
X |
|
Activate sampling on the Subscriptions. |
|
|
X |
X |
|
Activate publishing. |
|
|
|
X |
|
At Failover: |
|
|
|
|
|
OpenSecureChannel to backup OPC UA Server |
X |
|
|
|
X |
CreateSession on backup OPC UA Server |
X |
|
|
|
|
ActivateSession on backup OPC UA Server |
X |
|
|
|
X |
Create Subscriptions and add monitored items. |
X |
|
|
|
|
Activate sampling on the Subscriptions. |
X |
X |
|
|
|
Activate publishing. |
X |
X |
X |
|
|
Clients communicating with a non-transparent RedundantServerSet of Servers require some additional logic to be able to handle Server failures and to Failover to another Server in the RedundantServerSet. Figure 28 provides an overview of the steps a Client typically performs when it is first connecting to a RedundantServerSet. The figure does not cover all possible error scenarios.
Figure 28 – Client Start-up steps
The initial Server may be obtained via standard discovery or from a persisted list of Servers in the RedundantServerSet. But in any case the Client needs to check which Server in the Server set it should connect to. Individual actions will depend on the Server Failover mode the Server provides and the Failover mode the Client will use.
Clients once connected to a redundant Server shall be aware of the modes of Failover supported by a Server since this support affects the available options related to Client behaviour. A Client may always treat a Server using a lesser Failover mode, i.e. for a Server that provides Hot Redundancy, a Client might connect and choose to treat it as if the Server was running in Warm Redundancy or Cold Redundancy. This choice is up to the Client. In the case of Failover mode HotAndMirrored, the Client shall not use Failover mode Hot or Warm as it would generate unnecessary load on the Servers.
A Cold Failover mode is where the Client can only connect to one Server at a time. When the Client loses connectivity with the Active Server it will attempt a connection to the redundant Server(s) which may or may not be available. In this situation the Client may need to wait for the redundant Server to become available and then create Subscriptions and MonitoredItems and activate publishing. The Client shall cache any information that is required related to the list of available Servers in the RedundantServerSet. Figure 29 illustrates the action a Client would take if it is talking to a Server using Cold Failover mode. The monitor connection logic is defined in 6.7.
NOTE There may be a loss of data from the time the connection to the Active Server is interrupted until the time the Client gets Publish Responses from the backup Server.
A Warm Failover mode is where the Client should connect to one or more Servers in the RedundantServerSet primarily to monitor the ServiceLevel. A Client can connect and create Subscriptions and MonitoredItems on more than one Server, but sampling and publishing can only be active on one Server. However, the active Server will return actual data, whereas the other Servers in the RedundantServerSet will return an appropriate error for the MonitoredItems in the Publish response such as Bad_NoCommunication. The one Active Server can be found by reading the ServiceLevel Variable from all Servers. The Server with the highest ServiceLevel is the Active Server. For Failover the Client activates sampling and publishing on the Server with the highest ServiceLevel. Figure 30 illustrates the steps a Client would perform when communicating with a Server using Warm Failover mode. The monitor connection logic is defined in 6.7.
NOTE There may be a temporary loss of data from the time the connection to the Active Server is interrupted until the time the Client gets Publish Responses from the backup Server.
A Hot Failover mode is where the Client should connect to two or more Servers in the RedundantServerSet and to subscribe to the ServiceLevel variable defined in OPC 10000-5 to find the highest ServiceLevel to achieve load balancing; this means that Clients should issue Service requests such as Browse, Read, Write to the Server with the highest ServiceLevel. Subscription related activities will need to be invoked for each connected Server. Clients have the following choices for implementing Subscription behaviour in a Hot Failover mode:
- The Client connects to multiple Servers and establishes Subscription(s) in each where only one is Reporting; the others are Sampling only. The Client should setup the queue size for the MonitoredItems such that it can buffer all changes during the Failover time. The Failover time is the time between the connection interruption and the time the Client gets Publish Responses from the backup Server. On a Failover the Client shall enable Reporting on the Server with the next highest availability.
- The Client connects to multiple Servers and establishes Subscription(s) in each where all Subscriptions are Reporting. The Client is responsible for handling/processing multiple Subscription streams concurrently.
Figure 31 illustrate the functionality a Client would perform when communicating with a Server using Hot Failover mode (the figure include both (a) and (b) options). The monitor connection logic is defined in 6.7.
Clients are not expected to automatically switch over to a Server that has recovered from a failure, but the Client should establish a connection to it.
A HotAndMirrored Failover mode is where a Client only connects to one Server in the RedundantServerSet because the Server will share this session/state information with the other Servers. In order to validate the capability to connect to other redundant Servers it is allowed to create Sessions with other Servers and maintain the open connections by periodically reading the ServiceLevel. A Client shall not create Subscriptions on the backup Servers for status monitoring (to prevent excessive load on the Servers). This mode allows Clients to fail over without creating a new context for communication. On a Failover the Client will simply create a new SecureChannel on an alternate Server and then call ActivateSession; all Client activities (browsing, subscriptions, history reads, etc.) will then resume. Figure 32 illustrate the behaviour a Client would perform when communicating to a Server in HotAndMirrored Failover mode. The monitor connection logic is defined in 6.7.
Figure 32 – HotAndMirrored Failover
This Failover mode is similar to the transparent Redundancy. The advantage is that the Client has full control over selecting the Server. The disadvantage is that the Client needs to be able to handle Failovers.
A vendor can use the non-transparent Redundancy features to create a Server proxy running on the Client machine to provide transparent Redundancy to the Client. This reduces the amount of functionality that needs to be designed into the Client and to enable simpler Clients to take advantage of non-transparent Redundancy. The Server proxy simply duplicates Subscriptions and modifications to Subscriptions, by passing the calls on to both Servers, but only enabling publishing and sampling on one Server. When the proxy detects a failure, it enables publishing and/or sampling on the backup Server, just as the Client would if it were a Redundancy aware Client.
Figure 33 shows the Server proxy used to provide transparent Redundancy.
Figure 33 – Server proxy for Redundancy
Client Redundancy is supported in OPC UA by the TransferSubscriptions Service and by exposing Client information in the Server diagnostic information. Since Subscription lifetime is not tied to the Session in which it was created, backup Clients may use standard diagnostic information available to monitor the active Client’s Session with the Server. Upon detection of an active Client failure, a backup Client would then instruct the Server to transfer the Subscriptions to its own session. If the Subscription is crafted carefully, with sufficient resources to buffer data during the change-over, data loss from a Client Failover can be prevented.
OPC UA does not provide a standardized mechanism for conveying the SessionId and SubscriptionIds from the active Client to the backup Clients, but as long as the backup Clients know the Client name of the active Client, this information is readily available using the SessionDiagnostics and SubscriptionDiagnostics portions of the ServerDiagnostics data. This information is available for authorized users and for the user active on the Session. TransferSubscriptions requires the same user on all redundant Clients to succeed.
Redundant networks can be used with OPC UA in either transparent or non-transparent Redundancy.
Network Redundancy can be combined with Server and Client Redundancy.
In the transparent network use-case a single Server Endpoint can be reached through different network paths. This case is completely handled by the network infrastructure. The selected network path and Failover are transparent to the Client and the Server. Transparent network Redundancy is illustrated in Figure 34.
Figure 34 – Transparent network Redundancy
Examples:
- A physical appliance/device such as a router or gateway which automatically changes the network routing to maintain communications.
- A virtual adapter which automatically changes the network adapter to maintain communications.
In the non-transparent network use-case the Server provides different Endpoints for the different network paths. This requires both the Server and the Client to support multiple network connections. In this case the Client is responsible for selecting the Endpoint and for Failover. For Failover the normal reconnect scenario described in 6.7 can be used. Only the SecureChannel is created with another Endpoint. Sessions and Subscriptions can be reused. Non-transparent network Redundancy is illustrated in Figure 35.
Figure 35 – Non-transparent network Redundancy
The information about the different network paths is specified in NonTransparentRedundancyType ObjectType defined in OPC 10000-5.
In redundant systems, it is common to require that a particular Server in the RedundantServerSet be taken out of the RedundantServerSet for a period of time. Some items that could cause this may include:
- Certificate update
- Security reconfiguration
- Rebooting or restarting of the machine for
- software updates and patches
- installation of new software
- Reconfiguration of the AddressSpace
The removal from the RedundantServerSet can be done through a complete shutdown or by setting the ServiceLevel of the Server to Maintenance sub-range. This can be done through a Server specific configuration tool or through the Method RequestServerStateChange on the ServerType. The Method is formally defined in OPC 10000-5.
This Method requires that the Client provide credentials with administrative rights on the Server.
After a Client establishes a connection to a Server and creates a Subscription, the Client monitors the connection status. Figure 36 shows the steps to connect a Client to a Server and the general logic for reconnect handling. Not all possible error scenarios are covered.
The preferred mechanism for a Client to monitor the connection status is through the keep-alive of the Subscription. A Client should subscribe for the State Variable in the ServerStatus to detect shutdown or other failure states. If no Subscription is created or the Server does not support Subscriptions, the connection can be monitored by periodically reading the State Variable.
Figure 36 – Reconnect sequence
When a Client loses the connection to the Server, the goal is to reconnect without losing information. To do this the Client shall re-establish the connection by creating a new SecureChannel and activating the Session with the Service ActivateSession. If the OpenSecureChannel fails, the Client should delay the retry for a configurable time. The ActivateSession assigns the new SecureChannel to the existing Session and allows the Client to reuse the Session and Subscriptions in the Server. To re-establish the SecureChannel and activate the Session, the Client shall use the same security policy, application instance certificate and the same user credential used to create the original SecureChannel. This will result in the Client receiving data and event Notifications without losing information provided the queues in the MonitoredItems do not overflow.
The Client shall only create a new Session if ActivateSession fails. TransferSubscriptions is used to transfer the Subscription to the new Session. If TransferSubscriptions fails, the Client needs to create a new Subscription.
When the connection is lost, Publish responses may have been sent but not received by the Client.
After re-establishing the connection the Client shall call Republish in a loop, starting with the next expected sequence number and incrementing the sequence number until the Server returns the status Bad_MessageNotAvailable. After receiving this status, the Client shall start sending Publish requests with the normal Publish handling. This sequence ensures that the lost NotificationMessages queued in the Server are not overwritten by new Publish responses.
If the Client detects missing sequence numbers in the Publish and is not able to get the lost NotificationMessages through Republish, the Client should use the Method ResendData or should read the values of all data MonitoredItems to make sure the Client has the latest values for all MonitoredItems. ResendData allows the Client to resync its cache by receiving the current value for each MonitoredItem. The data should be sent in the next regular PublishingInterval.
The Server Object provides a Method ResendData that initiates resending of all data monitored items in a Subscription. This Method is defined in OPC 10000-5. If this Method is called, subsequent Publish responses shall contain the current value for each data MonitoredItem in the Subscription where the MonitoringMode is set to Reporting. If a value is queued for a data MonitoredItem, the next value in the queue is sent in the Publish response. If no value is queued for a data MonitoredItem, the last value sent is repeated in the Publish response. The Server shall verify that the Method is called within the Session context of the Session that owns the Subscription.
Independent of the detailed recovery strategy, the Client should make sure that it does not overwrite newer data in the Client with older values provided through Republish.
If the Republish returns Bad_SubscriptionIdInvalid, then the Client needs to create a new Subscription.
Re-establishing the connection by creating a new SecureChannel may be rejected, because of a new Server Application Instance Certificate or other security errors. OpenSecureChannel returns Bad_CertificateInvalid in the case of a new Server Application Instance Certificate. In case of security failures, the Client shall use the GetEndpoints Service to fetch the most up to date security information from the Server.
If the Client Application Instance Certificate is updated, the Client shall create a new Session since the Session does not allow a update of the Client Application Instance Certificate. The Client shall try to transfer existing Subscriptions to the new Session. Transfer subscription shall be accepted by a Server even for Anonymous user if the Client does not change i.e. the ApplicationUri of the Client does not change and a secure connection is used.
OPC 10000-6 defines a reverse connect mechanism where the Server initiates the logical connection. All subsequent steps like creating a SecureChannel are initiated by the Client. In this scenario the Client is only able to initiate a reconnect if the Server initiates a new logical connection after a connection interruption. The Client side reconnect handling described in Figure 36 applies also to the reverse connect case. A Server is not able to actively check the connection status; therefore the Server shall initiate a new connection in a configurable interval, even if a connection to the Client is established. This ensures that an initiated connection is available for the reconnect handling in addition to other scenarios where the Client needs more than one connection.
MonitoredItems are used to monitor Variable Values for data changes and event notifier Objects for new Events. Subscriptions are used to combine data changes and events of the assigned MonitoredItems to an optimized stream of network messages. A reliable delivery is ensured as long as the lifetime of the Subscription and the queues in the MonitoredItems are long enough for a network interruption between OPC UA Client and Server. All queues that ensure reliable delivery are normally kept in memory and a Server restart would delete them.
There are use cases where OPC UA Clients have no permanent network connection to the OPC UA Server or where reliable delivery of data changes and events is necessary even if the OPC UA Server is restarted or the network connection is interrupted for a longer time.
To ensure this reliable delivery, the OPC UA Server shall keep collected data and events in until the OPC UA Client has confirmed reception. It is possible that there will be data lost if the Server is not shut down gracefully or in case of power failure. But the OPC UA Server should persist the queues frequently even if the Server is not shut down.
The Method SetSubscriptionDurable defined in OPC 10000-5 is used to set a Subscription into this durable mode and to allow much longer lifetimes and queue sizes than for normal Subscriptions. The Method shall be called before the MonitoredItems are created in the durable Subscription. The Server shall verify that the Method is called within the Session context of the Session that owns the Subscription.
A value of 0 for the parameter lifetimeInHours requests the highest lifetime supported by the Server.
The revisedLifetimeInHours is used to set the LifetimeCount of the Subscription.
ModifySubscription can be used to change the parameters of the durable Subscription. If the Client would like to keep the previous life time setting, the Client needs to calculate the LifetimeCount based on the revisedLifetimeInHours and the PublishingInterval. ModifySubscription does not change the durable mode of the Subscription.
An OPC UA Server providing durable Subscriptions shall
- Support the SetSubscriptionDurable Method defined in OPC 10000-5
- Support Service TransferSubscriptions
- Support long Subscription lifetimes, minimum requirements are define in OPC 10000-7
- Support large MonitoredItem queues, minimum requirements are define in OPC 10000-7
- Store Subscriptions settings and sent notification messages with sequence numbers
- Store MonitoredItem settings and queues
An OPC UA Client using durable Subscriptions shall
- Use the SetSubscriptionDurable Method defined in OPC 10000-5 to create a durable Subscription
- Close Sessions for planned communication interruptions
- Use the Service TransferSubscriptions to assign the durable Subscription to a new Session for data transfer
- Store SubscriptionId, MonitoredItem client and server handles and the last confirmed sequence number