Implementation and deployment considerations – OPC Unified Architecture

6 Implementation and deployment considerations

6.1 Overview

Clause 6 provides guidance to vendors that implement OPC UA Applications. Since many of the countermeasures required to address the threats described above fall outside the scope of the OPC UA, the advice in Clause 6 suggests how some of those countermeasures should be provided.

For each of the following areas, Clause 6 defines the problem space, identifies consequences if appropriate countermeasures are not implemented and recommends best practices.

6.2 Appropriate timeouts:

Timeouts, the time that the implementation waits (usually for an event such as Message arrival), play a very significant role in influencing the security of an implementation. Potential consequences include

Denial of service: Denial of service conditions could exist when a Client does not reset a Session, if the timeouts are very large.

Resource consumption: When a Client is idle for long periods of time, the Server keeps the Client’s buffered Message or information for that period, leading to resource exhaustion.

The implementer should use reasonable timeouts for each connection stage.

6.3 Strict Message processing

The specifications often specify the format of the correct Messages and are silent on what the implementation should do for Messages that deviate from the specification. Typically, the implementations continue to parse such packets, leading to vulnerabilities.

The implementer should do strict checking of the Message format and should either drop the packets or send an error Message as described below.

Error handling uses the error code, defined in OPC 10000-4, which most precisely fits the condition and only when returning an error code is appropriate. Error codes can be used as an attack vector; thus, their uses shall be limited as described in Part 4 Service Behaviours clause. Once the SecureChannel has been established then appropriate specific error codes are returned.

Another attack vector that can be used is timing variations; this is minimized by the description in Part 4 that requires the closing of the socket for any errors when establishing a SecureChannel. Vendors should be careful in their implementation to ensure that all paths that result in the closure of the socket do not provide a timing hint indicating which failure path was encountered. This can be accomplished by having a random delay before closing the socket or before returning a generic error code.

All array lengths, string lengths and recursion depth should be strictly enforced and processed.

6.4 Random number generation

Random numbers that meet security needs can be generated by suitable functions that are provided by cryptography libraries. Common random functions such as using rand() provided by the “C” standard library do not generate enough entropy. As an alternative, implementers could use the random number generators provided by the Microsoft Windows Crypto library (WinCrypt library) or by OpenSSL. Even the random functions provided in cryptography libraries require a source of entropy to initialize and the required entropy is not always available on embedded devices. PCs can use several individual pieces of information (hardware ids like CPU, MAC addresses, USB devices, screen resolution, installed software, etc.) to generate entropy, but embedded devices are built completely identically. Often only the time and possibly a MAC address is left for entropy. These sources of entropy can be guessed or discovered. This makes the embedded devices very vulnerable.

A common mistake is to generate cryptographic keys during the first boot. Thus even the time information is predictable (creation time is stored e.g. in a certificate). Some alternate solutions a vendor could want to consider:

Add specific entropy generator hardware when designing embedded devices.

Do not generate certificates on embedded devices. Use an external tool or the GDS to generate the certificate and load it onto the device. A problem could still remain for the SymmetricKeys, as these are normally not created directly during the boot phase; rather they are created when a client connects.

Wait long enough until enough entropy information is available. Some operating systems provide hints when they have reached this point.

For embedded systems without a good entropy source, it is helpful to store the cryptographic pseudo-random number generator (CPRNG) state, so that it will not produce the same random numbers after every boot.

Vendor should ensure that cryptographic functions they use are initialized with suitable entropy and that the generated certificates are not created in a predictable manner.

6.5 Special and reserved packets

The implementation understands and correctly interprets any Message types that are reserved as special (such as broadcast and multicast addresses in IP specification). Failing to understand and interpret those special packets leads to vulnerabilities.

6.6 Rate limiting and flow control

OPC UA does not provide rate control mechanisms, however an implementation can incorporate rate control.

6.7 Administrative access

OPC UA describes that certain functionality, such as the management of CertificateStores, should be restricted to administrators. This multi-part standard does not describe the details associated with administrative access. The nature of administrative access varies from platform to platform. Some platforms only have a single administrator. Other platforms provide multiple levels of administrative access such as backup administrator, network administrator, configuration administrator etc. The deployment site should make appropriate selections for administrator access and the implementer should allow for the configuration of appropriate administrator account access.

Administrative AccessRestrictions include items such as configuration files for Servers and Clients. For example, configuration files could contain paths to certificate stores or exposed endpoints both of which if changed could cause major issues.

Administrative AccessRestrictions should also be used to control Audit Events, see 4.14 for additional details.

6.8 Cryptographic Keys

Security Profiles listed in Part 7 describe required algorithms and required key lengths. Key length requirements are often specified as a set, i.e., 2048, 3072, 4096 bits. It is important that an OPC UA Application supports the entire set of values for its ApplicationInstanceCertificate. This allows an end user to generate a key (ApplicationInstanceCertificate) that meets their security requirements. This often extends the period of time for which the given Security profile can be used. For example, key lengths of 2048 can already be considered insecure, but if an end user generates certificates for the high end of the set (4096), the application could still be considered secure (depending on the other algorithms).

6.9 Alarm related guidance

OPC UA supports a robust Alarm and Condition information model which includes the ability to disable alarms, shelve alarms, and to generally manage alarms. Alarm processing and management is an important part of maintaining efficient control of a plant. From a security point of view it is important that this avenue be adequately protected, to ensure that a rogue agent does not create a dangerous or financial situation. OPC UA provides the tools required for this protection, but it is up to the implementer to ensure that they are exercised correctly. All functions that allow changes to the running environment are able to generate Audit Events and are to be restricted to appropriate users.

The disabling of Alarms is one such function that should be restricted to personnel with appropriate access rights. Furthermore, any action that disables an alarm, whether it be initiated by personnel or some automated system, should generate an Audit Event indicating the action.

The shelving of alarms should follow similar guideline as the disabling of alarms with regard to access and Auditing, although it is often available to a wider range of users (operators, engineers). Also, the implementer should ensure that appropriate timeouts are configured for Alarm Shelving. These timeouts should ensure that an Alarm cannot be shelved for a period of time that could cause safety concerns.

Dialog Events could also be used to overload a Client. It would be a best practice for Servers that support dialogs to restrict the number of concurrent dialogs that could be active. Also, Dialogs should include some timeout period to ensure that they are not used to create a DOS. Client implementers should also ensure that any dialog processing cannot be used to overwhelm an operator. The maximum number of open dialogs should be restricted and dialogs should be able to be ignored (i.e. other processing should still be available).

6.10 Program access

OPC UA describes functionality that allows for programs to be executed as part of the OPC UA Server. These programs can be used to perform advanced control algorithms or other actions. The use of these actions should be restricted to personnel with appropriate access rights. Furthermore, the definition of programs should be carefully monitored. It is recommended that statistics be maintained regarding the number of defined programs in addition to their execution frequency. This information is available to administrative personnel. An unlimited number of program executions should not be allowed.

6.11 Audit event management

OPC 10000-3 describes Audit Events that are to be generated and the information that these Audit Events include as a minimum. However, the specification does not describe how these Audit Events are handled once they are generated. Audit Events can be subscribed to by multiple Audit tracking systems or logging systems. The OPC UA specification does not describe these systems. It is assumed that any number of vendor provided systems could provide this functionality. As a best practice whatever system is used to store and manage, Audit Events should ensure the following:

that Audit Events are not tampered with once they are received,

the Subscription for Audit Events should be via a SecureChannel to ensure they are not tampered with while in transition,

for Clients that log audit events; it is recommended that the logged audit events be persisted in such a manner that the audit events can be authenticated and linked to the original transaction.

An Audit event management system could have additional requirements based on the site CSMS.

6.12 OAuth2, JWT and User roles

OAuth2 defines a standard for AuthorizationServices that produce JSON Web Tokens (JWT), also known as AccessTokens. These JWTs are passed as an Issued Token to an OPC UA Server which uses the signature contained in the JWT to validate the token. JWT can also provide information to the Server regarding the roles associated with the Authenticated user. The enforcement of the roles is the responsibility of the Server. OPC 10000-4, OPC 10000-5, OPC 10000-6 and OPC 10000-18 describes OAuth2 and JWTs in more detail. Sites should ensure that they follow the best practices defined in the site CSMS for OAuth2.

If a GDS is available in the system, it could provide AuthorizationServices as defined in OPC 10000-12.

6.13 HTTPS, TLS & Websockets

HTTPS defines a standard transport security. This transport security does not always ensure end to end security. Proxy servers or other intermediaries can exist. If end to end security is required then additional steps such as a VPN should be taken.

If TLS communication is supported, the keys used for TLS must be different then the keys for TCP communication. Reusing the keys introduces security issues. Versions of TLS older than 1.2 (TLS 1.2) have security flaws and should not be enabled. It is recommended to only support TLS configurations provided in the TransportSecurity Profiles.

SSL has security issues and should be disabled. It is important that it is disabled for all applications on the machine not just for the UA application.

Websockets is just another protocol that is secured using HTTPS. If using Websockets, all of the security guideline for HTTPS and TLS should be followed.

6.14 Reverse Connect

Reverse connect allows a Server to initiate the connection to a Client (open the socket sending a HEL message). This results in an additional security concern for the Client, in that the Client needs to validate that the connection is from an appropriate Server and not a denial of service attack. The Client follows the process described in Part 6 “Client and Server Handshaking during Reverse Connect” table, including checks related to the ServerUri and EndpointUrl.

6.15 Passwords

This document describes one option for user security as username/password. If username / passwords are used, they should follow site specific rules and passwords shall be secured both in transit and in storage. Usernames should be able to be changed. Passwords shall not be hardcoded as part of an application. They shall be able to be managed by administrative users. Passwords should follow the password complexity and timeout rules associated with a site CSMS.

6.16 Additional Security considerations

If an OPC UA Application becomes aware of compromised credentials, which could be application level or user level credentials, the application should terminate any connection using the compromised credentials. The compromised credential could be determined via a GDS or other global service or they could be detected by some out of band process.

6.17 Least privilege principle

When a Client connects to a Server, the Client should be granted the minimum privileges that it requires to function. In OPC UA a Client can request additional privileges by changing the UserIdentityToken (see Activate Session in OPC 10000-4). This could even be done for a short period of time. Roles such as SecurityAdmin or ConfigureAdmin should not be granted to a user except when the user is actively performing duties associated with that Role.

6.18 Zero trust environments

The concept of zero trust is an environment where the network is not trusted and all application and communication between them needs to be approved (i.e., Authenticated and Authorized). Zero trust environments do not rely on perimeter defences. Many of the key concepts described in zero trust follow key concepts describe in this document. For a more complete overview of the core principles in zero trust see ZeroTrustCore.

OPC UA, with its built-in security capabilities, is a very good fit for a zero trust environment. The capability to assign permissions down to individual Nodes, the ability to provide both application level and user level authentication, and support for central management of Authorization and Authentication (GDS functionality), are all concepts desired in a zero-trust environment. Another key tenant of a zero trust architecture is the concept of least-privilege, which can easily be applied using OPC UA.

Some key concepts related to a zero trust network is that the network is not trusted and that devices on the network are not trusted.

A key point is that information that is flowing between the enterprise network and non-enterprise network needs to have consistent security policies. Furthermore, for a zero trust architecture additional safe guards should be in-place like diagnostics and monitoring systems, network logging, access policies, a PKI infrastructure and User identification systems. For additional details on the architecture of zero trust network see ZeroTrustArchitecture.

OPC UA is designed to operate in a multi-vendor environment, where devices from many vendors (not all of which would be trusted) could be operating. The hardware and software on these devices could be owned by the enterprise or they could be owned by others. OPC UA is designed to assign trust as needed, not inheritably trusting any device. Having standardized security policies and settings (as defined in OPC Security Policies) provides a consistent security policy and posture.

In zero trust architecture, OPC UA Auditing would be required as an integral part of a continuous diagnostics system. The individual privileges and roles that are available in OPC UA can be part of the data access policies. The support for a GDS in all Servers and Client allows an Enterprise PKI system to be deployed. The GDS can be linked to identity management systems.

The key point is that even though OPC UA is not a complete zero trust environment, it provides many of the required aspects of a zero trust environment.

6.19 Diagnostic related issues

Diagnostics are an important tool in troubleshooting problems in a Server, Client or system, but it is important that security sensitive information not be provided as part of diagnostic information. Security information shall only be available to security Administrators. Providing security related information via diagnostics to non-security personnel can provide information that can be used to compromise a system.

In addition, diagnostics can provide trace information describing the overall structure of Server. This type of diagnostic shall only be provided to Authenticated Clients.

6.20 Changing Users in OPC UA

OPC UA via the ActivateSession Service allows a Client to change the user that is involved with the Session. This Service can have security related implications.

Developers have to ensure that when a user context changes that all existing activities switch to the new context. Furthermore, in multi-threaded environments, when an ActivateSession request is received by a Server, it should stop processing new Service calls until the Server has completed any user change. For Services like Read or Browse, the Server needs to ensure that any Service call that were issued under the old user context are completed using that context and that the new context is only applied to Service calls that are issued after the user context change. For the Publish Service (part of a Subscription Services), it is important that security checks are applied to all monitored items if the user context has changed (as described in OPC 10000-4) which could result in a MonitoredItem returning Bad_AccessDenied.