Clause 6 provides guidance to vendors that implement OPC UA Applications. Since many of the countermeasures required to address the threats described above fall outside the scope of the OPC UA specification, the advice in Clause 6 suggests how some of those countermeasures should be provided.
For each of the following areas, Clause 6 defines the problem space, identifies consequences if appropriate countermeasures are not implemented and recommends best practices.
Timeouts, the time that the implementation waits (usually for an event such as Message arrival), play a very significant role in influencing the security of an implementation. Potential consequences include
- Denial of service: Denial of service conditions could exist when a Client does not reset a Session, if the timeouts are very large.
- Resource consumption: When a Client is idle for long periods of time, the Server keeps the Client’s buffered Message or information for that period, leading to resource exhaustion.
The implementer should use reasonable timeouts for each connection stage.
The specifications often specify the format of the correct Messages and are silent on what the implementation should do for Messages that deviate from the specification. Typically, the implementations continue to parse such packets, leading to vulnerabilities.
- The implementer should do strict checking of the Message format and should either drop the packets or send an error Message as described below.
- Error handling uses the error code, defined in OPC 10000-4, which most precisely fits the condition and only when returning an error code is appropriate. Error codes can be used as an attack vector; thus, their uses shall be limited as described in Part 4 Service Behaviours clause. Once the SecureChannel has been established then appropriate specific error codes are returned.
- Another attack vector that can be used is timing variations; this is minimized by the description in Part 4 that requires the closing of the socket for any errors when establishing a SecureChannel. Vendors should be careful in their implementation to ensure that all paths that result in the closure of the socket do not provide a timing hint indicating which failure path was encountered. This can be accomplished by having a random delay before closing the socket or before returning a generic error code.
- All array lengths, string lengths and recursion depth should be strictly enforced and processed.
Random numbers that meet security needs can be generated by suitable functions that are provided by cryptography libraries. Common random functions such as using rand() provided by the “C” standard library do not generate enough entropy. As an alternative, implementers could use the random number generators provided by the Microsoft Windows Crypto library (WinCrypt library) or by OpenSSL. Even the random functions provided in cryptography libraries require a source of entropy to initialize and the required entropy is not always available on embedded devices. PCs can use several individual pieces of information (hardware ids like CPU, MAC addresses, USB devices, screen resolution, installed software ...) to generate entropy, but embedded devices are built completely identically. Often only the time and possibly a MAC address is left for entropy. These sources of entropy can be guessed or discovered. This makes the embedded devices very vulnerable.
A common mistake is to generate cryptographic keys during the first boot. Thus even the time information is predictable (creation time is stored e.g. in a certificate). Some alternate solutions a vendor could want to consider:
- Add specific entropy generator hardware when designing embedded devices.
- Do not generate certificates on embedded devices. Use an external tool or the GDS to generate the certificate and load it onto the device. A problem could still remain for the SymmetricKeys, as these are normally not created directly during the boot phase; rather they are created when a client connects.
- Wait long enough until enough entropy information is available. Some operating systems provide hints when they have reached this point.
- For embedded systems without a good entropy source, it is helpful to store the cryptographic pseudo-random number generator (CPRNG) state, so that it will not produce the same random numbers after every boot.
Vendor should ensure that cryptographic functions they use are initialized with suitable entropy and that the generated certificates are not created in a predictable manner.
The implementation understands and correctly interprets any Message types that are reserved as special (such as broadcast and multicast addresses in IP specification). Failing to understand and interpret those special packets leads to vulnerabilities.
OPC UA does not provide rate control mechanisms, however an implementation can incorporate rate control.
OPC UA describes that certain functionality, such as the management of CertificateStores, should be restricted to administrators. This Multi-part standard does not describe the details associated with administrative access. The nature of administrative access varies from platform to platform. Some platforms only have a single administrator. Other platforms provide multiple levels of administrative access such as backup administrator, network administrator, configuration administrator etc. The deployment site should make appropriate selections for administrator access and the implementer should allow for the configuration of appropriate administrator account access.
Administrative access restrictions include items such as configuration files for Servers and Clients. For example, configuration files could contain paths to certificate stores or exposed endpoints both of which if changed could cause major issues.
Administrative access should also be used to control Audit Events, see 4.14 for additional details.
Security Profiles listed in Part 7 describe required algorithms and required key lengths. Key length requirements are often specified as a range, i.e., 1024-2048. It is important that an OPC UA Application supports the entire range for its ApplicationInstanceCertificate. This allows an end user to generate a key (ApplicationInstanceCertificate) that meets their security requirements. This often extends the period of time for which the given Security profile can be used. For example, key lengths less than 2048 are already considered insecure, but if an end user generates certificates for the high end of the range (2048), the application could still be considered secure (depending on the other algorithms).
OPC UA supports a robust Alarm and Condition information model which includes the ability to disable alarms, shelve alarms, and to generally manage alarms. Alarm processing and management is an important part of maintaining efficient control of a plant. From a security point of view it is important that this avenue be adequately protected, to ensure that a rogue agent does not create a dangerous or financial situation. OPC UA provides the tools required for this protection, but the implementer needs to ensure that they are exercised correctly. All functions that allow changes to the running environment are able to generate Audit Events and are to be restricted to appropriate users.
The disabling of Alarms is one such function that should be restricted to personnel with appropriate access rights. Furthermore, any action that disables an alarm, whether it be initiated by personnel or some automated system, should generate an Audit Event indicating the action.
The shelving of alarms should follow similar guideline as the disabling of alarms with regard to access and Auditing, although it is often available to a wider range of users (operators, engineers). Also, the implementer should ensure that appropriate timeouts are configured for Alarm Shelving. These timeouts should ensure that an Alarm cannot be shelved for a period of time that could cause safety concerns.
Dialog Events could also be used to overload a Client. It would be a best practice for Servers that support dialogs to restrict the number of concurrent dialogs that could be active. Also, Dialogs should include some timeout period to ensure that they are not used to create a DOS. Client implementers should also ensure that any dialog processing cannot be used to overwhelm an operator. The maximum number of open dialogs should be restricted and dialogs should be able to be ignored (i.e. other processing should still be available).
OPC UA describes functionality that allows for programs to be executed as part of the OPC UA Server. These programs can be used to perform advanced control algorithms or other actions. The use of these actions should be restricted to personnel with appropriate access rights. Furthermore, the definition of Programs should be carefully monitored. It is recommended that statistics be maintained regarding the number of defined programs in addition to their execution frequency. This information is available to administrative personnel. In no case should an unlimited number of program executions be allowed.
The OPC UA specification describes Audit Events that are to be generated and the information that these Audit Events include as a minimum, however, the specification does not describe how these Audit Events are handled once they are generated. Audit Events can be subscribed to by multiple Audit tracking systems or logging systems. The OPC UA specification does not describe these systems. It is assumed that any number of vendor provided systems could provide this functionality. As a best practice whatever system is used to store and manage, Audit Events should ensure the following:
- That Audit Events are not tampered with once they are received.
- The Subscription for Audit Events should be via a SecureChannel to ensure they are not tampered with while in transition.
- For Clients that log audit events; it is recommended that the logged audit events be persisted in such a manner that the audit events can be authenticated and linked to the original transaction.
An Audit event management system could have additional requirements based on the site CSMS.
OAuth2 defines a standard for Authorization Services that produce JSON Web Tokens (JWT), also known as AccessTokens. These JWTs are passed as an Issued Token to an OPC UA Server which uses the signature contained in the JWT to validate the token. JWT can also provide information to the Server regarding the roles associated with the Authenticated user. The enforcement of the roles is the responsibility of the Server. OPC 10000-4, OPC 10000-5, OPC 10000-6 and OPC 10000-18 describes OAuth2 and JWTs in more detail. Sites should ensure that they follow the best practices defined in the site CSMS for OAuth2.
If a GDS is available in the system, it could provide Authorization Services as defined in OPC 10000-12.
HTTPs defines a standard transport security. This transport security does not always ensure end to end security. Proxy servers or other intermediaries can exist. If end to end security is required then additional step such as a VPN should be taken.
If TLS communication is supported, the keys used for TLS must be different then the keys for TCP communication. Reusing the keys introduces security issues. Versions of TLS older than 1.2 have security flaws and should not be enabled. It is recommended to only support TLS configurations provided in the TransportSecurity Profiles.
SSL has security issues and should be disabled. It is important that it is disabled for all applications on the machine not just for the UA application.
Websockets is just another protocol that is secured using HTTPS. If using Websockets, all of the security guideline for HTTPs and TLS should be followed.
Reverse connect allows a Server to initiate the connection to a Client (open the socket sending a HEL message). This results in an additional security concern for the Client, in that the Client needs to validate that the connection is from an appropriate Server and not a denial of service attack. The Client follows the process described in Part 6 “Client and Server Handshaking during Reverse Connect” table, including checks related to the ServerUri and EndpointUrl.
This standard describes one option for user security as username/password. If username / passwords are used, they should follow site specific rules and passwords shall be secured both in transit and in storage. Usernames should be able to be changed. Passwords shall not be hardcoded as part of an application. They shall be able to be managed by administrative users. Passwords should follow the password complexity and timeout rules associated with a site CSMS.
If an OPC UA Application becomes aware of compromised credentials, which could be application level or user level credentials, the application should terminate any connection using the compromised credentials. The compromised credential could be determined via a GDS or other global service or they could be detected by some out of band process.
When a Client connects to a Server, the Client should be granted the minimum privileges that it requires to function. In OPC UA a Client can request additional privileges by changing the UserIdentityToken (see Activate Session in OPC 10000-4). This could even be done for a short period of time. Roles such as SecurityAdmin or ConfigureAdmin should not be granted to a user except when the user is actively performing duties associated with that Role.
The concept of zero trust is an environment where the network is not trusted and all application and communication between them needs to be approved (i.e., Authenticated and Authorized). Zero trust environments do not rely on perimeter defences. Many of the key concepts described in zero trust follow key concepts describe in this document. For a more complete overview of the core principles in zero trust see ZeroTrustCore.
OPC UA, with its built-in security capabilities, is a very good fit for a zero trust environment. The capability to assign permissions down to individual Nodes, the ability to provide both application level and user level authentication, and support for central management of Authorization and Authentication (GDS functionality), are all concepts desired in a zero-trust environment. Another key tenant of a zero trust architecture is the concept of least-privilege, which can easily be applied using OPC UA.
Some key concepts related to a zero trust network is that the network is not trusted and that devices on the network are not trusted
A key point is that information that is flowing between the enterprise network and non-enterprise network needs to have consistent security policies. Furthermore, for a zero trust architecture additional safe guards should be in-place like diagnostics and monitoring systems, network logging, access policies, a PKI infrastructure and User identification systems. For additional details on the architecture of zero trust network see ZeroTrustArchitecture
OPC UA is designed to operate in a multi-vendor environment, where devices from many vendors (not all of which would be trusted) could be operating. The hardware and software on these devices could be owned by the enterprise or they could be owned by others. OPC UA is designed to assign trust as needed, not inheritably trusting any device. Having standardized security policies and settings (as defined in OPC 10000-100, OPC UA Specification: Part 100 – DevicesOPC UA Specification: Part 100 – Devices
https://www.opcfoundation.org/UA/Part100/
OPC Security Policies) provides a consistent security policy and posture.
In zero trust architecture, OPC UA Auditing would be required as an integral part of a continuous diagnostics system. The individual privileges and roles that are available in OPC UA can be part of the data access policies. The support for a GDS in all Servers and Client allows an Enterprise PKI system to be deployed. The GDS can be linked to identity management systems.
The key point is that even though OPC UA is not a complete zero trust environment, it provides many of the required aspects of a zero trust environment.
Diagnostics are an important tool in troubleshooting problems in a Server, Client or system, but it is important that security sensitive information not be provided as part of diagnostic information. Security information shall only be available to security Administrators. Providing security related information via diagnostics to non-security personnel can provide information that can be used to compromise a system.
In addition, diagnostics can provide trace information describing the overall structure of Server. This type of diagnostic shall only be provided to Authenticated Clients.