Clause 6 provides guidance to vendors that implement OPC UA Applications. Since many of the countermeasures required to address the threats described above fall outside the scope of the OPC UA specification, the advice in Clause 6 suggests how some of those countermeasures should be provided.
For each of the following areas, Clause 6 defines the problem space, identifies consequences if appropriate countermeasures are not implemented and recommends best practices.
Timeouts, the time that the implementation waits (usually for an event such as Message arrival), play a very significant role in influencing the security of an implementation. Potential consequences include
- Denial of service: Denial of service conditions may exist when a Client does not reset a Session, if the timeouts are very large.
- Resource consumption: When a Client is idle for long periods of time, the Server keeps the Client’s buffered Message or information for that period, leading to resource exhaustion.
The implementer should use reasonable timeouts for each connection stage.
The specifications often specify the format of the correct Messages and are silent on what the implementation should do for Messages that deviate from the specification. Typically, the implementations continue to parse such packets, leading to vulnerabilities.
- The implementer should do strict checking of the Message format and should either drop the packets or send an error Message as described below.
- Error handling uses the error code, defined in OPC 10000-4, which most precisely fits the condition and only when returning an error code is appropriate. Error codes can be used as an attack vector; thus, their uses shall be limited as described in Part 4 Service Behaviours clause. Once the secure channel has been established then appropriate specific error codes are returned.
- Another attack vector that can be used is timing variations; this is minimized by the description in Part 4 that requires the closing of the socket for any errors when establishing a secure channel. Vendors should be careful in their implementation to ensure that all paths that result in the closure of the socket do not provide a timing hint indicating which failure path was encountered. This can be accomplished by having a random delay before closing the socket or before returning a generic error code.
- All array lengths, string lengths and recursion depth should be strictly enforced and processed.
Random numbers that meet security needs can be generated by suitable functions that are provided by cryptography libraries. Common random functions such as using rand() provided by the “C” standard library do not generate enough entropy. As an alternative, implementers could use the random number generators provided by the Microsoft Windows Crypto library (WinCrypt library) or by OpenSSL. Even the random functions provided in cryptography libraries require a source of entropy to initialize and the required entropy is not always available on embedded devices. PCs can use several individual pieces of information (hardware ids like CPU, Mac, addresses, USB devices, screen resolution, installed software ...) to generate entropy, but embedded devices are built completely identically. Often only the time and maybe a MAC address is left for entropy. These sources of entropy can be guessed or discovered. This makes the embedded devices very vulnerable.
A common mistake is to generate cryptographic keys during the first boot. Thus even the time information is predictable (creation time is stored e.g. in a certificate). Some alternate solutions a vendor might want to consider:
- Add specific entropy generator hardware when designing embedded devices.
- Do not generate certificates on embedded devices. Use an external tool or the GDS to generate the certificate and load it onto the device. A problem could still remain for the symmetric keys, as these are normally not created directly during the boot phase; rather they are created when a client connects.
- Wait long enough until enough entropy information is available. Some operating systems provide hints when they have reached this point.
- For embedded systems without a good entropy source it may help to store the cryptographic pseudo-random number generator (CPRNG) state, so that it will not produce the same random numbers after every boot.
Vendor should ensure that cryptographic functions they use are initialized with suitable entropy and that the generated certificates are not created in a predictable manner.
The implementation understands and correctly interprets any Message types that are reserved as special (such as broadcast and multicast addresses in IP specification). Failing to understand and interpret those special packets may lead to vulnerabilities.
OPC UA does not provide rate control mechanisms, however an implementation can incorporate rate control.
OPC UA describes that certain functionality, such as the management of CertificateStores, should be restricted to administrators. This Multi-part standard does not describe the details associated with administrative access. The nature of administrative access varies from platform to platform. Some platforms only have a single administrator. Other platforms provide multiple levels of administrative access such as backup administrator, network administrator, configuration administrator etc. The deployment site should make appropriate selections for administrator access and the implementer should allow for the configuration of appropriate administrator account access.
Administrative access restrictions include items such as configuration files for Servers and Clients. For example, configuration files might contain paths to certificate stores or exposed endpoints both of which if changed could cause major issues.
Administrative access should also be used to control Audit Events, see 4.14 for additional details.
Security Profiles defined in Part 7 describe required algorithms and required key lengths. Key length requirements may be specified as a range, i.e., 1024-2048. It is important that an OPC UA Application supports the entire range for its ApplicationInstanceCertificate. This allows an end user to generate a key (ApplicationInstanceCertificate) that meets their security requirements. This may extend the period of time for which the given Security profile can be used. For example, key lengths less than 2048 are already considered insecure, but if an end user generates certificates for the high end of the range (2048), the application might still be considered secure (depending on the other algorithms).
OPC UA supports a robust Alarm and Condition information model which includes the ability to disable alarms, shelve alarms, and to generally manage alarms. Alarm processing and management is an important part of maintaining efficient control of a plant. From a security point of view it is important that this avenue be adequately protected, to ensure that a rogue agent does not create a dangerous or financial situation. OPC UA provides the tools required for this protection, but the implementer needs to ensure that they are exercised correctly. All functions that allow changes to the running environment are able to generate Audit Events and are to be restricted to appropriate users.
The disabling of Alarms is one such function that should be restricted to personnel with appropriate access rights. Furthermore, any action that disables an alarm, whether it be initiated by personnel or some automated system, should generate an Audit Event indicating the action.
The shelving of alarms should follow similar guideline as the disabling of alarms with regard to access and Auditing, although it may be available to a wider range of users (operators, engineers). Also, the implementer should ensure that appropriate timeouts are configured for Alarm Shelving. These timeouts should ensure that an Alarm cannot be shelved for a period of time that could cause safety concerns.
Dialog Events could also be used to overload a Client. It would be a best practice for Servers that support dialogs to restrict the number of concurrent dialogs that could be active. Also, Dialogs should include some timeout period to ensure that they are not used to create a DOS. Client implementers should also ensure that any dialog processing cannot be used to overwhelm an operator. The maximum number of open dialogs should be restricted and dialogs should be able to be ignored (i.e. other processing should still be available).
OPC UA describes functionality that allows for programs to be executed as part of the OPC UA Server. These programs can be used to perform advanced control algorithms or other actions. The use of these actions should be restricted to personnel with appropriate access rights. Furthermore, the definition of Programs should be carefully monitored. It is recommended that statistics be maintained regarding the number of defined programs in addition to their execution frequency. This information is available to administrative personnel. In no case should an unlimited number of program executions be allowed.
The OPC UA specification describes Audit Events that are to be generated and the information that these Audit Events include as a minimum, however, the specification does not describe how these Audit Events are handled once they are generated. Audit Events can be subscribed to by multiple Audit tracking systems or logging systems. The OPC UA specification does not describe these systems. It is assumed that any number of vendor provided systems could provide this functionality. As a best practice whatever system is used to store and manage, Audit Events should ensure the following:
- That Audit Events are not tampered with once they are received.
- The Subscription for Audit Events should be via a Secure Channel to ensure they are not tampered with while in transition.
- For Clients that log audit events; it is recommended that the logged audit events be persisted in such a manner that the audit events can be authenticated and linked to the original transaction.
An Audit event management system could have additional requirements based on the site CSMS.
OAuth2 defines a standard for Authorization Services that produce JSON Web Tokens (JWT), also known as AccessTokens. These JWTs are passed as an Issued Token to an OPC UA Server which uses the signature contained in the JWT to validate the token. JWT can also provide information to the Server regarding the roles associated with the Authenticated user. The enforcement of the roles is the responsibility of the Server. OPC 10000-4, OPC 10000-5 OPC 10000-6 and OPC 10000-18 describes OAuth2 and JWTs in more detail. Sites should ensure that they follow the best practices defined in the site CSMS for OAuth2.
HTTPs defines a standard transport security. This transport security does not always ensure end to end security. Proxy servers or other intermediaries may exist. If end to end security is required then additional step such as a VPN should be taken.
If TLS communication is supported, the keys used for TLS must be different then the keys for TCP communication. Reusing the keys introduces security issues. Versions of TLS older than 1.2 have security flaws and should not be enabled. It is recommended to only support TLS configurations provided in the TransportSecurity Profiles.
SSL has security issues and should be disabled. It is important that it is disabled for all applications on the machine not just for the UA application.
Websockets is just another protocol that is secured using HTTPS. If using Websockets, all of the security guideline for HTTPs and TLS should be followed.
Reverse connect allows a Server to initiate the connection to a Client (open the socket sending a HEL message). This results in an additional security concern for the Client, in that the Client needs to validate that the connection is from an appropriate Server and not a denial of service attack. The Client follows the process described in Part 6 “Client and Server Handshaking during Reverse Connect” table, including checks related to the ServerUri and EndpointUrl.
This standard describes one option for user security as username/password. If username / passwords are used, they should follow site specific rules and passwords should be secured both in transit and in storage. Usernames should be able to be changed. Passwords should not be hardcoded as part of an application. They should be able to be managed by administrative users. Passwords should follow the password complexity and timeout rules associated with a site CSMS.
If an Application becomes aware of compromised credentials, which could be application level or user level credentials, the application should terminate any connection using the compromised credentials. The compromised credential may be determined via a GDS or other global service or they may be detected by some out of band process.