By Joel Langill
Stuxnet experts are jumping out from everywhere saying they know how to mitigate the worm and clear up any issues a system might have.
There seem to be some differences in approach, but one thing is pretty much agreed to by all: No single solution will block an attack like Stuxnet, but a comprehensive solution of countermeasures including process and policy can significantly reduce the negative consequences that result from such an attack.
Knowing this in advance means any mitigation strategy needs to be based on a solid defense-in-depth strategy that utilizes multiple, independent layers of protection. The members of the Cyber Security Forum Initiative (CSFI) Stuxnet Project agree while it will always be possible to find flaws in any one solution it should be increasingly difficult to find and exploit flaws in a comprehensive solution that depends on multiple protective measures.
The concept proposed breaks the situation down into two distinct phases: Prevention and Reaction. The first set of countermeasures should be preventative in nature, and designed to minimize the likelihood that a control system could be infected by such an attack. The second, and equally important, set of countermeasures should be reactive in nature, and designed to minimize any negative consequences to the control system should the system be compromised.
Each of these sets of countermeasures should also possess passive and active components that utilize direct and indirect methods in responding to the event. These countermeasures are then implemented in real-time based on the impact of the attack and the duration of the attack (which correlates into the likelihood of greater damage or negative consequences).
Let us explore this concept more as countermeasures are applied. This list is meant to be used as guidance to possible countermeasures which could be deployed and should not be interpreted as a list which all items are required for every installation.
Preventative – Passive
These “preventative” countermeasures have revealed the general lack of an existing strong security policy within the control systems environments and reflect security controls that that should be implemented on all systems prior to a Stuxnet-like event. These countermeasures include:
• Effective security policies and procedures. Policies and Procedures are the first step to securing control systems. These policies and procedures then need to be reviewed and updated as part of a continuous improvement program.
• Security Policies should be created that address specific host-to-host and zone-to-zone communication requirements, including protocols, ports, etc. This information is vital and will be used in subsequent countermeasures to identify suspect traffic, and is a basic requirement in complying with ISA-99 standards.
• Implement a Security Awareness Program within the organization. Consistent results require a baseline level of education relating to control systems security among operational and planning staff, and must include regular re-training as risks and technologies change.
• Disabling of USB devices within the more secure control systems “zones” (security “zones” as defined by ISA-99).
• Implementation of Software Restriction Policies (SRP) that prevent the execution of code on remote and removable media (USB, CD/DVD, network shares, etc.). Exceptions can be granted on a limited basis when required to support software maintenance and upgrades. Microsoft introduced this functionality with Windows XP, yet few implement its policies. Until and unless the control systems world is completely off of Windows 2000 and Windows XP platforms, this will remain a very viable countermeasure.
• Follow the vendor’s recommendations for disabling of all unnecessary services.
• Confirm that any default username/passwords have been removed or modified.
• Confirm that any unnecessary services have been removed and or isolated based on a need to access.
• Guidelines such as those published in the U.S. Dept. of Homeland Security Control Systems Security Program document “Cyber Security Procurement Language for Control Systems” should be adhered to as much as possible, especially in regard to any hard-coded passwords that may be engineered by the software developer. Role-based Access Control should be considered to minimize risks that could be introduced from hard-coded vendor passwords.
• Utilize active vulnerability scanners on these systems (at minimum during testing or other times of non-production use) to evaluate and document the configuration against known vulnerabilities and predetermine compliance guidelines. The fact that Stuxnet is using MS08-067 shows that (1) vendors may not even be aware of the power of exploiting this vulnerability, or (2) they are assuming that no one will target these systems and there is not a need to address this patch. This vulnerability is seen today on many systems and is one of the most common vulnerabilities used to exploit control systems.
• Implement a comprehensive Patch Management Program that regularly updates operating systems and installed applications in accordance with vendor guidelines and approval of hotfixes, patches, and relevant security updates. Systems should be audited to ensure that updates are installed on a regular basis.
Preventative – Active
• If allowed by the system vendor, integrator and facility owner/operator, all hosts should be installed with applicable host-based firewall, anti-virus and anti-malware applications.
• Host-based intrusion detection applications should be utilized where allowed. Some tests have shown that certain activities of Stuxnet would have triggered HIDS alerts, including DLL injection and rootkit installation attempts.
• Implement non-repudiation methods for accountable logging. All uses of any of the related systems should be identified using a physical device, and every activity should be logged and directly associated with that user. Every activity – from logging in, connecting devices, data activities, and network activities should be traceable to a specific person in order to provide better forensic ability as well as a deterring factor.
• Logging of events should not be performed on the same device which generates the logs but should write identical logs to a separate device for pre- and post-event analysis. This also helps minimize the possibility of an attacker “hiding their tracks” and altering system logs. Security Information and Event Management (SIEM) systems are designed for precisely this purpose and will securely maintain audit trails regardless of attacks against other systems on the control system network.
• “Whitelisting” based security applications should be considered over “blacklisting” or “heuristic” based solutions whenever possible to increase the likelihood that zero-days will be detected.
• Firewall rules should be implemented that “deny by default” all outbound traffic from the control system networks and zones. Justification needs to be given for outbound access, just like it is required for inbound, When outbound traffic must be allowed it should be between specific hosts for specific services based on business needs.
• Utilize code signing of all critical systems (in addition to whitelisting). Updates and changes should go through a unique traceable process at which code should be compared to an out-of-band provided signature from the vendor. Unless verified, code should not be allowed to the system.
Reactive – Passive
Identification of a threat (or even a “potential” threat) is a very valuable aspect of minimizing negative consequences of an attack. It may not be possible to eliminate completely all threats that exploit zero-days, but it should be a goal to be able to identify suspect activity that could signify an attack and minimize the consequences.
• Security Information and Event Management (SIEM) systems should be installed to automatically analyze and correlate the data that is generated and stored in system logs and event journals throughout the control system network. Without SIEM it is virtually impossible to (1) analyze all this data, and (2) make any logic correlation of the data between various hosts.
• Implement intrusion monitoring systems integrated with SIEM within control systems networks. These systems will evaluate data traffic patterns between system components, creating a baseline of acceptable network use. Use the data obtained from the security policy to map out the allowed data paths that should exist within the system architecture and implement rules or alerts to identify deviations from these patterns and generate notifications via the SIEM.
• Implement “Extrusion Detection” which when configured and deployed properly, will generate alerts and or alarms for potential client side attacks. Rules needs to be create that look at “What is LEAVING the system: Who, What, When, Where and How.”
• Implement passive vulnerability scanners (PVS) on the control systems network that can observe any unusual traffic patterns and correlate this against previous patterns, and provide an alert mechanism to signal deviations from normal. PVS can work in conjunction with IDS with all alerts consolidated and reconciled within the SIEM.
• Implement a SCADA honeypot or identical system on the same network which can be periodically scanned and tested using more “aggressive” methods, and will not impact normal operation or production. Set up test and validation systems that mimic the production systems (at least for all the critical components), and implement a recurring comparison process between the production and test systems. Financial institutions have used this approach for several years, and should be considered for critical infrastructure protection as well.
Reactive – Active
At this point in the attack, it may be possible to use the knowledge of the attack to stop it. In any case, it is very important to securely maintain forensic data that can be used for active mitigation and/or post-event analysis. Most of these security measures are used for forensic purposes in learning what failed and what can be done to prevent a similar attack in the future.
• When an attack is identified, operational staff or automated procedures should use the knowledge of the attack to possibly isolate affected hosts, segments or networks until they can be trusted and placed back into service. Switches, routers, firewalls and the hosts themselves can be used to isolate an attack in progress.
• Once the attack has been confirmed, all non-essential communication conduits should be filtered and closely monitoring to contain the attack while not negatively impacting plant operation.
• Reactive plans should be in place, tested on a recurring basis, and updated in order to be effective in the event of an attack.
• Incident Response and Business Continuity procedures should be in place and initiated to maintain or re-establish essential operations while recovering from an attack.
• Care needs to be taken in following established and rehearsed forensic procedures to maintain the integrity of the data contained within the infected systems.
As a last resort, it may be necessary to initiate a shutdown of the process under the control of the affected automation system to minimize the potential for environmental or loss-of-life resulting from a potential control system failure. Disaster Recovery Planning can assist by allowing an operator to place a “standby” machine in active mode during potential shutdown.
Joel Langill is a staff engineer and security consultant for ENGlobal’s Automation Segment based in Houston, TX. Langill developed this story and then the CSFI Stuxnet Project Team (www.csfi.us) reviewed and amended it as a part of their research project and paper on the Stuxnet worm. Beside Langill, members of the team include, Amr Ali, Chris Blask, Joseph Patrick Schorr, Jim Mulholland, Jesus Oquendo, Izar Tarandach, David Simpson, Charles A. Penn Jr., AVV, Stefano Mele, Bill Varhol, Ehab Fahmi.