By Ellen Fussell Policastro
When you think about the automation, asset reliability and mechanical reliability we’ve been investing in our plants for the past 40 years, it’s no wonder we’ve designed in some pretty amazing technologies to boost overall plant reliability.
While the mechanical side has seen an upturn, the human side of reliability needs some work. The human side of reliability was the subject of Tuesday’s PAS-sponsored webinar, “Prevent incidents by improving operator situation awareness.”
“Plants used to shut down once a year to replace a valve or change out a pump seal because they would break on a regular basis,” said Mark Carrigan, vice president of technology at PAS in Houston, Tex., who gave a detailed overview of the root of the problem and how his team is improving human machine interface (HMI) technologies to bring the human side of reliability up a few notches. “Now plants run from five to seven years without a scheduled shutdown because we don’t break things like we used to.”
While overall asset management has improved, and we’re doing a good job on mechanical reliability, challenges remain. “We’ve seen exponential growth in complexity and integration with all these systems in place at a typical facility— doing more with less,” he said. “In the past, a typical refinery plant or chemical plant would have a lot more people producing less product. But when we reduced staff, we realized we could make more product with less people.”
But here’s the problem: There’s still a lack of visibility about vulnerabilities within our systems. “We constantly have to ask ourselves whether we should be working during the startup. Are we doing things potentially unsafe and increasing risks? That’s hard to measure and understand.”
Another difficulty is transferring and maintaining knowledge. With more people retiring, there’s a greater gap in the workforce. How do you make sure all that operational knowledge in the next five years is transferred to the upcoming group of workers?
Not managing these situations well has led to unintended consequences, Carrigan said. “An alarm flood could take place, or equipment could shutdown with an improperly managed change.”
Carrigan offered a few examples of how this can happen, one of which involved testing shutdown systems (testing pressure and measuring safety systems) while the plant was running. Because the plant was running, “they bypassed the SIS system, everyone signed off, and they continued with the test. As a result, they increased pressure, and the interlocked tripped. The safety system was bypassed so it didn’t take action,” he said. The interlocked signal was seeing use within the integrated control system; consequently, valves closed, the system shutdown the plant, and the whole process caused an environmental incident. In this case, people made a change without understanding the consequences. “This is just one example of people not managing complex systems and understanding how the work they do can impact things,” he said.
Human Error, Airline Comparison
Within one graphic, Carrigan compared the rise and leveling off of safety within the automation control industry with that of the airline industry. “We can see a dramatic improvement in overall airline safety, but that improvement has leveled off over the past couple of decades. We can also see interesting trends. Those incidents attributed to human error have not seen nearly as much improvement,” he said.
While the airline and automation control industries are very different, they do have one thing in common — an operator sits in front of a screen, which conveys information about a process, and takes action to keep things on course.
“This type of reliability is hard for our industry,” he said. “But we can look at things, such as equivalent forced outage rate (EFOR) — the time the equipment is not operating as it is supposed to. There has also been an increase in reliability, which has flat-lined over the last several years. “We can see better improvement in mechanical reliability but not human reliability, he said. “This type of information is also less public for the oil and gas industry.”
Carrigan showed through an integrated platform demonstration how all these tools work together in a gas plant within a refinery—how operators and engineers can use an integrated platform to get information quickly to understand what’s taking place, which will allow them to make better and faster decisions.
His team built the example graphic in an HTML environment, so any system has the ability to integrate information from disparate sources. “As an operator, I have this alarm, and now I have to deal with it. So we want to deliver the information the operator needs. We’ve implemented this on many different kinds of control platforms. It’s easy to do. And it doesn’t impact your control network traffic at all,” he said. “By better designing the HMIs, I can help operators catch things while they are still small and help mitigate them. At a Level 3 graphic, you can see the trends — the bottom levels are going up and top levels are going down.”
The graphic allowed Carrigan to see more detail at the various pumps from the overhead and flow control and pressure controls. In his demo, he right-clicked on the alarm for a menu of available options to respond to it (inbound, loop sheets, control map, correlation matrix, and more). “I can see this is a process condition, so it doesn’t make sense to shelve the alarm. If I click on alarm details, I can get all the information I need to see the consequence of not responding is loss of controls,” he said. “So clearly, I need to respond. I can look at different potential causes. The controller is not in manual; it’s 100 percent open. That’s not what the problem can be. The next one is ‘valve stuck’ or ‘pump tripped.’ A bad instrument means I’m getting a bad reading. But that’s not the problem. Yet I do want to do further investigation on ‘valve stuck.’ But I need to understand more about operational limits. I’ve been told if I don’t respond, I’ll end up shutting the process down.”
After checking other options in the dropdown menu, Carrigan could see there was just a small amount of time before the process shut down. With each option, the operator can see more information. “I can also check the impact of making changes to the controller. I have pressure indicators, which let me see various outputs. So I know if I make a change, I will cause a problem to my APC application,” he said. “I have a complex loop, so I better be careful before making changes to the process controller. Next I need to know if there are any control problems with this loop or any performance problems, such as hysteresis.”
Finally, the operator can look at incident reports to discover whether he’s seen the same problem in the past. The incident report database ties everything back to the integrity database. “So we can bring those over to the platform as well,” he said. “We can see in June of last year there was an incident report of the very same thing — the controller was stuck. So we can put in a work order.”
All in all, systems are complex and interactive, and they come from so many different vendors, he said. “Perhaps with better HMI tools, there’s a chance to improve operational reliability so operators can get the information they need quickly — without having to look it up in five different places.”
Ellen Fussell Policastro is a freelance writer in Raleigh, NC. Her email is firstname.lastname@example.org.