Human Factors in Automation Systems
When you consider the design of factory automation systems today, you have to ask yourself: Is the system an extension of its users or are the people who use it an extension of the system? The distinction is important, because we often adopt the principles and lessons of previous system deployments, and allow them to shape user behavior.
Often this means consolidating user experience into an interface or into an integrated data set that allows meaningful decisions. But the results of these patchwork efforts may be ineffective or inefficient, particularly when it comes to homegrown factory automation systems.
A fundamental flaw in the patchwork approach is lack of adherence to human factors in engineering. While general solutions are built on the premise that more information provides better insight and better problem resolution, the all-too-real phenomenon of cognitive saturation is given little regard in system design.
This discussion is centered on the design of an integrated workflow management system to help with the disposition of inline statistical process control (SPC) violations. But a similar line of thought can be applied to fault detection and classification (FDC) and contamination-free manufacturing (CFM). These factory processes are also examples of complex multivariate scenarios that are difficult to isolate.
There are two fundamental ways in which any system can negatively impact a company’s bottom line, either directly and indirectly. The first is obvious: overt occurrences such as wafer breakage and particle contamination.
The second is more subtle, but more impactful: interpretation of signals in the fab. How well this is managed in automation system design has a direct impact on profit margins, because margin-eroding events, driven by scrap rates and excursions, are primarily products of human error. Efficiently identifying the underlying causes of variability in wafer processing and being able to measure cause and effect—not just action and effect—enable us to reduce the frequency of some of these problems.
Automation systems can help reduce human error by:
- Using proper detection screens to capture events.
- Effectively bridging communication gaps that exist within modules and across teams to facilitate the rapid and accurate communication of information to those who need to act.
- Being sensitive to signals that something is amiss.
Once such measures are in place, identifying opportunities to automate and streamline workflows more effectively becomes possible.
How Did We Get Here?
Most automation systems used in semiconductor manufacturing are homegrown efforts or a composite of internal and specialized products from various suppliers. Some components may be state of the art while others are far less sophisticated. Although the semiconductor fab has existed for over four decades, most automation systems are still designed as a patchwork of capabilities.
These capabilities typically include a material execution system (MES), a recipe management scheme, SPC, equipment interface, dispatching and material-handling systems, computer-controlled maintenance management system, routines for spec and quality management, and an enterprise resource planning (ERP) system, to name just a few.
All these systems, however, have a singular supporting role: they are all used to manufacture wafers that are sold to customers. They are built on finite business rules. They tend to be predictive in some cases (with varying degrees of success) but most are often little more than interrupt-driven systems.
Human beings, on the other hand, come with an entirely different set of characteristics. Two people can interpret the same stimuli differently, yielding far different outcomes. People frequently try to fix production problems by modifying automation rules based on their individual observations and beliefs. Those individual differences may clash with the intentions of the manufacturing team.
Efforts have been made to integrate the highly methodical rule-driven semiconductor production environment with the cognitively varied understandings of the individuals who work within it. For example, the industry has adopted so-called Out-of-Control Action Plans (OCAPS) as the standard method to deal with these circumstances. But it’s not adequate.
Is there a better way to bridge this gap between man and machine? For the inline SPC world, bridging the gap would require the ability to quickly make good decisions according to the wafer measurements taken over the course of manufacturing a semiconductor product.
Several foundational elements are required for such a system. To begin with, it must adhere to consistent and proper SPC practices. Without a consistent approach to SPC, additional variability would be introduced, and if a cause isn’t attributed to some source, then a falsification could become part of the dataset used to construct the control limits of an SPC chart.
For example, say a wafer is measured at a metrology tool in a fully automated facility. The metrology tool indicates a problem that points to a process tool. Meanwhile, because of the delay in getting the wafer to the metrology tool, additional lots have already moved through that same process tool. These lots are likely to exhibit the same problems as the original measured wafer, if the problem really is associated with that process tool.
But if the user could assign a cause to the first wafer that points to the process tool, and also associate the additional lots that have run through the tool, then these additional points could be omitted for the purpose of control limit calculations. This would serve as a fundamental step toward managing the process control limits.
An even better approach would be for users to have the ability to modify the previous points on the SPC chart once additional data arrives confirming the actions of the first violation. In effect, the user would be able to understand whether the results of the actions taken have met the success criteria, and thus confirm that the proper decision was made. This would allow for a more accurate flagging of the SPC samples that shouldn’t be considered in the calculation. In order for this to be possible, a concise, easily defined, and auditable action trail must exist that all users can see.
Sources of Variability
If you ask four different process experts what the violation rate (i.e., the variability) of a mature process should be, you will get four different answers. If you introduce a statistician into the mix you will add an additional answer that will likely not match any of them. The reason for this is simple: sources of variability are not well understood nor are they well managed.
This raises the following question: how well adapted are people to interpreting which sources cause the variability? Consider the inline SPC world.
Wafer measurement faults can be the product of a number of root causes; possibly more than one at a time. The user, typically a process technician or process engineer, is required to identify the possible sources of variability and adjust them accordingly.
The user has to overcome a number of hurdles at this point. There is the possibility that a violation occurred because of an incoming problem from an upstream process. This can stem from mismatched SPC, from tool matching at the process and metrology levels, from mismatched specifications, or from a simple process adjustment such as a recipe modification.
First, the user must consider the health of the current tool that processed the wafer by asking questions such as: Is the tool matched? Was there a fault on the tool? Was an experiment run on the wafer previously?
Next, the metrology tool should always be looked at. And finally maintenance activities on any piece of equipment should be considered as a possible contributor to additional variability.
How does a user navigate all these potential sources? Typically the information required to evaluate the situation and make a proper decision resides in five or more databases. There is also a great deal of information duplication within the four walls of the fab.
There have been attempts to aggregate this information and help visualize some of the sources, but never in a streamlined and efficient manner. For example, if you audit corporate OCAPS, you will find that none of the plans consider all of the sources and information actually required to answer the question presented. As a result of this decentralization, users frequently suffer from information overload or deprivation.
Psychology of Shiftwork
Fab modules are staffed 24x7. The majority of these shifts are measured on the number of dispositions per hour or shift.
There are multiple users with varying degrees of expertise manning stations and tasked with the disposition of SPC violations. Technicians and engineers are dedicated to modules that are usually vertically oriented. Metrology is often a department of its own and the upstream and downstream modules are not as closely tied as they should be.
In the absence of a robust communication network that is systematic and repeatable, a breakdown in communication generally allows issues to persist. When you consider the length of the task list for any module, you can imagine how easy it becomes to allow certain symptoms to linger, especially those with known workarounds.
This dynamic makes it impossible to build a reliable manual system for the disposition of lots. Symptoms of this inefficiency include an excessive number of re-measures and tool qualifications.
Inefficiency also breeds a culture of "Verify that it’s not my tool and process, and then move on with life because it’s not my problem." The result is a hesitancy to make decisions if there is a perceived risk in doing so.
Human Interpretation of Sources of Variability
Recently, Applied Materials looked closely at ways to handle human variability. Let’s look at the ways a human being interprets information, and also consider the overall effects of making a decision for better or worse compared to not making a decision at all.
In semiconductor manufacturing there are many faults whose effects on the overall quality of the product can’t be quantified until much farther down the line. Often, there is little evidence to relate the data to performance. Out of 160 parameters, for example, maybe only 15 have a known correlation to an electrical test parameter that affects device power consumption or reliability.
Until someone raises a flag, you as a user will learn little from your persistent SPC violations. Most aren’t correlated to electrical test parameters and there is little opportunity to sensitize a culture to them. Therefore, if you overload users with signals that have little discernable effect, they will begin to dismiss the meaning of the violation, because in this instance doing nothing is less risky than taking action.
The interpretation of users is the most important variable there is. If it were simply a matter of evaluating data thresholds, we would be able to automate everything. But again, the biggest source of variability comes from the inability of people to make proper observations in a standardized manner. Unfortunately there are no timely and effective ways to resolve this through training.
A well-designed automation system would provide the ability to audit the interpretations of its users, who tend to gravitate to one source or another depending on their understanding of the violation.
Figure 3 is an example of the variability in SPC signatures by user for a given process. Process engineer 1 (PE1) and process engineer 2 (PE2) appear to respond to the same process differently. The same can be seen for process technician 1 (PT1) and so on.
The solution is to measure this variability as a standard KPI and institute a mentoring program within the module to connect process experts with users who demonstrate a bias. The interpretations of users will become more homogeneous in a very short period of time. Continuous improvement programs (CIPs) will then allow shift behavior to change in lockstep for a given process across the entire module staff, providing measurable and more meaningful results.
Applied Materials has invested in a number of solutions that target human-error and variability reduction across inline SPC, including tool qualification, fault detection, defect and contamination management, and electrical test. Field results have demonstrated a 4% reduction in human variability across all modules in less than 3 months.
This approach is closely coupled with user training that focuses on problem-solving with a new dimension. It also integrates the failure and response-detection methods, in this case inline SPC, with the equipment and process failure modes and effects analyses (FMEAs).
By reducing human error and variability, fabs can rapidly roll out methods of mitigating future scrap events. Their automation systems become extensions of the people who use them, bringing new levels of efficiency to the fab, while preserving the human factor.
For additional information, contact firstname.lastname@example.org.