skip to main content

Advanced Data Mining Techniques to Improve IC Fab Yield

Lokesh Kulkarini, Sidda Kurakula, and Helen Armer

Semiconductor fabs are highly automated, with systems generating large amounts of data— often on the order of a few terabytes per day. As fabs become increasingly data driven, data mining to extract and analyze useful information is assuming more importance in solving customers’ most challenging, highest-value problems. Advanced data mining techniques have gained importance in the semiconductor industry in recent years, primarily due to rapid advances in computing technology and data collection, and storage software and hardware.

Semiconductor fabrication processes are very complex and interactions between different variables can be di‘ffcult to fully understand. Data mining helps highlight such relationships. Each wafer is individually tracked because while wafer-to-wafer (WTW) variation can be incredibly small and elusive, it still impacts device yield. Advanced statistical modeling techniques help identify and build relationships that explain such variation and its effect on wafer properties. This kind of analysis drives the identification of key sensors, which are used to monitor and control tool and wafer performance more closely.

As part of its discovery, assessment and solutions process, the FabVantage Consulting Group at Applied Materials employs a holistic data mining approach. The methodology goes beyond traditional statistical process control (SPC) techniques that primarily rely on process monitoring for change-point detection. Data mining is used in many FabVantage yield and tool-output improvement projects to help customers more quickly and e‘fficiently pinpoint sources of loss.

A wide variety of statistical algorithms are used to model various device- and wafer-related parameters, including thickness nonuniformity; defect density/count; sheet resistance; scratch type/count; and in-line electrical test results such as transistor current and other electrical properties. This approach is also used to execute crucial chamber-matching strategies.

Laying the Groundwork

The data mining process begins by identifying a list of critical process tools affecting the desired wafer- or lot-level response. After that, a priority sensor list is prepared for each critical tool which defines the fault detection and classification (FDC) data collected for that tool. Table 1 is an example of a tool priority sensor list drawn from Applied’s knowledge base. This example is truncated for brevity; an actual priority sensor list may include several dozen sensors.

The Applied knowledge base is a crucial resource in the process. It is a repository for methodologies, tool documentation, Applied’s best known methods (BKMs), learnings, sensor data collection plans, and models. It contains sensor lists for more than 500 variants of Applied’s 200mm and 300mm process tools. This vital resource is continuously updated with any new information from the field, internal system development, or research.

Once the tool priority sensors are obtained, the next step is to collect the relevant FDC and metrology data. This data is obtained from the available FDC software products and may include Applied’s E3 and process equipment charting technologies, or software from third party suppliers. Tool sensor priorities are identified as P1, P2, P3 or P4, depending on the influence of the sensor parameter on the yield. For the definition of sensors in each category, refer to table 2.

Before proceeding further, a data quality check must be performed to identify missing or erroneous data that might corrupt the analysis.

Table 1. Tool priority sensor list.
Table 2. Sensor priority definitions.

Advanced Data Mining Approach

The first part of the analysis involves customized data visualization. This helps determine the correct process window to examine in order to calculate summary statistics and to report any obvious abnormalities. This is primarily done with a sensor trace analysis.

Sensor trace analysis could be performed in multiple ways, depending on the problem being examined. Examples of trace plots are overlay plots of one sensor for many wafers or overlay plots of multiple sensors for one wafer. Note that the two plotting styles have a different purpose. One method allows us to see the spread of one sensor across multiple wafers while the other allows us to observe the data from multiple sensors for each wafer. Similarly, trend charts using overlay style across multiple chambers or tools, or box plots, can be very useful in developing a high-level picture of the potential problem areas.

As a basic guide to understanding the various statistics that could be used, refer to figure 1. For steady regions, we use mean, median or standard deviation (stdev). A ramp-up step could be followed by a stabilizing spike, or the parameter could undershoot the target value as illustrated by the orange line in figure 1. In such cases, using max or min may highlight some of the variability present. Step- or window duration, area, or moving range are additional types of statistics that could be instrumental in the analysis. For more detailed analysis, the output of multiple sensors can be compared (e.g., ratios or absolute differences) to create a more sensitive measure. This analytical approach of using ratios or absolute differences to create a so-called derived sensor has been used effectively to model a variety of applications and problems, including particle performance resulting from thermal stress in the seasoning of an HDP-CVD chamber.

Once appropriate process windows are identified and summary statistics calculated, the first-pass modeling results are studied. This typically involves filtering out redundant variables to reduce the dimensions to a more manageable subset. Depending on the problem being analyzed, a variety of supervised and unsupervised learning algorithms are available. Supervised algorithms, like some regression or classification techniques, help establish relationships between a dependent variable (e.g., transistor current) and a set of independent variables (e.g., RF power). Unsupervised methods, such as principle components analysis (PCA) or clustering, help highlight interaction between different variables (sensors). Table 3 provides a comparison of the advantages of some supervised modeling methods.

Figure 1. Trace plots as a guide to calculating appropriate statistics.
Table 3. Comparison of di‰fferent modeling methods.

The first-pass modeling results are followed by a number of iterations to successively and systematically remove noise and unneeded variables to improve model quality. This is primarily done by identifying and removing any redundant variables. For example, for a stable portion of the recipe, since the median and mean both have very similar values, we can use either of the variables in the model iteration. Likewise, variables that have constant values do not impart any additive explanatory power to the model. We eliminate these as well.

In addition, we disregard variables that have extremely small variation across the wafers; however, caution must be exercised in doing so or one runs the risk of missing important variables or interaction effects. These refinements result in a final dataset that has a more manageable set of variables and is almost noise-free.

Next, a model quality report is generated with the top-ranked variables and their respective contributions (see figure 2). A plot of predicted values vs. actual values (see figure 3) indicates model quality.

A high R-squared value, however, may not always indicate the best model fit. Overfitting is not uncommon and should be avoided because it can cost the model its generalizability. Validation is then performed to correct for all hardware-, process- and sequence-related changes required to solve the yield/output issues and also to match chamber-to-chamber performance. Finally, yield-driven control limits are determined and set for each sensor of interest, including derived sensors, and are subsequently monitored for any abnormal behavior.

More often, a combination of two algorithms is significantly more effective than using any one algorithm. For example, consider the development of a model for predicting transistor current after an etch process. It was found that simply using a rule ensemble algorithm gave a model with poor predictive ability. Similarly, a Random Forest algorithm, which builds decision trees based on splitting sensor values, is prone to overfitting. However, a combination of the two methods proved to be significantly more powerful: a rule ensemble algorithm was used to reduce the number of variables and then Random Forest was used to build a model with high predictive power and good generalizability.

Figure 2. Contribution pareto of top-ranked variables.
Figure 3. Plot of model-predicted values vs. actual values of transistor drive current.

In cases where a variable dependence is unknown, any of the unsupervised learning methods could be employed. An example is clustering analysis that finds similarities (using a distance measure) in variable behavior within each group, helping to differentiate one group from another. An example is provided in the next section (see figure 4). Like regression, clustering analysis may be combined with another method, such as PCA, to enhance results.

Success Stories

The FabVantage data mining approach has been used in more than 40 customer engagements. The examples described below highlight the kind of high-value problems the FabVantage consulting team has been able to resolve using this data mining methodology.

Figure 4 depicts the results of implementing an algorithm to identify parameters driving mismatch between three decoupled plasma nitridation (DPN) chambers. Nitridation is a process wherein nitrogen is infused into silicon oxide, enabling effective oxide thickness scaling and reducing gate leakage while improving resistance against dopant diffusion through the gate dielectric. A drill-down analysis of the results showed that the mismatch among chambers was caused by differences in the throttle valves, foreline pressure and RF subsystems.

In another example, a project on two etch chambers (see figure 5) shows the pre- and post-implementation results obtained based on findings of a FabVantage data mining analysis. The customer provided Applied with gate critical dimension (CD) data they had collected over the previous six months. Analysis quickly revealed that a faulty RF generator was causing a bias impedance mismatch, which was in turn driving the variation in gate CD. By pinpointing the source of the problem rapidly, the customer was able to restore production faster and more cost-effectively.

Figure 4. Plot showing mismatch among three DPN chambers using an advanced data mining approach.
Figure 5. Identification of root cause driving variation in gate CD on two etch systems.

Another customer was experiencing film uniformity variations. Film thickness uniformity is a wafer-level metric that is highly sensitive to process parameters. In this project, the FabVantage team analyzed data from 300 wafers and identified causes of high film thickness nonuniformity. Hardware and recipe changes were identified to reduce the nonuniformity. Figure 6 shows the improvement achieved after implementing the FabVantage recommendations.

Figure 6. Improvement in thickness uniformity after implementing hardware and recipe changes recommended by the FabVantage team.

Next Steps

As more intelligent algorithms continue to be developed both in academia and in data mining communities, specialized analysis will become more accessible. Generating new advances that leverage the power of data mining requires more e‘fficient collaboration among big data experts and process and hardware engineers. With further advances, focus will gradually shift from reactive to predictive analysis, which will help increase tool availability, reduce unscheduled downtime, and enable tighter control of processes, creating a positive impact on the device maker’s yield and output.

For additional information, contact avi_edelstein@amat.com