skip to main content

The Move to Big Data and Predictive Analytics in Semiconductor Manufacturing

James Moyne and Michael Armacost

The move from reactive to predictive analytics in semiconductor manufacturing is being enabled by the corresponding move to big data solutions for advanced process control (APC) systems.

The digital universe is doubling every two years, and will reach 40,000 exabytes (40 trillion gigabytes) by 2020.[1] As requirements on data volumes, rates, quality, merging and analytics rapidly increase in semiconductor manufacturing, we need new approaches to data management and use across the fab. A number of industries are facing this problem and, in response, the "big data" effort has emerged.

In our industry big data solutions will be key to scaling APC solutions to finer levels of control and diagnostics. However the main impact will be to better enable more effective predictive technologies such as predictive maintenance (PdM), virtual metrology (VM) and yield prediction.[2, 3]

The International Technology Roadmap for Semiconductors (ITRS) defines the dimensions of the big data problem in terms of the five V’s: volume, velocity, variety (i.e., data merging), veracity (i.e., data quality) and value (analytics).[3] While big data solutions can be any solution approach that addresses one or more of the five V’s, a typical big data solution contains the following components:

  • Real-time collection and analytics: In addition to increased data collection rates (velocity), analytics are being implemented that function in time-critical environments, providing analysis to support real-time decision-making.
  • Apache Hadoop: an open source software framework for storage and large-scale processing of data sets on clusters of commodity hardware.[4] Hadoop leverages parallel-processing and scalability capabilities to provide solutions tailored to large time-series data sets such as trace data.
  • Hadoop Distributed Filing System (HDFS): a distributed file system designed and optimized for processing large data volumes and for highest availability.[4]
  • MapReduce-type frameworks: A programming model for large-scale data processing such as MapReduce (which originally referred to the proprietary Google technology but has since been genericized).[5]
  • Data warehousing: An extensible storage capability that relaxes limitations on volume of data.
  • Analytics: Addresses the "value" component of big data. Analytics, particularly predictive analytics, leverage capabilities to organize and analyze large quantities of data quickly.

Proper implementation of big data solutions can provide for more e‘cient data storage and processing. For example, figure 1 illustrates a comparison that Applied Materials performed in developing its big data APC solution, illustrating improved cost of ownership (COO) and processing speed.[6] Note that these big data solutions will likely still require a transactional component probably best suited to a relational database; however the vast majority of data volume and processing can be relegated to a Hadoop-type infrastructure. The benefits of implementing these types of big data solutions include (1) lower COO of data solutions, (2) more e‘fficient data storage, (3) improved analytics performance and (4) better enabling of predictive analytics.

Figure 1. Comparison of traditional and Hadoop-type data platforms (analysis performed as part of the Applied APC big data solution development processes).[6]

Enabling Predictive Analytics with Big Data: PDM Example

Predictive analytics represent the latest layer of capabilities being added to the factory automation infrastructure. As shown in figure 2, these capabilities can be leveraged in a collaborative fashion for yield optimization and overall equipment effectiveness (OEE) optimization. While the former is largely a proprietary fab effort, the latter requires the user, OEM and predictive analytics supplier to cooperate so that equipment, process and analytical knowledge can be leveraged together to achieve cost-effective, optimized solutions. Further, techniques have been developed to ensure IP separation and protection between these parties. This means that both a technical and business foundation is in place to deliver effective PdM services.[7]

Figure 2. Evolution of factory automation from collection to reaction, control and now prediction. VM=virtual metrology; SPC=statistical process control; EHM=equipment health monitoring; RDBMS=relational database management system; DTC=distributed tool communication (Applied tool high-speed data collection interface); MWBC=mean wafers between cleans. (The capabilities can be loosely grouped into mechanisms for yield optimization and those for OEE optimization; while yield optimization is largely controlled by the user, OEE optimization is most e‰ffective when it is addressed as a cooperative e‰ffort between user, OEM and OEE solution—e.g., PdM-provider.)

Figure 3. PdM multivariate predictor signal used to predict the need for CMP consumable replacement to avoid defects such as scratches. The blue vertical lines represent maintenance events in the historical data. The strong prediction signal indicates that the need for maintenance can be predicted with good lead time, avoiding potential scratches, reduced yield and scrap that would normally occur in the shaded area.

As an example, figure 3 shows the application of PdM techniques to predict the need for CMP consumable replacement to avoid defects such as scratches. A multivariate prediction metric is derived by combining tool and process knowledge (from the OEM and user) with statistical techniques. A lead-time horizon complete with confidence limits is then derived that best matches the prediction capability to the customer's needs and costs. A PdM trigger threshold (i.e., "prediction limit") is set up based on this horizon. This capability provides a PdM indication 3 to 4 hours in advance of a defect-triggered unscheduled downtime with 90% accuracy. Implementing this service can result in reduction of wafers at risk of defects, reduced unscheduled downtime, and potentially increased time between scheduled downs.

Prediction is the Future

The advent of big data solutions combined with recent successes in predictive solutions in our industry opens the door for significant opportunities in yield optimization and OEE improvement. These opportunities come with challenges such as migrating from a reactive to predictive culture of operations, enabling cooperative (user, OEM and analytics supplier) solutions while protecting intellectual property, and addressing key big data topics such as data quality (veracity) and data merging (variety) to enable cost-effective and repeatable solutions. However, the level of success currently being achieved is a clear indication that these issues are being addressed, paving the way for prediction to become much more pervasive in the fab of the future.

For additional information, contact

[1] IDC and EMC report, 2014, available at
[2] Many references; see, for example TechRadar Pro talks to SAP’s VP of Marketing and Analytics James Fisher, 2014.
[3] 2014 International Technology Roadmap for Semiconductors (ITRS): Factory Integration Chapter, available at
[6] J. Samantaray, P. Sutrave and T. Bowyer, "Storing, Retrieving, and Managing Semiconductor Manufacturing Data in a Big Data Infrastructure," APC Conference
XXVI, Ann Arbor, Michigan, September 2014.
[7] J. Scoville, M. Armacost, K. Subrahmanyam, P. Hawkins and J. Moyne, "Moving to Cooperative Service-based Delivery of APC Technologies," APC Conference XXVI,
Ann Arbor, Michigan, September 2014.