Download presentation

Presentation is loading. Please wait.

Published byDomenic Ferns Modified over 3 years ago

1
Multistep Virtual Metrology Approaches for Semiconductor Manufacturing Processes Presenter: Simone Pampuri (University of Pavia, Italy) Authors: Simone Pampuri, University of Pavia, Italy Andrea Schirru, University of Pavia, Italy Gian Antonio Susto, University of Padova, Italy Cristina De Luca, Infineon Technologies AT, Austria Alessandro Beghi, University of Padova, Italy Giuseppe De Nicolao, University of Pavia, Italy

2
Introduction Collaboration between University of Pavia (Italy), University of Padova (Italy) and Infineon Technologies AT (Austria) Activity funded by the European project EU- IMPROVE: Implementing Manufacturing science solutions to increase equiPment pROductiVity and fab pErformance

3
Introduction Collaboration between University of Pavia (Italy), University of Padova (Italy) and Infineon Technologies AT (Austria) Activity funded by the European project EU- IMPROVE: Implementing Manufacturing science solutions to increase equiPment pROductiVity and fab pErformance Duration: 42 months (since Jan 2009) Global fundings: 37.7 M 32 partners, including Semiconductor fabs Academic institutions Research centers Software houses Thematic Work Packages

4
Motivations 1 Machine Learning 2 Multilevel framework 3 Multistep VM 4 5 5 Contents Results and Conclusions

5
What is Virtual Metrology? In semiconductor manufacturing, measurement operations are costly and time-consuming Only a small part of the production is actually measured

6
What is Virtual Metrology? In semiconductor manufacturing, measurement operations are costly and time-consuming Only a small part of the production is actually measured Virtual metrology exploits sensors and logistic information to predict process outcome Sensor Data Recipe Data Logistic Data VM

7
What is Virtual Metrology? In semiconductor manufacturing, measurement operations are costly and time-consuming Only a small part of the production is actually measured Virtual metrology exploits sensors and logistic information to predict process outcome Controllers Sampling tools Decision tasks Sensor Data Recipe Data Logistic Data VM Predictive Information

8
Motivations 1 Machine Learning 2 Multilevel framework 3 Multistep VM 4 5 5 Contents Results and Conclusions

9
Machine learning (in a nutshell) Machine learning algorithms create models from observed data (training dataset), using little or no prior informations about the physical system Input (X) Output (Y) Model f(X) Learning Algorithm Training dataset

10
Machine learning (in a nutshell) Machine learning algorithms create models from observed data (training dataset), using little or no prior informations about the physical system The model is then able to predict patterns similar to the observed ones Input (X) Output (Y) Model f(X) Learning Algorithm Training dataset Model Input (X new ) Prediction (Y new )

11
Machine learning (in a nutshell) Machine learning algorithms create models from observed data (training dataset), using little or no prior informations about the physical system The model is then able to predict patterns similar to the observed ones Input (X) Output (Y) Model f(X) Learning Algorithm Training dataset Model Input (X new ) Prediction (Y new ) Most famous algorithm: Ordinary Least Squares (OLS) that consists in solving the optimization problem defined by the loss function

12
The curse of dimensionality Problem: the so-called curse of dimensionality Consequence: the predictive power of machine learning models reduces as the number of candidate predictors increases In semiconductor manufacturing, it is common to have hundreds of candidate predictors: how to tackle the problem? The number of selected predictors grows almost linearly with the number of candidate predictors

13
The curse of dimensionality Problem: the so-called curse of dimensionality Consequence: the predictive power of machine learning models reduces as the number of candidate predictors increases In semiconductor manufacturing, it is common to have hundreds of candidate predictors: how to tackle the problem? The number of selected predictors grows almost linearly with the number of candidate predictors Regularization (or Penalization) methods

14
The curse of dimensionality Problem: the so-called curse of dimensionality Consequence: the predictive power of machine learning models reduces as the number of candidate predictors increases The number of selected predictors grows almost linearly with the number of candidate predictors 19431943 Ridge (or Tikhonov) regression: in order to improve the least squares method, stable (easier) solutions are encouraged by penalizing coefficients through the parameter a

15
The curse of dimensionality Problem: the so-called curse of dimensionality Consequence: the predictive power of machine learning models reduces as the number of candidate predictors increases The number of selected predictors grows almost linearly with the number of candidate predictors 19431943 Ridge (or Tikhonov) regression: in order to improve the least squares method, stable (easier) solutions are encouraged by penalizing coefficients through the parameter a Best value for hyperparameter is chosen via validation Computationally easy (closed form solution) No sparse solution

16
The curse of dimensionality Problem: the so-called curse of dimensionality Consequence: the predictive power of machine learning models reduces as the number of candidate predictors increases 1996 – today today L1-penalized methods: by constraining the solution to belong to an hyper-octahedron, sparse models can be obtained (variable selection). Most famous example: LASSO The number of selected predictors grows almost linearly with the number of candidate predictors

17
The curse of dimensionality Problem: the so-called curse of dimensionality Consequence: the predictive power of machine learning models reduces as the number of candidate predictors increases 1996 – today today L1-penalized methods: by constraining the solution to belong to an hyper-octahedron, sparse models can be obtained (variable selection). Most famous example: LASSO The number of selected predictors grows almost linearly with the number of candidate predictors Best value for hyperparameter is chosen via validation Sparse solution (variable selection) Solved by iterative algorithms (e.g. SMO)

18
Motivations 1 Machine Learning 2 Multilevel framework 3 Multistep VM 4 5 5 Contents Results and Conclusions

19
The hierarchical variability We deal every day with multiple levels of variability: Every equipment has several chambers In some cases, these chambers are splitted in sub-chambers Different process groups, recipes run on the same equipment

20
The hierarchical variability We deal every day with multiple levels of variability: Every equipment has several chambers In some cases, these chambers are splitted in sub-chambers Different process groups, recipes run on the same equipment Simple (naive) solution: create one model for every possible combination of factors Well never have enough data to that, especially for low volume recipes

21
The hierarchical variability We deal every day with multiple levels of variability: Every equipment has several chambers In some cases, these chambers are splitted in sub-chambers Different process groups, recipes run on the same equipment Simple (naive) solution: create one model for every possible combination of factors Well never have enough data to that, especially for low volume recipes Better solution: handle those different levels of variability inside the model

22
The hierarchical variability We deal every day with multiple levels of variability: Every equipment has several chambers In some cases, these chambers are splitted in sub-chambers Different process groups, recipes run on the same equipment Simple (naive) solution: create one model for every possible combination of factors Well never have enough data to that, especially for low volume recipes Better solution: handle those different levels of variability inside the model Multilevel Techniques: Multilevel Ridge Regression (RR) & Multilevel Lasso

23
First step is to create an extended input matrix to reflect the relationships between the j clusters. For instance, in the case of j mutually exclusive nodes, The input matrix reflects the dependency on logistic paths The Multilevel Transform

24
Motivations 1 Machine Learning 2 Multilevel framework 3 Multistep VM 4 Results and Conclusions 5 5 Contents

25
Standard scenario Production flow: sequence of steps; each step represents an operation that must be performed on a wafer in order to obtain a specific results Each step is performed by different equipment (composed by multiple chambers): The knowledge of which wafer is processed by a specific equipment is available (logistic information) The information about processed wafer (e.g. sensor readings and recipe setup) might be available On some equipments a single step VM system is already in place (estimated measures for each processed wafer are available)

26
Cascade Multistep VM This approach allow to build a pipe system in which the predictive information is propagated forward to concur to further model estimation. The generation of multilevel input matrix consists in replace j-th clusters process variables with j-th VM-j estimation

27
Cascade Multistep VM This approach allow to build a pipe system in which the predictive information is propagated forward to concur to further model estimation. The generation of multilevel input matrix consists in replace j-th clusters process variables with j-th VM-j estimation Pros: o Small overhead append to the input space o Computational effort very similar to single step VM case Cons: o Steps without single step VM must be excluded o There might be some information loss between two or more steps

28
Process and Logistic Multistep VM With this approach, all the relevant logistic, process and recipe information from all the considered steps is included in the input set In this case, the generation of input matrix fully follows the previous Multilevel Transform

29
Process and Logistic Multistep VM With this approach, all the relevant logistic, process and recipe information from all the considered steps is included in the input set In this case, the generation of input matrix fully follows the previous Multilevel Transform Pros: o Steps with no (or meaningless) measurements can be included o All the available information is provided to the learning algorithm Cons: o Input space dimension is significantly increased by this approach o More observations are needed to train the learning algorithm

30
Contents Motivations 1 Machine Learning 2 Multilevel framework 3 Multistep VM 4 Results and Conclusions 5 5

31
Production flow for methodologies validation: 1.Chemical Vapor Deposition (CVD) 2.Thermal Oxidation 3.Coating 4.Lithography Target: post-litho CDs Dataset: 583 wafers anonymized Hyper-parameter tuning: 10-fold crossvalidation Multistep VM setups: CVD-Litho Cascade CVD-Litho Process and Full Logistic Scenario

32
Cascade The cascade VM allows to further improve the VM performances using RR. This result might be related to the additional hidden knowledge provided by the intermediate CVD metrology prediction. The cascade approach performs worse with the LASSO. It should be noted that this is the only case in which the extended input space does not improve the predictive performances.

33
Process and Full Logistic Validation RMSE results for Ridge Regression: it is apparent how the full step choice allows to improve the predictive performances. LASSO is consistently outperformed by Ridge Regression in the dataset that was used for the experiment; nevertheless, the extended input space proves to be fruitful also in this case, with respect to the Lithography based approach.

34
Best Lasso and Best RR The best overall results for Ridge Regression are obtained with the cascade approach and by considering all the process steps. For the LASSO, the best overall results are obtained by considering the extended process values for all the involved steps.

35
Research and design of Multistep VM strategies targeted to specific semiconductor manufacturing needs Main features: Enhancing precision and accuracy of regular VM system Taking in account process without measurements Tests showed promising results; however, the strategy to be implemented must be carefully designed: Sample size and relevance of the steps are fundamental criteria to obtain the best performances Conclusions

36
www.themegallery.com Thanks for your attention! Authors: Simone Pampuri, University of Pavia, Italy Andrea Schirru, University of Pavia, Italy Gian Antonio Susto, University of Padova, Italy Cristina De Luca, Infineon Technologies AT, Austria Alessandro Beghi, University of Padova, Italy Giuseppe De Nicolao, University of Pavia, Italy Presenter: Simone Pampuri (University of Pavia, Italy)

Similar presentations

OK

Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.

Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google