Statistical Issues at LHCb Yuehong Xie University of Edinburgh (on behalf of the LHCb Collaboration) PHYSTAT-LHC Workshop CERN, Geneva June 27-29, 2007.

Slides:



Advertisements
Similar presentations
Experimental Particle Physics PHYS6011 Joel Goldstein, RAL 1.Introduction & Accelerators 2.Particle Interactions and Detectors (2) 3.Collider Experiments.
Advertisements

Computing and Statistical Data Analysis / Stat 4
Investigation on Higgs physics Group Ye Li Graduate Student UW - Madison.
14 Sept 2004 D.Dedovich Tau041 Measurement of Tau hadronic branching ratios in DELPHI experiment at LEP Dima Dedovich (Dubna) DELPHI Collaboration E.Phys.J.
Measurements of sin2  from B-Factories Masahiro Morii Harvard University The BABAR Collaboration BEACH 2002, Vancouver, June 25-29, 2002.
Jörgen Sjölin Stockholm University LHC experimental sensitivity to CP violating gtt couplings November, 2002 Page 1 Why CP in gtt? Standard model contribution.
8. Statistical tests 8.1 Hypotheses K. Desch – Statistical methods of data analysis SS10 Frequent problem: Decision making based on statistical information.
16 May 2002Paul Dauncey - BaBar1 Measurements of CP asymmetries and branching fractions in B 0   +  ,  K +  ,  K + K  Paul Dauncey Imperial College,
Pakhlov Pavel (ITEP, Moscow) Why B physics is still interesting Belle detector Measurement of sin2  Rare B decays Future plans University of Lausanne.
Particle Identification in the NA48 Experiment Using Neural Networks L. Litov University of Sofia.
Measurements of Radiative Penguin B Decays at BaBar Jeffrey Berryhill University of California, Santa Barbara For the BaBar Collaboration 32 nd International.
Heavy Flavor Production at the Tevatron Jennifer Pursley The Johns Hopkins University on behalf of the CDF and D0 Collaborations Beauty University.
Optimization of Signal Significance by Bagging Decision Trees Ilya Narsky, Caltech presented by Harrison Prosper.
Search for B     with SemiExclusive reconstruction C.Cartaro, G. De Nardo, F. Fabozzi, L. Lista Università & INFN - Sezione di Napoli.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Chris Barnes, Imperial CollegeWIN 2005 B mixing at DØ B mixing at DØ WIN 2005 Delphi, Greece Chris Barnes, Imperial College.
The BaBarians are coming Neil Geddes Standard Model CP violation BaBar Sin2  The future.
Radiative Leptonic B Decays Edward Chen, Gregory Dubois-Felsmann, David Hitlin Caltech BaBar DOE Presentation Aug 10, 2005.
Summary of Recent Results on Rare Decays of B Mesons from BaBar for the BaBar Collaboration Lake Louise Winter Institute Chateau Lake Louise February.
Beauty 2006 R. Muresan – Charm 1 Charm LHCb Raluca Mureşan Oxford University On behalf of LHCb collaboration.
Donatella Lucchesi1 B Physics Review: Part II Donatella Lucchesi INFN and University of Padova RTN Workshop The 3 rd generation as a probe for new physics.
Luca Lista L.Lista INFN Sezione di Napoli Rare and Hadronic B decays in B A B AR.
B S Mixing at Tevatron Donatella Lucchesi University and INFN of Padova On behalf of the CDF&D0 Collaborations First Workshop on Theory, Phenomenology.
G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem 2Random variables and.
Heavy charged gauge boson, W’, search at Hadron Colliders YuChul Yang (Kyungpook National University) (PPP9, NCU, Taiwan, June 04, 2011) June04, 2011,
Measurement of CP violation with the LHCb experiment at CERN
Max Baak1 Impact of Tag-side Interference on Measurement of sin(2  +  ) with Fully Reconstructed B 0  D (*)  Decays Max Baak NIKHEF, Amsterdam For.
1 ZH Analysis Yambazi Banda, Tomas Lastovicka Oxford SiD Collaboration Meeting
QCD 与强子物理研讨会, 2010 年 8 月 4 - 10 日,威海 Prospects of Flavor Physics at the LHC 高原宁 清华大学高能物理研究中心 2010/8/61Y. Gao, Prospects of flavor physics at the LHC.
Irakli Chakaberia Final Examination April 28, 2014.
Matthew Schwartz Harvard University with J. Gallicchio, PRL, 105:022001,2010 (superstructure) with K. Black, J. Gallicchio, J. Huth, M. Kagan and B. Tweedie.
CP violation measurements with the ATLAS detector E. Kneringer – University of Innsbruck on behalf of the ATLAS collaboration BEACH2012, Wichita, USA “Determination.
E. Devetak – CERN CLIC1 LCFI: vertexing and flavour-tagging Erik Devetak Oxford University CERN-CLIC Meeting 14/05/09 Vertexing Flavour Tagging Charge.
Gavril Giurgiu, Carnegie Mellon, FCP Nashville B s Mixing at CDF Frontiers in Contemporary Physics Nashville, May Gavril Giurgiu – for CDF.
KIRTI RANJANDIS, Madison, Wisconsin, April 28, Top Quark Production Cross- Section at the Tevatron Collider On behalf of DØ & CDF Collaboration KIRTI.
Sensitivity Prospects for Light Charged Higgs at 7 TeV J.L. Lane, P.S. Miyagawa, U.K. Yang (Manchester) M. Klemetti, C.T. Potter (McGill) P. Mal (Arizona)
1 Rare Bottom and Charm Decays at the Tevatron Dmitri Tsybychev (SUNY at Stony Brook) On behalf of CDF and D0 Collaborations Flavor Physics and CP-Violation.
11 th July 2003Daniel Bowerman1 2-Body Charmless B-Decays at B A B AR and BELLE Physics at LHC Prague Daniel Bowerman Imperial College 11 th July 2003.
Radiative penguins at hadron machines Kevin Stenson University of Colorado.
CP-Violating Asymmetries in Charmless B Decays: Towards a measurement of  James D. Olsen Princeton University International Conference on High Energy.
Study of exclusive radiative B decays with LHCb Galina Pakhlova, (ITEP, Moscow) for LHCb collaboration Advanced Study Institute “Physics at LHC”, LHC Praha-2003,
Prospects for B  hh at LHCb Eduardo Rodrigues On behalf of the LHCb Collaboration CKM2008 Workshop, Rome, 9-13 September 2008 LHCb.
Measurements of Top Quark Properties at Run II of the Tevatron Erich W.Varnes University of Arizona for the CDF and DØ Collaborations International Workshop.
LHCb: Xmas 2010 Tara Shears, On behalf of the LHCb group.
CHARM MIXING and lifetimes on behalf of the BaBar Collaboration XXXVIIth Rencontres de Moriond  March 11th, 2002 at Search for lifetime differences in.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Study of pair-produced doubly charged Higgs bosons with a four muon final state at the CMS detector (CMS NOTE 2006/081, Authors : T.Rommerskirchen and.
1 Searching for Z’ and model discrimination in ATLAS ● Motivations ● Current limits and discovery potential ● Discriminating variables in channel Z’ 
1 Introduction to Statistics − Day 2 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
RECENT RESULTS FROM THE TEVATRON AND LHC Suyong Choi Korea University.
E. Devetak - IOP 081 Anomalous – from tools to physics Erik Devetak Oxford - RAL IOP 2008 Lancaster‏ Anomalous coupling (Motivation – Theory)
G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and.
Guglielmo De Nardo for the BABAR collaboration Napoli University and INFN ICHEP 2010, Paris, 23 July 2010.
Shawn Hayes Cohorts: Tianqi Tang, Molly Herman Higgs Analysis for CLIC at 380 GeV simulations.
Single Top Quark Production at D0, L. Li (UC Riverside) EPS 2007, July Liang Li University of California, Riverside On Behalf of the DØ Collaboration.
Studies of the Higgs Boson at the Tevatron Koji Sato On Behalf of CDF and D0 Collaborations 25th Rencontres de Blois Chateau Royal de Blois, May 29, 2013.
Low Mass Standard Model Higgs Boson Searches at the Tevatron Andrew Mehta Physics at LHC, Split, Croatia, September 29th 2008 On behalf of the CDF and.
First Evidence for Electroweak Single Top Quark Production
Theoretical Motivation Analysis Procedure Systematics Results
SUSY Particle Mass Measurement with the Contransverse Mass Dan Tovey, University of Sheffield 1.
Measurement of the phase of Bs mixing with Bs ϕϕ
Martin Heck, for the CDF II collaboration
Search for WHlnbb at the Tevatron DPF 2009
Heavy Flavor Results from CMS
Multi-dimensional likelihood
B  at B-factories Guglielmo De Nardo Universita’ and INFN Napoli
CP violation in Bs→ff and Bs → K*0K*0
Measurement of the Single Top Production Cross Section at CDF
Northern Illinois University / NICADD
Presentation transcript:

Statistical Issues at LHCb Yuehong Xie University of Edinburgh (on behalf of the LHCb Collaboration) PHYSTAT-LHC Workshop CERN, Geneva June 27-29, 2007

2 LHCb: a dedicated B physics experiment at LHC

3 Physicists know where to look for new physics in flavour sector LHCb will look for effects of new physics in CP violation and rare phenomena in B meson decays LHCb will look for effects of new physics in CP violation and rare phenomena in B meson decays –CMS and ATLAS will search for new particles directly produced In the Standard Model quark flavour mixing and CP violation are fully determined by the CKM matrix with four parameters In the Standard Model quark flavour mixing and CP violation are fully determined by the CKM matrix with four parameters –Over-constraining the CKM matrix is a stringent test of the SM –Any inconsistency will mean new source of flavour mixing and CP violation FCNC (flavour changing neutral currents) are forbidden at tree level in the SM but new physics can have significant effects in FCNC processes FCNC (flavour changing neutral currents) are forbidden at tree level in the SM but new physics can have significant effects in FCNC processes –Comparing asymmetry and/or rate measurements with their SM predictions in FCNC processes is a sensitive test of the SM –Any discrepancy will indicate presence of new physics particles in FCNC processes

4 Statisticians know how to quantify new physics effects In the language of statistics, LHCb will perform hypothesis testing In the language of statistics, LHCb will perform hypothesis testing The null hypothesis: the SM is valid at the energy scale relevant to B meson decays The null hypothesis: the SM is valid at the energy scale relevant to B meson decays No alternative hypothesis is given explicitly, but rejecting the SM means new physics is needed to describe B meson decays No alternative hypothesis is given explicitly, but rejecting the SM means new physics is needed to describe B meson decays What LHCb need do is What LHCb need do is –Identify a test-statistic which has high power to separate the SM and potential NP models –Measure the test-statistic from data –Evaluate the extreme probability in the null hypothesis, called –Evaluate the extreme probability in the null hypothesis, called p-value –If the p-value is too small, reject the null hypothesis –Otherwise go for another test-statistic and repeat the test

5 Where statistics/statisticians can help B physics/physicists? B flavour tagging on a statistical basis B flavour tagging on a statistical basis Separating signal/background events Separating signal/background events Data modeling and fitting Data modeling and fitting Setting confidence intervals and limits Setting confidence intervals and limits Controlling and treating systematics Controlling and treating systematics Optimizing analyses Optimizing analyses Test of the SM Test of the SM Providing analysis tools Providing analysis tools

6 Flavour tagging CP violation measurements with neutral B decays need to know the flavour of the B at production CP violation measurements with neutral B decays need to know the flavour of the B at production Information is carried by taggers Information is carried by taggers –Charge of the particles accompanying the signal B at production: same side tagging –Charge of the ±, e ± or K ± from the decay of the opposite B hadron: lepton and kaon tagging –Weighted sum of charges of all particles found to be compatible with being from the opposite B decay: vertex charge tagging Tagging result of a signal B is a decision on statistical basis with Tagging result of a signal B is a decision on statistical basis with –Average tagging efficiency typically ~ –Average mistag probability N W /(N W +N R ), typically % –Average statistical power (1-2 ) 2, typically 4-10% How to maximize statistical power? How to maximize statistical power? –Require appropriate statistical methods

7 Statistical issues in flavour tagging Neural net used to get event-by-event mistag of each tagger. Performances depend on the way the NNs work and the way they are trained Neural net used to get event-by-event mistag of each tagger. Performances depend on the way the NNs work and the way they are trained Treatment of correlation between vertex charge and other taggers is non-trivial Treatment of correlation between vertex charge and other taggers is non-trivial –The other taggers may be included in the vertex –Compromise between correct handling of correlations and available statistics to get properties of sub-samples Hard assignment of particles to vertices causes loss of statistical power. Need to investigate probability-based assignment of particles to vertices Hard assignment of particles to vertices causes loss of statistical power. Need to investigate probability-based assignment of particles to vertices

8 Separating signal/background events A demanding task in a hadron machine experiment A demanding task in a hadron machine experiment –After trigger the ratio of inclusive bbbar background to signal in a typical channel is at the million level, even bigger for very rare decays –In each bbbar event there are not only the two B hadrons but also ~100 tracks from pp interactions Information available Information available –PID –Kinematical: momentum, invariant masses –Geometrical: vertex 2, event topology –More … Typically variables to look at, each alone with limited separation power Typically variables to look at, each alone with limited separation power –Cut-based analysis not optimal for statistical precision –Multivariate analysis more powerful but also more difficult for understanding systematics –Need trade-off between precision and accuracy

9 Multivariate analysis Essentially about how to construct a best test statistic from many input variables for a hypothesis test Essentially about how to construct a best test statistic from many input variables for a hypothesis test Neyman-Pearson-Lemma: likelihood ratio is the best Neyman-Pearson-Lemma: likelihood ratio is the best –The method of representing PDFs by multi-dimensional histograms using Monte Carlo data becomes impractical when the dimension of the PDFs is too big The alternatives is to construct estimators to approach the likelihood ratio The alternatives is to construct estimators to approach the likelihood ratio –Decorrelated likelihood method –Linear estimator: Fishers discriminants, … –Nonlinear estimator: neural networks, … –Boosted decision trees –…–…–…–… An implementation: TMVA (Toolkit for MultiVariate data Analysis) (arXiv:physics/ ) An implementation: TMVA (Toolkit for MultiVariate data Analysis) (arXiv:physics/ ) Application in LHCb: SM forbidden B s e Application in LHCb: SM forbidden B s e ± analysis – –High efficiency has higher priority than small systematics

10 Application of TMVA in B s e Application of TMVA in B s e ± Two phases Two phases –Training –Application Leave variables with clear separation power outside TMVA and cut on them Leave variables with clear separation power outside TMVA and cut on them 5 input variables in TMVA: no linear correlation expected 5 input variables in TMVA: no linear correlation expected Winner: decorrelated likelihood method Winner: decorrelated likelihood method –As Neyman-Peasrson Lemma tells us? Our interest is here Signal eff. (%) N bg /fb -1

11 Overtraining with TMVA TMVA has a mechanism to monitor overtraining using two independent training and testing samples TMVA has a mechanism to monitor overtraining using two independent training and testing samples Way to control overtraining in early phase is desirable Way to control overtraining in early phase is desirable

12 Data modeling and fitting Maximum likelihood fit is generally used in B physics measurements Maximum likelihood fit is generally used in B physics measurements Modeling and fitting made easy with RooFit ( Modeling and fitting made easy with RooFit ( LHCb can benefit from this package and wishes to LHCb can benefit from this package and wishes to –Have a goodness-of-fit for unbinned maximum likelihood fit implemented –Understand how to speed up toy event generation for complicated PDF –Understand how to make fit converge if a non- factorizable multi-dimensional PDF has no analytical normalization and can only rely on numerical integration

13 Confidence intervals/limits As in all other experiments, we need to quote confidence intervals/limits for all measurements As in all other experiments, we need to quote confidence intervals/limits for all measurements The issue is especially important when working on very rare decays with small signal and huge background The issue is especially important when working on very rare decays with small signal and huge background –significant signals: intervals to establish discrepancy with SM: –Insignificant signals: limits to exclude some new physics models An example: B s sensitivity An example: B s sensitivity –BR(B s ) SM = 3.4 x but enhanced in some new physics models –Exclusion limit: confidence limit for the signal decay branching ratio when the generated N events in an Monte Carlo experiment are all background, a measure of sensitivity

14 Exclusion limit for BR(B s Exclusion limit for BR(B s Step 1: construct geometrical, muon-id and invariant mass likelihood ratios between signal and background hypotheses for each event Step 1: construct geometrical, muon-id and invariant mass likelihood ratios between signal and background hypotheses for each event –decorrelated likelihood method Step 2: divide the 3D space into a number of bins and count events d i in each bin Step 2: divide the 3D space into a number of bins and count events d i in each bin –no cut and N-counting Step 3: estimate number of expected background events b i and signal events s i (for each assumed branching ratio) in each bin Step 3: estimate number of expected background events b i and signal events s i (for each assumed branching ratio) in each bin Step 4: construct a total likelihood ratio between the signal+background and background-only hypotheses for the whole configuration Step 4: construct a total likelihood ratio between the signal+background and background-only hypotheses for the whole configuration

15 Exclusion limit for BR(B s Exclusion limit for BR(B s Step 5: evaluate the of each hypothesis Step 5: evaluate the p-value of each hypothesis –Signal+background: probability(X<X obs ) –Background-only: probability(X>X obs ) Step 6: compute CL s (Thomas Junk, CERN-EP/99-041) Step 6: compute CL s (Thomas Junk, CERN-EP/99-041) Step 7: make statistical statement: Step 7: make statistical statement: If CL s (BR) <, the assumed BR is excluded at 1- confidence level If CL s (BR) <, the assumed BR is excluded at 1- confidence level

16 BR(B s - ) BR(B s - ) exclusion limit results

17 Exclusion limit for BR(B s Exclusion limit for BR(B s The combination of various techniques is shown to work better than the cut-and-count method The combination of various techniques is shown to work better than the cut-and-count method Statistics Review of PDG2006 claims the CL s limit is conservative and the confidence level is under- estimated Statistics Review of PDG2006 claims the CL s limit is conservative and the confidence level is under- estimated –The usual procedure requires the, not the CL s, of a hypothesis to be smaller than to exclude it at 1- confidence level –The usual procedure requires the p-value, not the CL s, of a hypothesis to be smaller than to exclude it at 1- confidence level What is the common understanding of how to set better limits in circumstance of insignificant signal? What is the common understanding of how to set better limits in circumstance of insignificant signal?

18 Controlling systematics Systematics arise from incorrect modeling of detector and/or background effects Systematics arise from incorrect modeling of detector and/or background effects Need delicate statistical methods to acquire these effects from real data and model them Need delicate statistical methods to acquire these effects from real data and model them Example: two methods to deal with efficiency as a function of decay time (t) in time-dependent analysis Example: two methods to deal with efficiency as a function of decay time (t) in time-dependent analysis –Per-event i (t): not covered in this talk –Normalization trick: –Normalization trick: Described by Stéphane TJampens (French thesis)

19 Normalization trick Factorized Signal PDF Factorized Signal PDF – A: physical parameters – t: decay time – position in phase space Likelihood Likelihood Maximization Maximization –Integrated factor i obtained using MC simulation –No need for specific shape of (t, )

20 Fitting with background Unable to know the ideal background distributions w/o detector effect from physical law Unable to know the ideal background distributions w/o detector effect from physical law Solution: use pseudo-log- likelihood to avoid need of background distributions Solution: use pseudo-log- likelihood to avoid need of background distributions Phy.Rew. D71(2005) Phy.Rew. D71(2005) N sig /N b : number of events in the signal/background region N sb : number of expected background events in the signal region

21 Fitting with background Errors return by Minuit are too optimistic Errors return by Minuit are too optimistic The true variance of a fit parameter is given by The true variance of a fit parameter is given by Three items Three items –One from signal events in signal region –One from background events in signal region –One from events in background region, vanishes as N b –One from events in background region, vanishes as N b No need for background PDF No need for background PDF (Private communication with (Private communication with Joe Boudreau)

22 Estimating systematics Every effort has been made to model the detector/background effect correctly Every effort has been made to model the detector/background effect correctly However, have to use some assumptions However, have to use some assumptions –E.g., background events in sideband and signal region have same properties Also require good agreement of MC data and real data Also require good agreement of MC data and real data –Not everything can be obtained from real data What is the proper procedure to estimate systematic errors in the treatment of detector/background effects so that at least different people will come to consistent estimates of the systematic uncertainties if the same analysis method is used? What is the proper procedure to estimate systematic errors in the treatment of detector/background effects so that at least different people will come to consistent estimates of the systematic uncertainties if the same analysis method is used? –What is the rule to set e.g. 1- systematic error? Vary what quantities to obtain it? By how much? –What is the statistical meaning of 1- systematic error?

23 Analysis optimization What is the target function to optimize in order to obtain the best statistical precision for CP measurements? What is the target function to optimize in order to obtain the best statistical precision for CP measurements? –Signal events are not with equal weights –Events with smaller mistag probability and better time resolution contribute more

24 SM test: CKM fit Different measurements of the sides and angles of the triangle should be consistent Different measurements of the sides and angles of the triangle should be consistent –UT Fit: Bayesian –CKM Fitter: frequentist –Which one better serves this purpose? Questions Questions –How to deal with theoretical uncertainties? Do they have frequentist properties? –How do we know if an inconsistency is due to NP or underestimated systematics and theoretical uncertainties? Should give global 2 as measure of agreement with SM

25 SM test: rare decays? LHCb will measure many rare decays LHCb will measure many rare decays Individually they are all good probes of NP Individually they are all good probes of NP How to combine them to get best sensitivity? How to combine them to get best sensitivity? Not a easy job Not a easy job –The SM relations between these quantities are not explicitly given –SM prediction for each of them is with big uncertainty Need a lot of work in physics Need a lot of work in physics –Understand the correlations between the SM predictions from physics and quantify the correlations using an error matrix Also some thinking in statistics Also some thinking in statistics –Construct a test statistic using the measurements and their SM predictions, latter (or both) with correlated errors

26 Analysis tools Tools we have used Tools we have used –Data storage and general processing / Root –Minimization / Minuit –Data modeling and fitting / RooFit –Multivariate analysis / TMVA –Data unfolding / sPlot –Neural networks Tools we may need Tools we may need –Frequentist limit setting tool –Bayesian analysis tool –Reliable numerical multi-dimensional integrator

27 Wish-list A well supported tool for data modeling and fitting, which can handle general multi-dimensional problems numerically A well supported tool for data modeling and fitting, which can handle general multi-dimensional problems numerically Better understanding of how to do multivariate analysis Better understanding of how to do multivariate analysis Better understanding of how to treat systematics and theoretical uncertainties in SM test Better understanding of how to treat systematics and theoretical uncertainties in SM test New statistical methods to control systematics using real data New statistical methods to control systematics using real data New statistical methods to improve flavour tagging New statistical methods to improve flavour tagging Better understanding of how to set confidence limits in case of insignificant signal Better understanding of how to set confidence limits in case of insignificant signal Statisticians recommendation on statistical procedures in data analysis Statisticians recommendation on statistical procedures in data analysis And most importantly a successful LHC(b)!

28 spare slides

29

30

31 Decorrelated likelihood method used for B s Decorrelate the input variables for signal and background separately A very similar method is described by Dean Karlen, Computers in Physics Vol 12, N.4, Jul/Aug 1998 s1 s2 s3. sn b1 b2 b3. bn x1 x2 x3. xn n input variables (IP, DOCA…) n variables for signal independent and Gaussian distributed χ 2 S = Σ s i 2 same for background χ 2 B = Σ b i 2 -2 ln(L sig /L bg ) = χ 2 S - χ 2 B Discriminating variable:

32 Lessons learned with TMVA Overtraining is a general problem in this kind of analysis when the training sample has low statistics Overtraining is a general problem in this kind of analysis when the training sample has low statistics TMVA splits the sample into two TMVA splits the sample into two –One for training and one for test and evaluation Would it make more sense to use three independent samples? Would it make more sense to use three independent samples? –Sample A for training –Sample B for deciding when to stop training by looking at the performance difference between A and B –Sample C for evaluation of performance –This is because correlation may have been introduced between A and B when B is used to decide when to stop training