Bootstrapping (non-parametric)

Slides:



Advertisements
Similar presentations
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Advertisements

Chapter 4 Sampling Distributions and Data Descriptions.
Measures of Location and Dispersion
Fill in missing numbers or operations
EuroCondens SGB E.
Lecture Slides Elementary Statistics Eleventh Edition
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
STATISTICS Linear Statistical Models
STATISTICS HYPOTHESES TEST (I)
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Detection of Hydrological Changes – Nonparametric Approaches
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Variance Estimation in Complex Surveys Third International Conference on Establishment Surveys Montreal, Quebec June 18-21, 2007 Presented by: Kirk Wolter,
BUS 220: ELEMENTARY STATISTICS
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
CALENDAR.
Copyright © 2010 Pearson Education, Inc. Slide
0 - 0.
Addition Facts
Year 6 mental test 5 second questions
Overview of Lecture Parametric vs Non-Parametric Statistical Tests.
C82MST Statistical Methods 2 - Lecture 2 1 Overview of Lecture Variability and Averages The Normal Distribution Comparing Population Variances Experimental.
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
1 Contact details Colin Gray Room S16 (occasionally) address: Telephone: (27) 2233 Dont hesitate to get in touch.
Around the World AdditionSubtraction MultiplicationDivision AdditionSubtraction MultiplicationDivision.
Learning to show the remainder
1 Session 7 Standard errors, Estimation and Confidence Intervals.
STATISTICAL INFERENCE ABOUT MEANS AND PROPORTIONS WITH TWO POPULATIONS
The 5S numbers game..
Chi Square Interpretation. Examples of Presentations The following are examples of presentations of chi-square tables and their interpretations. These.
Year 6/7 mental test 5 second questions
Solve Multi-step Equations
Sampling in Marketing Research
The basics for simulations
Operations Management For Competitive Advantage © The McGraw-Hill Companies, Inc., 2001 C HASE A QUILANO J ACOBS ninth edition 1 Strategic Capacity Management.
ABC Technology Project
Mental Math Math Team Skills Test 20-Question Sample.
Review bootstrap and permutation
Chapter 10 Estimating Means and Proportions
Statistics Review – Part I
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
Sets Sets © 2005 Richard A. Medeiros next Patterns.
TCCI Barometer September “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
When you see… Find the zeros You think….
Module 17: Two-Sample t-tests, with equal variances for the two populations This module describes one of the most utilized statistical tests, the.
Before Between After.
Addition 1’s to 20.
25 seconds left…...
Test B, 100 Subtraction Facts
Week 1.
We will resume in: 25 Minutes.
Static Equilibrium; Elasticity and Fracture
Converting a Fraction to %
Chapter Thirteen The One-Way Analysis of Variance.
Chapter 18: The Chi-Square Statistic
A SMALL TRUTH TO MAKE LIFE 100%
Experimental Design and Analysis of Variance
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
1 Chapter 20: Statistical Tests for Ordinal Data.
Multiple Regression and Model Building
January Structure of the book Section 1 (Ch 1 – 10) Basic concepts and techniques Section 2 (Ch 11 – 15): Inference for quantitative outcomes Section.
9. Two Functions of Two Random Variables
Commonly Used Distributions
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
Tree Evaluation Tree Evaluation. Tree Evaluation A question often asked of a data set is whether it contains ‘significant cladistic structure’, that is.
Processing & Testing Phylogenetic Trees. Rooting.
Processing & Testing Phylogenetic Trees. Rooting.
Assessing Phylogenetic Hypotheses and Phylogenetic Data
Assessing Phylogenetic Hypotheses and Phylogenetic Data
Presentation transcript:

Bootstrapping (non-parametric) Bootstrapping is a modern statistical technique that uses computer intensive random resampling of data to determine sampling error or confidence intervals for some estimated parameter

Bootstrapping (non-parametric) Characters are resampled with replacement to create many bootstrap replicate data sets Each bootstrap replicate data set is analysed (e.g. with parsimony, distance, ML) Agreement among the resulting trees is summarized with a majority-rule consensus tree Frequency of occurrence of groups, bootstrap proportions (BPs), is a measure of support for those groups Additional information is given in partition tables

Bootstrapping Original data matrix Resampled data matrix Characters Characters Taxa 1 2 3 4 5 6 7 8 Taxa 1 2 2 5 5 6 6 8 Summarise the results of multiple analyses with a majority-rule consensus tree Bootstrap proportions (BPs) are the frequencies with which groups are encountered in analyses of replicate data sets A R R Y Y Y Y Y Y A R R R Y Y Y Y Y B R R Y Y Y Y Y Y B R R R Y Y Y Y Y C Y Y Y Y Y R R R C Y Y Y Y Y R R R D Y Y R R R R R R D Y Y Y R R R R R Outgp R R R R R R R R Outgp R R R R R R R R Randomly resample characters from the original data with replacement to build many bootstrap replicate data sets of the same size as the original - analyse each replicate data set A B C D A B C D A B C D 1 5 2 1 5 96% 8 2 7 8 2 6 6 66% 6 2 5 4 1 3 Outgroup Outgroup Outgroup

Bootstrapping - an example Partition Table Ciliate SSUrDNA - parsimony bootstrap Ochromonas (1) 123456789 Freq ----------------- .**...... 100.00 ...**.... 100.00 .....**.. 100.00 ...****.. 100.00 ...****** 95.50 .......** 84.33 ...****.* 11.83 ...*****. 3.83 .*******. 2.50 .**....*. 1.00 .**.....* 1.00 Symbiodinium (2) 100 Prorocentrum (3) Euplotes (8) 84 Tetrahymena (9) 96 Loxodes (4) 100 Tracheloraphis (5) 100 Spirostomum (6) 100 Gruberia (7) Majority-rule consensus

Bootstrapping - random data Partition Table Randomly permuted data - parsimony bootstrap 123456789 Freq ----------------- .*****.** 71.17 ..**..... 58.87 ....*..*. 26.43 .*......* 25.67 .***.*.** 23.83 ...*...*. 21.00 .*..**.** 18.50 .....*..* 16.00 .*...*..* 15.67 .***....* 13.17 ....**.** 12.67 ....**.*. 12.00 ..*...*.. 12.00 .**..*..* 11.00 .*...*... 10.80 .....*.** 10.50 .***..... 10.00 Majority-rule consensus (with minority components)

Bootstrap - interpretation Bootstrapping was introduced as a way of establishing confidence intervals for phylogenies This interpretation of bootstrap proportions (BPs) depends on the assumption that the original data is a random sample from a much larger set of independent and identically distributed data However, several things complicate this interpretation Perhhaps the assumptions are unreasonable - making any statistical interpretation of BPs invalid Some theoretical work indicates that BPs are very conservative, and may underestimate confidence intervals - problem increases with numbers of taxa BPs can be high for incongruent relationships in separate analyses - and can therefore be misleading (misleading data -> misleading BPs) with parsimony it may be highly affected by inclusion or exclusion of only a few characters

Bootstrap - interpretation Bootstrapping is a very valuable and widely used technique - it (or some suitable) alternative is demanded by some journals, but it may require a pragmatic interpretation: BPs depend on two aspects of the support for a group - the numbers of characters supporting a group and the level of support for incongruent groups BPs thus provides an index of the relative support for groups provided by a set of data under whatever interpretation of the data (method of analysis) is used