# Bootstrapping (non-parametric)

## Presentation on theme: "Bootstrapping (non-parametric)"— Presentation transcript:

Bootstrapping (non-parametric)
Bootstrapping is a modern statistical technique that uses computer intensive random resampling of data to determine sampling error or confidence intervals for some estimated parameter

Bootstrapping (non-parametric)
Characters are resampled with replacement to create many bootstrap replicate data sets Each bootstrap replicate data set is analysed (e.g. with parsimony, distance, ML) Agreement among the resulting trees is summarized with a majority-rule consensus tree Frequency of occurrence of groups, bootstrap proportions (BPs), is a measure of support for those groups Additional information is given in partition tables

Bootstrapping Original data matrix Resampled data matrix Characters Characters Taxa Taxa Summarise the results of multiple analyses with a majority-rule consensus tree Bootstrap proportions (BPs) are the frequencies with which groups are encountered in analyses of replicate data sets A R R Y Y Y Y Y Y A R R R Y Y Y Y Y B R R Y Y Y Y Y Y B R R R Y Y Y Y Y C Y Y Y Y Y R R R C Y Y Y Y Y R R R D Y Y R R R R R R D Y Y Y R R R R R Outgp R R R R R R R R Outgp R R R R R R R R Randomly resample characters from the original data with replacement to build many bootstrap replicate data sets of the same size as the original - analyse each replicate data set A B C D A B C D A B C D 1 5 2 1 5 96% 8 2 7 8 2 6 6 66% 6 2 5 4 1 3 Outgroup Outgroup Outgroup

Bootstrapping - an example
Partition Table Ciliate SSUrDNA - parsimony bootstrap Ochromonas (1) Freq .** ...** .....** ...**** ...****** ** ...****.* ...***** .******* .**....* .**.....* Symbiodinium (2) 100 Prorocentrum (3) Euplotes (8) 84 Tetrahymena (9) 96 Loxodes (4) 100 Tracheloraphis (5) 100 Spirostomum (6) 100 Gruberia (7) Majority-rule consensus

Bootstrapping - random data
Partition Table Randomly permuted data - parsimony bootstrap Freq .*****.** ..** ....*..* .*......* .***.*.** ...*...* .*..**.** .....*..* .*...*..* .***....* ....**.** ....**.* ..*...* .**..*..* .*...* .....*.** .*** Majority-rule consensus (with minority components)

Bootstrap - interpretation
Bootstrapping was introduced as a way of establishing confidence intervals for phylogenies This interpretation of bootstrap proportions (BPs) depends on the assumption that the original data is a random sample from a much larger set of independent and identically distributed data However, several things complicate this interpretation Perhhaps the assumptions are unreasonable - making any statistical interpretation of BPs invalid Some theoretical work indicates that BPs are very conservative, and may underestimate confidence intervals - problem increases with numbers of taxa BPs can be high for incongruent relationships in separate analyses - and can therefore be misleading (misleading data -> misleading BPs) with parsimony it may be highly affected by inclusion or exclusion of only a few characters

Bootstrap - interpretation
Bootstrapping is a very valuable and widely used technique - it (or some suitable) alternative is demanded by some journals, but it may require a pragmatic interpretation: BPs depend on two aspects of the support for a group - the numbers of characters supporting a group and the level of support for incongruent groups BPs thus provides an index of the relative support for groups provided by a set of data under whatever interpretation of the data (method of analysis) is used