Presentation on theme: "Approximate random distribution of coefficients of correlation for two random variates = 0.03 Under a normal approximation we can use Z-transformed score."— Presentation transcript:
Approximate random distribution of coefficients of correlation for two random variates = 0.03 Under a normal approximation we can use Z-transformed score for statistical infering. P( - < X < + ) = 68% P( - 1.65 < X < + 1.65 ) = 90% P( - 1.96 < X < + 1.96 ) = 95% P( - 2.58 < X < + 2.58 ) = 99% P( - 3.29 < X < + 3.29 ) = 99.9% The Fisherian significance levels The standard normal distribution Z is standard normally distributed Lecture 2 Randomization techniques
Countrysq.kmDeltaT Albania2874817 Andorra46815 Austria8387120 Azores22007 Baleary Islands501415 Belarus20765023 Belgium3052815 Bosnia and Herzegovina5119720 Bulgaria11097121 Canary Islands72705 Channel Is.30010 Corsica868013 Crete825913 Croatia5659421 Cyclades Is.250012 Cyprus925019 Czech Republic7886619 Denmark4309316 Dodecanese Is.266314 Estonia4522721 Faroe Is.13997 Finland33814523 France54396515 Franz Josef Land1613427 Germany35702119 Gibraltar6.510 Greece13199217 Hungary9305422 Iceland10300012 Ireland7027310 Italy30140116 Kaliningrad Region1500019 Latvia6462620 Liechtenstein16014 Lithuania6531822 Luxembourg258816 Macedonia2533923 Madeira(Funchal)7895 Malta31614 Moldova3370923 Monaco1.9512 ……… Average temperature difference in European countries/islands Permutation test probability Bootstrap probability Probability level Parameters and standard errors Consider the coefficient of correlation. Statistical significance of r > 0 (H1) is tested against the null hypothesis H0 of r = 0. Most statistics programs do this using Fisher’s Z- transformation Reshuffling
Permutation testing Random numberln arealn Delta TrSim rAverage r 0.24701283811.337042.8332130.4571760.148940.08609641=+ŚREDNIA(H2:H21) 0.30330087812.653212.708050.014534StdDev r 0.7256338339.9170452.9957320.1579970.16530152+ODCH.STANDARDOWE(H2:H21) 0.2582178570.6678291.945910.031033 0.6324518577.2435132.70805-0.14119tt 0.2545282927.6962133.1354940.26883910.0393331(H2-J2)/J4*20^0.5 0.98067160113.016922.708050.117112P(t) 0.52239627610.628252.9957320.1373614.9403E-09+ROZKŁAD.T(J7,19,2) 0.68354567411.087023.0445220.21447 0.7736487137.8872091.6094380.159525ZZ 0.35956251510.32642.302585-0.052512.24486312+(G2-H2)/J4 0.12813777812.688382.564949-0.23382P(Z) 0.57306191111.79052.5649490.0728880.03687629=ROZKŁAD.T(J12,19,2) 0.02542152212.785553.044522-0.04616 0.08730949211.427962.4849070.222467 0.201599219.1323792.944439-0.14329 0.43820855412.405192.944439-0.0572 0.57589352413.134272.7725890.449186 0.93117669410.14012.6390570.167553 0.030979310.671123.0445220.234201 0.03247278810.634321.94591 0.3522390019.0190593.135494 We reorder one of the variables at random (at least 1000 times) We calculate the mean, standard deviation, and the upper and lower confidence intervals. This gives us an estimate of how probable is the observed correlation.
The distribution of randomized correlation coefficients Observed value The distribution is not symmetric. We can’t use Z-transformed values (the normal approximation) We can’t use a t-test. Lower two-sided 1% confidence limit Upper two-sided 1% confidence limit We have to use the upper and lower probability levels. We get them directly from the random distribution Probability level for r = 0.457: P = 0.0006
Jackknifing The jackknifed standard error of the coefficient of variation
Bootstrapping Take the original values and calculate the parameter you need Take 1000 random samples of different size Calculate 1000 parameters from the bootstrap samples Compare the observed value with the parameters distribution and calulate the confidence limits for the observed value
We use at least 1000 random samples and calculate for each sample CV. The standard deviation of thses CV values is an estimate of the standard error of the original CV. The standard error of a distribution is identical to the standard deviation of the sample.
Bootstrap distribution The mean CV values are based on samples of different size. The scores are therefore of different value. We have to use weighed averages
Null models Darwin finch Photo:Guardian Unlimited Do the beak length of Darwin finches as a measure of resource usage differ more or less than expected just by chance? The classical method to answer this question is to compare the observed variance in beak length differences with those obtained from a random draw of beak length inside the observed range (smallest and largest beak size being fixed). This is a null model approach We test whether this null model approach is reliable
We have randomly assigned beak length of 20 species measured in mm P (H 0 ) = 21/1000 = 0.021 The null distribution gives us directly the H 0 probability. Observed variance Randomized variances
Meningitis in Europe Distribution of forests in Europe Is the probability of Meningitis infection correlated to the distribution of forests in Europe? We use a grid aproach We use the corefficient of correlation between the entries of both grids R = 0.06; P(R=0) > 0.1. The distance between the sites might be of importance.
Meningitis in Europe Distribution of forests in Europe We reshuffle rows and columns only to get the null model distribution. P (H 0 ) = 26/1000 = 0.026
Mantel test Coefficient of correlation between matrix entries For convenience we use Z- transformed data The Mantel test is a test for the correlation between two distance matrices. It tests whether distances are correlated.