Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probability and Statistics in Geology Probability and statistics are an important aspect of Earth Science. Understanding the details, population of a data.

Similar presentations


Presentation on theme: "Probability and Statistics in Geology Probability and statistics are an important aspect of Earth Science. Understanding the details, population of a data."— Presentation transcript:

1 Probability and Statistics in Geology Probability and statistics are an important aspect of Earth Science. Understanding the details, population of a data sample How rounded are these pebbles ? Where did they come from ? How likely is an earthquake here in Northridge ?

2 Probability and Statistics in Geology Statistics Histograms Probability Error Analysis Regression ] Discuss next week

3 Are Statistics Always Right ? Can They be Misleading ? Toss a coin 6 times....”Heads or tails ? ” What is most unlikely ? What is more likely ?.....six tails.....3 heads and 3 tails So is HTHTHT more likely than TTTTTT ?...No, both are equally unlikely!

4 Are Statistics Always Right ? Can They be Misleading ? The result 3 heads and 3 tails is more likely only because There are many combinations where this can occur (e.g. HHTHTT, or HTHTTH, or HHHTTT...) Let's try it...

5 What is a Statistic ? Is this a statistic ? “ In 1970, the oil refining capacity of Belgium was 32.6 million tonnes per year” This is actually, just a fact – not a statistic

6 What is a Statistic ? Consider a pebbly beach How could you determine the composition, mass, length, shape of these particular pebbles ? Would these sizes be the same on every beach ?

7 What is a Statistic ? - Specimen Let's pick up a pebble and look at it – this is a specimen This pebble could probably give us the composition but would it be inclusive of all the pebbles ? Is it typical ? How could be improve this specimen ?

8 What is a Statistic ? - Sample We could pick up 100 pebbles, this is a sample from the beach This should give you a much better idea of your beach rocks Could we do any better ?

9 What is a Statistic ? - Population Or we could sample ALL the pebbles on the beach! This is the population of all pebbles Now measure the composition, size, shape, of each Is this a realistic plan ?

10 What is a Statistic ? - Population Specimen: One object Sample:A subset number of objects Population:All the objects These terms are often misused in science and literature.

11 Faults in Southern California Above is a map of faults found in southern California If we just study the San Jacinto fault, what is this called statistically ? If we study the system, San Jacinto, Elsinor, and San Andreas what is this called statistcally ?

12 So What is a Statistic ? Is the average mass of a pebble a statistic ? This depends on whether this average is determine From a sample of pebbles or the total population... If we take the average of the total population – this considered a parameter and is now a simple fact The average of a sample, however, is a statistic.

13 So What is a Statistic ? A statistic is an attempt to estimate the average mass of all the pebbles by calculating the average mass of some of the pebbles Statistics are generally based on a sample of the population

14 Election Polls Polling question: “ Who did the best job in the debate ?” Obama 54% McCain 30% Estimates of voter intentions obtained before an election are statistics...a sample of the population

15 Election Polls Obama 365 McCain 162 The final result of an election, however, is an election parameter The final result is a fact, a measure of the entire voting population Obama 66,882,230 McCain 58,343,671

16 Back to the Pebbly Beach Average, Mean, and Median Pebble#Mass (g) 1374 2389 3395 4364 5224 6250 7378 8376 9330 10310 The typical mass of pebbles on a particular beach can be described by the mean, ( same as the average ) w w = 1/N    w i i = 1 N The mean is the “total mass of the sample” divided by The “number of pebbles” - What is mean of these pebbles ?

17 Back to the Pebbly Beach Average, Mean, and Median Pebble#Mass (g) 1225 2250 3310 4330 5364 6374 7376 8378 9389 10395 11399 Another way of finding the typical mass of pebbles is to use the median value. Median means “middle” and is the weight of the middle Pebble if all are lined up (ranked) from lightest to heaviest. You must have an odd number of pebbles to get the median In the above example, pebble #6 has a mass of 374 g which gives the median value of this pebble sample

18 Back to the Pebbly Beach Average, Mean, and Median Will the median always be the same as the mean ? With an even number of pebbles (100), you can average The 50 th and 51 st pebbles. Pebble#Mass (g) 1225 2250 3310 4330 5364 6374 7376 8378 9389 10395 11399

19 Back to the Pebbly Beach - Dispersion What about other aspects of the distribution of pebbles ? How can we tell if the pebbles are similar in size (i.e. well or poorly sorted) Pebble#Mass (g) 1225 2250 3310 4330 5364 6374 7376 8378 9389 10395 11399 We could give the total range of sizes – known as the dispersion But how much does this tell us about all the sample pebbles ?

20 Back to the Pebbly Beach - Dispersion The heaviest and lightest pebbles may not be “typical” One way to get an accurate measure of how similar your Pebbles are is to use the mean square of the standard deviation Pebble#Mass (g) 1225 2250 3310 4330 5364 6374 7376 8378 9389 10395 11399  2 = (mass - w) 2 This measures the deviation from the mean – also known as the variance - the bar indicates the average of all calculations The standard deviation is the square root of this value.

21 Back to the Pebbly Beach - Dispersion Pebble#Mass (g) 1225 2250 3310 4330 5364 6374 7376 8378 9389 10395 11399  2 = (mass - w) 2 Why do we square this difference ? Some will be negative, we just want the deviation of each From the average value. If  2 is small – then the masses are similar and well sorted If  2 is large – then the masses are widely varying and are poorly sorted

22 Visualizing Distribution of Data How can you display graphically the distribution of a large number of pebbles ? Which sizes occur most often ? Which are fairly rare ?

23 Visualizing Distribution of Data: Histogram A histogram displays the pebble mass count in bins ( 10 bins shown ) We first count the number of occurences (frequency) in each bin and list them in a table called the frequency distribution Then plot this frequency as a bar chart against mass Pebble mass (g) Frequency Range(g)Number 200-2351 236-2603 261-2857 286-3159 316-33516 336-36522 366-38519 386-41514 416-4356 436-4652 Frequency Distribution

24 Histograms in Matlab (or Octave) To plot histograms in Matlab: >> x = 200:25:500 % set bin range and increment, here 25 >> y = pebblefile(:,2) % read column 2 of file of pebble masses >> hist(y,x) % plots histogram shown above for data (y) and bins (x) Pebble mass (g) Frequency Pebble# Mass (g) 1225 2250 3310 4330 5364 6374 7376 8378 9389 10395 11399 Count of all pebbles

25 Visualizing Distribution of Data Marine seismic study, Weeraratne et al., 2007 We're interested in earthquake paths which come from every possible azimuth within 360 o (the back azimuth). How can we graphically represent the distribution of cyclical data or direction ?

26 Visualizing Distribution of Data: Rose Diagrams A rose diagram is like plotting a histogram on a polar graph. The direction is represented by The angle around the plot and The frequency is proportional To distance from the center. Here frequency ranges from 0 to 6 and an angle of 30 o is the most frequent occuring 6 times. A list of fault dip angles could be plotted in this way.

27 Plotting Rose Diagrams in Matlab (or Octave) To plot rose diagrams in Matlab: >> dip = faultdipfile(:,1) % reads first column of data input >> dipradians = dip.*pi./180 % converts angles to radians >> bins = 100 % specify the number of bins >> rose(dipradians,bins) % plot the rose diagram

28 Probability What is Probability ? If I measure a large number of data points, how often do I obtain a particular result ? Pebble mass (g) Frequency For the pebbles masses measured here, the most probably mass is 350 grams This mass value occurs in 22 (frequency) out of 100 cases or 22% of the time. Thus the estimated probability of picking up a pebble in this area with a mass of 350 grams is 22%.

29 Probability What is Probability ? If I measure a large number of data points, how often do I obtain a particular result ? Pebble mass (g) Frequency For the pebbles masses measured here, the most probably mass is 350 grams This mass value occurs in 22 (frequency) out of 100 cases or 22% of the time. Thus the estimated probability of picking up a pebble in this area with a mass of 350 grams is 22%.

30 Probability What is Probability ? If I measure a large number of data points, how often do I obtain a particular result ? Pebble mass (g) Frequency For the pebbles masses measured here, the most probably mass is 350 grams This mass value occurs in 22 (frequency) out of 100 cases or 22% of the time. Thus the estimated probability of picking up a pebble in this area with a mass of 350 grams is 22%.

31 Probability We can then add another column to the data which shows the probability for each bin size You can now plot probability in a histogram Pebble mass (g) Probability Range(g)Number Probability 200-2351.01 236-2603.03 261-2857.07 286-3159.09 316-33516.16 336-36522.22 366-38519.19 386-41514.14 416-4356.06 436-4652.02 Frequency Distribution & Probability

32 Probability: What is Normal ? You can compare your data distribution to theoretical estimates The most common distribution used is a normal distribution also known as a Gaussian distribution. Pebble mass (g) Probability Range(g)Number Probability 200-2351.01 236-2603.03 261-2857.07 286-3159.09 316-33516.16 336-36522.22 366-38519.19 386-41514.14 416-4356.06 436-4652.02 Frequency Distribution & Probability

33 Gaussian Distribution P(x) = e [-(x-x) 2 /2  2 ] sqrt(2  2 ) The Gaussian distribution is written as above and describes the relative probability of obtaining the value, x. Here  is the standard deviation and x is the average of all x

34 x P(x) Gaussian Distribution This is a Gaussian distribution for x mean = 5.0 and  = 2.0 You are more likely to obtain a value between 4-6 where the graph is high And less likely to obtain a value between 1-2, or 9-10 P(x) = e [-(x-x) 2 /2  2 ] sqrt(2  2 )

35 Gaussian Distribution We can quantify this by looking at the area under the curve, the total area under the curve is 1.0 The area under the curve between 1 - 2 is shown in gray. This area is much smaller than the dark gray block between 4 - 7. x P(x)

36 Gaussian Distribution The area under the curve between 3-7 is 0.683 and is termed 1.0  (this is known as the 68% confidence limit) The area under the curve between 1-9 is 0.954 and is termed  2.0  (this is known as the 95% confidence limit) x P(x) 1.0  2.0  To quantify these “areas” we use established values for multiples of the standard deviation from the mean

37 Linear Regression: How to Fit a Line to Scattered Data Now that we've learned statistical analysis of a single variable We can also consider statistical analysis of two related variables. We may be able to approximate this relationship by a straight line. How do we find this line ? Which line is best ? Pebble diameter Distance from shore (m)

38 Linear Regression: How to Fit a Line to Scattered Data The line draw to the right is one possibility. How can we determine whether this line is better than another – in a quantitative way ? Pebble diameter Distance from shore (m) DyDy We can calculate the mean square deviation by looking the distance each point is from the predicted line The devation of one point is shown by D y and is estimated in the “y direction” only.

39 Linear Regression: How to Fit a Line to Scattered Data This gives you the deviation of one point from the line. To obtain the mean square deviation, we take the average of D y for all points We calculate this using the same equation for standard deviation which we used before. Pebble diameter Distance from shore (m) DyDy  2 = ( D y - D y) 2 The line with the smallest s will have the best fit to the data


Download ppt "Probability and Statistics in Geology Probability and statistics are an important aspect of Earth Science. Understanding the details, population of a data."

Similar presentations


Ads by Google