Presentation is loading. Please wait.

Presentation is loading. Please wait.

EART20170 Computing, Data Analysis & Communication skills

Similar presentations


Presentation on theme: "EART20170 Computing, Data Analysis & Communication skills"— Presentation transcript:

1 EART20170 Computing, Data Analysis & Communication skills
Lecturer: Dr Paul Connolly (F18 – Sackville Building) 1. Data analysis (statistics) 3 lectures & practicals statistics open-book test (2 hours) 2. Computing (Excel statistics/modelling) 2 lectures assessed practical work Course notes etc: Recommended reading: Cheeney. (1983) Statistical methods in Geology. George, Allen & Unwin

2 Recap – last lecture The four measurement scales: nominal, ordinal, interval and ratio. There are two types of errors: random errors (precision) and systematic errors (accuracy). Basic graphs: histograms, frequency polygons, bar charts, pie charts. Gaussian statistics describe random errors. The central limit theorem Central values, dispersion, symmetry Weighted mean.

3 Some common problems

4 Use tables 1 4 0.0278 6 1.8333 3.3611 3 1.3611 7 2.8333 8.0278 25

5 Lecture 2 Correlation between two variables
Classical linear regression Reduced major axis regression Propagation of errors in compound quantities.

6 Correlation Many real-life quantities have a dependence on some thing else. E.g dependence of rock permeability on porosity. How can we quantify the strength and direction of a linear relationship between X and Y variables?

7 Correlation  y = sum of all y-values  x = sum of all x-values
Linear correlation (Pearson’s coefficient)  y = sum of all y-values  x = sum of all x-values  x2 = sum of all x2 values  y2 = sum of all y2 values  xy = sum of the x times y values Like other numerical measures, the population correlation coefficient is (the Greek letter ``rho'‘, ) and the sample correlation coefficient is denoted by r.

8 Correlation Values of r y r = +1 y r = -1 y r = 0 x x x Perfect
positive correlation Perfect negative correlation No correlation

9 Correlation coefficient, r r2, fraction of explained
r2 is the amount of variation in x and y that is explained by the linear relationship. It is often called the `goodness of fit’ E.g. if an r = 0.97 is obtained then r2 = 0.95 so 100x0.95=95% of the total variation in x and y is explained by the linear relationship, but the remaining 5% variation is due to “other” causes. 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 -1.0 -0.5 +0.0 +0.5 +1.0 Correlation coefficient, r r2, fraction of explained variation

10 Regression analysis How can we fit an equation to a set of numerical data x, y such that it yields the best fit for all the data?

11 Classical linear regression
An approximate fit yields a straight line that passes through the set of points in the best possible manner without being required to pass exactly through any of the points.

12 Classical linear regression
y x m { ei c Linear Regression Y=mx+c Where ei is the deviation of the data point from the fit line, c is the intercept, m is the gradient. Assumes that the error is present only in y.

13 How do we define a good fit?
If the sum of all deviations is a minimum? ei If the sum of all the absolute deviations is a minimum? |ei| If the maximum deviation is a minimum? emax If the sum of all the squares of the deviations is a minimum? ei2

14 Classical linear regression
The best way is to minimise the sum of the squares of the deviation. Formally this involves some Mathematics: At each value of xi: Therefore the deviations from the curve are: The sum of the squares:

15 Classical linear regression
How do you find the minimum of a function? Use calculus Differentiate and set to zero Two simultaneous equations

16 Classical linear regression
Solving the two equations yields:

17 Classical linear regression
x y xy x2 ?

18 Classical linear regression
Classical linear regression only considered errors in the Y values of the data. How can we consider errors in both x and y values? Use Reduced major axis regression

19 Reduced major axis regression
{ y x dy c dx Method to quantify a linear relationship where both variables are dependent and have errors Instead of minimising e2=(Y-y)2 we minimise e2=dy2+dx2.

20 Reduced major axis regression

21 Reduced major axis regression
y x-x’ y-y’ (x-x’)2 (y-y’)2 ?

22 Error propagation Every measurement of a variable has an error.
Often the error quoted is one standard deviation of the mean (mean ± standard deviation) The standard deviation of the sample mean is usually our best estimate of the population standard deviation

23 Error propagation Error propagation is a way of combining two or more random errors together to get a third. The equations assume that the errors are Gaussian in nature. It can be used when you need to measure more than one quantity to get at your final result. For example, if you wanted to predict permeability from a measured porosity and grainsize. The equations introduced here let you propagate the uncertainties on your data through the calculation and come up with an uncertainty on your results. How then do we combine variables which have errors?

24 Error propagation - quoted
Relationship Error propagation (k=constant)

25 Example of propagation of error
Suppose we measure the thickness of a rock bed using a tape measure. The tape measure is shorter then the bed thickness so we have to do it in two steps x and y. We repeat the measurements 100 times and obtain the following mean and standard deviation values for x and y: The thickness of the bed should be simply: But what about the error on the total thickness? x=12.1±0.3 cm y=4.2±0.2 cm x+y=16.3 cm

26 Example of propagation of error
It is given by propagating the individual errors as follows: So the final answer for the total thickness of the bed is: Error propagation formulae are non-intuitive and understanding how they are derived requires some mathematical knowledge 16.3±0.4 cm

27 More complex examples What if we have several functions of several variables? E.g. calculating density using Archimedes Principle: This equation contains two functions and two variables Error propagation is best done in parts, so first work out value and error in denominator: Then the value and error of: In a few of weeks we will use a Monte Carlo method for solving more complex functions

28 Reminder Statistics practical #2
Those not taking BIOL20451: Roscoe – 1300 Tuesday Those taking BIOL20451: Williamson – 1600 Tuesday

29 Some common problems Weighted mean f x

30 What does adding two variables really mean?


Download ppt "EART20170 Computing, Data Analysis & Communication skills"

Similar presentations


Ads by Google