Download presentation

Presentation is loading. Please wait.

Published byJohn Munoz Modified over 2 years ago

1
Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

2
Lecture notes will be posted on class website – – Supplementary material: IRSUR by Kerns Grades – Homeworks (60%) [No homework copying.] – Midterm (20%), Final (20%) The R language will be used for data analysis. A tutorial session is arranged on Tuesday (6:00 – 7:00 pm). Office hour: Thursday 2:30 – 3:30 pm 1/31/2014 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University 2

3
What is statistics ? Statistics is a science of reasoning from data. A body of principles and methods for extracting useful information from data, for assessing the reliability of that information, for measuring and managing risk, and for making decisions in the face of uncertainty. 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

4
The major difference between statistics and mathematics is that statistics always needs observed data, while mathematics does not. An important feature of statistical methods is the uncertainty involved in analysis. 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

5
Statistics is the discipline concerned with the study of variability, with the study of uncertainty and with the study of decision-making in the face of uncertainty. As these are issues that are crucial throughout the sciences and engineering, statistics is an inherently interdisciplinary science. 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

6
Extracting useful information from data Assessing the reliability of that information – How much are we sure about our claim based on the data? 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

7
One of the objectives of this course is to facilitate students with a critical way of thinking. – Accuracy of weather forecasting – Accuracy of flood forecasting – Not to be fooled by the surface meaning of statistical terminologies. 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

8
Sources of uncertainties Data uncertainty Parameter uncertainty Model structure uncertainty – An exemplar illustration 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

9
You are given a set of (x,y) data. Apparently, Y is dependent on X. 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

10
Observed data with uncertainties (Linear model) 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

11
Observed data with uncertainties (Power model) The linear model fits the data better than the power model. 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

12
Theoretical model: Sum of squared errors (SSE) of estimates of the linear and power models (with respect to the theoretical model) are and , respectively. Theoretical model The power model performs better than the linear model. 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

13
Key topics in statistics Probability Estimation Test of hypotheses Regression Forecasting Quality control Simulation 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

14
Deterministic vs Stochastic Models An abstract model is a description of the essential properties of a phenomenon that is formulated in mathematical terms. – An abstract model is used as a theoretical approximation of reality to help us understand the world around us. All models are wrong! 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

15
Essentially, all models are wrong, but some are useful. Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful. (George E. P. Box) – Normal distribution for mens height, grades in a statistics class, etc. 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

16
Types of abstract models Deterministic model – A deterministic model describes a phenomenon whose outcome is fixed. Stochastic model – A random/stochastic model describes the unpredictable variation of the outcomes of a random experiment. 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

17
Examples Deterministic model – Suppose we wish to measure the area covered by a lake that, for all practical purposes, appears to have a circular shoreline. Since we know the area A= r 2, where r is the radius, we would attempt to measure the radius and substitute it in the formula. 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

18
Stochastic model – Consider the experiment of tossing a balanced coin and observing the upper face. It is not possible to predict with absolute accuracy what the upper face will be even if we repeat the experiment so many times. However, it is possible to predict what will happen in the long run. We can say that the probability of heads on a single toss is ½. – P(more than 60 heads in 100 trials) 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

19
Random Experiment and Sample Space An experiment that can be repeated under the same (or uniform) conditions, but whose outcome cannot be predicted in advance, even when the same experiment has been performed many times, is called a random experiment. Can the lotto draw be considered as a random experiment? 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

20
Examples of random experiments – The tossing of a coin. – The roll of a die. – The selection of a numbered ball (1-50) in an urn. (selection with replacement) – The time interval between the occurrences of two higher than scale 6 earthquakes. – The amount of rainfalls produced by typhoons in one year (yearly typhoon rainfalls). 1/31/2014 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University 20

21
The following items are always associated with a random experiment: – Sample space. The set of all possible outcomes, denoted by. – Outcomes. Elements of the sample space, denoted by. These are also referred to as sample points or realizations. – Events. Subsets of for which the probability is defined. Events are denoted by capital Latin letters (e.g., A, B, C ). 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

22
Definition of Probability Classical probability Frequency probability Probability model 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

23
Classical (or a priori) probability If a random experiment can result in n mutually exclusive and equally likely outcomes and if n A of these outcomes have an attribute A, then the probability of A is the fraction n A /n. 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

24
Example 1. Compute the probability of getting two heads if a fair coin is tossed twice. (1/4) Example 2. The probability that a card drawn from an ordinary well-shuffled deck will be an ace or a spade. (16/52) 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

25
Remarks The probabilities determined by the classical definition are called a priori probabilities since they can be derived purely by deductive reasoning. 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

26
The equally likely assumption requires the experiment to be carried out in such a way that the assumption is realistic; such as, using a balanced coin, using a die that is not loaded, using a well-shuffled deck of cards, using random sampling, and so forth. This assumption also requires that the sample space is appropriately defined. 1/31/2014 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University 26

27
Troublesome limitations in the classical definition of probability: – If the number of possible outcomes is infinite; – If possible outcomes are not equally likely. 1/31/2014 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University 27

28
Relative frequency (or a posteriori) probability We observe outcomes of a random experiment which is repeated many times. We postulate a number p which is the probability of an event, and approximate p by the relative frequency f with which the repeated observations satisfy the event. 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

29
Suppose a random experiment is repeated n times under uniform conditions, and if event A occurred n A times, then the relative frequency for which A occurs is f n (A) = n A /n. If the limit of f n (A) as n approaches infinity exists then one can assign the probability of A by: P(A)=. 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

30
This method requires the existence of the limit of the relative frequencies. This property is known as statistical regularity. This property will be satisfied if the trials are independent and are performed under uniform conditions. 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

31
Example 3 A fair coin was tossed 100 times with 54 occurrences of head. The probability of head occurrence for each toss is estimated to be /31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

32
Example 4 – Randomly draw three balls in the box at one time. What is the sample space of the random experiment? What is the probability of having two or more blue balls in a draw? What if … 1/31/2014 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

33
The chain of probability definition 1/31/2014 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University 33 Random experiment Sample space Event space Probability space

34
Probability Model 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

35
Event and event space An event is a subset of the sample space. The class of all events associated with a given random experiment is defined to be the event space. 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

36
Remarks 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

37
Probability is a mapping of sets to numbers. Probability is not a mapping of the sample space to numbers. – The expression is not defined. However, for a singleton event, is defined. 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

38
Probability space A probability space is the triplet (, A, P[ ]), where is a sample space, A is an event space, and P[ ] is a probability function with domain A. A probability space constitutes a complete probabilistic description of a random experiment. –The sample space defines all of the possible outcomes, the event space A defines all possible things that could be observed as a result of an experiment, and the probability P defines the degree of belief or evidential support associated with the experiment. 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

39
Conditional probability 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

40
Bayes theorem 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

41
Multiplication rule 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

42
Independent events 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

43
The property of independence of two events A and B and the property that A and B are mutually exclusive are distinct, though related, properties. If A and B are mutually exclusive events then AB=. Therefore, P(AB) = 0. Whereas, if A and B are independent events then P(AB) = P(A)P(B). Events A and B will be mutually exclusive and independent events only if P(AB)=P(A)P(B)=0, that is, at least one of A or B has zero probability. 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

44
But if A and B are mutually exclusive events and both have nonzero probabilities then it is impossible for them to be independent events. Likewise, if A and B are independent events and both have nonzero probabilities then it is impossible for them to be mutually exclusive. 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

45
Summarizing data Qualitative data – Frequency table 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

46
– Bar chart 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

47
Quantitative data – Histogram 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

48
Boxplot 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

49
1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

50
1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

51
1/31/2014 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University 51

52
Dealing with outliers – Should the outliers be discarded or should they be retained? – An example of outlier presence Typhoon Morakot 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

53
Typhoon Morakot Cumulative rainfall (Aug 7, 0:00 – 24:00) 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

54
Cumulative rainfall (Aug 8, 0:00 – 24:00) 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

55
Cumulative rainfall (Aug 9, 0:00 – 24:00) 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

56
Cumulative rainfall in mm 2009/08/07 00:00 ~ 2009/08/09 17:00 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

57
Measures of Central Tendency Mean – Sum of measurements divided by the number of measurements. Median – Middle value when the data are sorted. Mode – Value or category that occurs most frequently. 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

58
Measures of Variation Standard Deviation - summarizes how far away from the mean the data value typically are. Range 1/31/ Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

59
Reading assignment IPSUR (Will be covered in the tutor session) – Chapt. 2 – Chapt , 3.1.3, , 3.4.4, 3.4.5, 3.4.6, /31/2014 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University 59

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google