Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistics 02. Normal distribution Also called normal curve or the Gaussian curve Any variable whose value comes about as the result of summing the values.

Similar presentations


Presentation on theme: "Statistics 02. Normal distribution Also called normal curve or the Gaussian curve Any variable whose value comes about as the result of summing the values."— Presentation transcript:

1 Statistics 02

2 Normal distribution Also called normal curve or the Gaussian curve Any variable whose value comes about as the result of summing the values of several independent, or almost independent, components can be modeled successfully as a normal distribution.

3 Normal distribution

4 Three features of the normal distribution 1. symmetrical histogram 2. the mean of the sample is very close to that of the original population. 3. the standard deviation of the set of sample means will be very close to the original population standard deviation divided by the square root of the sample size, n.

5 Z score Converted raw score on the basis of standard deviation. We convert a raw score to z score o determine how many standard deviation units that raw score is above or below the mean. Z=(X-M)/s

6 Application of Z score Comparison of two scores from two tests Conversion to standardized score (T score): T=50+10Z Determining the proportion below a particular raw score: X < Score Statistic inference: Range estimation

7 Case Student A takes 2 tests with the following data: Test 1: Raw score=67. Mean=63, Standard deviation=3 Test 2: R=56, M=51, s=4 Question: What possible information can we obtain?

8 Case Two students take two different tests of English. Student A: RS=67, M=63, s=3 Student B: RS=56, M=51, s=4 Question 1: Which student is better in English? Question 2: Their T scores?

9 Table of Normal Distribution Relation between Z score and Proportion

10 Case When we select a score randomly from the population, how much probability is this score below or above a certain score? That is: the probability of this score (X) < a certain score (say: 60) X<60

11 Case Z<? Z=(X-M)/s Therefore, inequality X-M < 60-M (X-M)/s < (60-M)/s Z<-1 P=0.1587 The chance that we randomly select a score that is below 60 is 16%.

12 Case Xiamen University wants to give the freshmen a placement test upon the admission and put them into 5 levels of English learning. Work out a plan for this test and inform the students before the test the scores required for each level. Total of freshmen: 5000 Classes for each level: B0: 4 B1: remaining B2: 20 B3: 8 B4: 4 Normal class size: 35

13 LevelClassesNumber%ZCut-off Sub40.028 11090.762 2200.14 360.042 440.028 140 3810 700 210 140 -1.90 0.80 1.5 1.90 44.6 60.8 65 67.4

14 Statistic inference Use a collection of observed values to make inferences about a larger set of potential values. Classical problem of statistic inference: how to infer from the properties of a part the likely properties of the whole. Because of the way in which samples are selected, it is often impossible to generalize beyond the samples.

15 Population The largest class to which we can generalize the results of an investigation based on a subclass, in other words, the set of all possible values of a variable. A population, for statistical purpose, is a set of values. We need to be sure that the values that constitute the sample somehow reflect the target statistical population.

16 Sampling Random sampling gives us reasonable confidence that our inference from sample values to population values are valid. The most common type of sampling frame is a list (actual or notional) of all the subjects in the group to which generalization is intended. What the techniques of statistics offer is a common ground, a common measuring stick by which experimenters can measure and compare the strength of evidence for one hypothesis or another that can be obtained from a sample of subjects.

17 Sampling Careful considerations are needed to ensure the sample represents the population. eg. The gravity of errors in written English as perceived by two different groups: native English- speaking teachers of English and Greek teachers of English. Both samples contained individuals from different institutions to avoid institution attitude bias. Researchers have an inescapable duty of describing carefully how their experimental material -- including subjects -- was actually obtained. It is also a good practice to attempt to foresee some of the objections that might be made about the quality of the material and either attempt to forestall criticism or admit openly to any serious defects.

18 Case Study Study the population and sample for the following investigations: Vocabulary size Listening input and listening comprehension Social backgrounds and learning strategy

19 Random Sampling Use the Table of Random Numbers Other methods

20 Statistic Parameters Population parameters Mean: μ(mu, [mju], English correspondent: m) Standard deviation: σ(sigma [  sigm  ], English correspondent: s) Sample parameters Mean: M Standard deviation: s

21 Other Greek Alphabets Σ sigma, symbol of sum, English correspondent: S ε: epsilon, symbol of error, English correspondent: e α: alpha χ: chi [kai], English correspondent: x

22 Parameter Estimation ( 参数估计 ) Point estimator ( 点估计 ): a single number calculated from a sample and used to estimate a population parameter. Interval estimator ( 区间估计 ): a likely range within which the population value may lie.

23 Standard error of the sample means If we draw repeatedly a sample from the population and calculate the means of these samples, these means will fall into a normal distribution. The variability of these means from the population mean is called standard error of the sample means, and is calculated as follows: Standard error σx = σ/√n When the population standard deviation σ is unknown, we often use the sample standard deviation s: σx = s/√n

24 Case If the following data are obtained from a test N=132 M=67 S=6.5 What is the standard error of the sample means?

25 Case σx = s/√n =6.5/√132 =6.5/11.49 =0.566

26 Confidence( 置信度 ) The probability at which we are confident the value will fall into, usually 95% or 99%. Procedure: calculate the Z score Look up in the Normal Distribution Table the Z score that corresponds to the probability of Z =α/2. Compare Z and Z =α/2

27 Case N=132, M=67, S=6.5 μ α=0.05 ?

28 Case Z=(X-M)/s =(X- μ)/ σ x =(67 - μ)/0.566 -Z =α/2 ≤ Z ≤ Z =α/2 -1.96 ≤ (67 - μ)/0.566 ≤ 1.96 -1.10936 ≤ 67 - μ ≤ 1.10936 -68.1036 ≤- μ ≤ -65.89064 65.89064 ≤ μ ≤ 68.1036

29 t Distribution When the sample size becomes less than 30, the sample fall into T distribution. T distribution is a family of curves Degree of freedom ( 自由度 ) : the number of conditions that are free to vary. In t distribution, df=n-1

30 Case Sample mean=63.16 Sample standard deviation=7.25 N=19 μ α=0.05 ?

31 Case Standard error=s/√19=7.25/ 4.36 = 1.66 Z=(X-M)/s =(X- μ)/ σ x =(63.16 - μ)/1.66 t 0.05/2 (18)=2.101 -2.101<= (63.16 - μ)/1.66<=2.101 -3.48766<= 63.16 - μ<= 3.48766 -66.64766<=- μ<=59.67234 59.7<=μ<=66.6


Download ppt "Statistics 02. Normal distribution Also called normal curve or the Gaussian curve Any variable whose value comes about as the result of summing the values."

Similar presentations


Ads by Google