Presentation is loading. Please wait.

Presentation is loading. Please wait.

STATISTICS PNPCOMPTROLLERSHIPCOURSE. Statistics The term has two meanings.The term has two meanings. Statistics (singular) is the science of collecting,

Similar presentations


Presentation on theme: "STATISTICS PNPCOMPTROLLERSHIPCOURSE. Statistics The term has two meanings.The term has two meanings. Statistics (singular) is the science of collecting,"— Presentation transcript:

1 STATISTICS PNPCOMPTROLLERSHIPCOURSE

2 Statistics The term has two meanings.The term has two meanings. Statistics (singular) is the science of collecting, organizing, analyzing, and interpreting information.Statistics (singular) is the science of collecting, organizing, analyzing, and interpreting information. Statistics (plural) are numbers calculated from a set or collection of information.Statistics (plural) are numbers calculated from a set or collection of information.

3 General Categories Descriptive Statistics comprises those methods used to organize and describe information that has been collected.Descriptive Statistics comprises those methods used to organize and describe information that has been collected. Inferential Statistics involves the theory of probability and comprises those methods and techniques for making generalizations, predictions, or estimates about the population by using limited information.Inferential Statistics involves the theory of probability and comprises those methods and techniques for making generalizations, predictions, or estimates about the population by using limited information.

4 Descriptive Statistics

5 Organizing Data Data are the building blocks of statistics.Data are the building blocks of statistics. They are generally categorized as quantitative or qualitative.They are generally categorized as quantitative or qualitative. They are also classified according to the type of measurement scale used such as:They are also classified according to the type of measurement scale used such as: –Nominal scale –Ordinal scale –Interval scale –Ratio scale

6 Nominal Scale Nominal scale exists for both the quantitative and qualitative data.Nominal scale exists for both the quantitative and qualitative data. –Nominal scale for quantitative data assigns numbers to categories to distinguish one from another such as basketball jerseys, postal zip codes, and telephone numbers. –Nominal scale for qualitative data is an unordered grouping of data into discrete categories where each datum can go into only one group such as sex, blood type, or religion..

7 Ordinal Scale Data measured on a nominal scale that is ordered in some fashion are referred to as ordinal data.Data measured on a nominal scale that is ordered in some fashion are referred to as ordinal data. –Letter grades as A, B, C, D, and F –Ranks as Inspector, Sr Inspector, Chief Inspector –Residence number –Performance Rating as Poor, Fair, Good –Grades in school as 1, 2, 3 and so on.

8 Interval Scale Data measured on an ordinal scale for which distances between values are calculated are called interval data.Data measured on an ordinal scale for which distances between values are calculated are called interval data. The distance between two values is relevant.The distance between two values is relevant. Interval data are necessarily quantitative.Interval data are necessarily quantitative. An interval scale does not have necessarily a zero point, a point which indicates the absence of what we are measuring.An interval scale does not have necessarily a zero point, a point which indicates the absence of what we are measuring.

9 Example IQ test scores. We can say an IQ score of 180 is higher than an IQ score of 90. We can also say that it is 90 points higher. But we cannot say that a person with an IQ score of 180 is twice as smart as a person with an IQ score of 90. Likewise, a given difference between two IQ scores does not have always the same meaning. Say 100-90, and 150-140, may have different interpretations even if the difference is the same as 10.IQ test scores. We can say an IQ score of 180 is higher than an IQ score of 90. We can also say that it is 90 points higher. But we cannot say that a person with an IQ score of 180 is twice as smart as a person with an IQ score of 90. Likewise, a given difference between two IQ scores does not have always the same meaning. Say 100-90, and 150-140, may have different interpretations even if the difference is the same as 10.

10 Another Example Celsius Temperature. A temperature of 80degrees C is 40 degrees warmer than a temperature of 40degrees C. But it is not correct to say that 80degrees C is twice as warm as 40degrees C. Note that 0degree C does not represent the absence of heat or zero heat. The absence of heat is represented by 0degree Kelvin equivalent to -273degrees C.Celsius Temperature. A temperature of 80degrees C is 40 degrees warmer than a temperature of 40degrees C. But it is not correct to say that 80degrees C is twice as warm as 40degrees C. Note that 0degree C does not represent the absence of heat or zero heat. The absence of heat is represented by 0degree Kelvin equivalent to -273degrees C.

11 Ratio Scale Data measured on an interval scale with a zero point meaning “none” are called ratio data. Because the zero point of the Celsius scale does not represent the absence of heat, the Celsius scale is not a ratio scale. The Kelvin scale is a ratio scale. Examples of other ratio scales are those commonly used to measure units such as feet, meters, pounds, and pesos. The results of counting objects are also ratio data.Data measured on an interval scale with a zero point meaning “none” are called ratio data. Because the zero point of the Celsius scale does not represent the absence of heat, the Celsius scale is not a ratio scale. The Kelvin scale is a ratio scale. Examples of other ratio scales are those commonly used to measure units such as feet, meters, pounds, and pesos. The results of counting objects are also ratio data.

12 Organizing Data Using Tables The objective of organizing data is to arrange a set of data into useful form in order to reveal essential features and simplify certain analyses.The objective of organizing data is to arrange a set of data into useful form in order to reveal essential features and simplify certain analyses. Data that are not organized in some fashion are called raw data.Data that are not organized in some fashion are called raw data. One method of arranging data is to construct an ordered array; that is arranging data from low to high (or high to low).One method of arranging data is to construct an ordered array; that is arranging data from low to high (or high to low). If the number of data is large, the data may be difficult to manage, thus tables are often used as a general approach to organizing raw data.If the number of data is large, the data may be difficult to manage, thus tables are often used as a general approach to organizing raw data.

13 Ungrouped Frequency Tables The frequency of a measurement or category is the total number of times the measurement or category occurs in a collection of data. The symbol f is used to denote the frequency of a measurement.The frequency of a measurement or category is the total number of times the measurement or category occurs in a collection of data. The symbol f is used to denote the frequency of a measurement. For example: A sample data representing the number of free throws missed by a basketball team during the last 7 games:For example: A sample data representing the number of free throws missed by a basketball team during the last 7 games: 7284272

14 Frequency Table of Free Throw Data Data x f 23 41 72 81 7

15 Tally Marks For a very large number of data, an intermediate step is to count observations through the use of tally marks to aid in determining the frequency f for each observation.For a very large number of data, an intermediate step is to count observations through the use of tally marks to aid in determining the frequency f for each observation. Corresponding to each observation we place a tally mark in a tally column.Corresponding to each observation we place a tally mark in a tally column. After all tallies are placed, they are counted for each measurement x to determine the frequency.After all tallies are placed, they are counted for each measurement x to determine the frequency.

16 Example 987843 210532 117328 766432 209469 694357 321442

17 x Frequency f 02 14 27 36 46 52 64 74 83 94 42

18 Grouped Frequency Tables A grouped frequency table shows frequencies according to groups or classes of measurements.A grouped frequency table shows frequencies according to groups or classes of measurements. For example, a memorial hospital wants to study whether its emergency room staffing is adequate. To start the study, the manager tracks down the number of people visiting the emergency room each day for a 12-day period with result as:For example, a memorial hospital wants to study whether its emergency room staffing is adequate. To start the study, the manager tracks down the number of people visiting the emergency room each day for a 12-day period with result as: 7438221328361823211552

19 Steps The manager constructs six groupings or classes, the first class representing 1-10 patients; the second class, 11-20 patients; 3 rd class, 21-30 patients; 4 th class, 31-40; 5 th class, 41-50; and, the 6 th class, 51-60.The manager constructs six groupings or classes, the first class representing 1-10 patients; the second class, 11-20 patients; 3 rd class, 21-30 patients; 4 th class, 31-40; 5 th class, 41-50; and, the 6 th class, 51-60. For the first class, the lower class limit is 1 while the upper class limit is 10. The rest of the classes will have a similar pattern of lower and upper limits.For the first class, the lower class limit is 1 while the upper class limit is 10. The rest of the classes will have a similar pattern of lower and upper limits. Tally the number of patients that fall within each class.Tally the number of patients that fall within each class. Construct the grouped frequency table.Construct the grouped frequency table.

20 Grouped Frequency Table for Emergency Room Data Class Frequency f 1-102 11-203 21-304 31-401 41-501 51-601 12

21 Basic Guidelines Each class should have the same width.Each class should have the same width. No two classes should overlap.No two classes should overlap. Each piece of data should belong to a class.Each piece of data should belong to a class. Class Frequency f 1-102 11-203 21-304 31-401 41-501 51-601

22 Class Boundaries and Class Widths Class boundaries determine class widths.Class boundaries determine class widths. Class boundaries for grouped frequency table are determined by considering the unit or precision of measurement.Class boundaries for grouped frequency table are determined by considering the unit or precision of measurement. The lower class boundary of a class interval is located one-half unit below the lower class limit. The upper class boundary is one-half unit above the upper class limit.The lower class boundary of a class interval is located one-half unit below the lower class limit. The upper class boundary is one-half unit above the upper class limit. The class width w for any class interval is found by subtracting the lower class boundary from the upper class boundary, thus:The class width w for any class interval is found by subtracting the lower class boundary from the upper class boundary, thus: w = l 2 – l 1 where: l 1 is the lower class boundary; and, l 2 is the upper class boundary for each class interval.

23 Basic Rules in Constructing a Grouped Frequency Table How many classes should be used?How many classes should be used? What should be the width of each class?What should be the width of each class? At what value should the first class start?At what value should the first class start? How is the class mark or midpoint computed?How is the class mark or midpoint computed?

24 Basic Rules in Constructing a Grouped Frequency Table For number of classes, Sturges’ rule:For number of classes, Sturges’ rule: c = 3.3(log n) + 1 For width, the rule is w = R/c where R is the range computed by subtracting the smallest measurement L from the largest measurement U; thus, R = U – L.For width, the rule is w = R/c where R is the range computed by subtracting the smallest measurement L from the largest measurement U; thus, R = U – L. The lower limit of the first class should be near and at most as large as the smallest measurement L.The lower limit of the first class should be near and at most as large as the smallest measurement L. Class mark X or the midpoint is computed by adding the lower class limit a and upper class limit b and dividing the sum by 2; thus, (a + b) / 2Class mark X or the midpoint is computed by adding the lower class limit a and upper class limit b and dividing the sum by 2; thus, (a + b) / 2

25 Relative Frequency Table It is useful sometimes to express each value or class in a frequency table as a fraction of the total observations.It is useful sometimes to express each value or class in a frequency table as a fraction of the total observations. The relative frequency of a class is found by dividing the frequency f by the total number of observations n.The relative frequency of a class is found by dividing the frequency f by the total number of observations n. The table that describes the relative frequencies is then called relative frequency table.The table that describes the relative frequencies is then called relative frequency table.

26 Cumulative Frequency Table There are many occasions when we are interested in the number of observations less than or equal to some value. Example: A teacher may want to know the number of students who got a score of less than or equal to 70% on an examination. The cumulative frequency will answer that.There are many occasions when we are interested in the number of observations less than or equal to some value. Example: A teacher may want to know the number of students who got a score of less than or equal to 70% on an examination. The cumulative frequency will answer that. The cumulative frequency for any measurement or class is the total of the frequency for that measurement or class and the frequencies of all measurements or classes of smaller value.The cumulative frequency for any measurement or class is the total of the frequency for that measurement or class and the frequencies of all measurements or classes of smaller value.

27 Cumulative Relative Frequency Table Cumulative frequency tables can be constructed also for tables containing relative frequencies or percentages.Cumulative frequency tables can be constructed also for tables containing relative frequencies or percentages. The procedures are identical to those used for cumulative frequency tables except that relative frequencies or percentages are used.The procedures are identical to those used for cumulative frequency tables except that relative frequencies or percentages are used. Cumulative relative frequencies have many uses. One is in scoring standardized tests through the percentiles method. A percentile score tells what part of the tested population scored lower. For example, if 50 is said to be the 90 th percentile in an examination, it means that 90% of the scores were lower than 50.Cumulative relative frequencies have many uses. One is in scoring standardized tests through the percentiles method. A percentile score tells what part of the tested population scored lower. For example, if 50 is said to be the 90 th percentile in an examination, it means that 90% of the scores were lower than 50.

28 Example A final examination result has the following data.A final examination result has the following data. 1715782110327651887 4223442998279984 4464627728145378344 7713411617138237554 7678841612292166785 In constructing the frequency table, assume c = 5.In constructing the frequency table, assume c = 5.

29 Grouped Frequency Table Class Number ClassXf 12-2111.518 222-4131.58 342-6151.56 462-8171.510 582-10191.58

30 Relative Frequency Table Classf Relative Frequency 2-2118.36 22-418.16 42-616.12 62-8110.20 82-1018.16 50

31 Cumulative Frequency Table Class Number Class Cumulative Frequency 12-2118 222-4126 342-6132 462-8142 582-10150

32 Cumulative Relative Frequency Table Classf Relative Frequency Cumulative Relative Frequency 2-2118.36.36 22-418.16.52 42-616.12.64 62-8110.20.84 82-1018.161.00 50

33 Graphical Representation of Data A Bar Graph

34 Graphical Representation of Data A Pie Graph

35 Graphical Representation of Data A Line Graph

36 Measures of Central Tendencies The first characteristic of a set of data that we want to measure is the center or central tendency. The purpose is to summarize a collection of data to obtain a general overview that will serve as a representative for the rest of the data.The first characteristic of a set of data that we want to measure is the center or central tendency. The purpose is to summarize a collection of data to obtain a general overview that will serve as a representative for the rest of the data. Common Measures of Central Tendencies:Common Measures of Central Tendencies: –Mean –Median –Mode –Midrange

37 Mean The mean or arithmetic average is found by adding the numbers and then dividing the sum by the number of observation n:The mean or arithmetic average is found by adding the numbers and then dividing the sum by the number of observation n: χ = Σx / n A population mean is denoted by:A population mean is denoted by: μ = Σx / N The mean for grouped data:The mean for grouped data: χ = Σ(f X) / Σf

38 Disadvantage of the Mean The mean as a measure of center has a disadvantage. It is affected by the extreme measurements on one end of a distribution. It depends on the value of every measurement and extreme values can lead to the mean misrepresenting the data.The mean as a measure of center has a disadvantage. It is affected by the extreme measurements on one end of a distribution. It depends on the value of every measurement and extreme values can lead to the mean misrepresenting the data. In this case, the median may provide a better measure than the mean inasmuch as it is not affected by the extreme values.In this case, the median may provide a better measure than the mean inasmuch as it is not affected by the extreme values.

39 Median In general, the median is found by first ranking the data.In general, the median is found by first ranking the data. If there is an odd number of observations, then the median is the number in the middle of the distribution.If there is an odd number of observations, then the median is the number in the middle of the distribution. If the number of observations is even, then the median is computed by adding the two numbers found in the middle positions and divide the sum by 2.If the number of observations is even, then the median is computed by adding the two numbers found in the middle positions and divide the sum by 2.

40 Mode The mode, if it exists, is the most frequent measurement or observation.The mode, if it exists, is the most frequent measurement or observation. The mode has the advantage of being easily found especially in small samples and is usually not influenced by extreme measurements on one end of an ordered set of data.The mode has the advantage of being easily found especially in small samples and is usually not influenced by extreme measurements on one end of an ordered set of data. Example: In an array of data arrange as follows: 1, 2, 3, 3, 3, 4, 5, and 6, the mode is 3.Example: In an array of data arrange as follows: 1, 2, 3, 3, 3, 4, 5, and 6, the mode is 3.

41 Mode Median Median Mode ModeMedian Rightward skewness Leftward skewness Symmetry Relationships Between Mean, Median and Mode Mean < Median Mean > Median Median = Mode Mean Mean Mean Mean = Median Median > Mode Median < Mode

42 Midrange The midrange of a set of data is the average of the largest and smallest measurements, thus:The midrange of a set of data is the average of the largest and smallest measurements, thus: Midrange = (U + L) / 2 For a data organized in a grouped frequency table, the midrange is approximately the average of the lower class boundary of the first class and the upper class boundary of the last class, thus:For a data organized in a grouped frequency table, the midrange is approximately the average of the lower class boundary of the first class and the upper class boundary of the last class, thus: Midrange = (l 1fc + l 2lc ) / 2

43 Measures of Dispersion or Variability Quite often, measures of central tendency alone do not adequately describe a characteristic being observed.Quite often, measures of central tendency alone do not adequately describe a characteristic being observed. Hence, variability is an important concept in statistics. As a result, there are many measures of variability for a collection of quantitative data such as:Hence, variability is an important concept in statistics. As a result, there are many measures of variability for a collection of quantitative data such as: RangeRange VarianceVariance Standard deviationStandard deviation Standard ScoreStandard Score

44 Range As previously defined, range is the difference between the largest and the smallest measurements; thus:As previously defined, range is the difference between the largest and the smallest measurements; thus: R = U – L where: R is the range L is the smallest measurement L is the smallest measurement U is the largest measurement U is the largest measurement

45 Deviation Score Deviation score is the quantity defined by this relationship:Deviation score is the quantity defined by this relationship: Deviation score represents the directed distance a measurement has from the mean of a set of data.Deviation score represents the directed distance a measurement has from the mean of a set of data. A positive deviation score means the measurement is above the mean; a negative means the mean is above the measurement; a zero deviation means the measurement is equal to the mean.A positive deviation score means the measurement is above the mean; a negative means the mean is above the measurement; a zero deviation means the measurement is equal to the mean. x - x

46 Sum of Squares By adding the deviation scores the resulting value will be zero, a useless result for analyzing a set of data. To avoid this situation the sum of squares is used.By adding the deviation scores the resulting value will be zero, a useless result for analyzing a set of data. To avoid this situation the sum of squares is used. Sum of squares SS is computed by first squaring the deviation scores, then adding them up; thus:Sum of squares SS is computed by first squaring the deviation scores, then adding them up; thus: SS = Σ( x – x ) 2

47 Variance The variance of a population of measurements is defined as the average of the squared deviation scores denoted by δ 2 ; thus:The variance of a population of measurements is defined as the average of the squared deviation scores denoted by δ 2 ; thus: δ 2 = SS/N δ 2 = SS/N The variance of a sample, denoted by s 2, is defined by the following formula:The variance of a sample, denoted by s 2, is defined by the following formula: s 2 = SS/(n – 1) s 2 = SS/(n – 1) The variance for data in frequency tables is computed by deriving first the sum of squares using the following formula then proceeding:The variance for data in frequency tables is computed by deriving first the sum of squares using the following formula then proceeding: SS = ( Σ(f x 2 ) – (Σf x ) 2 /Σf ) SS = ( Σ(f x 2 ) – (Σf x ) 2 /Σf ) Where x is the midpoint class mark x of the class

48 Variance Thus, the variance for a grouped data of a population is:Thus, the variance for a grouped data of a population is: δ 2 = SS/ Σf δ 2 = SS/ Σf The variance for a grouped data of a sample, denoted by s 2, is defined by the following formula: s 2 = SS/ (Σf -1) s 2 = SS/ (Σf -1)

49 Standard Deviation The standard deviation is defined as the positive square root of the variance.The standard deviation is defined as the positive square root of the variance. The standard deviation of a population is denoted by δ. The standard deviation of the sample is denoted by s; thus:The standard deviation of a population is denoted by δ. The standard deviation of the sample is denoted by s; thus: δ = √δ 2 s = √s 2 If the standard deviation of the population is given the standard deviation of a sample is derived from the following formula:If the standard deviation of the population is given the standard deviation of a sample is derived from the following formula: s = δ √n

50 Standard Score A measure that takes into account the dispersion of the scores is called standard scores.A measure that takes into account the dispersion of the scores is called standard scores. Standard score allows also analysts to make comparisons from different distributions, thus giving him the ability to decide on ranking.Standard score allows also analysts to make comparisons from different distributions, thus giving him the ability to decide on ranking. A standard score denoted as z is defined as:A standard score denoted as z is defined as: Standard score = Deviation Score/Standard Deviation Standard score = Deviation Score/Standard Deviation z = (x – μ) / δ

51 Example Problem Consider this problem. Pedro scores 700 on the math portion of a national test A. Pablo scores 24 on another national test B. The mean and the standard deviation of test A are 500 and 100 respectively, while that of test B are 18 and 6. If both tests are regarded as measures for the same kind of ability, which person is better?Consider this problem. Pedro scores 700 on the math portion of a national test A. Pablo scores 24 on another national test B. The mean and the standard deviation of test A are 500 and 100 respectively, while that of test B are 18 and 6. If both tests are regarded as measures for the same kind of ability, which person is better? To answer the question we need some method of comparison to compare scores from different distributions.To answer the question we need some method of comparison to compare scores from different distributions.

52 Answer Using deviation score, where Pedro has 200 (700-500) and Pablo has 6 (24-18), we cannot compare considering that the spread of the scores are not taken into account.Using deviation score, where Pedro has 200 (700-500) and Pablo has 6 (24-18), we cannot compare considering that the spread of the scores are not taken into account. Using z, we can now compare:Using z, we can now compare: For Pedro z = (700-500)/100 = 2 For Pablo z = (24-18)/6 = 1 Obviously, Pedro is better than Pablo.Obviously, Pedro is better than Pablo.

53 Inferential Statistics

54 Knowledge Requirements –The Concept of Probability –The Theorem on Counting –Binomial Distributions –Normal Distributions –Sampling Theory –Analysis of Variance (ANOVA) –Linear Regression Analysis

55 The Concept of Probability Probability provides the foundation of inferential statistics.Probability provides the foundation of inferential statistics. Using probability theory, we can deduce the likelihood of certain samples occurring with specified properties which will enable us to draw inferences about the population.Using probability theory, we can deduce the likelihood of certain samples occurring with specified properties which will enable us to draw inferences about the population. The probability of an event is a number between 0 and 1.The probability of an event is a number between 0 and 1. If E is an event, then P(E) denotes the probability of E.If E is an event, then P(E) denotes the probability of E.

56 The Concept of Probability Probability satisfies the following properties:Probability satisfies the following properties: P ( E i ) ≥ 0 P ( E i ) ≥ 0 P ( E i ) ≤ 1 P ( E i ) ≤ 1 Σ P ( E i ) = 1Σ P ( E i ) = 1 We can assign probabilities of events to occur through the use of experiments or empirical observation. Through the latter, the probability is called empirical probability and it is computed using the formula:We can assign probabilities of events to occur through the use of experiments or empirical observation. Through the latter, the probability is called empirical probability and it is computed using the formula: P ( E ) = f / n P ( E ) = f / n

57 Example Problem 1 An insurance company wants to estimate the probability of a police car involved in a car accident. Last month, 7 cars out of 20 police cars insured were involved in accidents. What is the estimated probability? What is the chance that a police car will not be involved in an accident?An insurance company wants to estimate the probability of a police car involved in a car accident. Last month, 7 cars out of 20 police cars insured were involved in accidents. What is the estimated probability? What is the chance that a police car will not be involved in an accident? Answers: a. P( E ) = f /n; P( E ) = 7/20 =.35Answers: a. P( E ) = f /n; P( E ) = 7/20 =.35 b. 1 – P( E ) = 1 -.35 =.65 b. 1 – P( E ) = 1 -.35 =.65

58 Example Problem 2 An Entrance Test Math scores for a large university is grouped as shown: Scoresf 200-2993,600 300-39911,900 400-49912,000 500-5995,500 600-6991,500 700-799500

59 If a student is selected at random, what is the probability that the student’s math score:If a student is selected at random, what is the probability that the student’s math score: –Exceeds 399? –Is at most 599? –Is between 600 – 699? –Is not between 400 -499? –Is less than or equal to 699? Hint: Construct the relative frequency table and the answers will be found in it.Hint: Construct the relative frequency table and the answers will be found in it. Requirements

60 Scoresf Relative f 200-2993,600.103 300-39911,900.340 400-49912,000.343 500-5995,500.157 600-6991,500.043 700-799500.014 35,000

61 The Theorem on Counting The fundamental theorem on counting (FTC) states that:The fundamental theorem on counting (FTC) states that: If an event can occur in any one of m ways, and if it has occurred, a second event can occur in any one of n ways, then the events can occur together, in the order stated, in mn different ways. The FTC is best exemplified with aid of a decision tree that will illustrate certain decisions to be made.The FTC is best exemplified with aid of a decision tree that will illustrate certain decisions to be made.

62 Example Problem A B B B C C C C C C There are two decisions to be made 1. At A, which road to B? 1. At A, which road to B? 2. At B, which road to C? 2. At B, which road to C? Question: How many ways to go from A to C Hint: At A, there are 3 ways At B, there are 2 ways At B, there are 2 ways Use FTC Use FTC

63 Binomial Distributions The binomial distribution describes the probability distribution of successes and failures.The binomial distribution describes the probability distribution of successes and failures. In a binomial experiment, the following properties are known:In a binomial experiment, the following properties are known: –The experiment consists of n identical trials. –Each trial results in exactly one of two outcomes, called success or failure. Success is denoted by S and failure by F. –The n trials are independent. –The probability of success p, remains constant from trial to trial. A Table for Binomial Distribution can be used to find values for P(x) for selected values of p and values of n.A Table for Binomial Distribution can be used to find values for P(x) for selected values of p and values of n.

64 Example Problem If a baseball player with a batting average of.600 comes to bat five times in a game, what is the probability that he will get three hits?If a baseball player with a batting average of.600 comes to bat five times in a game, what is the probability that he will get three hits? Given:Given: p =.600p =.600 n = 5n = 5 x = 3x = 3 Use the Binomial TableUse the Binomial Table

65 Normal Distribution A normal distribution is mound shaped as shown or takes on a bell-shaped appearance. y x μ where : y = (ρ -(x-μ)2/(2δ2) )/√2πδ μ = mean of the population δ = standard deviation ρ = 2.718 x = any real number

66 Properties A normal distribution is mound or bell shaped.A normal distribution is mound or bell shaped. The area under the curve is always equal to 1.The area under the curve is always equal to 1. The mean is located at the center of the distribution and the curve is symmetrical about its mean.The mean is located at the center of the distribution and the curve is symmetrical about its mean. The mean, median and mode are equal.The mean, median and mode are equal. The curve extends indefinitely to the left and right of the mean and approaches the horizontal axis.The curve extends indefinitely to the left and right of the mean and approaches the horizontal axis. The curve never touches the horizontal axis.The curve never touches the horizontal axis. The shape and position of the curve depend on the parameters μ and δ.The shape and position of the curve depend on the parameters μ and δ.

67 Empirical Rule Approximately 68% of the measurements fall within 1 standard deviation of the mean, that is within μ ± δ.Approximately 68% of the measurements fall within 1 standard deviation of the mean, that is within μ ± δ. Approximately 95% of the measurements fall within 2 standard deviations of the mean, that is within μ ± 2δ.Approximately 95% of the measurements fall within 2 standard deviations of the mean, that is within μ ± 2δ. Approximately 99.7% of the measurements fall within 3 standard deviations of the mean, that is within μ ± 3δ.Approximately 99.7% of the measurements fall within 3 standard deviations of the mean, that is within μ ± 3δ.

68 Example Problem 504065 Problem: For μ = 50 and δ = 5, find P(40<x <65) find P(40<x <65)

69 Solution Transform the distribution into a normal distribution by using the transformation formula: z = (x – μ) / δ.Transform the distribution into a normal distribution by using the transformation formula: z = (x – μ) / δ. Hence, for x = 40, z = (40 – 50)/5 = -2Hence, for x = 40, z = (40 – 50)/5 = -2 And for x = 65, z = (65 – 50)/5 = 3And for x = 65, z = (65 – 50)/5 = 3 Therefore by using the normal distribution table, P(40<x <65) = P(-2 < z < 3) =.4772 +.4987 =.9759Therefore by using the normal distribution table, P(40<x <65) = P(-2 < z < 3) =.4772 +.4987 =.9759

70 Sampling Theory A major concern of inferential statistics is to estimate unknown population characteristics by examining information gathered from a sub- collection of the population called sample.A major concern of inferential statistics is to estimate unknown population characteristics by examining information gathered from a sub- collection of the population called sample. If it were to be used to study the characteristic of the population, a sample must necessarily be a representative of the population.If it were to be used to study the characteristic of the population, a sample must necessarily be a representative of the population. When a complete enumeration of a population called census is not possible, a sample is used.When a complete enumeration of a population called census is not possible, a sample is used.

71 Random Sample Sampling bias which is an statistical bias can be removed through randomization which refers to a sampling process involving the selection of a sample through unbiased and impartial procedure.Sampling bias which is an statistical bias can be removed through randomization which refers to a sampling process involving the selection of a sample through unbiased and impartial procedure. The sample generated from this process is called a random sample.The sample generated from this process is called a random sample. Types of sampling techniques that characterize randomization procedures are simple random sampling, stratified sampling, cluster sampling, and systematic sampling.Types of sampling techniques that characterize randomization procedures are simple random sampling, stratified sampling, cluster sampling, and systematic sampling.

72 Analysis of Variance Analysis of variance is a statistical method used to test the equality of two or more population means.Analysis of variance is a statistical method used to test the equality of two or more population means. Analysis of variance is a methodology for analyzing the variation between samples and the variation within samples using variances, rather than ranges.Analysis of variance is a methodology for analyzing the variation between samples and the variation within samples using variances, rather than ranges. Analysis of variance enables us to test hypotheses such as:Analysis of variance enables us to test hypotheses such as: –H o : μ 1 = μ 2 = μ 3 = … = μ k –H 1 : At least two population means are unequal.

73 Linear Regression Analysis Linear regression analysis is a procedure to analyze the relationship of two types of variables where one is dependent upon one or more independent variables.Linear regression analysis is a procedure to analyze the relationship of two types of variables where one is dependent upon one or more independent variables. A simple linear regression model involves only one dependent variable influenced by one dependent variable. It is estimated using the least squares prediction equation as shown:A simple linear regression model involves only one dependent variable influenced by one dependent variable. It is estimated using the least squares prediction equation as shown: y = b 0 + b 1 x A multiple regression model involves one dependent variable influenced by two or more independent variables:A multiple regression model involves one dependent variable influenced by two or more independent variables: y = b 0 + b 1 x 1 + b 2 x 2 + b 3 x 3 +... b n x n

74 Estimation of Parameters In simple linear regression, the parameters b 0 and b 1 can be estimated by using the following formulas:In simple linear regression, the parameters b 0 and b 1 can be estimated by using the following formulas: SS x = Σx 2 – (Σx) 2 /n SS y = Σy 2 – (Σy) 2 /n SS xy = Σxy – (ΣxΣy)/n y = Σy/n x = Σx/n b 1 = SS xy / SS x b 0 = y - b 1 x In multiple linear regression, the estimation of the parameters is quite complicated but there are available computer programs that can provide solutions to the problem.In multiple linear regression, the estimation of the parameters is quite complicated but there are available computer programs that can provide solutions to the problem.

75 Example Problem An information on mileage ratings was released comparing engine size (in cubic inches of displacement) and miles per gallon (mpg) estimates for eight representative models of running compact cars in the US as shown: Car cdi cdimpg Cavalier12130 Stanza12031 Omni9734 Escort9827 Mazda12229 Horizon9734 Encore8538 Corolla12232

76 Requirement Suppose we want to find out if there is an existing relationship between engine displacement and mileage per gallon where engine displacement necessarily affects the mileage ratings of the car.Suppose we want to find out if there is an existing relationship between engine displacement and mileage per gallon where engine displacement necessarily affects the mileage ratings of the car. What relationship can be established?What relationship can be established? Hint: Use the least squares prediction equation to estimate the linear regression model.Hint: Use the least squares prediction equation to estimate the linear regression model.

77 Solution To find the relationship, we first establish the following:To find the relationship, we first establish the following: y = mpg x = cdi x = cdi Given the above notation, the relationship will be described by the following equation using the predictor model:Given the above notation, the relationship will be described by the following equation using the predictor model: y = b 0 + b 1 x After the relationship has been established, we now find the parameters b 0 and b 1 by applying the formulas. However, these formulas can be meaningful only after we compute for the following data.After the relationship has been established, we now find the parameters b 0 and b 1 by applying the formulas. However, these formulas can be meaningful only after we compute for the following data.

78 xy x2x2x2x2 y2y2y2y2xy 1213014,6419003,630 1203114,4009613,720 97349,4091,1563,298 98279,6047292,646 1222914,8848413,538 97349,4091,1563,298 85387,2251,4443,230 1223214,8841,0243,904 86225594,4568,21127,264

79 SS x = Σx 2 – (Σx) 2 /n = 94,456 – (862) 2 /8SS x = Σx 2 – (Σx) 2 /n = 94,456 – (862) 2 /8 = 1575.5 SS y = Σy 2 – (Σy) 2 /n = 8,211 – (255) 2 /8SS y = Σy 2 – (Σy) 2 /n = 8,211 – (255) 2 /8 = 82.875 = 82.875 SS xy = Σxy – (ΣxΣy)/n = 27,264 – (862)(255)/8SS xy = Σxy – (ΣxΣy)/n = 27,264 – (862)(255)/8 = – 212.25 = – 212.25 b 1 = SS xy / SS x = – 212.25/1575.5 = –.1347b 1 = SS xy / SS x = – 212.25/1575.5 = –.1347 b 0 = y - b 1 x = Σy/n – (–.1347)(Σx/n)b 0 = y - b 1 x = Σy/n – (–.1347)(Σx/n) = (255/8) + (.1347)(862/8) = 46.3889 = (255/8) + (.1347)(862/8) = 46.3889 Therefore, y = 46.3889 –.1347xTherefore, y = 46.3889 –.1347x

80 Analysis With y = 46.3889 –.1347x, we can now estimate the mileage rating of a car model given its engine displacement.With y = 46.3889 –.1347x, we can now estimate the mileage rating of a car model given its engine displacement. However, the estimate can only be true if there is a close correlation between the two parameters considered. This is another aspect which statistics thru the correlation analysis has to prove before the model can become useful in predicting characteristics of a variable. Commonly used to test correlation is the Spearman’s rank correlation coefficient denote by r s.However, the estimate can only be true if there is a close correlation between the two parameters considered. This is another aspect which statistics thru the correlation analysis has to prove before the model can become useful in predicting characteristics of a variable. Commonly used to test correlation is the Spearman’s rank correlation coefficient denote by r s.

81 Specification Criteria 1.Theory. Is there a theoretically sound justification for including the variable?1.Theory. Is there a theoretically sound justification for including the variable? 2.t-test. Is the variable’s estimated coefficient statistically significant and of the expected sign?2.t-test. Is the variable’s estimated coefficient statistically significant and of the expected sign? 3.Does the overall fit of the equation improve when the variable is added?3.Does the overall fit of the equation improve when the variable is added? 4.Bias. Does another variable’s coefficient change significantly when the variable is added to the equation?4.Bias. Does another variable’s coefficient change significantly when the variable is added to the equation?

82 PROBLEMS Contemporaneous correlation between the independent variables and the disturbance termContemporaneous correlation between the independent variables and the disturbance term HeteroscedasticityHeteroscedasticity AutocorrelationAutocorrelation

83 Further Analysis Moreover, the appropriateness of the regression model is further tested in order to prove that indeed the model is good enough for predictive purposes. The F statistic is the tool to test its appropriateness.Moreover, the appropriateness of the regression model is further tested in order to prove that indeed the model is good enough for predictive purposes. The F statistic is the tool to test its appropriateness. In other words, as illustrated by this particular model, statistics is not as what it has been presented yet. There is more to learn. And if you are interested in pursuing the knowledge it is enjoined that you acquire more books that will provide you with the information if not study further formally the subject matter.In other words, as illustrated by this particular model, statistics is not as what it has been presented yet. There is more to learn. And if you are interested in pursuing the knowledge it is enjoined that you acquire more books that will provide you with the information if not study further formally the subject matter.

84 Problem Obj Scales Description of a Single Population Comparison of Two Population Comparison of Two or More Population Analysis of the Relationship Between Two Variables Analysis of the Relationship Among Two or More Variables Nominal z test and estimator of p X 2 test of the multinomial experiment z test and estimator of (p 1 – p 2 ) X 2 test of a contingency table Ordinal Wilcoxon rank sum test for independent samples and for matched pairs Kruskal-Wallis test for the completely randomized design Freidman test for the randomized block design Spearman rank correlation Interval z test and estimator of u t test and estimator of u X 2 test and estimator of variance z test and estimator of u 1 -u 2 t test and estimator of u p = u 1 -u 2 F test and estimator of variance 1 2 /variance 2 2 ANOVA: Completely randomized design ANOVA: randomized block design Simple Linear regression and correlation Multiple Regression Analysis

85


Download ppt "STATISTICS PNPCOMPTROLLERSHIPCOURSE. Statistics The term has two meanings.The term has two meanings. Statistics (singular) is the science of collecting,"

Similar presentations


Ads by Google