Download presentation

Presentation is loading. Please wait.

Published byLillian Wade Modified over 3 years ago

1
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Understanding Variability Instructor: Ron S. Kenett Course Website: Course textbook: MODERN INDUSTRIAL STATISTICS, Kenett and Zacks, Duxbury Press, 1998

2
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.2 Course Syllabus Understanding Variability Variability in Several Dimensions Basic Models of Probability Sampling for Estimation of Population Quantities Parametric Statistical Inference Computer Intensive Techniques Multiple Linear Regression Statistical Process Control Design of Experiments

3
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.3 Discrete Data A set of data is said to be discrete if the values / observations belonging to it are distinct and separate. That is, they can be counted (1,2,3, ). For example, the number of kittens in a litter; the number of patients in a doctors surgery; the number of flaws in one metre of cloth; gender (male, female); blood group (O, A, B, AB).

4
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.4 Continuous Data A set of data is said to be continuous if the values / observations belonging to it may take on any value within a finite or infinite interval. You can count, order and measure continuous data. For example, height; weight; temperature; the amount of sugar in an orange; the time required to run a mile.

5
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.5 Types of Variables Qualitative Variables Attributes, categories Examples: male/female, registered to vote/not, ethnicity, eye color.... Quantitative Variables Discrete - usually take on integer values but can take on fractions when variable allows - counts, how many Continuous - can take on any value at any point along an interval - measurements, how much

6
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.6 For each of the following, indicate whether the appropriate variable would be qualitative or quantitative. If the variable is quantitative, indicate whether it would be discrete or continuous. Self Assessment Test

7
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.7 Self Assessment Test a) Whether you own an RCA Colortrak television set b) Your status as a full-time or a part- time student c) Number of people who attended your school s graduation last year Qualitative Variable two levels: yes/no no measurement Qualitative Variable two levels: full/part no measurement Quantitative, Discrete Variable a countable number only whole numbers

8
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.8 Self Assessment Test d) The price of your most recent haircut e) Sam s travel time from his dorm to the Student Union Quantitative, Discrete Variable a countable number only whole numbers Quantitative, Continuous Variable any number time is measured can take on any value greater than zero

9
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.9 Self Assessment Test f) The number of students on campus who belong to a social fraternity or sorority Quantitative, Discrete Variable a countable number only whole numbers

10
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.10 Scales of Measurement Nominal Scale - Labels represent various levels of a categorical variable. Ordinal Scale - Labels represent an order that indicates either preference or ranking. Interval Scale - Numerical labels indicate order and distance between elements. There is no absolute zero and multiples of measures are not meaningful. Ratio Scale - Numerical labels indicate order and distance between elements. There is an absolute zero and multiples of measures are meaningful.

11
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.11 Self Assessment Test Bill scored 1200 on the Scholastic Aptitude Test and entered college as a physics major. As a freshman, he changed to business because he thought it was more interesting. Because he made the dean s list last semester, his parents gave him $30 to buy a new Casio calculator. Identify at least one piece of information in the:

12
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.12 Self Assessment Test a) nominal scale of measurement. 1. Bill is going to college. 2. Bill will buy a Casio calculator. 3. Bill was a physics major. 4. Bill is a business major. 5. Bill was on the dean s list.

13
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.13 Self Assessment Test b) ordinal scale of measurement c) interval scale of measurement d) ratio scale of measurement Bill is a freshman. Bill earned a 1200 on the SAT. Bill s parents gave him $30.

14
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.14 Self Assessment Test b) ordinal scale of measurement c) interval scale of measurement d) ratio scale of measurement Bill is a freshman. Bill earned a 1200 on the SAT. Bill s parents gave him $30.

15
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.15 Histogram A histogram is a way of summarising data that are measured on an interval scale (either discrete or continuous). It is often used in exploratory data analysis to illustrate the major features of the distribution of the data in a convenient form. It divides up the range of possible values in a data set into classes or groups. For each group, a rectangle is constructed with a base length equal to the range of values in that specific group, and an area proportional to the number of observations falling into that group. This means that the rectangles might be drawn of non-uniform height.

16
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.16 Data array An orderly presentation of data in either ascending or descending numerical order. Frequency Distribution A table that represents the data in classes and that shows the number of observations in each class. Key Terms

17
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.17 Key Terms Frequency Distribution Class - The category Frequency - Number in each class Class limits - Boundaries for each class Class interval - Width of each class Class mark - Midpoint of each class

18
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.18 Sturge s Rule How to set the approximate number of classes to begin constructing a frequency distribution. where k = approximate number of classes to use and n = the number of observations in the data set.

19
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.19 Frequency Distributions 1. Number of classes Choose an approximate number of classes for your data. Sturges rule can help. 2. Estimate the class interval Divide the approximate number of classes (from Step 1) into the range of your data to find the approximate class interval, where the range is defined as the largest data value minus the smallest data value. 3. Determine the class interval Round the estimate (from Step 2) to a convenient value.

20
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.20 Frequency Distributions 4. Lower Class Limit Determine the lower class limit for the first class by selecting a convenient number that is smaller than the lowest data value. 5. Class Limits Determine the other class limits by repeatedly adding the class width (from Step 2) to the prior class limit, starting with the lower class limit (from Step 3). 6. Define the classes Use the sequence of class limits to define the classes.

21
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.21 Relative Frequency Distributions 1. Retain the same classes defined in the frequency distribution. 2. Sum the total number of observations across all classes of the frequency distribution. 3. Divide the frequency for each class by the total number of observations, forming the percentage of data values in each class.

22
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.22 Cumulative Relative Frequency Distributions 1. List the number of observations in the lowest class. 2. Add the frequency of the lowest class to the frequency of the second class. Record that cumulative sum for the second class. 3. Continue to add the prior cumulative sum to the frequency for that class, so that the cumulative sum for the final class is the total number of observations in the data set.

23
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.23 Cumulative Relative Frequency Distributions 4. Divide the accumulated frequencies for each class by the total number of observations -- giving you the percent of all observations that occurred up to an including that class. An Alternative: Accrue the relative frequencies for each class instead of the raw frequencies. Then you don t have to divide by the total to get percentages.

24
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.24 Example The average daily cost to community hospitals for patient stays during 1993 for each of the 50 U.S. states was given in the next table. a) Arrange these into a data array. b) Construct a stem-and-leaf display. *) Approximately how many classes would be appropriate for these data? c & d) Construct a frequency distribution. State interval width and class mark. e) Construct a histogram, a relative frequency distribution, and a cumulative relative frequency distribution.

25
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.25 Example – Data List AL $775HI 823MA 1,036NM 1,046SD 506 AK 1,136ID 659MI 902NY 784TN 859 AZ 1,091IL 917MN 652NC 763TX 1,010 AR 678IN 898MS 555ND 507UT 1,081 CA 1,221IA 612MO 863OH 940VT 676 CO 961KS 666MT 482OK 797VA 830 CT 1,058KY 703NE 626OR 1,052WA 1,143 DE 1,024LA 875NV 900PA 861WV 701 FL 960ME 738NH 976RI 885WI 744 GA 775MD 889NJ 829SC 838WY 537

26
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.26 Example – Data Array CA 1,221TX 1,010RI 885NY 784KS 666 WA 1,143NH 976LA 875AL 775ID 659 AK 1,136CO 961MO 863GA 775MN 652 AZ 1,091FL 960PA 861NC 763NE 626 UT 1,081CH 940TN 859WI 744IA 612 CT 1,058IL 917SC 838ME 738MS 555 OR 1,052MI 902VA 830KY 703WY 537 NM 1,046NV 900NJ 829WV 701ND 507 MA 1,036IN 898HI 823AR 678SD 506 DE 1,024MD 889OK 797VT 676MT 482

27
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.27 Example – Stem and Leaf Display Stem-and-Leaf DisplayN = 50 Leaf Unit: , , 81, 58, 52, 46, 36, 24, , 61, 60, 40, 17, 02, 00 (11) 898, 89, 85, 75, 63, 61, 59, 38, 30, 29, , 84, 75, 75, 63, 44, 38, 03, , 76, 66, 59, 52, 26, , 37, 07, Range: $482 - $1,221

28
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.28 To approximate the number of classes we should use in creating the frequency distribution, use Sturges Rule, n = 50: Sturges rule suggests we use approximately 7 classes. Example – Frequency Distribution

29
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.29 Step 1. Number of classes Sturges Rule: approximately 7 classes. The range is: $1,221 – $482 = $739 $739/7 $106 and $739/8 $92 Steps 2 & 3. The Class Interval So, if we use 8 classes, we can make each class $100 wide. Example – Frequency Distribution

30
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.30 Example – Frequency Distribution Step 1. Number of classes Sturges Rule: approximately 7 classes. The range is: $1,221 – $482 = $739 $739/7 $106 and $739/8 $92 Steps 2 & 3. The Class Interval So, if we use 8 classes, we can make each class $100 wide.

31
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.31 Example – Frequency Distribution Step 4. The Lower Class Limit If we start at $450, we can cover the range in 8 classes, each class $100 in width. The first class : $450 up to $550 Steps 5 & 6. Setting Class Limits $450 up to $550$850 up to $950 $550 up to $650$950 up to $1,050 $650 up to $750 $1,050 up to $1,150 $750 up to $850 $1,150 up to $1,250

32
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.32 Example – Frequency Distribution Average daily cost NumberMark $450 – under $5504$500 $550 – under $650 3$600 $650 – under $750 9$700 $750 – under $850 9$800 $850 – under $950 11$900 $950 – under $1,050 7 $1,000 $1,050 – under $1,150 6 $1,100 $1,150 – under $1,250 1 $1,200 Interval width: $100

33
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.33 Example – Histogram

34
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.34 Example – Relative Frequency Distribution Average daily cost Number Rel. Freq. $450 – under $55044/50 =.08 $550 – under $650 33/50 =.06 $650 – under $750 99/50 =.18 $750 – under $850 99/50 =.18 $850 – under $ /50 =.22 $950 – under $1,050 77/50 =.14 $1,050 – under $1,150 66/50 =.12 $1,150 – under $1,250 11/50 =.02

35
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.35 Example – Polygon

36
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.36 Example – Cumulative Frequency Distribution Average daily cost Number Cum. Freq. $450 – under $ $550 – under $ $650 – under $ $750 – under $ $850 – under $ $950 – under $1, $1,050 – under $1, $1,150 – under $1,

37
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.37 Example – Cumulative Relative Frequency Distribution Average daily cost Cum.Freq. Cum.Rel.Freq. $450 – under $55044/50 =.02 $550 – under $65077/50 =.14 $650 – under $ /50 =.32 $750 – under $ /50 =.50 $850 – under $ /50 =.72 $950 – under $1, /50 =.86 $1,050 – under $1, /50 =.98 $1,150 – under $1, /50 = 1.00

38
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.38 Example – Percentage Ogive

39
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.39

40
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.40 Key Terms Measures of Central Tendency, The Center Mean µ, population;, sample Weighted Mean Median Mode

41
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.41 Key Terms Measures of Dispersion, The Spread Range Mean absolute deviation Variance Standard deviation Interquartile range Interquartile deviation Coefficient of variation

42
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.42 Key Terms Measures of Relative Position Quantiles Quartiles Deciles Percentiles Residuals Standardized values

43
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.43 The Mean Mean Arithmetic average = (sum all values)/# of values Population: µ = ( x i )/N Sample: = ( x i )/n Problem: Calculate the average number of truck shipments from the United States to five Canadian cities for the following data given in thousands of bags: Montreal, 64.0; Ottawa, 15.0; Toronto, 285.0; Vancouver, 228.0; Winnipeg, 45.0 (Ans: 127.4)

44
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.44 The Weighted Mean When what you have is grouped data, compute the mean using µ = ( w i x i )/ w i Problem: Calculate the average profit from truck shipments, United States to Canada, for the following data given in thousands of bags and profits per thousand bags: Montreal64.0Ottawa 15.0 Toronto $15.00 $13.50 $15.50 Vancouver 228.0Winnipeg 45.0 $12.00 $14.00 (Ans: $14.04 per thous. bags)

45
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.45 The Median To find the median: 1. Put the data in an array. 2A. If the data set has an ODD number of numbers, the median is the middle value. 2B. If the data set has an EVEN number of numbers, the median is the AVERAGE of the middle two values. (Note that the median of an even set of data values is not necessarily a member of the set of values.) The median is particularly useful if there are outliers in the data set, which otherwise tend to sway the value of an arithmetic mean.

46
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.46 The Mode The mode is the most frequent value. While there is just one value for the mean and one value for the median, there may be more than one value for the mode of a data set. The mode tends to be less frequently used than the mean or the median.

47
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.47 Comparing Measures of Central Tendency If mean = median = mode, the shape of the distribution is symmetric. If mode median > mode, the shape of the distribution trails to the right, is positively skewed. If mean median > mean, the shape of the distribution trails to the left, is negatively skewed.

48
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.48 The Range The range is the distance between the smallest and the largest data value in the set. Range = largest value – smallest value Sometimes range is reported as an interval, anchored between the smallest and largest data value, rather than the actual width of that interval.

49
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.49 Residuals Residuals are the differences between each data value in the set and the group mean: for a population, x i – µ for a sample, x i –

50
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.50 The MAD The mean absolute deviation is found by summing the absolute values of all residuals and dividing by the number of values in the set: for a population, MAD = ( |x i – µ |)/N for a sample, MAD = ( |x i – |)/n

51
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.51 The Variance Variance is one of the most frequently used measures of spread, for population, for sample, The right side of each equation is often used as a computational shortcut.

52
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.52 The Standard Deviation Since variance is given in squared units, we often find uses for the standard deviation, which is the square root of variance: for a population, for a sample,

53
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.53 Quartiles One of the most frequently used quantiles is the quartile. Quartiles divide the values of a data set into four subsets of equal size, each comprising 25% of the observations. To find the first, second, and third quartiles: 1. Arrange the N data values into an array. 2. First quartile, Q 1 = data value at position (N + 1)/4 3. Second quartile, Q 2 = data value at position 2(N + 1)/4 4. Third quartile, Q 3 = data value at position 3(N + 1)/4

54
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.54 Quartiles

55
1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.55 Standardized Values How far above or below the individual value is compared to the population mean in units of standard deviation How far above or below (data value – mean) which is the residual... In units of standard deviation divided by Standardized individual value: A negative z means the data value falls below the mean.

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google