Presentation is loading. Please wait.

Presentation is loading. Please wait.

AN OVERVIEW OF STATISTICS. WHAT IS STATISTICS? What does a statistician do? Player Games Minutes Points Rebounds FG% Bob 34 32.724 7.6.552 Andy 36 31.521.

Similar presentations


Presentation on theme: "AN OVERVIEW OF STATISTICS. WHAT IS STATISTICS? What does a statistician do? Player Games Minutes Points Rebounds FG% Bob 34 32.724 7.6.552 Andy 36 31.521."— Presentation transcript:

1 AN OVERVIEW OF STATISTICS

2 WHAT IS STATISTICS? What does a statistician do? Player Games Minutes Points Rebounds FG% Bob 34 32.724 7.6.552 Andy 36 31.521 8.4.465 Larry 30 33.018 5.6.493 Michael 31 35.129 6.1.422 Player Games Minutes Points Rebounds FG% Bob 34 32.724 7.6.552 Andy 36 31.521 8.4.465 Larry 30 33.018 5.6.493 Michael 31 35.129 6.1.422

3 JOB OF A STATISTICIAN Collects numbers or data Systematically organizes or arranges the data Analyzes the data…extracts relevant information to provide a complete numerical description Infers general conclusions about the problem using this numerical description

4 POLITICS Forecasting and predicting winners of elections Where to concentrate campaign appearances, advertising and $$… If the election for president of the United States were held today, who would you be more likely to vote for? Rudy Guiliani 45% Hilary Clinton 43% Someone else 2% Wouldn’t vote 4% Unsure 6% If the election for president of the United States were held today, who would you be more likely to vote for? Rudy Guiliani 45% Hilary Clinton 43% Someone else 2% Wouldn’t vote 4% Unsure 6%

5 To market product… Interested in the average length of life of a light bulb Cannot test all the bulbs INDUSTRY

6 USES OF STATISTICS Statistics is a theoretical discipline in its own right Statistics is a tool for researchers in other fields Used to draw general conclusions in a large variety of applications

7 COMMON PROBLEM Decision or prediction about a large body of measurements which cannot be totally enumerated. Examples Light bulbs (to enumerate population is destructive) Forecasting the winner of an election (population too big; people change their minds) Solutions Collect a smaller set of measurements that will (hopefully) be representative of the larger set.

8 DATA AND STATISTICS Data consists of information coming from observations, counts, measurements, or responses. Statistics is the science of collecting, organizing, analyzing, and interpreting data in order to make decisions. A population is the collection of all outcomes, responses, measurement, or counts that are of interest. A sample is a subset of a population.

9 Introduction to Probability and Statistics Thirteenth Edition Chapter 1 Describing Data with Graphs

10 Introduction to Statistical Terms  Variable o Something that can assume some type of value  Data  consists of information coming from observations, counts, measurements, or responses.  Data Set o A collection of data values  Observation o the value, at a particular period, of a particular variable experimental unit  An experimental unit is the individual or object on which a variable is measured. measurement  A measurement results when a variable is actually measured on an experimental unit. data,samplepopulation.  A set of measurements, called data, can be either a sample or a population.

11 Example Variable – Time until a light bulb burns out Experimental unit – Light bulb Typical Measurements – 1500 hours, 1535.5 hours, etc.

12 Populations and Samples A Population is the set of all items or individuals of interest – Examples: All likely voters in the next election All parts produced today All sales receipts for November A Sample is a subset of the population – Examples:1000 voters selected at random for interview A few parts selected for destructive testing Every 100 th receipt selected for audit

13 population sample inference Sampling Techniques Statistical Procedures Parameters Statistics

14 Parameters & Statistics A parameter is a numerical description of a population characteristic. A statistic is a numerical description of a sample characteristic. Parameter Population Statistic Sample

15  Univariate data:  Univariate data: One variable is measured on a single experimental unit.  Bivariate data:  Bivariate data: Two variables are measured on a single experimental unit.  Multivariate data:  Multivariate data: More than two variables are measured on a single experimental unit.

16  Nominal o for things that are mutually exclusive/non-overlapping o there is no order or ranking o For example: gender (male or female), religion.  Ordinal o can be ordered, but not precisely. o For example : health quality (excellent, good, adequate, bad, terrible)  Interval o involves measurements, but there is no meaningful zero. o For example : temperature.  Ratio o involves measurements, it can be ranked and there are precise differences between the ranks, as well as having a meaningful zero. o For example: height, time, or weight

17 Qualitative Discrete Continuous Quantitative Types of Variables

18 Qualitative variablesQualitative variables measure a quality or characteristic on each experimental unit. Examples:Examples: Hair color (black, brown, blonde…) Make of car (Dodge, Honda, Ford…) Gender (male, female) State of birth (California, Arizona,….) Quantitative variablesQuantitative variables measure a numerical quantity on each experimental unit. Discrete Discrete if it can assume only a finite or countable number of values. Continuous Continuous if it can assume the infinitely many values corresponding to the points on a line interval.

19 Examples For each orange tree in a grove, the number of oranges is measured. –Quantitative discrete For a particular day, the number of cars entering a college campus is measured. –Quantitative discrete Time until a light bulb burns out –Quantitative continuous

20 Statistical Methods Descriptive StatisticsInferential Statistics Utilizes numerical and graphical methods to look for patterns in the data set. The data can either be a representation of the entire population or a sample

21 Descriptive Statistics GraphicalNumerical Bar Chart Pie Chart Bar/Pie Chart Line Plot (Time Series) Dotplot Stem-and-Leaf Plot Histogram Ogive Boxplot Qualitative Quantitative Note: Some graphs require a tabular representation (frequency distribution) Qualitative Quantitative Central Tendency Dispersion (Variability) Tables, frequency, percentage, cumulative percentage Cross tabulation

22 Graphing Qualitative Variables data distributionUse a data distribution to describe: –What values –What values of the variable have been measured –How often –How often each value has occurred “How often” can be measured 3 ways: –Frequency –Relative frequency = Frequency/n –Percent = 100 x Relative frequency Bar Chart Pie Chart

23 Example A bag of M&Ms contains 25 candies: Raw Data:Raw Data: ColorTally FrequencyRelative Frequency Percent Red33/25 =.1212% Blue66/25 =.2424% Green44/25 =.1616% Orange55/25 =.2020% Brown33/25 =.1212% Yellow44/25 =.1616% m m mm m m m m m m m m m m m m m m m m m m m mmm mm m mm mm mm m mmm mm m m m m m m m m m Statistical Table:

24 Graphs Bar Chart Pie Chart

25 Graphing Quantitative Variables Bar/Pie Chart Line Plot (Time Series) Dotplot Stem-and-Leaf Plot Histogram Ogive Boxplot

26 Graphing Quantitative Variables (1) bar pie chartA single quantitative variable measured for different population segments or for different categories of classification can be graphed using a bar or pie chart. A Big Mac hamburger costs $4.90 in Switzerland, $2.90 in the U.S. and $1.86 in South Africa.

27 time seriesline bar chartA single quantitative variable measured over time is called a time series. It can be graphed using a line or bar chart. SeptOctNovDecJanFebMar 178.10177.60177.50177.30177.60178.00178.60 CPI: All Urban Consumers-Seasonally Adjusted Graphing Quantitative Variables (2)

28 The simplest graph for quantitative data Plots the measurements as points on a horizontal axis, stacking the points that duplicate existing points. Example:Example: The set 4, 5, 5, 7, 6 45674567 Graphing Quantitative Variables (3) -Dotplot

29 Stem and Leaf Plots (4) A simple graph for quantitative data Uses the actual numerical values of each data point. –Divide each measurement into two parts: the stem and the leaf. –List the stems in a column, with a vertical line to their right. –For each measurement, record the leaf portion in the same row as its matching stem. –Order the leaves from lowest to highest in each stem. –Provide a key to your coding. –Divide each measurement into two parts: the stem and the leaf. –List the stems in a column, with a vertical line to their right. –For each measurement, record the leaf portion in the same row as its matching stem. –Order the leaves from lowest to highest in each stem. –Provide a key to your coding.

30 Example : Stem-and-Leaf Plot The prices ($) of 18 brands of walking shoes: 907070707570656860 747095757068654065 40 5 60 5 5 5 8 8 70 0 0 0 0 0 4 5 5 8 90 5

31 Relative Frequency Histograms (5) relative frequency histogramA relative frequency histogram for a quantitative data set is a bar graph in which the height of the bar shows “how often” (measured as a proportion or relative frequency) measurements fall in a particular class or subinterval. 5-12subintervalsDivide the range of the data into 5-12 subintervals of equal length. approximate widthCalculate the approximate width of the subinterval as Range/number of subintervals. Round the approximate width up to a convenient value. left inclusionUse the method of left inclusion, including the left endpoint, but not the right in your tally. statistical tableCreate a statistical table including the subintervals, their frequencies and relative frequencies.

32 relative frequency histogramDraw the relative frequency histogram, plotting the subintervals on the horizontal axis and the relative frequencies on the vertical axis. The height of the bar represents proportion –The proportion of measurements falling in that class or subinterval. probability –The probability that a single measurement, drawn at random from the set, will belong to that class or subinterval. Relative Frequency Histograms (5) : cont’d

33 Example 1 The ages of 50 tenured faculty at a state university. 34 48 70 63 52 52 35 50 37 43 53 43 52 44 42 31 36 48 43 26 58 62 49 34 48 53 39 45 34 59 34 66 40 59 36 41 35 36 62 34 38 28 43 50 30 43 32 44 58 53 We choose to use 6 intervals. Minimum class width = (70 – 26)/6 = 7.33 Convenient class width = 8 Use 6 classes of length 8, starting at 25. Range

34 AgeTallyFrequencyRelative Frequency Percent 25 to < 33111155/50 =.1010% 33 to < 411111 1111 11111414/50 =.2828% 41 to < 491111 1111 1111313/50 =.2626% 49 to < 571111 99/50 =.1818% 57 to < 651111 1177/50 =.1414% 65 to < 731122/50 =.044%

35 ClassClass Boundaries Midpoint Frequency Relative Frequency Percent 25 to < 3324.5 – 33.52955/50 =.1010% 34 to < 4233.5 – 42.5381616/50 =.3232% 43 to < 5142.5 – 51.5471414/50 =.2828% 52 to < 6051.5 – 60.5561010/50 =.2020% 61 to < 6960.5 – 69.56544/50 =.088% 70 to < 7869.5 – 78.57411/50 =.022%

36 Shape? Outliers? What proportion of the tenured faculty are younger than 42.5? What is the probability that a randomly selected faculty member is 52 or older? Skewed right No. (16 + 5)/50 = 31/50 =.62=62% (10 + 4 + 1)/50 = 15/50 =.34 Describing the Distribution

37 How Many Class Intervals? Many (Narrow class intervals) may yield a very jagged distribution with gaps from empty classes Can give a poor indication of how frequency varies across classes Few (Wide class intervals) may compress variation too much and yield a blocky distribution can obscure important patterns of variation. (X axis labels are upper class endpoints)

38 Example 2 Example: A manufacturer of insulation randomly selects 20 winter days and records the daily high temperature 24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53, 27

39 Sort raw data in ascending order: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Find range: 58 - 12 = 46 Select number of classes: 5 (usually between 5 and 12) Compute class interval (width): 10 (46/5 then round up) Determine class boundaries (limits): 10, 20, 30, 40, 50, 60 Compute class midpoints: 15, 25, 35, 45, 55 Count observations & assign to classes Example 2: Solution (Frequency Distribution)

40 Class 10 ≤ X < 20 3.15 15 20 ≤ X < 30 6.30 30 30 ≤ X < 40 5.25 25 40 ≤ X < 50 4.20 20 50 ≤ X < 60 2.10 10 Total 20 1.00 100 Relative Frequency Percentage Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 (continued) Example 2: Solution (Frequency Distribution) Frequency

41 Class Midpoints Histogram: Example 2 (No gaps between bars) Class 10 ≤ X < 20 15 3 20 ≤ X < 30 25 6 30 ≤ X < 40 35 5 40 ≤ X < 50 45 4 50 ≤ X < 60 55 2 Frequency Class Midpoint

42 Ogive (6) An ogive is a curve drawn for the cumulative frequency distribution by joining with straight lines the dots marked above the upper boundaries of classes at heights equal to the cumulative frequencies of respective classes. Two type of ogive: (i) ogive less than (ii)ogive greater than First, build a table of cumulative frequency.

43 Cumulative Frequency Class 10 ≤ X < 20 3 15 3 15 20 ≤ X < 30 6 30 9 45 30 ≤ X < 40 5 25 14 70 40 ≤ X < 50 4 20 18 90 50 ≤ X < 60 2 10 20 100 Total 20 100 Percentage Cumulative Percentage Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Frequency Cumulative Frequency

44 Graphing Cumulative Frequencies: The Ogive Class Boundaries (Not Midpoints) Class <10 0 0 10 ≤ X < 20 10 15 20 ≤ X < 30 20 45 30 ≤ X < 40 30 70 40 ≤ X < 50 40 90 50 ≤ X < 60 50 100 Cumulative Percentage Lower class boundary

45 Interpreting Graphs: Location and Spread Where is the data centered on the horizontal axis, and how does it spread out from the center?

46 Interpreting Graphs: Shapes Mound shaped and symmetric (mirror images) Skewed right: a few unusually large measurements Skewed left: a few unusually small measurements Bimodal: two local peaks

47 Are there any strange or unusual measurements that stand out in the data set? Outlier No Outliers Interpreting Graphs: Outliers

48 A quality control process measures the diameter of a gear being made by a machine (cm). The technician records 15 diameters, but inadvertently makes a typing mistake on the second entry. 1.9911.8911.9911.9881.993 1.9891.9901.988 1.9881.9931.9911.9891.9891.9931.9901.994 Example


Download ppt "AN OVERVIEW OF STATISTICS. WHAT IS STATISTICS? What does a statistician do? Player Games Minutes Points Rebounds FG% Bob 34 32.724 7.6.552 Andy 36 31.521."

Similar presentations


Ads by Google