Presentation is loading. Please wait.

Presentation is loading. Please wait.

NA387(3) Lecture 2: Populations, Samples, and Frequency Analysis (Devore, Ch. 1.1-1.2)

Similar presentations


Presentation on theme: "NA387(3) Lecture 2: Populations, Samples, and Frequency Analysis (Devore, Ch. 1.1-1.2)"— Presentation transcript:

1 NA387(3) Lecture 2: Populations, Samples, and Frequency Analysis (Devore, Ch. 1.1-1.2)

2 Topics I. Types of Data and Variables II. Samples Vs. Populations III. Branches of Statistics IV. Descriptive Statistics: Frequency Analysis Tools – Frequency Tables – Histograms – Dot Plots – Stem and Leaf Plots

3 I. Data Characteristics Categorical Data (Qualitative) –Nominal Data (e.g., different colors, types of defect) –Binary Data (e.g., defective / not defective) –Note: often convert categorical data to numbers for analysis. Numerical Data (Quantitative) –Numbers (e.g., diameter is 25 mm, the service time is 15.2 minutes). Variable – any characteristic whose value may change from one object to another in a population. Data Sets –Univariate – observations on a single variable –Bivariate – observations made on each of 2 variables –Multivariate – observations on more than 2 variables

4 Types of Variables (Devore) A variable is discrete if its set of possible values constitute a finite set or an infinite sequence. A variable is continuous if its set of possible values consists of an entire interval on a number line.

5 Discrete Vs. Continuous Variables Discrete variables - vary by whole units # of students in class, sum of rolling 2 dice Continuous variables - vary to any degree, limited only by precision of measurement system. Height of students in a class, Length of an object, miles per gallon. Precision of Measurement System Concept: –Continuous variables can always be broken down further with greater measurement precision. For example, Length could be noted: 10 mm, 10.0 mm, 10.01 mm, 10.008 mm …

6 II. Samples Vs. Population When describing a variable, we often collect a sample of data from a population. –Population - All items in a set (obtain via census). Describe populations using parameters such as the population mean ( m ) –Sample - Subset of Population. Estimate parameters using statistics, mean Example: suppose we produce ball bearings. We might measure a sample from among all the bearings produced (population) to assess if we are meeting our engineering specifications.

7 Population Example (all possible outputs are known) What is the population for all possible combinations of the sum of rolling two dice?

8 Understanding Samples If you roll two dice 10 times (10 samples), you will generally observe different combinations each time. Key Sampling Concepts: –You don’t need to measure every observation to understand a population. –Knowledge of a population increases with the number and size of samples but eventually the value of this information may converge. The notion that we may understand populations by only measuring samples drives the field of statistics. www.seeingstatistics.comwww.seeingstatistics.com >> Chapter 6.2 – probability of rolling two dice

9 Population Example (Not all possible combinations are known) Of course, for most data sets, the possible combinations are not known. Suppose we have the width measurements for all 10,000 items produced in a furniture factory Average (m): 1220 mm

10 Samples from Continuous Populations Suppose you take a sample of 3 from this population –Width Measurements: 1219.1, 1220.1, 1220.5 1219.1 1220.1 1220.5

11 Samples from Continuous Populations If you take another sample of 3 from this population, you likely will get a different set of values. As samples become larger, they likely will converge or form a pattern (if the population does NOT change.) – “Underlying Distribution” 1220.25 1219.5 1218.5

12 III. Branches of Statistics Descriptive Statistics – to summarize and describe data (chapter 1 in textbook) –Graphical methods: histograms, boxplots, dotplots, etc. –Numerical measures: means, medians, standard deviations, correlations Inferential Statistics – based on a sample from a population, make an inference (some conclusion or educated guess) about the population (chapters 6-16 in Devore) –Point estimation, confidence intervals, hypothesis testing, ANOVA, linear regression, SPC, reliability analysis, etc, etc… –Examples: Estimate the durability of a component based on a life test, determine if machine settings have a significant impact on a quality characteristic. –How about probability? Bridge between descriptive and inferential statistics (chapters 2-5)!

13 Relationship Between Probability and Inferential Statistics (Devore) Population Sample Probability Inferential Statistics

14 Other Concepts: Concrete vs Conceptual Population, Enumerative vs Analytical Studies: Concrete population (well defined) versus conceptual/hypothetical population (might not yet exist). Examples: –Concrete: Newspapers published in 2002 –Conceptual: Students with a GPA of 4.0 in graduating class of 2005 Enumerative versus Analytical Studies –Enumerative - focused on a finite, unchanging collection of individuals/objects from a population (sampling frame) –Analytical – collection of individuals/objects can change. Focus is on improving a future product. For instance, study 5 parts produced in the same machine during the same time period, adjust settings based on the analysis to improve output.

15 IV. Descriptive Statistics - Frequency Analysis Frequency Analysis Tools –Frequency Table –Histograms –Dot Plot Understanding Data Patterns –Distribution Shapes –Outliers

16 16 Frequency Analysis Tools Frequency Analysis is used to analyze data patterns. It involves determining frequencies (# of occurrences) by classes (also called bins or frequency ranges). –Classes or Bins - values or ranges of values (continuous variables). E.g., Test Scores: 76, 82, 77, 73, 84, 93, 81, 98 Bin 70-79: ? values Bin 80-89: ? values Bin 90-99: ? values

17 Example: Miles per Gallon Data Suppose you conduct an experiment on gas mileage. spec: 21 +/- 2 mpg You collect 100 samples ~ measure number of miles per gallon based on full tanks.

18 Frequency Table To understand the distribution, we first create a frequency table. In general, Frequency (or BIN) Ranges are equal widths ~ 0.4 mpg Inclusion of end values of ranges is often software dependent. For Excel: –17.4 - 17.8 would be read as: –17.4 < X <= 17.8

19 Frequency Tables Frequency Tables also may include: –Relative Frequency – Bin frequency / total observations –Cumulative Relative Frequency Cumulative % of Relative Frequencies

20 Frequency Table Example

21 Treatment of Observations Falling Exactly on Frequency Range Limits In Excel, a range of 17.4 – 17.8 implies all values greater than 17.4 and less than or equal to 17.8. In other software (e.g., Minitab), a range of 17.4 – 17.8 would mean all values greater than or equal to 17.4 and less than 17.8. Neither is wrong, it is merely a matter of convention (agreement). To avoid confusion, try to use more discriminatory (precise) limits: –Example: 17.400 – 17.799, 17.800 – 18.199 etc.

22 Most statistical software (including Excel / Minitab) automatically create frequency bin ranges given a data set. If creating own ranges, some general rules: –# of ranges ~ –Between 5 – 20 ranges is usually sufficient Frequency Ranges (Classes)

23 Histograms A histogram is a graphical representation of a frequency table. It is used to: –show distribution shape for one variable (conveys the location, dispersion, and symmetry). –identify outliers

24 Histograms: Discrete Data Determine the frequency and relative frequency for each value of x. Then mark possible x values on a horizontal scale. Above each value, draw a rectangle whose height is the relative frequency of that value.

25 Ex. (Devore) Students from a small college were asked how many charge cards that they carry. x is the variable representing the number of cards. Results are in table below: x#people 012 142 257 324 49 54 62 Rel. Freq 0.08 0.28 0.38 0.16 0.06 0.03 0.01 Frequency Distribution

26 Histograms (Devore) xRel. Freq. 00.08 10.28 20.38 30.16 40.06 50.03 60.01 Credit card results:

27 Histograms, Continuous Data: Equal Class Widths Determine the frequency and relative frequency for each class. Then mark the class boundaries on a horizontal measurement axis. Above each class interval, draw a rectangle whose height is the relative frequency.

28 Histograms, Continuous Data: Unequal Widths After determining frequencies and relative frequencies, calculate the height of each rectangle using: Rectangle height=(class relative frequency)/(class witdh) The resulting heights are called densities and the vertical scale is the density scale.

29 Histogram Example: MPG Data Typical Y-Axis: –frequency or relative frequency

30 Common Histogram Shapes: Histogram shapes may be used to help identify the underlying distribution types. Skewed Right Skewed Left Exponential Bi-Modal Normal (Bell Curve)

31 Histogram Shape Example Which shape is shown in this histogram? Is this shape likely explainable or unexplainable?

32 Dot Plots, Example Another tool (provided by most advanced statistical software) is the “Dot Plot”. Dot Plots are shown without grouping into ranges – typically are used with smaller data sets. Example: Dot Plot Using Minitab for our MPG Data.

33 Stem-and- Leaf Displays 1.Select one or more leading digits for the stem values. The trailing digits become the leaves. 2. List stem values in a vertical column. 3. Record the leaf for every observation. 4. Indicate the units for the stem and leaf on the display.

34 Stem-and-Leaf Example Observed values: 9, 10, 15, 22, 9, 15, 16, 24,11 0 9 9 1 1 0 5 5 6 2 2 4 Stem: tens digit; Leaf: units digit

35 Stem-and- Leaf Displays Identify typical value Extent of spread about a value Presence of gaps Extent of symmetry Number and location of peaks Presence of outlying values

36 Frequency Analysis & Sample Size General rules for creating histograms and assessing distribution shapes. –minimum 30 samples ~ prefer 100 or more Avoid using relative frequency (%) unless at least 30 samples are available. Later in this course, we will address this further when discussing estimation and confidence intervals.


Download ppt "NA387(3) Lecture 2: Populations, Samples, and Frequency Analysis (Devore, Ch. 1.1-1.2)"

Similar presentations


Ads by Google