ISE 261 PROBABILISTIC SYSTEMS
Chapter One Descriptive Statistics
Engineering Statistics Collect Data Summarize Draw Conclusions
Data Types Categorical (Qualitative) > Attribute Variable (Quantitative)
Population Defined collection or group of objects
Census Data is available for all objects in the population
Sample Subset of the population
Variable Any characteristic whose value may change from one object to another in the population
Empirical Data Based on Observation
Data Collection Basic Principles of Design: Replication Randomization Blocking
Descriptive Statistics Graphical (Visual) Numerical
Graphical Stem-and-Leaf Displays Dotplots Histograms Pareto Diagram Scatter Diagrams
Numerical Mean Median Trimmed Means Standard Deviation Variance Range
Stem-and-Leaf Displays Data Format: > Numerical > At Least Two Digits
Information Conveyed: > Identification of a typical value > Extent of spread about typical value > Presence of any gaps in the data > Extent of symmetry in the distribution > Number and location of peaks > Presence of any outlying values Information Not Displayed: > Order of Observations
Construction of Stem-and-Leaf: >Select 1 or more leading digits for stem values. The trailing digits becomes the leaves. >List possible stem values in a vertical column >Record the leaf for every observation beside the corresponding stem >Label or indicate the units for stems and leaves someplace in the display
DOTPLOTS Data Format:. Numerical DOTPLOTS Data Format: Numerical Distinct or Discrete Values Information Conveyed: Location Spread Extremes Gaps Construction: Each observation is a dot Stack dots above the value on a horizontal scale
Dotplot Example Data Set: Temperatures F0 84 49 61 40 83 67 45 66 70 69 80 58 68 60 67 72 73 70 57 63 70 78 52 67 53 67 75 61 70 81 76 79 75 76 58 31
Histograms (Pareto) Data Format: Qualitative. (Categorical) Frequency: Histograms (Pareto) Data Format: Qualitative (Categorical) Frequency: Number of times that a data value occurs in the data set. Relative Frequency: A proportion of time the value occurs.
Constructing a Pareto Histogram > Above each value (label), draw a rectangle whose height corresponds to the frequency or relative frequency of that value. > Ordering can be natural or arbitrary (eg. Largest to smallest).
Pareto Histogram Example During a week’s production a total of 2,000 printed circuit boards (PCBs) are manufactured. List of non-conformities: Blowholes = 120 Unwetted = 80 Insufficient solder = 440 Pinholes = 56 Shorts = 40 Unsoldered = 64 Improvements, Efforts, Time/Money?
Histograms Data Format:. >Numerical Histograms Data Format: >Numerical >Discrete or Continuous Data displayed by magnitude. Observed frequency is a rectangle. Height corresponds to the frequency in each cell.
Histogram Construction Discrete Data: >Find Frequency of each x value >Find Relative Frequency >Mark possible x values on a horizontal scale >Above each value, draw a rectangle whose height corresponds to the frequency or relative frequency of that value
Histogram Construction Continuous Data: (Equal Widths) > Count the number of observations (n) > Find the largest & smallest (n) > Find the Range (largest- smallest) > Determine the number and width of the class intervals by the following rules:
Rules > Use from 5 to 20 intervals Rules > Use from 5 to 20 intervals. Rule of Thumb: # of Intervals = √n > Use class intervals of equal width. Choose values that leave no question of the interval in which a value falls. > Choose the lower limit for the first cell by using a value that is slightly less than the smallest data value. > The class interval (width) can be determined by w = range/number of cells.
Build Histogram Continuous Data: > Tally Data for each Interval > Draw Rectangular Boxes with heights equal to the frequencies of the number of observations.
Histogram Shapes Unimodal (1 single peak) Bimodal (2 different peaks) Multimodal (more than 2 peaks) Symmetric (mirror image) Positively Skewed (R-stretched) Negatively Skewed (L-stretched) Uniform (straight) Truncated (limited)
Scatter Diagrams Data Format: Continuous Two Random Variables Construction: Each Ordered Pair is plotted Patterns: Positive Correlation No Correlation Negative Correlation
MEAN Sample Mean:. _. x = Data Values MEAN Sample Mean: _ x = Data Values n n = Number of Observations in Sample Population Mean: u = Data Values N N = Number of Objects in Population
Median Middle value after the observations are ordered from smallest to largest 50% of the values to the right. 50% of the values to the left. Odd number of samples: Middle value of the ordered arrangement. Even number of samples: Average of the two middle values.
MODE The most frequent value that occurs in the data set.
Quartiles Divides data into four equal parts Quartiles Divides data into four equal parts. Interquartile Range = Q3 – Q1
Trimmed Means Mean obtained from trimming off % of the observations from “each” side of a data set.
Range Difference between the largest & smallest values.
Standard Deviation The square root of the average squared deviation from the mean. _ s = [(xi – x)2 / (n-1)]1/2 Short Cut Method: s = [( xi2 – ( xi)2 / n) / (n-1)]1/2
Variance Square of the Standard Deviation.
Boxplots Information Conveyed: > Center > Spread > Nature of Symmetry > Identification of Outliers
Build Boxplots On 1. Smallest Value 2. Lower Fourth 3. Median 4 Build Boxplots On 1. Smallest Value 2. Lower Fourth 3. Median 4. Upper Fourth 5. Largest Value Fourth Spread = Upper Fourth – Lower Fourth
Construction Of Boxplot 1. Order data from smallest to largest. 2 Construction Of Boxplot 1. Order data from smallest to largest. 2. Separate smallest half from the largest half. (If n is odd include the median in both halves). 3. Lower fourth is the median of the smallest half. 4. Upper fourth is the median of the largest half. 5. Fourth spread = Upper fourth – Lower fourth. 6. On a horizontal measurement scale, the left edge of a rectangle is the lower fourth & the right edge is the upper fourth. 7. Place a vertical line inside the rectangle at the location of the median. 8. Draw whiskers out from ends of the rectangle to the smallest and largest data values.