Download presentation
Presentation is loading. Please wait.
1
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005
Dr. John Lipp Copyright © John Lipp
2
Copyright 2002 - 2005 Dr. John Lipp
Session 1 Outline Part 1: The Statistics You Thought You Knew. Part 2: Probability Theory. Part 3: Discrete Random Variables. Part 4: Continuous Random Variables. EMIS 7300 Copyright Dr. John Lipp
3
Today’s Session Topics
Part 1: The Statistics You Thought You Knew. What is Statistics? Mechanistic vs. Empirical Models. Deterministic vs. Statistical Modeling. Populations and Samples. Mean, Variance, Standard Deviation. Mode, Range, Quartiles, Percentiles. Frequency, Relative Frequency. Simple Linear Regression, Correlation Coefficient. Dot Diagram, Box Plot, Histogram, Scatter Plot. EMIS 7300 Copyright Dr. John Lipp
4
Copyright 2002 - 2005 Dr. John Lipp
What is Statistics? Statistics is the mathematics branch dealing with applied probability theory. Statistics is very prescriptive. Statistics’ main emphasis is on decision making. Engineering is well populated with decisions to be made based on random or imperfect data: Is this radar measurement just cosmic radiation, or is it a stealth fighter? How many high-pressure hoses in this lot should be destructively tested to be confident the whole lot is good? Is there a correlation between system performance and missile mass, antennae gain, bad FLIR pixels, etc. EMIS 7300 Copyright Dr. John Lipp
5
What is Statistics? (cont.)
Statistics is also concerned with estimation of unknown quantities (statistical parameters like mean and variance). Estimation is the more prevalent statistics problem found in engineering: The Kalman filter (a course into and of itself). Design of Experiments (another course). Regardless of the problem statistics boils down to modeling! EMIS 7300 Copyright Dr. John Lipp
6
Copyright 2002 - 2005 Dr. John Lipp
Statistical Modeling Engineering analysis often begins with a mechanistic model of a physical system using scientific first principles, for example, F = ma, V = IR, etc. The analysis results of such a design are deterministic, exact and reproducible. Transfer Function x1 x2 xn y = f(x1, x1, …, xn) y EMIS 7300 Copyright Dr. John Lipp
7
Statistical Modeling (cont.)
Sources of experimental variability are many: Imperfect hardware and measurement devices. Assumptions are approximate (frictionless surfaces really aren’t, missiles flex during maneuvers, etc.). A mechanistic model can be augmented with random errors to represent this lack of knowledge: x1 x2 xn y = f(x1, x1, …, xn, e1, e1, …, em) y e1 e2 em EMIS 7300 Copyright Dr. John Lipp
8
Statistical Modeling (cont.)
The primary parameters in a statistical model of a system are The mean of the response, Y-bar. The standard deviation of the response, S. The random distribution of the response errors. A deterministic model only considers the the mean of the response; The response’s standard deviation is effectively 0. The response’s random distribution is an indeterminate concept. EMIS 7300 Copyright Dr. John Lipp
9
Statistical Modeling (cont.)
When a mechanistic model is unavailable, an empirical model based on experimental evidence can be constructed by considering the system to be a “black box.” Response Mean Response Variation x1 x2 y xn Input Factors e Output Response(s) Random Errors / Noise EMIS 7300 Copyright Dr. John Lipp
10
Copyright 2002 - 2005 Dr. John Lipp
Modeling (cont.) Neither a mechanistic or empirical model is appropriate in some cases! Some phenomenon are purely random, possibly even irreducibly random. EMIS 7300 Copyright Dr. John Lipp
11
Copyright 2002 - 2005 Dr. John Lipp
Populations Missiles are built in lots of 10. The following parameters are measured as percentages of the requirement specifications. Missile Weight Motor Seeker Range Labor 1 99 96 105 2 101 102 3 98 95 4 103 97 5 6 100 7 90 8 9 94 10 W = * randn(1,10) M = * randn(1,10) S = * randn(1,10) R = 99 + (M – 100) – (w – 100) L = * randn(1,10) – 2 * (w – 100) EMIS 7300 Copyright Dr. John Lipp
12
Copyright 2002 - 2005 Dr. John Lipp
Dot Diagrams Populations (cont.) Weight 90 92 96 100 102 106 110 104 108 98 94 Motor 90 92 96 100 102 106 110 104 108 98 94 } Number of dots represents the frequency of the data value. Seeker 90 92 96 100 102 106 110 104 108 98 94 Range 90 92 96 100 102 106 110 104 108 98 94 Labor 90 92 96 100 102 106 110 104 108 98 94 EMIS 7300 Copyright Dr. John Lipp
13
Population Mean, Variance, and Standard Deviation
The size of a population is denoted N. The number of unique data values will be denoted M. The population mean is a measure of a population’s central tendency. It is commonly denoted by the Greek letter and is computed from data via The population variance is a measure of a population’s variability about the population mean. It is commonly denoted by the Greek letter 2 and is computed from data via EMIS 7300 Copyright Dr. John Lipp
14
Population Mean, Variance, and Standard Deviation (cont.)
The population standard deviation, , is the square root of the population variance. It is also a measure of variability about the mean. Unlike the population variance, the population standard deviation has the same units as the population mean and the raw data. In engineering 2 is usually proportional to power, while is proportional to magnitude (voltage, current, force, velocity, etc.). 1 contains 68.3% of “normal” data 2 contains 95.5% of “normal” data 3 contains 99.7% of “normal” data EMIS 7300 Copyright Dr. John Lipp
15
Population Mean, Variance, and Standard Deviation (cont.)
Weight 90 92 96 100 102 106 110 104 108 98 94 Motor 90 92 96 100 102 106 110 104 108 98 94 Seeker 90 92 96 100 102 106 110 104 108 98 94 Range 90 92 96 100 102 106 110 104 108 98 94 Labor 90 92 96 100 102 106 110 104 108 98 94 EMIS 7300 Copyright Dr. John Lipp
16
Population Range, Median, Quartiles, and Percentiles
The population range is another measure of variability. It is the difference between the largest and smallest data values. The population mode is the most frequently occurring value in the samples, that is, the most probable value. Ties are allowed. The population range and mode are rarely used. If the size of the population is infinite they can be undefined. EMIS 7300 Copyright Dr. John Lipp
17
Population Range, Median, Quartiles, and Percentiles
The population median is another measure of central tendency. It is computed via sorting the data (from lowest to highest) and dividing this ordered data into two equal halves at the data mid-point. If N is odd, the median is the “left over” data point after dividing at the mid-point into equal halves. If N is even, the median is the average of the two data points on either side of the mid-point. Regardless of N’s value, the same number of data points are above and below the median’s value (ties with the median are allocated above/below as necessary). EMIS 7300 Copyright Dr. John Lipp
18
Population Range, Median, Quartiles, and Percentiles (cont.)
The division points which divide ordered data into four “equal” portions are the population quartiles: The first or lower quartile is denoted q1. The second quartile is the median, q2. The third or upper quartile is denoted q3. The difference q3 – q1 is called the interquartile range and is yet another measure of variability. For “normal” data the interquartile range should be about 4/3. The division points which divide ordered data into 100 “equal” portions are the population percentiles. Percentiles are most commonly denoted as i. EMIS 7300 Copyright Dr. John Lipp
19
Population Range, Median, Quartiles, and Percentiles (cont.)
What constitutes “equal” portions is defined by the following algorithm Sort (order) the x data in increasing order, call the result y Determine the quartile / percentile point z by computing Q {1..3} for the first quartile, second quartile (median), or third quartile, respectively. K {1..100} for the K-th percentile. EMIS 7300 Copyright Dr. John Lipp
20
Population Range, Median, Quartiles, and Percentiles (cont.)
Linear interpolation is used to compute the quartiles and percentiles from the two closest data points to z via = closest integer less than or equal to z (floor). = closest integer greater than or equal to z (ceiling). For the missile lot, N = 10. Thus the value of z for q1 is 2.75 and for q3 it is The quartile equations are then EMIS 7300 Copyright Dr. John Lipp
21
Population Range, Median, Quartiles, and Percentiles (cont.)
The population statistics (means, variances, quartiles, etc.) for the missile lot are computed in the table below. Parameter Weight Motor Seeker Range Labor 100.7 99.8 100.2 98.0 99.5 2 2.01 10.36 0.96 7.60 17.65 1.42 3.22 0.98 2.76 4.20 range 5 11 3 9 15 q3 – q1 2.25 4.75 2.00 5.00 5.25 q1 99.75 97.50 99.00 95.00 96.75 q2 101.0 100.0 98.5 100.5 q3 102.00 102.25 101.00 100.00 EMIS 7300 Copyright Dr. John Lipp
22
Population Range, Median, Quartiles, and Percentiles (cont.)
Below is a common method of illustrating the statistical quantiles called a box plot. The box is drawn from q1 to q3 (the interquartile range) and has the median, q2, marked in the middle. A line and mark known as a whisker extends from the box’s q1 end to the smallest data point within 1.5 interquartile ranges from q1. Likewise, a whisker is drawn from the box’s q3 end to the largest data point within 1.5 interquartile ranges from q3. Outlier Labor 90 92 96 100 102 106 110 104 108 98 94 EMIS 7300 Copyright Dr. John Lipp
23
Population Range, Median, Quartiles, and Percentiles (cont.)
Weight 90 92 96 100 102 106 110 104 108 98 94 Motor 90 92 96 100 102 106 110 104 108 98 94 EMIS 7300 Copyright Dr. John Lipp
24
Population Range, Median, Quartiles, and Percentiles (cont.)
Seeker 90 92 96 100 102 106 110 104 108 98 94 Range 90 92 96 100 102 106 110 104 108 98 94 EMIS 7300 Copyright Dr. John Lipp
25
Population Range, Median, Quartiles, and Percentiles (cont.)
% Population analysis for Missile Lot x = [ ]; y = sort(x); x_bar = mean(x) sigma = std(x,1) var = sigma.^2 q2 = median(x) rng = range(x) q1 = (3*y(3,:) + y(2,:)) / 4 q3 = (3*y(8,:) + y(9,:)) / 4 q31 = q3 - q1 Code on the left is for MATLAB. Notice that MATLAB has built-in functions to compute most of the statistical values. std(x,1) divides by N to give the population standard deviation, while std(x) divides by N-1 to give the sample standard deviation. EMIS 7300 Copyright Dr. John Lipp
26
In Class Assignment – Deal or Dud?
The customer is unhappy with his lot of missiles and canceling the contract. They claim the missiles don’t meet the contract requirements!!! Divide up into two teams of statisticians, One representing the plantiffs (the customer), and The other the defendant (Missile King). Prepare to argue WITH STATISTICS the case for your side! EMIS 7300 Copyright Dr. John Lipp
27
Copyright 2002 - 2005 Dr. John Lipp
Samples An entire population may not or cannot be measured to determine the population’s statistics: The measurement process is destructive. The measurement costs are excessive. The population is evolving, i.e, the statistics fluctuate. The population is theoretical (N = ). Instead of measuring the population, a sub-set or sample of the population can be measured and the parameters estimated by statistical inference. The number of items in the sub-sample is typically denoted as n in statistics. (Note n < N.) Denote the number of unique data values as m. EMIS 7300 Copyright Dr. John Lipp
28
Sample Mean, Variance, and Standard Deviation
The sample mean is an estimate of the population mean. It is commonly denoted by x and is computed from data via The sample variance is an estimate of the population variance. It is commonly denoted by s2 and is computed from data via EMIS 7300 Copyright Dr. John Lipp
29
Sample Range, Median, Quartiles, and Percentiles (cont.)
The sample standard deviation, s, is likewise the square root of the sample variance and is an estimate of the population standard deviation. The sample range is the difference between the largest and smallest data sample values. Similarly, the sample median, sample quartiles, and sample percentiles are found by using the sorted data samples and replacing n for N in the population equations. The sample mode is the most frequently occurring value in the data samples. Ties are allowed. EMIS 7300 Copyright Dr. John Lipp
30
Statistical Model for Sample Mean and Variance
Simplest Transfer Function: Y-bar and S are constants. S = 1 Y-bar = 5 EMIS 7300 Copyright Dr. John Lipp
31
Copyright 2002 - 2005 Dr. John Lipp
Sample Statistics Consider the missile lot. If only 3 of the 10 missiles are sampled, what is the sample mean? Let i, j, and k denote the missiles’ selected for the sample. Then the sample mean is Since {i, j, k} are selected at random, so are the {xi, xj, xk} data values. That implies the sample mean is Itself is a random variable, Has a population, and Has its own mean, variance, quartiles, etc. EMIS 7300 Copyright Dr. John Lipp
32
Sample Statistics (cont.)
The first step in determining the sample means’ statistics is to determine the population Select i first. Since N = 10, i can take on one of 10 values. Select j next. Since i j, j can take on one of 9 values. Finally select k. Since k i j, k is one of 8 values. The total population size is 10 9 8 = 720. The process applicable to determining the population in this case is known as selection without replacement. The general formula for the size is known as the number of permutations where n is the population size and r is the sample size. EMIS 7300 Copyright Dr. John Lipp
33
Sample Statistics (cont.)
However, the order in which {xi, xj, xk} are added does not change the value of the sample mean. Regardless of the values of i, j, and k , there are six orders in which they can be randomly draw {i, j, k} {j, i, k} {i, k, j} {j, k, i} {k, i, j} {k, j, i} Thus, the population size can be reduced to 720 / 6 = 120. Generally, the number of orders in which r things can be arranged is r! (= rPr). When the order of draw is not important, the formula for the number of combinations is EMIS 7300 Copyright Dr. John Lipp
34
Sample Statistics (cont.)
Consider the n = 3 sample mean for the weight of the missile lot. Note that Largest possible value = ( ) / 3 = /3. Smallest value possible = ( ) / 3 = 96. Range is 7 1/3, with discrete steps every 1/3. That is, the number of unique values is 22 !?! What is different about the proposed populations? For N=720: each permutation of i, j, and k is equally probable. N=120: each combination of i, j, and k is equally probable. N=22: each sample mean is unequally probable! EMIS 7300 Copyright Dr. John Lipp
35
Sample Statistics (cont.)
Sampling Distribution (of the Mean) mean weight (n = 3) 98 99 100 101 102 EMIS 7300 Copyright Dr. John Lipp
36
Sample Statistics (cont.)
Print / make transparency 98 99 100 101 102 EMIS 7300 Copyright Dr. John Lipp
37
Sample Statistics (cont.)
% Population of N = 120 % i = zeros(120,1); j = zeros(120,1); k = zeros(120,1); ijkdx = zeros(120,3); N = 0; for idx = 1:10, for jdx = (idx+1):10, for kdx = (jdx+1):10, N = N + 1; i(N) = idx; j(N) = jdx; k(N) = kdx; ijkdx(N,:) = [idx jdx kdx]; end % Check that i,j,k are all different sum(i==j), sum(i==k), sum(j==k), Code on the left and next page is for MATLAB. This code assumes that you have already run the code on page The first code section formulates the population for the n = 3 mean. The second code section computes the statistics. The third code section computes the data for dot diagrams / histograms and plots them. EMIS 7300 Copyright Dr. John Lipp
38
Sample Statistics (cont.)
% Sample mean population statistical analysis % x3 = (x(i,:) + x(j,:) + x(k,:)) / 3; x_bar = mean(x3) sigma = std(x3,1) var = sigma.^2 q2 = median(x3) y3 = sort(x3); z1 = 1 * (N+1) / 4; z3 = 3 * (N+1) / 4; q1 = (z1 – floor(z1)) * y3(ceil(z1),:) + … (ceil(z1) – z1) * y3(floor(z1),:) q3 = (z3 – floor(z3)) * y3(ceil(z3),:) + … (ceil(z3) – z3) * y3(floor(z3),:) q31 = q3 – q1 % Compute / plot dot diagrams (but use the % built-in histogram function) for all columns % for loop = 1:min(size(x3)), rng = (3*min(x3(:,loop))):(3*max(x3(:,loop))); figure(loop); hist(3*x3(:,loop),rng); xx = get(gca,’xtick’); set(gca,’xticklabel’,xx/3); end EMIS 7300 Copyright Dr. John Lipp
39
Sample Statistics (cont.)
Parameter Sample Mean (n = 3) of Weight Motor Seeker Range Labor 100.7 99.8 100.2 98.0 99.5 2 0.52 2.69 0.25 1.97 4.58 0.72 1.64 0.50 1.40 2.14 q3 – q1 1.0000 2.3333 0.6667 2.0000 3.0000 q1 q2 q3 EMIS 7300 Copyright Dr. John Lipp
40
Frequency, Relative Frequency, and the Histogram
Up to now have been using the dot diagram. Clumsy when there is a lot of data Difficult to see statistical trends in data Another way to show the distribution of data values is by tabulating the number of occurrences within data sub-ranges. The data sub-ranges are commonly referred to as bins. The number of data occurrences within a particular bin is referred to as the frequency. Frequency vs. bin is known as a frequency distribution. When plotted, often as a bar chart, the result is called a histogram. EMIS 7300 Copyright Dr. John Lipp
41
Frequency, Relative Frequency, and the Histogram (cont.)
Histogram procedural suggestions A good rule of thumb for selecting the number of bins is to use an integer close to the square root of the data set size. End points of a histogram normally include ALL outliers. That can distort the scaling of the histogram when the number of outliers is large. Ergo, sometimes outliers may be excluded. A good choice for the (center) bin of the histogram is near the mean, median, or mode. Use sigma, range, or quartiles for guide as to bin size. For example, if the number of data points is 30, that suggest using around 6 bins. Choose bin 4 to be the mean, and the other bins to be each one standard deviation away. EMIS 7300 Copyright Dr. John Lipp
42
Frequency, Relative Frequency, and the Histogram (cont.)
Consider the n = 3 weight sample mean Bin Sub-Range Bin “Label” Frequency Cumulative Frequency Relative Frequency Cumulative Relative Frequency Print / make transparency EMIS 7300 Copyright Dr. John Lipp
43
Sample Statistics (cont.)
Print / make transparency 98 99 100 101 102 EMIS 7300 Copyright Dr. John Lipp
44
Copyright 2002 - 2005 Dr. John Lipp
Homework The Wiley web-site has a student page with errata (corrections) to the text book Your first assignment is to download the errata document This will save you heartache as several homework answers in Appendix C of the textbook are incorrect! EMIS 7300 Copyright Dr. John Lipp
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.