Presentation is loading. Please wait.

Presentation is loading. Please wait.

Engineering 1811.01 College of Engineering Engineering Education Innovation Center Analyzing Measurement Data Rev: 20130604, MCAnalyzing Data1.

Similar presentations


Presentation on theme: "Engineering 1811.01 College of Engineering Engineering Education Innovation Center Analyzing Measurement Data Rev: 20130604, MCAnalyzing Data1."— Presentation transcript:

1 Engineering 1811.01 College of Engineering Engineering Education Innovation Center Analyzing Measurement Data Rev: 20130604, MCAnalyzing Data1

2 Engineering 1811.01 Example Rev: 20120103, AMAnalyzing Data2

3 Engineering 1811.01 Example Rev: 20120103, AMAnalyzing Data 3 Most values fall between 14 and 20 m. This data contains an outlier of 45.2 m.

4 Engineering 1811.01 Represent the Data with a Histogram First, determine an appropriate bin size. The bin size [k] can be assigned directly or can be calculated from a suggested number of bins [h]: Let’s try the most commonly used formula first: Rev: 20120103, AMAnalyzing Data4 If you have this many data points [n] Use this number of bins [h] Less than 505 to 7 50 to 996 to 10 100 to 2507 to 12 More than 25010 to 20

5 Engineering 1811.01 Histogram - Example Rev: 20120103, AMAnalyzing Data5 Is this the best way to represent this data? By changing our bin size, [k], we can improve the representation. Bin SizeFrequency 0-1917 19-242 24-290 29-340 34-390 39-440 44-491

6 Engineering 1811.01 Histogram - Example Rev: 20120103, AMAnalyzing Data6 All 3 histograms represent the exact same data set, but the bin width and number of bins for the two shown above were selected manually. Which one is most descriptive?

7 Engineering 1811.01 Dealing with outliers Engineers must carefully consider any outliers when analyzing data. It is up to the engineer to determine whether the outlier is a valid data point or if it is invalid and should be discarded. Invalid data points can result from measurement errors or recording the data incorrectly. Rev: 20120103, AMAnalyzing Data7

8 Engineering 1811.01 Characterizing the data Statistics allows us to characterize the data numerically as well as graphically. We characterize data in two ways: –Central Tendency –Variation Rev: 20120103, AMAnalyzing Data8

9 Engineering 1811.01 Central Tendency (Expected Value) Central tendency is a single value that best represents the data. But which number do we choose? Mean Median Mode –Note: For most engineering applications, mean and median are most relevant. Rev: 20120103, AMAnalyzing Data9

10 Engineering 1811.01 Central Tendency - Mean Rev: 20120103, AMAnalyzing Data10 Is the mean value a good depiction of the data? How does the outlier affect the mean?

11 Engineering 1811.01 Central Tendency - Mean Problem: Outliers may decrease the usefulness of the mean as a central value. Observe how outliers can affect the mean for this simple data set: Rev: 20120103, AMAnalyzing Data11 37121721 2327323644 - 112212 Without outliers Changing 3 to -112 Outlier: -112 Changing 44 to 212 Outlier: 212 Solution: Look at the median.

12 Engineering 1811.01 Central Tendency - Median Rev: 20120103, AMAnalyzing Data12 n = 20  even number of data points. Must take the average of the 2 middle values Which value looks like a better representation of the data? Mean (18.47) or median (17.4)? Why?

13 Engineering 1811.01 Central Tendency Median Rev: 20120103, AMAnalyzing Data13 Using the simple data set, observe how the median reduces the impact of outliers on the central tendency. Median = 21

14 Engineering 1811.01 Central Tendency – Mean and Median Which value, the mean (18.47 m) or the median (17.4) is a better representation of the data? Rev: 20120103, AMAnalyzing Data14

15 Engineering 1811.01 Characterizing the data We can select a value of central tendency to represent the data, but is one number enough? It is also important to know how much variation there is in the data set. Variation refers to how the data is distributed around the central tendency value. Rev: 20120103, AMAnalyzing Data15

16 Engineering 1811.01 Variation As with central tendency, there are multiple ways to represent the variation of a set of data. ± (“Plus, Minus”) gives the range of the values. Standard Deviation provides a more sophisticated look at how the data is distributed around the central value. Rev: 20120103, AMAnalyzing Data16

17 Engineering 1811.01 Variation - Standard Deviation Definition: how closely the values cluster around the mean; how much variation there is in the data Equation: Rev: 20120103, AMAnalyzing Data17

18 Engineering 1811.01 Standard Deviation Example Rev: 20130604, MCAnalyzing Data18 mean = ∑ =

19 Engineering 1811.01 Standard Deviation: Interpretation Rev: 20120103, AMAnalyzing Data19 These curves describe the distribution of students’ exam grades. The average value is an 83%. Which class would you rather be in? Curve B Curve A AA BB

20 Engineering 1811.01 Data that is normally distributed occurs with greatest frequency around the mean. Normal distributions are also frequently referred to as Gaussian distributions or bell curves Normal Distribution Rev: 20120103, AMAnalyzing Data20 Frequency Bins 0 12345 -2-3-4-5 mean

21 Engineering 1811.01 Normal Distribution Rev: 20120103, AMAnalyzing Data21 Mean = Median = Mode -68% of values fall within 1 SD -95% of values fall within 2 SDs

22 Engineering 1811.01 Other Distributions Rev: 20120103, AMAnalyzing Data22 Skewed distributions: Multimodal distribution: Uniform distribution:

23 Engineering 1811.01 What we’ve learned This lecture has introduced some basic statistical tools that engineers use to analyze data. Histograms are used to represent data graphically. Engineers use both central tendency and variation to numerically describe data. Rev: 20120103, AMAnalyzing Data23


Download ppt "Engineering 1811.01 College of Engineering Engineering Education Innovation Center Analyzing Measurement Data Rev: 20130604, MCAnalyzing Data1."

Similar presentations


Ads by Google