Presentation is loading. Please wait.

Presentation is loading. Please wait.

Frequencies and the normal distribution

Similar presentations


Presentation on theme: "Frequencies and the normal distribution"— Presentation transcript:

1 Frequencies and the normal distribution
CSC 152 (Blum)

2 Make a histogram with a bin width of 0
Make a histogram with a bin width of 0.1 using the House Fly data provided. CSC 152 (Blum)

3 Is that normal? We want to compare the distribution to a normal distribution. We will make a histogram is a more tedious procedure that will allow us to also plot the normal distribution function for a nice visual comparison. CSC 152 (Blum)

4 Calculate the mean (average) and standard deviation of the House –fly wing data
The normal distribution uses the average and standard deviation as parameters controlling the center of the distribution and how spread out it is. CSC 152 (Blum)

5 Calculate the sample size (count) of the data
The normal distribution is about probabilities. We will use the count to turn a frequency (the number of times something occurred) into a percentage/probability that something occurred. CSC 152 (Blum)

6 Calculate the minimum and maximum of the data
CSC 152 (Blum)

7 Make a range of data starting at the Min plus 0
Make a range of data starting at the Min plus and going to the Max by 0.1’s We are making the bins that Excel generated so nicely for us in the fast histogram. There is no exact rule for what the bin width should be – that’s why Excel lets you adjust that parameter. Except in the tail of the distribution, you probably don’t want frequencies of 1 or 0. We are using a bin width of 0.1 CSC 152 (Blum)

8 Warning: Array formula
Array formulas have results that span several cells instead of just one. They require: Highlighting enough cells for the complete answer Clicking Ctrl-Shift-Enter instead of just Enter. If you attempt to edit part of an array formula result, problems occur. Use the Esc key to get out of it. CSC 152 (Blum)

9 Highlight cells D2:D21 (for the answer) and insert the formula =FREQUENCY(A2:A101,C2:C21) and then hit Ctrl-Shift-Enter The range A2:A101 is the data and the range C2:C21 are the bins. CSC 152 (Blum)

10 A frequency is the number of times something occurs – in this case the values up to and including 3.6, then the values between 3.6 (exclusive) and 3.7 (inclusive), etc. CSC 152 (Blum)

11 Make a quick (unformatted, un-designed) XY Scatter graph
CSC 152 (Blum)

12 Normal distribution CSC 152 (Blum)

13 Use Excel’s formula for the normal distribution
Notice that we have used absolute addressing for the mean B$2 and standard deviation B$4. That way when the formula is copied elsewhere, the 2 and 4 are held fixed. The 4th argument is whether or not we want to sum the distribution everything from negative infinity up to and including a value – we said no. CSC 152 (Blum)

14 Before comparing we need two more steps
Before comparing we need two more steps. The first is to divide the frequencies by the sample size turning them effectively into probabilities Instead of saying the value 4.2 occurred 7 times in a sample size of 100, we say 4.2 occurred 0.07 or 7% of the time. Note the count requires absolute addressing. CSC 152 (Blum)

15 The normal distribution has to be multiplied by the bin width – the separation between our values
We had gone up by 0.2’s instead of by 0.1’s there would be roughly half as many Frequencies with roughly twice the value they have now. In math class we’d probably call the bin width Δx. This way we can get a single point to “stand in for” all the values from x to x+ Δx CSC 152 (Blum)

16 Highlight columns C, F & G (only where there’s data) and Insert an XY-Scatter chart.
To highlight non-consecutive columns: highlight the first column then hold down the Ctrl key while highlighting the second (and third) column. Then let go of Ctrl. CSC 152 (Blum)

17 Apply a Chart Layout (e.g. 1) under Design.
CSC 152 (Blum)

18 Change the title and axis labels
CSC 152 (Blum)

19 Format the axis to have a Minimum of 3
CSC 152 (Blum)

20 Right click on a data point and choose Select Data
CSC 152 (Blum)

21 Select a data set, click Edit and give the series a name.
CSC 152 (Blum)

22 Result – see legend CSC 152 (Blum)

23 Right click on the second (normal dist
Right click on the second (normal dist.) series and choose Change Format Data Series. Choose the Paint bucket (Fill & Line). Choose Line, Solid Line and pick a color CSC 152 (Blum)

24 And on Marker choose None
CSC 152 (Blum)

25 Result CSC 152 (Blum)

26 Skew and Kurt Recall that the skew (skewness) and kurt (kurtosis) calculations also provide measures of how close a distribution is to normal or in what way and by how much it deviates from a normal distribution. Calculate skew and kurt for the fly data CSC 152 (Blum)

27 Skewness rule of thumb (https://brownmath.com/stat/shape.htm)
CSC 152 (Blum)

28 CSC 152 (Blum)

29 Histogram alternative
The rest is how to make a histogram using the frequency data we just generated. CSC 152 (Blum)

30 Histogram: highlight columns C & D and choose Insert Column Chart
CSC 152 (Blum)

31 Right click and choose Select Data
CSC 152 (Blum)

32 Highlight Series 1 and click Remove
CSC 152 (Blum)

33 Highlight Series 2 and click Edit under Horizontal Axis Labels
CSC 152 (Blum)

34 Then highlight the C column and click OK
CSC 152 (Blum)

35 Right click on the x axis, Choose Format, choose Number, change the category to Number and set the Decimal place to 1 CSC 152 (Blum)

36 Result so far CSC 152 (Blum)

37 Choose a Layout (e.g. 7) and label axes
CSC 152 (Blum)

38 Add a title (if you haven’t got one already)
Add a title (if you haven’t got one already). Design tab, Add Chart Element, … CSC 152 (Blum)

39 Right click on the columns and choose Format Data Series
Right click on the columns and choose Format Data Series. Choose the gap width to be 0%. CSC 152 (Blum)

40 Choose a Solid Line and a Border Color.
CSC 152 (Blum)

41 Result CSC 152 (Blum)


Download ppt "Frequencies and the normal distribution"

Similar presentations


Ads by Google