Presentation is loading. Please wait.

Presentation is loading. Please wait.

4. Interpreting sets of data

Similar presentations


Presentation on theme: "4. Interpreting sets of data"— Presentation transcript:

1 4. Interpreting sets of data
Cambridge University Press  G K Powers 2013 Study guide Chapter 2

2 Grouped frequency tables
Classes or groups are listed in the first column in ascending order. The tally column shows the number of times a score occurs in a class. The frequency column shows the total count of the scores in each class. HSC Hint – Class centre is the middle and is calculated by adding the two extremes and dividing by 2. Cambridge University Press  G K Powers 2013

3 Cumulative frequency Cumulative frequency is the frequency of the score plus the frequency of all the scores less than that score. It is the progressive total of the frequencies. Score Frequency Cumulative frequency 18 1 19 5 6 20 3 9 21 7 16 HSC Hint – The last number in the cumulative frequency column equals the total number of scores. Cambridge University Press  G K Powers 2013

4 Cumulative frequency graphs
Cumulative frequency histogram Cumulative frequency polygon HSC Hint – Cumulative frequency polygon joins the top right corner of the rectangles in a cumulative frequency histogram. Cambridge University Press  G K Powers 2013

5 Mean Mean is a measure of the centre. It is calculated by summing all the scores and dividing by the number of scores. ‒ ‘Sum of’ (Greek capital letter sigma) x ‒ A score or data value – Mean of a set of scores n ‒ Total number of scores f ‒ Frequency HSC Hint – Make sure all data has been cleared before using the calculator for statistics. Cambridge University Press  G K Powers 2013

6 Mode Mode is the score that occurs the most number of times. Score with the highest frequency. To find the mode: Determine the number of times each score occurs. Mode is the score that occurs the most number of times. If two or more scores occur the same number of times they are both regarded as the mode. HSC Hint – Data is called bimodal if it contains two modes. Cambridge University Press  G K Powers 2013

7 Median The median is the middle score or value.
Cumulative frequency polygon is used to estimate the median. HSC Hint – Total number of scores is the value of the cumulative frequency for the last score or class. Cambridge University Press  G K Powers 2013

8 Range and interquartile range
Range = Highest score – Lowest score Interquartile range is the difference between the first quartile and third quartile. To calculate the interquartile range (IQR) Arrange the data in increasing order. Divide the data into two equal-sized groups. If n is odd, omit the median. Find Q1 the median of the first group. Find Q3 the median of the second group. Calculate the interquartile range. HSC Hint – Interquartile range is not dependent on the extreme values like the range. Cambridge University Press  G K Powers 2013

9 Standard deviation The standard deviation is a measure of the spread of data about the mean. Two calculations are used for standard deviation. Population standard deviation ( ) is a better measure when we have all of the data or the entire population. Sample standard deviation ( ) is the better measure when a sample is taken from a large population. HSC Hint – Population standard deviation or sample standard deviation can be used if it is not specified. Cambridge University Press  G K Powers 2013

10 Investigating sets of data
Outlier is a score that is separated from the majority of the data. Outliers have little effect on the mean, median and mode for large sets of data. However, in small data sets, the presence of an outlier will have a large effect on the mean, smaller effect on the median and usually no effect on the mode. Shape of the graph is described in terms of smoothness, symmetry and the number of nodes. HSC Hint – An outlier is a score that is not close to any other scores. It is not typical. Cambridge University Press  G K Powers 2013

11 Symmetry and skewness No skew (symmetric)
Data is symmetrical and balanced about a vertical line. Positively skewed Data is more on the left side. The long tail is on the right side. Negatively skewed Data is more on the right side. The long tail is on the left side. HSC Hint – Mean, mode and median are equal when the data is symmetrical. Cambridge University Press  G K Powers 2013

12 Number of modes Unimodal Data has only 1 mode or peak. Bimodal
Data has 2 modes or peaks. Multimodal Data has many modes or peaks. HSC Hint – List all the modes if the data is multimodal. Cambridge University Press  G K Powers 2013

13 Double stem-and-leaf plots
A stem-and-leaf plot has the tens digit of the data written in numerical order down the page. The ‘units’ digit becomes the ‘leaves’ and is written in numerical order across the page. HSC Hint – The numbers in the ‘leaves’ of a stem-and-leaf plot must be written in increasing order. Cambridge University Press  G K Powers 2013

14 Double box-and-whisker plots
A graph that uses five-number summary – lower extreme, lower quartile, median, upper quartile and the higher extreme. A double box-and-whisker graph has two sets of data. HSC Hint – To draw a box plot arrange the data in order before calculating the five-number summary. Cambridge University Press  G K Powers 2013

15 Radar charts A radar chart looks like a spider web and is used to compare the performance of one or more entities. HSC Hint – Line segments in a radar chart must be constructed accurately to ensure the information is valid. Cambridge University Press  G K Powers 2013

16 Area chart A graph consisting of different ‘areas’ each representing a data set over a period of time. The thickness of the area indicates the size of the data. HSC Hint – To read data from an area chart, draw a vertical line and estimate the difference between the heights. Cambridge University Press  G K Powers 2013

17 Comparison – Measures of location
Mean Advantages Easy to understand and calculate. Depends on every score. Varies least from sample to sample. Disadvantages Distorted by outliers. Not suitable for categorical data. Median Easy to understand. Not affected by outliers. May not be central. Varies more than the mean in a sample. Mode Easy to determine Not affected by outliers Suitable for categorical data May be no mode or more than one mode. May not be central Cambridge University Press  G K Powers 2013

18 Comparison – Measures of spread
Range Advantages Easy to understand. Easy to calculate. Disadvantages Dependent on the smallest and largest values. May be distorted by outliers. Interquartile range Easy to determine for small data sets. Not affected by outliers. Difficult to calculate for large data sets. Dependent on lower and upper quartiles. Data needs to be sorted. Standard deviation Depends on every score. Difficult to determine without a calculator Difficult to understand. Cambridge University Press  G K Powers 2013

19 Two-way tables A two-way table presents data using rows and columns. Data in a cell is interpreted by reading the headings for the row and the column. HSC Hint – Calculate the totals across each row and down each column. Add the totals horizontally and vertically. The results of these calculations should be equal. Cambridge University Press  G K Powers 2013


Download ppt "4. Interpreting sets of data"

Similar presentations


Ads by Google