Presentation on theme: "2007© BOLD Educational Software"— Presentation transcript:

Note: for more information, print the NOTES PAGES, or view the notes below the PowerPoint slide Descriptive Statistics: Frequency Distributions and Graphic Presentations DESCRIPTIVE STATISTICS is the use of statistics to summarize and describe a group of scores or a set of data. A FREQUENCY DISTRIBUTION is a list of all the scores for a sample or population indicating the number of times that each score occurs; the data are grouped into mutually exclusive categories showing the number of observations in each category. A SAMPLE is a subset of elements from a population. Those elements or subjects included in a given research study. A POPULATION consists of all scores or members of a group that are of interest to a researcher; the group to which the researcher wishes to generalize.  2007© BOLD Educational Software

What Is the Question? Descriptive statistics answers the question: What does this data look like? Remember that research is the careful, systematic, patient investigation undertaken to discover or establish facts and relationships, and contribute to general knowledge. It starts with a PROBLEM TO BE SOLVED… that problem is then stated in the form of a RESEARCH QUESTION. However, when the study is conducted, other researchers READING the study need to know more than just the answers found to the specific research question. They also need to know who the research population was, who was included in the sample, what the population and sample ‘look like’ in terms of the key variables in the study. Remember that the POPULATION consists of all scores or members of a group that are of interest to a researcher; the group to which the researcher wishes to generalize, while the SAMPLE is the a subset of elements from a population…those elements or subjects included in a given research study. The VARIABLES in the study are the characteristics of people, place, events or things that change, i.e., a measurement that can take on more than one value. 2007© BOLD Educational Software

Frequency Distribution
Frequency distribution: a list of all the scores for a sample or population indicating the number of times that each score occurs. The data are grouped into mutually exclusive categories showing the number of observations in each category. Frequency distributions answer the research question, “Which scores occurred, and how often did each occur?” A SIMPLE FREQUENCY DISTRIBUTION shows the number of times each score occurred in a data set. The symbol for a score’s frequency is f. To find the score’s frequency, or f, simply count how many times that score occurred in the data. Remember that “mutually exclusive” means an individual or item that, by virtue of being included in one category, must be excluded from any other category. 2007© BOLD Educational Software

Practice: Describe the Distribution
Students f 0 up to 3 1 3 up to 6 6 up to 9 4 9 up to 12 5 12 up to 15 10 15 up to 18 25 Total 46 46 students took an 18-question quiz. Describe the distribution. Note: read “zero, up to, but not including 3; 3 up to, but not including 6”, etc. HINT: when trying to describe a distribution, look for tendencies in the data: are there large groups and small groups? 2007© BOLD Educational Software

Practice: Describe the Distribution
Students f 0 up to 3 1 3 up to 6 6 up to 9 4 9 up to 12 5 12 up to 15 10 15 up to 18 25 Total 46 Out of 46 students, largest number of students (25) got 15, 16, 17, or 18 answers correct. Only 2 students got less than 6 answers correct. There is no one right answer, but there are some wrong ways to interpret the data. If your interpretation doesn’t simplify the chart, then it probably is TOO descriptive! Often, in research, we let the frequency distribution ‘speak’ for itself, but you – another researcher reading the study – need to know how to interpret the distribution. Practice: Describe the Distribution 2007© BOLD Educational Software

Creating a Relative Frequency Distribution
Students f Relative Frequency 0 up to 3 1 2.17% 1/46 3 up to 6 2.17 6 up to 9 4 8.70 4/46 9 up to 12 5 10.87 5/46 12 up to 15 10 21.74 10/46 15 up to 18 25 54.35 25/46 Total 46 100.00 A distribution based on the relative frequency of the scores in each group is called a RELATIVE FREQUENCY DISTRIBUTION. A relative frequency distribution shows us the percentage in each class or group. To create a Relative Frequency Distribution table, first create a simple frequency table. Next, show the relative frequency of each score in the third column. To compute the relative frequency, simply divide the number (f) of the group by the total. So here, approximately 54% fell into the group or class. Creating a Relative Frequency Distribution 2007© BOLD Educational Software

Practice: Where Do the Data Tend to Cluster?
Ages f 10 up to 20 2 20 up to 30 1 30 up to 40 5 40 up to 50 20 50 up to 60 25 60 up to 70 3 70 up to 80 4 Total 60 Data tend to cluster between 40 and 59 HINT: look for major trends in the data. 2007© BOLD Educational Software

Practice: Describe the Distribution
The ages range from a low of 10 to a high of 79, with some clustering in the 40 to 59 age brackets. Ages f 10 up to 19 2 20 up to 29 1 30 up to 39 5 40 up to 49 20 50 up to 59 25 60 up to 69 3 70 up to 79 4 Total 60 Don’t get fancy with your writing…just say it ‘like it is’. Note that in THIS distribution, the classes are inclusive, so the first class ends in 19 and the second class begins with 20. 2007© BOLD Educational Software

Ages f Relative Freq. 10 up to 19 2 2/60 20 up to 29 1 1/60 30 up to 39 5 40 up to 49 20 50 up to 59 25 60 up to 69 3 70 up to 79 4 Total 60 100.00 Simply take the frequency of each group or class, and divide it by the total. That will give you the relative frequency of each group. 2007© BOLD Educational Software Practice: Determine the Relative Frequency Distribution

Charts and Graphs Remember, the purpose of DESCRIPTIVE STATISTICS is to summarize and describe a group of scores or a set of data. Often, a picture is worth a thousand words! 2007© BOLD Educational Software

Graphic Presentation of a Frequency Distribution
The three commonly used graphic forms are: Histograms, Bar charts, Frequency polygons. There are actually many types of charts and graphs. If you come across a chart or a graph you don’t understand, you may want to keep a statistics textbook handy, or refer to the HELP menu in SPSS or other statistical packages. SPSS can create the following charts/graphs: Simple, clustered, or stacked BAR CHARTS Simple, multiple, or drop line LINE CHARTS Simple or stacked AREA CHARTS PIE CHARTS Five different types OF HIGH-LOW CHARTS Simple or stacked PARETO CHARTS Four different types of CONTROL CHARTS Simple or clustered BOX PLOTS Simple or clustered ERROR BAR CHARTS Four different types of SCATTER PLOTS HISTOGRAMS PROBABILITY PLOTS (P-P plots) QUANTILE PROBABILITY PLOTS (Q-Q plots) TIME SERIES (SEQUENCE) PLOTS AUTOCORRELATIONS CROSS-CORRELATIONS SPECTRAL PLOTS Most of these are used for advanced statistical procedures, but if you see one, you can check the help menu in SPSS to read more about it. Graphic Presentation of a Frequency Distribution 2007© BOLD Educational Software

Graphic Presentation of a Frequency Distribution
Histogram: A graph in which the classes are marked on the horizontal axis and the class frequencies on the vertical axis. The class frequencies are represented by the heights of the bars and the bars are drawn adjacent to each other. A HISTOGRAM a pictoral display of how many times any given score appears in the data set where the classes are marked on the horizontal axis and the class frequencies on the vertical axis.  The class frequencies are represented by the heights of the bars, and the bars are drawn adjacent to each other. Graphic Presentation of a Frequency Distribution 2007© BOLD Educational Software

Histogram for Years of Experience: Note the Normal Curve Superimposed
Histograms show the distribution of a single numeric variable. The histogram shows the number of cases that fall within each interval. 2007© BOLD Educational Software

Common Shapes of Histograms
When folded vertically, both sides are (more or less) the same. Symmetric Common Shapes of Histograms 2007© BOLD Educational Software

Common Shapes of Histograms
Also Symmetric Common Shapes of Histograms 2007© BOLD Educational Software

Common Shapes of Histograms
Uniform (Platykurtic – flat) Common Shapes of Histograms 2007© BOLD Educational Software

Leptokurtic (high peak)

These histograms are skewed.
Non-Symmetric Histograms These histograms are skewed. SKEWNESS is the lack of symmetry (positively skewed has many low scores and a right-handed tail; negatively skewed has many high scores and a left-handed tail).  Data from a positively skewed (skewed to the right) distribution have values that are bunched together below the mean, but have a long tail above the mean. Common Shapes of Histograms 2007© BOLD Educational Software

Common Shapes of Histograms
Skewed Histograms Skewed left (negative skew) Skewed right (positive skew) Common Shapes of Histograms 2007© BOLD Educational Software

Skewed Histograms Notice that the SKEW follows the TAIL
Skewed left (negative skew) Skewed right (positive skew) Notice that the SKEW follows the TAIL 2007© BOLD Educational Software

Common Shapes of Histograms
The two largest rectangles are separated by at least one class. Bimodal f BIMODAL DISTRIBUTION: a frequency distribution that has two modes. Common Shapes of Histograms 2007© BOLD Educational Software

Bar Graphs A bar graph illustrates nominal data with the scores/categories along the x-axis and the frequencies on the y-axis. The scores are not ordered. The bars do not touch (unlike a histogram). The heights correspond to the number of times the score occurs. The horizontal axis is sometimes referred to as the abscissa, although the abscissa is more correctly the actual coordinates along the x-axis. The vertical axis is sometimes referred to as the ordinate, although the ordinate is more correctly the actual coordinates along the y-axis. 2007© BOLD Educational Software

Practice: Describe the Data
The majority of students in the sample were White. The next largest group was Hispanic. The smallest representative groups were Black, Asian, and Indian. Note the “count” on the y-axis. The data can be distorted by changing the spread on this axis. Be careful when viewing histograms and bar charts: always look at the increments. 2007© BOLD Educational Software

Graphic Presentation of a Frequency Distribution
A frequency polygon consists of line segments connecting the points formed by the class midpoint and the class frequency. A FREQUENCY POLYGON is similar to a histogram, except line segments are used instead of bars – the points formed by the intersections of the class midpoints and the class frequencies. Graphic Presentation of a Frequency Distribution 2007© BOLD Educational Software

Frequency Polygon for Hours Spent Studying
2 4 6 8 10 12 14 15 20 25 30 35 Hours spent studying Frequency Frequency Polygon for Hours Spent Studying 2007© BOLD Educational Software

Compare the Frequency Polygon to the Histogram
To turn a histogram into a frequency polygon, just draw a line from the top center of each bar 2007© BOLD Educational Software

Go to GRAPHS, HISTOGRAM, and select the variable you want to display. Histogram Using SPSS 2007© BOLD Educational Software

Frequency Polygon Using SPSS
To create a FREQUENCY POLYGON using SPSS, use the “LINE” – “SIMPLE” selections under ‘Graphs’. Frequency Polygon Using SPSS 2007© BOLD Educational Software

Stem and Leaf A frequency distribution table that provides a visual picture of the distribution A Stem and Leaf Plot is a graph similar to a histogram but includes more information. The Stem and Leaf Plot summarizes the shape of a set of data (the distribution) and provides extra detail regarding individual values. 2007© BOLD Educational Software

Stem and Leaf Each raw score has two parts: a stem, consisting of all but the last digit, and the leaf, the last digit in the number. The "stem" is a column of the data with the last digit removed. The final digits of each column are placed next to each other in a row next to the appropriate column. Then each row is sorted in numerical order. This diagram was invented by John Tukey. 2007© BOLD Educational Software

Stem and Leaf Current Salary Stem-and-Leaf Plot Frequency Stem & Leaf 2.00 Extremes (>=81250) Stem width: Each leaf: case(s) A Stem and Leaf plot is not found under ‘graphs’ in SPSS, because it is considered a different type of frequency distribution, rather than a graph. 2007© BOLD Educational Software

Each stem represents 10 thousand, so the 1 (stem) = 10,000
There are two cases (frequency=2) with 15,000, two cases with 20,000 (actually, 24,000 and 27,000), 6 cases with 30,000 (30, 30, 31, 32, 33, and 34 thousand) in this data set. Current Salary Stem-and-Leaf Plot Frequency Stem & Leaf 2.00 Extremes (>=81250) Stem width: Each leaf: case(s) 2007© BOLD Educational Software

Stem and Leaf in SPSS To create a stem and leaf in SPSS, select the following: Analyze Descriptives Explore Select stem and leaf in “plots” Click continue Click OK 2007© BOLD Educational Software

Scatter plots A scatter plot illustrates the relationship between two continuous variables. A scatterplot is a visual illustration of a correlation between two variables, one being plotted on the x-axis (horizontal) and one plotted on the y-axis (vertical). 2007© BOLD Educational Software

Scatter plots A scatter plot illustrates the values of Y (vertical axis) versus the corresponding values of X (horizontal axis) A scatterplot is a visual illustration of a correlation between two variables, one being plotted on the x-axis (horizontal) and one plotted on the y-axis (vertical). 2007© BOLD Educational Software

Scatter plots Scatter plots can provide answers to the following questions: Are variables X and Y correlated? (as one variable goes up, the other variable goes up/down) A scatterplot is a visual illustration of a correlation between two variables, one being plotted on the x-axis (horizontal) and one plotted on the y-axis (vertical). 2007© BOLD Educational Software

Scatter plots Scatter plots can provide answers to the following questions: Is there a linear relationship between X and Y? (as one variable goes up, the other variable goes up/down) A linear relationship means that the series of points on the scatter plot form a line or a linear shape (with points on either side of the line). 2007© BOLD Educational Software

Scatter plots Scatter plots can provide answers to the following questions: Is there a curvilinear relationship between variables X and Y? (As Y goes up X goes up, then at a peak, as X continues to go up, Y goes down A curvilinear relationship is exactly what it looks like: the points on the plot form a U shape instead of a straight line. The U can be upside down, as it is in the graphic illustration on this slide. 2007© BOLD Educational Software

Scatter plots Scatter plots can provide answers to the following questions: Are there outliers? (Do one or more points stray from the trend?) The definition of an outlier: Scores with standardized residuals greater than 2 (scores significantly more deviant than most scores). A ‘standardized residual’ is a type of standard deviation (see modules 3 and 4). The standard deviation is a standardized ‘ruler’ that tells you how far away from the mean a given score is. The ‘residual’ is the difference between the predicted value of the score (where the line is in the middle of the scatter plot) and the actual score (where the point falls on the scatter plot). 2007© BOLD Educational Software

Similar presentations