Presentation on theme: "Frequency Tables and Single variable Graphics. Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense."— Presentation transcript:
Frequency Tables and Single variable Graphics
Listing a large set of data does not present much of a picture to the reader. Sometimes we want to condense the data into a more manageble form. This can be accomplished with the aid of a frequency distribution. FREQUENCY DISTRIBUTION
The frequency for x=1 is 3 To demonstrate the concept of a frequency distribution, lets use the following set of data: A frequency distribution is used to represent this set of data by listing the x values with their frequencies. For example, the value 1 occurs in the sample three times;
The frequency f is the number of times the value x occurs in the sample. xf Ungrouped frequency distribution We say ungrouped because each value of x in the distribution stands alone.
Classes: When a large set of data has many different x values instead of a few repeated values, as in the previous example, we can group the data into a set of classes and construct a frequency table. Lower and upper class limits: Lower class limit is the smallest piece of data that could go into each class. The upper class limits are the largest values fitting into each class. Number of classes: It can be take a value between 8 and 15. CONSTRUCTION OF A FREQUENCY TABLE
Class boundaries (true class limits) are numbers that do not occur in the sample data but are halfway between the upper limit of one class and the lower limit of the next class. Relative frequency is a propotional measure of the frequency of an occurence. Class mark (class mid-point) is the numerical value that is exactly in the middle of each class. Class interval is the difference between a lower class limit and the next lower class limit.
The two basic guidelines that should be followed in constructing a grouped frequency distribution are: 1.Each class should be of the same width. (there are some exceptions) 2.Classes should be set up so that they do not overlap and so that each piece of data belongs to exactly one class
n=155 (There are 155 observations) 1.Rank the data. 2.Identify lowest (L) and highest (H) scores and find the range (range=H-L) 3.Select the number of classes and find class width L=116, H=315 Range= =196 #of classes=8 Class Int.=196/8=24,5 25 PROCEDURE OF CLASSIFICATION
Relative Frequency =100*(8/155)= CONSTRUCTION OF THE FREQUENCY DISTRIBUTION TABLE
fifi mimi A L C=25
Measures of central tendency Mean or Median Mode is 203
Measures of position What are the 25 th, 75 th percentiles and the median?
25=x x=? x
X P 25 =Q 1 = % of observations lie below
Standart deviation or Coefficient of variation
Graphic Presentation of Data We will learn how to present single-variable data by using graphical technique. There are several graphic ways to describe data. The method used is determined by the type of data and the idea to be presented.
Bar graph and pie (circle) graph are often used to summarize attribute data. Data are represented by frequency or proportion. In graphical presentation, proportion is more meaningful than frequency. BAR GRAPH AND PIE GRAPH In a bar graph; x axis represents the attribute, while y axis (bars height) represents proportion or frequency of each attribute. In a pie graph, each piece represents proportion of attribute.
Marital status of woman are given below: Marital statusFreq.% Single Married Divorced Widowed107.2 Separate53.6 Total Example
Marital Status SeparateWidowedDivorcedMarriedSingle Percent Bar chart of marital status of woman
3,6% 7,2% 19,4% 23,0% 46,8% Separate Widowed Divorced Married Single Pie chart of marital status of woman
STEM AND LEAF PLOT The stem is leading digit(s) of the data, while the leaf is the trailing digit(s). For example, the numerical value 458 might be split into stem (45) and leaf (8). This plot provides a convinient means of tallying the observations and can be used as a direct display of data or as a preliminary step in constructing a frequency table.
Lets construct a stem-and-leaf display of following set of 20 test scores: At a quick glance we see that there are scores in 50s, 60s, 70s, 80s and 90s. Lets use the first digit of score as the stem and second digit as the leaf.
We will construct the display in a vertical position. Draw a vertical line and to the left of it locate the stems in order Next we place each leaf on its stem. This is accomplished by placing the trailing digit on the right side of the vertical line opposite to its corresponding leading digit
All scores with the same tens digit are placed on the same branch. This may not always be desired. Suppose we construct the display; this time instead of grouping ten possible values on each stem, lets group the values so that only five possible values could fall in each stem. (50-54) 5 (55-59) 5 (60-64) 6 (65-69) 6 (70-74) 7 (75-79) 7 (80-84) 8 (85-89) 8 (90-94) 9 (95-99)
Histogram is a type of bar graph representing the frequency distribution of quantitative data. A histogram is made up of the following components: 1.A title, which identifies the sample of concern. 2.A vertical scale, which identifies the frequencies (relative frequencies) in the various classes. 3.A horizantal scale, which identifies the variable x (class mid-points or true class limits or lower class limits). HISTOGRAM
Symmetric Distribution Right-skewed DistributionLeft-skewed Distribution
BOX PLOT (BOX AND WHISKER PLOT) The median and first and third quartiles of the distribution are used in constructing box plots. The location of the midpoint or median of the distribution is indicated with a horizontal line in the box. Straight lines or whiskers extend 1.5 times the interquartile range above and below the 75th and 25th percentiles when there are outliers or extreme observations. If they do not exist, lines represent minimum and maximum values. Cases with values between 1.5 and 3 box lengths from the upper or lower edge of the box are called outliers. Cases with values more than 3 box lengths from the upper or lower edge of the box are called extreme points.
Since there are no outliers BWT 3110,0 3677, (Median) 75 Percentiles Maximum Range 1588Minimum 2553,525
Mode Median Mean Mode Median Mean Mode Median Mean Left SkewedRight SkewedSimetric
SCATTER PLOT WITH ONE VARIABLE Scatter plot displays the value of each observation by a small circle, on an invisible line which is parallel to the y-axis displaying original measurement. BWT
In line graph, individual data points are connected by a line. Line plots provide a simple way to visually present a sequence of many values. LINE GRAPH
The distribution of measles cases among seansons in an area are as follows: Spring75 Summer25 Fall50 Winter100 SEASONS WinterFallSummerSpring Frequency
Error bars help you visualize distributions and dispersion by indicating the variability of the measure being displayed. The mean of a scale variable is plotted for a set of categories, and the length of the bar on either side of the mean value indicates standard deviations. Error bars can extend in one direction or both directions from the mean. Error bars are sometimes displayed in the same chart with other chart elements such as bars or lines. ERROR BARS