Statistical Analysis - Chapter 2 “Organizing and Analyzing Data” Fashion Institute of Technology Dr. Roderick Graham.

Statistical Analysis - Chapter 2 “Organizing and Analyzing Data” Fashion Institute of Technology Dr. Roderick Graham

Showing Data Graphically  When we collect a sample, we initially want to get a picture of how the data “looks”.  We can show our “stakeholders” easily what the patterns in the data are  What do we mean by “stakeholders”?  Three of the ways to show data are Histograms, Frequency Polygons, and Circle Graphs

Showing Data Graphically  Look at the listing of numbers on p.17  This is called “ungrouped” data  Sometimes it is better to “group” data into categories…this makes it easier to represent data graphically (p.18)

Histograms  Let’s look at the move from “ungrouped data” to the construction of a histogram in your textbook…(pp. 17 – 18) 1. Start with a survey of numbers…or “ungrouped data” 2. Decide on the categories you want to use and “group” the numbers into the categories that fit it 3. Now the data has been changed from a series of ages, to GROUPS of ages 4. We can compute statistics for both grouped and ungrouped data

Histograms  Let’s figure out this Histogram (taken from actual data I am using)…  1 = 18 – 24  2 = 25 – 34  3 = 35 – 44  4 = 45 – 54  5 = 55 – 64  6 – 65+ How many people are between ages 45 and 54?

Frequency Polygon (Line Graph)  This is a line graph representing the shape of a histogram  Usually when you have “too many bars” (categories) you may want to use line graph  This can be used to show trends easier than a histogram.

Circle Graph  These graphs are used to show what percentage (proportion) of a sample is doing what.  Your textbook goes into some detail about how to create circle graphs with a protractor…lucky for us we have Excel!  Below is an example from the CDC showing the percentages of how people have become infected with HIV…

Key Points  It is up to you (researcher) to decide what graph is most important for presenting your data. For me… 1. If am showing a small amount of categories, I use a histogram 2. If I am showing trends through time, or a large number of categories, I use a line graph 3. If I want to show percentages, I use a circle graph (this always the best way to show percentages)

Our first “statistics”  Remember that statistics are values that we compute from our sample of data that we have collected. We will learn two basic and important types of statistics:  Measures of Central Tendency – What are the middle values for our data?  Measures of Dispersion or Spread – How much diverse is our data…or how widely scattered is our data?  You can compute these statistics for both grouped and ungrouped data

Measures of Central Tendency (ungrouped)  What if we had collected data about one measure, and we wanted to know what the middle value was for this measure?  Ex. What is the middle value, in age, for those who listen to Lady Gaga?  Ex. How many times do young Hispanic women report shopping at H&M?  Knowing this middle, or central, value is important for describing our data.  There are three measures of central tendency…

Measures of Central Tendency (ungrouped)  Mean (p.24)  This is the mathematical average of a set of numbers  Median (p.26)  This is the middle value of a set of data that has been arranged from lowest to highest  Mode (p. 27)  The value that occurs the most in a set of data  We can use income as a good way of discussing these three measures. Imagine that we wanted to know the average incomes for FIT students. Imagine that we took a random sample of incomes for FIT students. …

Measures of Central Tendency (ungrouped)  The sample gives these values:  5000, 6000, 30000, 110000, 15000, 6000, 17000, 13000, 12000, 11000, 8000, 6000, 15000, 6000, 11500  The Mean  This is the average….  Sum of values = 271500  Total N = 15  Mean = 18100

Measures of Central Tendency (ungrouped)  The sample gives these values:  5000, 6000, 30000, 110000, 15000, 6000, 17000, 13000, 12000, 11000, 8000, 6000, 15000, 6000, 11500  The Median  This is the middle values:  5000, 6000, 6000, 6000, 6000, 8000, 11000, 11500, 12000, 13000, 15000, 15000, 17000, 30000, 110000  The median here is 11500  In cases where there are two middle values, we average the two.

Measures of Central Tendency (ungrouped)  The sample gives these values:  5000, 6000, 30000, 110000, 15000, 6000, 17000, 13000, 12000, 11000, 8000, 6000, 15000, 6000, 11500  The Mode  This is the most numerous value:  5000, 6000, 6000, 6000, 6000, 8000, 11000, 11500, 12000, 13000, 15000, 15000, 17000, 30000, 110000  The Mode here is 6000.  Sometimes there is no mode…or even two modes!

Measures of Central Tendency (ungrouped)  So given these values… 5000, 6000, 6000, 6000, 6000, 8000, 11000, 11500, 12000, 13000, 15000, 15000, 17000, 30000, 110000  …what is the best measure of central tendency for this random sample of FIT students?  Mean?...18100  Median?...11500  Mode?...6000

Measures of Dispersion or Spread (ungrouped)  Range (p.29)  The highest value minus the lowest value….  From our last example, the range would be: 115000 – 5000 = 110000  Standard Deviation (p.29 – 35)  This is the average distance your values have from the mean score.  Best shown through example…

Measures of Dispersion or Spread (ungrouped) Standard Deviation  Let’s return to our FIT random sample… 5000, 6000, 6000, 6000, 6000, 8000, 11000, 11500, 12000, 13000, 15000, 15000, 17000, 30000, 110000  Follow the steps on the right while we(I) calculate the standard deviation as a class on the board 1. Calculate the mean…which is 18100 2. Find the distance that each value has from the mean 3. Square the distance 4. Add up these distances and divide by the sample size – 1 (at this point, this number is called the variance). 5. Then we get the square root of this number

Standard Deviation XMean (x-bar)X – x-bar(X – x-bar) 2 500018100-1310017161 + E4 600018100-1210014641 + E4 600018100-1210014641 + E4 600018100-1210014641 + E4 600018100-1210014641 + E4 800018100-1010010201 + E4 1100018100-71005041 + E4 1150018100-66004356 + E4 1200018100-61003721 + E4 1300018100-51002601 + E4 1500018100-3100961 + E4 1500018100-3100961 + E4 1700018100-1100121 + E4 30000181001190014161 + E4 1100001810091900844561 + E4

Standard Deviation  We sum (x – x-bar) 2, and get the square root of this sum. This is the standard deviation. What is the square root of the sum?  Appx. 26,219  Right now, this number means very little…but in the following chapters we will gain a better understanding of the standard deviation

Measures of Central Tendency and Dispersion (Grouped Data)  Remember that grouped data is a collection of data that has been placed into categories…  Thus we need to calculate the mean and standard deviation differently, but the idea is the same.  P. 36 – 39 show the formulas for these measures.

Calculating the Mean for Grouped Data  Let’s say we conducted a random sample of FIT students, and asked them their GPA. We decided to group GPA into categories. Here is the data below:  So…what is the mean? Look at pages 36 – 38 and I will wait for someone to tell me how to go about answering this question? GPA CategoryNumber of Students 3.5 – 4.015 3.0 – 3.4925 2.0 – 2.950 Below 2.011

Calculating the Mean for Grouped Data X = the average of the categories f = number of students So can someone answer this question on the board (with help from classmates)? GPA CategoryNumber of Students 3.5 – 4.015 3.0 – 3.4925 2.0 – 2.950 Below 2.011 GPA Category XNumber of Students (f) 3.5 – 4.03.7515 3.0 – 3.493.24525 2.0 – 2.92.4550 Below 2.0 (0 – 1.9).9511

Calculating the Standard Deviation of Grouped Data  Now let’s calculate the standard deviation for this same set of data…  Who can do this one on the board? GPA CategoryNumber of Students 3.5 – 4.015 3.0 – 3.4925 2.0 – 2.950 Below 2.011

Writing Research Reports (pp. 48 – 50)  Background Statement (5 pts)  I will give you data…use your imagination  Why was the study performed (why was the data collected)?  Design and Procedures of the Study (10 pts)  How did you conduct the study  How was the study internally valid/externally valid These two sections are not the most important…simply use your imagination to complete these two sections

Writing Research Reports (pp. 48 – 50)  Results (55 pts.)  The most important section.  For this first report, this is where you present your data graphically, show measures of dispersion, and central tendency  Analysis and Discussion (10 pts.)  What is interesting to you about the results?  Conclusions and Recommendations (20 pts.)  (this section you will not do for your report…this is where you present your results and analysis to the class. The class can ask you questions, so be on point!)

Statistical Analysis - Chapter 2 “Organizing and Analyzing Data” Fashion Institute of Technology Dr. Roderick Graham.

Similar presentations

Presentation on theme: "Statistical Analysis - Chapter 2 “Organizing and Analyzing Data” Fashion Institute of Technology Dr. Roderick Graham."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistical Analysis - Chapter 2 “Organizing and Analyzing Data” Fashion Institute of Technology Dr. Roderick Graham.

Similar presentations

Presentation on theme: "Statistical Analysis - Chapter 2 “Organizing and Analyzing Data” Fashion Institute of Technology Dr. Roderick Graham."— Presentation transcript:

Similar presentations

About project

Feedback