Summarising data / Levels of measurement / Introduction to SPSS

Slides:



Advertisements
Similar presentations
1 A B C
Advertisements

Cairo Modern School Computer for Grade
Quantitative Methods Topic 9 Bivariate Relationships
Quantitative Methods Topic 5 Probability Distributions
AP STUDY SESSION 2.
1 WORKING WITH 2007 WORD Part 1 Developed October 2007 with lots of help from.
1
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
David Burdett May 11, 2004 Package Binding for WS CDL.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 5- 1.
We need a common denominator to add these fractions.
Local Customization Chapter 2. Local Customization 2-2 Objectives Customization Considerations Types of Data Elements Location for Locally Defined Data.
Create an Application Title 1Y - Youth Chapter 5.
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
Mean, Median, Mode & Range
Study question: distribution of IQ
Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS
CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J R W Hunter,
Multiple-choice example
The 5S numbers game..
Office 2003 Introductory Concepts and Techniques M i c r o s o f t Windows XP Project An Introduction to Microsoft Windows XP and Office 2003.
1.
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.
1 1 Slide © 2003 South-Western/Thomson Learning TM Slides Prepared by JOHN S. LOUCKS St. Edwards University.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Welcome. © 2008 ADP, Inc. 2 Overview A Look at the Web Site Question and Answer Session Agenda.
Break Time Remaining 10:00.
Turing Machines.
PP Test Review Sections 6-1 to 6-6
1 IMDS Tutorial Integrated Microarray Database System.
Contingency tables enable us to compare one characteristic of the sample, e.g. degree of religious fundamentalism, for groups or subsets of cases defined.
Frequency Distributions Quantitative Methods in HPELS 440:210.
Office 2003 Introductory Concepts and Techniques M i c r o s o f t Office 2003 Integration Integrating Office 2003 Applications and the World Wide Web.
Central Tendency- Nominal Variable (1)
A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of.
Chapter 2: Frequency Distributions
General Linear Models The theory of general linear models posits that many statistical tests can be solved as a regression analysis, including t-tests.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
2.5 Using Linear Models   Month Temp º F 70 º F 75 º F 78 º F.
Quantitative Analysis (Statistics Week 8)
Adding Up In Chunks.
FAFSA on the Web Preview Presentation December 2013.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
When you see… Find the zeros You think….
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Types of selection structures
Converting a Fraction to %
Basic Statistics Measures of Central Tendency.
Clock will move after 1 minute
PSSA Preparation.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Physics for Scientists & Engineers, 3rd Edition
Select a time to count down from the clock above
By Hui Bian Office for Faculty Excellence Spring
Copyright Tim Morris/St Stephen's School
1.step PMIT start + initial project data input Concept Concept.
9. Two Functions of Two Random Variables
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
PY550 Research and Statistics Dr. Mary Alberici Central Methodist University.
Chapter 11 Descriptive Statistics Gay, Mills, and Airasian
Descriptive Statistics
Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques.
Educational Research Descriptive Statistics Chapter th edition Chapter th edition Gay and Airasian.
Chapter 3 Describing Data Using Numerical Measures
Presentation transcript:

Summarising data / Levels of measurement / Introduction to SPSS Topic 2 Summarising data / Levels of measurement / Introduction to SPSS

Main Issues for this session Levels of measurement Data types: nominal, ordinal, interval, ratio Linking data types to statistical analyses Introduction to SPSS

Reading Chapter 2 and Chapter 3 Frequency Distributions and Graphic Representation Fundamentals of Statistical Reasoning in Education, Colardarci et al.

Levels of measurement Nominal Percent, ratio, frequency Ordinal x   Sequential Magnitude Zero point Example Descriptive Statistics Nominal Percent, ratio, frequency Ordinal x Interval Arbitrary Mean, SD, Min, Max Ratio Absolute This is anther way of looking at the characteristics of each type of measurement. Level of measurement has direct implications for how relationships within and between variables can be identified and described. This is why you need to think about your variables and levels of measurement as part of the design of your study. For example, if you want to compare average performance on a particular ability across two groups, you will need to be able to calculate averages. That means that the level of measurement of your data will have to lend themselves to that type of manipulation.

Preparing a questionnaire and codebook Example questionnaire: Example codebook: Example codebooks: http://pisa2006.acer.edu.au/downloads.php WB_Pupil_MP.doc Pupil_codebooks.xls

Codebook - 1 A codebook should be prepared as a questionnaire is developed The purposes of a codebook are To facilitate data entry, with codes shown on the questionnaire if possible To plan for analysis; to help with determining the types of analyses that are appropriate.

Codebook - 2 Numeric codes are easier to enter than alphabetic codes Consider the appropriate field width and range of answers. These can be useful feedback to questionnaire design as well. Decide how to handle missing responses

Getting data into SPSS - 1 The EXCEL file contains the pupil questionnaire data Import this data set into SPSS: Start SPSS Puipl_data.xls

Getting data into SPSS - 2 Select from Menu File -> Open -> Data

Getting data into SPSS - 3 Find the folder where the EXCEL file is stored. In the file open dialog box, make sure the file type is set to xls. Select file Pupil_data.xls File type set to “xls”

Getting data into SPSS - 4 Make sure the check box for “Read variable names from the first row of data” is checked. (The EXCEL file has variable names in the first row, and these will be read in as SPSS variable names as well. Check this box

Toggle between data view and variable view The tab at the bottom left corner shows the data view or variable view. Data view or Variable view

Add Variable labels for variables 4 to 9 (PDOBDD to PHOMLANG)

Add Value labels for variable PSEX (The column after Variable Labels). Click in the value labels cell and the following dialog box appears

Add missing values for variable PSEX (The column after Value Labels) Click in the Missing values cell and a dialog box appears. Enter values representing missing values

Practice for other variables Set variable labels, value labels and missing values for some other variables Copy and pasting value labels and missing values from a set of cells to other cells can be done. Make sure you save the file often!!

Frequencies For which types of variables, will it be appropriate to compute frequencies? Nominal, ordinal, interval and ratio? For which types of variables, will it be appropriate to compute averages? Nominal, ordinal, interval and ratio?

Compute frequencies in SPSS -1 Select from menu Analyze -> Descriptive Statistics -> Frequencies

Compute frequencies in SPSS -2 Select the variables in the left-hand box and move them to the right-hand box. Press OK.

Compute frequencies in SPSS -3 Explore the options under the Statistics and Charts buttons, and see what kinds of output you can produce. Compute frequencies for other variables as a practice.

Constructs in a questionnaire - 1 Sometimes we are interested in a measure that is not directly obtainable/observable as questions like “are you a boy or a girl”. For example, socio-economic status is something that we have an interest in, but it is a concept (like well-being) rather than something that we can see and directly measure. Such concepts are often called constructs, or latent variables.

Constructs in a questionnaire - 2 Sociologists and statisticians have developed methodologies to measure constructs (or latent variables). Psychometrics is the science of the measurement of latent variables. The field of psychometrics include classical test theory (CTT) and item response theory (IRT)

Constructs in a questionnaire - 3 To measure a construct, typically a number of observable indicators are collected (e.g., through a questionnaire). The data from these indicators are aggregated in some way (e.g., to form a total score) to be used as a measure of the construct for each individual.

Constructs in a questionnaire - 4 A simple way to aggregate the indicators into a measure for a construct is just to sum the scores for the set of questions for each student. These sums (or measures of the constructs) can then be used as new variables as the basis of further statistical analysis. There are more sophisticated ways to aggregate the indicator scores into a construct score (e.g, using item response theory models).

Constructs in a questionnaire - 5 In SPSS, calculate sum scores for each construct you identified, for each student. You can then use these new variables for further analyses. Watch animated demo on how to compute sum scores. HowToComputeSumScores_demo.swf

Outline Categorical variables (ordinal and nominal) Continuous variables (interval and ratio) Once the data collected and entered into the data file and cleaning of the data is completed, the next step is analysing the data. The very first step of the process of analysing data is summarising the data. While summarising the data often relate to descriptive questions, this step is needed for all analysis techniques. This session will cover three broad topics: how to summarise data with few categories and data with many categories and how to transfer continuous variable into categorical variable and report frequency.

Download from subject website Data file from TIMSS 2003 study for Australia TIMSS2003AUS.sav Student Questionnaire from TIMSS 2003 study for Australia T03_Student_8.pdf You should download these two data files.

Categorical data Nominal - numbers are used only as labels for different objects within a set. For example, gender idbook (there are 12 different test booklets) Ordinal - numbers are used to reflect the rank order of objects within a set according to a specific criterion bsbgbook (number of books in the home) bsbgmfed (mother’s education level)

Summary of categorical variables In general, summary of categorical variables addresses the questions: How many categories? How many cases in each category or What are the proportions of cases in each of the categories? If a variable is ordinal, questions regarding trends and association can be considered. Examples: For data file TIMSS2003AUS.sav, the possible questions could be: What are the proportions of female and male students in the study? What are the levels of education of parents for the students surveyed? Is there an association between levels of education of parents and number of books in the home?

Hands-on (1) Are there more girls than boys? Is there an association between Father’s education level and the number of books at home? Follow animated demo frequency_1_demo frequency_2_demo Explore_1_demo Explore_1_output_demo

Hands-on (2) Is there a difference between girls and boys in terms of whether they enjoy mathematics (variable bsbmtenj)? Follow animated demo Crosstab_1_demo Crosstab_1_output_demo

Hands-on (3) Is there a difference between girls and boys in terms of whether they enjoy SCIENCE (variable bsbstenj, (var 67))?

Things to watch out for in comparing frequencies - 1 Consider if you should compare raw frequencies or percentages. For percentages, make sure the denominator (total) is the appropriate one to use. For example, check row total, column total, overall total. Check the scale to make sure there is no exaggeration of differences

Things to watch out for in comparing frequencies – Raw score or percentage?

Things to watch out for in comparing frequencies – Raw score or percentage? Percentages are better because there are many more students speaking the test language at home than those who do not.

Things to watch out for in comparing frequencies – Check magnitude of scale The graph on the right shows large differences. But check the scale on the vertical axis. There are only a few students. We can’t say there is a great difference. Beware of visual deception.

Continuous data Interval - numbers reflect both the rank order of objects and the extent of the differences between them (e.g. temperature) Ratio - scale has an absolute zero and hence a ratio of scores is independent of the units of the scale (e.g. height, weight, age. )

Summary of continuous variables Example of Questions What is the average score that the students surveyed get? What is the middle score? (median) Which is the most frequent score? (mode) What is the highest score ? (max) What is the lowest score? (min) What is the range of students’ scores? (range) To what extent are the scores close to the mean? (variance and standard deviation) Think of the variable to measure student achievement in reading. The raw scores ranging from 1 to 40 will be used to record student achievement in reading. These are the possible questions a researcher would ask. The first question addresses the mean score; the second question addresses the median. The third question addresses the mode of the distribution. The fourth and fifth questions address maximum and minimum value of the variable. The six and seventh question addresses dispersion.

Mean and Median Mean (average, expected value) Median Sum observations / number of observations Median 50% subjects below and 50% subjects above With all the scores in a distribution arranged form lowest to highest, that is, rank order, the median is the middle score or the half way point. If there are an odd number of subjects in the distribution, the median is easy to work out. If there is an even number of the subjects in the distribution, then the median is the half way point between the two middle values. Suppose that a series of random samples were drawn from a large population of scores, (for example we draw of sample of 20 students from a school with 100 students) and that the mean, the median and the mode of their heights were calculated for each sample. These measures of central tendency would vary from sample to sample, but compared to the mode and the median, the mean is the most stable from sample to sample.

Variance and Standard deviation Where µ is the mean, and n is the number of observations. We have looked at ways of providing a single number that represented the central tendency of a distribution. Now we will look at ways of representing and quantifying how spread out the values are in the distribution. There are two common ways of expressing the variability of a set of values: variance (Var) and standard deviation (SD). The SD calculates the average amount of deviation from the mean and shows the extent to which the values in a distribution differ from the mean. It is very important to report Standard deviation along with reporting the mean of a distribution.

Normal Distribution Many variables have a distribution shaped like a bell curve. If continuous variables will be used for further statistical analysis, it is important to report the distribution of the values of these variables. The normal distribution is a bell-shaped curve. (p.113-119). There are variables that will not follow the shape of the normal distribution. Some variables may depart very strikingly from it. This tendency is most clearly evident when the values of a distribution are clustered at either end or skewed. While there are tests that can be used to evaluate skewness and kurtosis values, it is recommended to inspect the normality of the distribution of the variables using histogram.

Example descriptive statistics Variable 154 (bsmmat01) is an estimate of a student’s mathematics achievement. Follow animated demo: descriptive_1_demo

Histogram of continuous variable Frequency analysis and bar charts may fail because there are too many categories. Use histogram. Variable 154 (bsmmat01) is an estimate of a student’s mathematics achievement. Follow animated demo: histogram_1_demo

Compare histograms for groups Compare mathematics achievement distributions between groups based on father’s education level. Follow animated demo: histogram_2_demo

Box-Plots Box-plots are graphical representations of the data in a five-number summary with the addition of ‘cutoffs’ or ‘fences’ for the identification of possible outliers (individual data points are plotted beyond the fences if they occur)

Box plot for mathematics achievement Follow animated demo: boxplot_1_demo boxplot_2_demo

Output of Box-plot of mathematics scores The rectangle represents 50 % of the cases with the whiskers (the lines protruding from the box) going out to the largest and smallest values. The additional circles outside this range are outliers (see next slide) . The line inside the rectangle is the median.

Output of Box-plot of mathematics scores by father’s education level

Parametric and Non-parametric Mean and Median Mean: average Median: score at the 50th percentile. The middle value The mean is often known as parametric variable, and the median is often known as non-parametric variable.

Mean and Median If the distribution of scores is symmetrical, the mean and median will be close. If the distribution is skewed, then the mean and median will be quite different. Mean is sensitive to outliers Median is not sensitive to outliers Example: income distribution

Examples of income distribution What will be the mean? What will be the median?

Robust statistics The mean will be much higher than the median, because there are four people with very high salaries. The median will not shift if the four highest salaries are in the 150K range instead of the 280 range, but the mean will change by a great deal. The median is said to be “robust”.

Percentile Rank The percentile rank of a raw score s, is the percentage of people whose scores are less than or equal to s. Example: Raw 12 14 28 34 47 50 Rank 1 2 3 4 5 6 %Rank 1/6 2/6 3/6 4/6 5/6 6/6

Advantages and disadvantage of percentile ranks Simple to communicate. More “robust” (not affected by extreme scores in the distribution) Raw scores turned into Ranks: reduce raw scores to ordinal measurement. Percentile ranks have uniform distribution, not normal. Percentile differences in the middle of the score range can exaggerate small differences.

Compute percentile ranks in SPSS Compute percentile ranks using mathematics achievement score Follow animated demo: percentile_1_demo

Histogram of percentile ranks Do a histogram of percentile ranks, what do you see? Plot a scatter graph of mathematics achievement (variable 154) with the newly created variable of percentile ranks. How do you interpret the graph?