Transforming data: Some very valuable tools S-012.

Transforming data: Some very valuable tools S-012

Transforming scores: Shifting scales can be a big help Some common transformations 1.Proportions or percentages 2.Rank order 3.The Z transformation (standardizing) 4.Square root 5.Logarithm

1. Raw scores to proportions or percentages ObsRaw score (# correct) 13 25 310 415..

1. Raw scores to proportions or percentages ObsRaw score (# correct) Total number of items 1315 25 31015 4..

1. Raw scores to proportions or percentages ObsRaw score (# correct) Total number of items ProportionPercentage 1315.20 2515 31015 4..

1. Raw scores to proportions or percentages ObsRaw score (# correct) Total number of items ProportionPercentage 1315.2020% 2515 31015 4..

1. Raw scores to proportions or percentages ObsRaw score (# correct) Total number of items ProportionPercentage 1315.2020% 2515.3333% 31015 4..

1. Raw scores to proportions or percentages Probably the most common transformation. We do this all the time. ObsRaw score (# correct) Total number of items ProportionPercentage 1315.2020% 2515.3333% 31015.6767% 415 1.00100%..

1. Raw scores to proportions or percentages Another example Another example: Analyzing conversations at dinner tables. Recordings of conversations Adjust for length of conversation Proportion of turns, or proportion of utterances Another example: Analyzing conversations at dinner tables. Recordings of conversations Adjust for length of conversation Proportion of turns, or proportion of utterances ObsRaw score (# correct) Incorrect items Total attempted Percentage Correct 110 2050% 2551050 332560 415304533........

2. Transforming to ranks ObsPages read 115 2200 325 4400........ Example: Grade 4 students reading. Number of pages reported in one week.

2. Transforming to ranks ObsPages readRank 1154 22002 3253 44001........ Example: Grade 4 students reading. Number of pages reported in one week.

2. Transforming to ranks ObsPages readRankRank from low to high 11541 220023 32532 440014........ Example: Grade 4 students reading. Number of pages reported in one week. (Stata likes to rank from lowest to highest.) Ranking preserves the order, but it ignores the distances between the scores. Ranking is a very common and very useful transformation.

The “Z” transformation My favorite! The best! ObsTime 1Time 2 11025 21535 32030 41640 51337...... Mean1 = 15.0Mean2=35.0 SD1 = 5.0SD2 = 10.0 Example: Students’ scores at two different times.

The “Z” transformation ObsTime 1Time 2 11025 21535 32030 41640 51337...... Mean1 = 15.0Mean2=35.0 SD1 = 5.0SD2 = 10.0 How well did student #1 do at time 1? How about student 2? 3? Etc.? How did they do at time 2?

The “Z” transformation ObsTime 1zTime 2z 11025 21535 32030 41640 51337...... Mean1 = 15.0Mean2=35.0 SD1 = 5.0SD2 = 10.0 Use the group mean and SD to create z- scores.

The “Z” transformation ObsTime 1zTime 2z 11025 215035 320+1.030 4160.240 513-0.437...... Mean1 = 15.0Mean2=35.0 SD1 = 5.0SD2 = 10.0 Use the group mean and SD to create z- scores.

The “Z” transformation ObsTime 1zTime 2z 11025 2150350 320+1.030-0.5 4160.2400.5 513-0.4370.2........ Mean1 = 15.0Mean2=35.0 SD1 = 5.0SD2 = 10.0 The z-scores now help us a lot in comparing individual performance at time1 and time2.

The “Z” transformation formula There are two versions of the formula. 1. Here we use the sample mean and the sample SD. 2. Here we use the population mean and the population SD. Use the mean and SD of the sample. How far is each score from the sample mean? How many standard deviations away? Use the mean and SD of the sample. How far is each score from the sample mean? How many standard deviations away? Use the mean and SD of a population. How far is each score from the population mean? How many standard deviations away? Use the mean and SD of a population. How far is each score from the population mean? How many standard deviations away?

Z- transformation example GRE scores The old version of the GRE was scaled so that the mean was 500, with a standard deviation of 100. (Mean = 500, SD = 100) If a student had a score of 600, how good is that score? X = 600, so z = 1.0. (One SD above the GRE population mean.) The old version of the GRE was scaled so that the mean was 500, with a standard deviation of 100. (Mean = 500, SD = 100) If a student had a score of 600, how good is that score? X = 600, so z = 1.0. (One SD above the GRE population mean.) The new version of the GRE is rescaled so that the mean is 150, with a standard deviation of 9.0. (Mean = 150, SD = 9) If a student had a score of 160, how good is that score? X = 160, so z = 1.1. (A bit more than 1 SD above the GRE population mean.) The new version of the GRE is rescaled so that the mean is 150, with a standard deviation of 9.0. (Mean = 150, SD = 9) If a student had a score of 160, how good is that score? X = 160, so z = 1.1. (A bit more than 1 SD above the GRE population mean.)

Key idea: – How many SDs away from mean – How far from mean – in SD units – What a great idea! – Lets us compare things even when we use different tests or different scoring systems Key idea: – How many SDs away from mean – How far from mean – in SD units – What a great idea! – Lets us compare things even when we use different tests or different scoring systems The “Z” transformation My favorite! The best! Other examples?

The square-root transformation: Often useful when things are positively skewed Obs Score (x) 14 29 31 425 564 649 74 89 91 10100

The square-root transformation: Often useful when things are positively skewed Obs Score (x) 14 29 31 425 564 649 74 89 91 10100 Mean pulled way up beyond the median. Not “normal” at all. (Not bell-shaped.) Skewed to the right, positively skewed. Mean pulled way up beyond the median. Not “normal” at all. (Not bell-shaped.) Skewed to the right, positively skewed. Mean = 26.6 Median = 9 Mean = 26.6 Median = 9

The square-root transformation: Often useful when things are positively skewed Obs Score (x) √X 14 29 31 425 564 649 74 89 91 10100 Let’s look at each score and take the square root. This will pull in the high scores. Let’s look at each score and take the square root. This will pull in the high scores.

The square-root transformation: Often useful when things are positively skewed Mean = 26.6 Median = 9 Mean = 26.6 Median = 9 Obs Score (x) √X 142 293 311 4255 5648 6497 742 893 911 1010010 Mean = 4.2 Median = 3 Mean = 4.2 Median = 3

The log transformation: Often useful when things are positively skewed Or when the range is very wide (over several orders of magnitude) Obs Score (x) 110 2100 31000 410000 590 69 750 10 1 = 1010 2 = 10010 3 = 100010 4 = 10000 These (the exponents) are the logs (the logarithms) Here I am using “base 10” logs.

The log transformation: Often useful when things are positively skewed Or when the range is very wide (over several orders of magnitude) Obs Score (x) Log(x) 110 2100 31000 410000 590 69 750 10 1 = 1010 2 = 10010 3 = 100010 4 = 1000 1 2 3 4 The “logs” are the exponents

The log transformation: Often useful when things are positively skewed Or when the range is very wide (over several orders of magnitude) Obs Score (x) Log(x) 1101 21002 310003 4100004 590 69 750 10 1 = 1010 2 = 10010 3 = 100010 4 = 1000 The “logs” are the exponents 10 ? = 90 What will the log of 90 be?

The log transformation: Often useful when things are positively skewed Or when the range is very wide (over several orders of magnitude) Obs Score (x) Log(x) 1101 21002 310003 4100004 590 69 750 10 1 = 1010 2 = 10010 3 = 100010 4 = 1000 The “logs” are the exponents 10 ? = 90 What will the log of 90 be? 1.95

The log transformation: Often useful when things are positively skewed Or when the range is very wide (over several orders of magnitude) Obs Score (x) Log(x) 1101 21002 310003 4100004 5901.95 690.95 7501.70 10 1 = 1010 2 = 10010 3 = 100010 4 = 1000 The “logs” are the exponents 10 1.95 = 9010 0.95 = 910 1.70 = 50

The log transformation: Often useful when things are positively skewed Or when the range is very wide (over several orders of magnitude) Obs Score (x) Log(x) 1101 21002 310003 4100004 5901.95 690.95 7501.70 The log transformation has a dramatic effect on the scores. This changes the distances between the scores. This has a huge effect on the distribution. When scores are spread out widely on the scale (e.g., 10, 100, 1000, etc.) the log helps to pull in the very high scores. Actually, it pulls in the high scores, and it can help to spread out the low scores. This is a very useful and very common transformation. (Widely used in economics, biology, demography, etc.)

The log transformation: Often useful when things are positively skewed Or when the range is very wide (over several orders of magnitude) Obs Score (x) Log(x) 1101 21002 310003 4100004 5901.95 690.95 7501.70 The low scores (9, 10, 50, 100) are all clustered together at the left side.) We cannot really see them. The large values are far away from the small values. Mean = 1608 Median = 90 Mean = 1608 Median = 90

The log transformation: Often useful when things are positively skewed Or when the range is very wide (over several orders of magnitude) Obs Score (x) Log(x) 1101 21002 310003 4100004 5901.95 690.95 7501.70 Mean = 1608 Median = 90 Mean = 1608 Median = 90 Original scores Log scores High scores pulled in. Lower scores more spread out. The scale has changed.

The log transformation: Also often useful when we are studying growth over time agevocabulary 63 74 84 95 107 118 1210 1312 1415 18 1622 1727 1833 1940 2049 2159 2272 2388 24108 25131 26160 27195 28238 29291 30355 31433 32528 33644 34786 35958 361169 Example: Studying children’s vocabulary growth How many words are they learning? Example: Studying children’s vocabulary growth How many words are they learning? During early months, the “scores” (the vocabulary sizes) are low, so they are bunched together. But at older ages, the growth continues, and so the scores are much more spread out. The scale changes quite a bit here. The early scores are 4, 5, 7, 10, 20. The later scores are 400, 600, 1100. So this is another example where the log transform may be helpful. The scale changes quite a bit here. The early scores are 4, 5, 7, 10, 20. The later scores are 400, 600, 1100. So this is another example where the log transform may be helpful.

The log transformation: Also often useful when we are studying growth over time Check these graphs Vocabulary is growing, and it seems to be growing faster and faster! Wait! Now we see that the growth is steady. (Here the growth is 20% per month.) The log transformation is helpful here! Wait! Now we see that the growth is steady. (Here the growth is 20% per month.) The log transformation is helpful here!

The log transformation: Also often useful when we are studying growth over time Check these graphs Growth in vocabulary for two children. (Both growing rapidly!) But the gap is getting larger and larger over time. The log transformation shows us the differences in the growth rates. (Here the difference is only one percent per month.) But this monthly difference is steady, so it ends producing a big difference over time.

Transformations: The can help, but they require lots of thought 1.Percentages are useful V ery common Adjusts things to rates rather than simple counts Easy to understand 1.Percentages are useful V ery common Adjusts things to rates rather than simple counts Easy to understand 2.Rank ordering Preserves the order Ignores the distances Very common Several important statistical tests use the rank order 2.Rank ordering Preserves the order Ignores the distances Very common Several important statistical tests use the rank order 3.Square-root transformation Often useful with count data (days absent) (household size) When there is positive skew (Skewed to the right) Pulls in the long tail Works with positive values 3.Square-root transformation Often useful with count data (days absent) (household size) When there is positive skew (Skewed to the right) Pulls in the long tail Works with positive values 4.Log transformation Useful when scores are spread out over a very wide scale When we look at things that change in percentage terms (e.g., growth rates:1- percent growth, or 5-percent growth) Works only with positive values (Sometimes we add a constant so we can use square root or log transform.) Sometimes harder to interpret Very commonly used in economics, biology, ecology, etc. 4.Log transformation Useful when scores are spread out over a very wide scale When we look at things that change in percentage terms (e.g., growth rates:1- percent growth, or 5-percent growth) Works only with positive values (Sometimes we add a constant so we can use square root or log transform.) Sometimes harder to interpret Very commonly used in economics, biology, ecology, etc.

But the best, most important, most valuable, most versatile, all-around most-cool transformation is... Call it “Zee” Call it “Zed” However you pronounce it, it is a great concept. How far away? How many SDs away? Z is a standard score. On a standard scale. Helps us compare results on different tests (different test scales). Helps compare results of different studies. Helps us judge differences when we are comparing groups. How far away? How many SDs away? Z is a standard score. On a standard scale. Helps us compare results on different tests (different test scales). Helps compare results of different studies. Helps us judge differences when we are comparing groups.

Transforming data: Some very valuable tools S-012.

Similar presentations

Presentation on theme: "Transforming data: Some very valuable tools S-012."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Transforming data: Some very valuable tools S-012.

Similar presentations

Presentation on theme: "Transforming data: Some very valuable tools S-012."— Presentation transcript:

Similar presentations

About project

Feedback