Fun With Numbers Z-scores R code (wrt hw. 4) Statistical Inference The normal distribution is our first choice in most cases because it has nice properties:

Fun With Numbers Z-scores R code (wrt hw. 4)

Statistical Inference The normal distribution is our first choice in most cases because it has nice properties: Distribution is symmetrical around the mean Percentage of cases associated with standard deviations Can identify probability of values under the curve A linear combination of normally distributed variables is itself distributed normally Central limit theorem Great flexibility in using the normal distribution

Normal Distribution Normal Distribution and areas under it. 68-95-99.7 Percent Rule In a normal distribution, about 68 percent of the observations will fall within about +/- 1 standard deviation... A Picture:

Area (with some added stuff) http://members.aol.com/svennord/ed/normal.htm

Another Picture

What do we know? Area is useful to determine probabilities. Fun with Numbers Gas Prices (Let’s take a sidetrip) What are some research issues when looking at financial data over time? Inflation! 2007 dollars vs. 1990 dollars CPI: 2007 Price=1990 Price*(2007 Price/1990 Price)

Visualizing Data is FUNdamental Unadjusted CPI-Adjusted CPI-Adjusted w/o 05/06)

Histograms

Using z-scores Taking advantage of the normal distribution Area under the normal is probability area. Probabilities must sum to 1. Full density under normal is 1. Since it’s symmetric, we know the probability of “being above” the mean is.50 (ditto on below)

Standard Normal Distribution N~(0,1) Easy to compute: When X=mean, z=0. Metric of z-score: standard deviations from the mean. Thus, if z=1, X is 1 s.d. above the mean. NOW since we know the 68- 95-99.7 Rule, we can identify probs.

Getting Gas Let’s look at the adjusted gas prices. Means: 2006: 2.57 (.30) 1999: 1.37 (.15) 2005: 2.34 (.32) 1998: 1.27 (.04) 2004: 1.98 (.15) 1997: 1.51 (.04) 2003: 1.71 (..09) 1996: 1.54 (.08) 2002: 1.51 (.13) 1995: 1.47 (.06) 2001: 1.62 (.20) 1994: 1.46 (.07) 2000: 1.74 (.11) 1993: 1.49 (.03) 1992: 1.56 (.07) 1991: 1.62 (.05) 1990: 2.00 (.07) [small n] (Anything interesting here?)

Compute a z-score Mean adjusted price: 1.68 (.37) To derive z-score for any year, substitute a value X into  Suppose “X”=1.68? Z=(1.68-1.68)/.37=0 The mean is normalized to 0. 1 s.d. above mean? 1.68+.37=2.05 Z=(2.05-1.68)/.37=1 The metric of z is in standard deviations.

“Standardizing” X allows us to use “z distribution.” The Most “Average” Price z Week Year |--------------------------------------| | 1.680374 -.009361 Feb 12 2001 | | 1.681257 -.0069663 Nov 03 2003 | | 1.681329 -.0067707 Apr 24 2000 | | 1.682352 -.0039966 Aug 04 2003 | | 1.683292 -.001449 Jun 03 1991 | | | | 1.684771.0025612 Feb 04 1991 | | 1.68625.0065716 May 27 1991 | | 1.688924.0138213 Oct 27 2003 | | 1.689519.0154355 Apr 17 2000 | | 1.69062.0184197 Sep 24 2001 | |--------------------------------------|

The 10 Most “Below Average” Price Z Week Year |--------------------------------------| | 1.096723 -1.59183 Feb 22 1999 | | 1.103978 -1.572159 Mar 01 1999 | | 1.111233 -1.552488 Feb 15 1999 | | 1.113652 -1.545931 Mar 08 1999 | | 1.120907 -1.52626 Feb 08 1999 | |--------------------------------------| | 1.123325 -1.519703 Feb 01 1999 | | 1.13058 -1.500032 Jan 04 1999 | | 1.131789 -1.496754 Jan 25 1999 | | 1.137835 -1.480361 Jan 11 1999 | | 1.141463 -1.470526 Jan 18 1999 | |--------------------------------------| The 10 Most “Above Average” Price Z Week Year |-------------------------------------| | 2.947 3.424879 May 15 2006 | | 2.973 3.495373 Jul 10 2006 | | 2.989 3.538755 Jul 17 2006 | | 3 3.56858 Aug 14 2006 | |-------------------------------------| | 3.003 3.576713 Jul 24 2006 | | 3.004 3.579425 Jul 31 2006 | | 3.021628 3.62722 Oct 03 2005 | | 3.038 3.67161 Aug 07 2006 | | 3.049491 3.702766 Sep 12 2005 | | 3.167136 4.021741 Sep 05 2005 | |-------------------------------------|

Finding Probabilities What is the probability of a Z gas price of 2.50 or higher? The z-score is 2.22. In the z-distribution, if gas prices were truly normally distributed, a score this high or higher has a probability of occurring of.013, or about 1.3%. It’s an unlikely event. How computed? 1-.9868 gives area above (consult standard normal)

Finding Probabilities What is the probability of a z gas price being between 1.75 and -1.75 P(above)=.04; P(below)=.04 Therefore, P(in between)=1-.08=.92 The upper tail is.04; the lower tail is.04 Any probability calculation is this straightforward.

Issues The “gas price” example is pedagogical. Serious analysis of gas-pricing effects would require much more sophisticated statistical techniques. z is useful to compare observations from historical eras or across disparate cases. Hands-on examples in R

Plots and Z-scores How to do some of the “stuff” in HW 4 Multiple plots on a single page Creating z-scores and finding p-values Visualizing political data Data: Obama vote share by county

Dot Chart: Obama Vote dotchart(obamapercent, labels=row.names, cex=.7, xlim=c(0, 100), main="Support for Obama", xlab="Percent Obama") abline(v=50) Returns:

Interpretation? Geographical Patterns? Central Valley Coastal SoCal, NorCal? Why might you observe these patterns? Z-scores NB: we’re doing this for learning purposes

Z-scores Easy: create mean, standard deviation Then derive z-score using formula from last slide set: R code on next slide

Z-scores and R #Z scores for Obama meanobama<-mean(obamapercent) sdobama<-sd(obamapercent) zobama<-(obamapercent-meanobama)/sdobama

Interpretation Z-scores in metric of standard deviations Large z imply the observation is further away from mean than observations with small z. Z=0 means the observation is exactly at the mean. Dotchart (code): par(mfcol=c(1,1)) dotchart(zobama, labels=row.names, cex=.7, xlim=c(-3, 3), main="p-values for Obama Vote Z-scores", xlab="Probability") abline(v=0) abline(v=1, col="red") abline(v=-1, col="red") abline(v=2, col="dark red") abline(v=-2, col="dark red")

Probability Values High Z-scores are probabilistically less likely to be observed than smaller scores. Consult a z-distribution table Probability area is given Can think about probabilities in the “tails” One-tail (upper or lower) Two-tail (upper + lower) R

R code twotailp<- 2*pnorm(-abs(zobama)) #Gives us area in the upper and lower tails of z onetailp<- pnorm(-abs(zobama)) #Gives us 1-tail probability area; if #subtract this from 1, this give us the area #below this z score (if z is positive) or #area above this z score (if z is negative) zp<-cbind(county, onetailp, twotailp, zobama ); zp

Plots 4 plots on one page: par(mfcol=c(2,2)) boxplot(obamapercent, ylab="Vote Percent", main="Obama Vote: Box Plot", col="blue") hist(zobama, xlab="Obama Vote as Z-Scores", ylab="Frequency", main="Histogram of Standardized Obama Vote", col="blue") hist(obamapercent, ylab="Frequency", xlab="Vote Percent", main="Obama Vote: Histogram", col="blue") plot(zobama, onetailp, ylab="One-Tail p", xlab="Z-score", main="Z-scores and p-values", col="blue")

Fun With Numbers Z-scores R code (wrt hw. 4) Statistical Inference The normal distribution is our first choice in most cases because it has nice properties:

Similar presentations

Presentation on theme: "Fun With Numbers Z-scores R code (wrt hw. 4) Statistical Inference The normal distribution is our first choice in most cases because it has nice properties:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Fun With Numbers Z-scores R code (wrt hw. 4) Statistical Inference The normal distribution is our first choice in most cases because it has nice properties:

Similar presentations

Presentation on theme: "Fun With Numbers Z-scores R code (wrt hw. 4) Statistical Inference The normal distribution is our first choice in most cases because it has nice properties:"— Presentation transcript:

Similar presentations

About project

Feedback