EQUATIONS IN THE SAS LOG FOR THE STATISTICIAN IN YOU NOTE: Regression equation : z_total_post = 0.13379 + 0.776552*z_total_pre. NOTE: The above message was for the following BY group: group=CONTROL NOTE: Regression equation : z_total_post = 1.233616 + 0.578418*z_total_pre. NOTE: The above message was for the following BY group: group=EXPERIMENTAL
Is the intervention successful under all conditions?
TRAINING WAS ADMINISTERED TO FOUR COHORTS Admittedly, we did not train people while flying on a trapeze
Creating the interaction graph First, in the RESULTS window, type sgedit on
Creating the interaction graph First, in the RESULTS window, type sgedit on Ods listing sge = on ; Ods graphics on ; proc glm data = plots ; class TestType cohort ; model z_total = TestType cohort TestType*cohort ; where group = "EXPERIMENTAL" ;
Click on the sge plot to edit it
ODDLY, THE MOST TIME-CONSUMING PART OF THIS IS MAKING THE LINES THICKER Of course, that is kind of like being the smaller midget
Using SGEDIT to, well, edit 1.Double-click on the.sge file in the RESULTS window 2.Right-click in the plot area & select PLOT PROPERTIES 3.Select desired line thickness
THANKS FOR ASKING! Yes, the TestType*Cohort*Group interaction (F=5.84, p <.0001) AND the TestType*Group interaction (F=22.92, p < 0001) in the other repeated measures ANOVA were significant.
LOOKING AT THE LITTLE PICTURE
(Especially true for small samples)
Are these test related? R=.22
Another example Years of Education as predictor of gain score R-square =.46 Correlation =.68) P <.01.
Now looky here … Is it a real relationship?
What should we do? Throw the score out? Keep the score in? Something else?
Ignoring my partner … Compare your answers with the people next to you
Sometimes outliers are the most interesting part of your study
One last example on knowing your data Not just telling a story, having a conversation
Custom Map-making How to plot the largest category in a frequency distribution
DATA VISUALIZATION BY EXAMPLE WHERE IS DEMOCRATIC SUPPORT BASED? DATA VISUALIZATION IN POLITICAL SURVEYS
PROC TABULATE DATA= in.VOTE2008 OUT=SummaryVOTE2008; CLASS question3 state ; TABLE state, question3* RowPctN ;
WARNING: Some observations were discarded when charting PctN_01. Only first matching observation was used. Use STATISTIC= option for summary statistics.
proc format ; value vote 50.01 - 100 = "Obama" 0 - 50 = "McCain" ;
PROC GMAP DATA = SummaryVOTE2008 map = maps.us ; ID state ; CHORO PctN_01 / discrete LEGEND=LEGEND1 ;
ID statement uses the _map_geometry_ variable that was merged in from the maps.us dataset to identify the location on the map.
PROC GMAP DATA = SummaryVOTE2008 map = maps.us ; ID state ; CHORO PctN_01 / discrete LEGEND=LEGEND1 ; Pattern1 c = red ; Pattern2 c = blue ; format PctN_01 vote. ;
PROC GMAP CHORO PctN_01 / discrete LEGEND=LEGEND1 ; FORMAT PctN_01 vote. ; CHORO statement uses the first observation and ignores the others.
Does Race Matter?
PROC GMAP Vote2008 coded 0 = McCain 1 = Obama Pctmin = Percentage of residents in voter’s district from minority groups
PROC GMAP DATA = wuss map=maps.us ; ID state ; area vote2008 / discrete statistic = mean ; block pctmin / discrete statistic = mean ; format pctmin rangep. vote2008 voten. ;
The BLOCK statement charts the pctmin variable. The height of the block will be based on the value of the variable, but the color will be determined using the format specified.
mean minority percentage in districts where Obama voters live is 21% versus 13% for McCain voters (t= 5.73, p <.0001)
The usefulness of visual data With one statement, I can change the percentage of minority & re-run the chart value rangep 0 - 15 = "0 -15%" 15.01 - 100 = "> 15%" ;
DATA VISUALIZATION BY EXAMPLE Decision Trees, ROC & Lift Curves to Predict Military Service
Speaking of easy, interactive, graphics JMP
libname readin "E:\crimes\readout" ; libname writeout xport "e\wuss2010\crimes.xpt" ; proc copy in = readin out =writeout ;
How to get a SAS.xpt file into JMP, Step 1 File > Open
DECISION TREE ANALYZE > MODELING > PARTITION SELECT Y SELECT X VARIABLES Click on the SPLIT button
Receiver Operating Characteristic Click on the red arrow at the top left of the partition window for pull-down options include ROC and Lift curves.
ROC Sensitivity is the percent of true positives, for example, the percentage of people you predicted would die who actually died. Specificity is the percent of true negatives, for example, the percentage of people you predicted would NOT die who survived.
In JMP, use of training and testing datasets is REALLY easy EXCLUDE 25% or 50% of the data and then re-run your analyses with the excluded sample
A statistician is a person who was good at math but didn’t have enough personality to be an accountant ?
It is important that people believe you And that’s my story
AnnMaria De Mars The Julia Group 2111 7 th St #8 Santa Monica, CA 90405 ANNMARIA@THEJULIAGROUP.COM (310) 717 -9089