Presentation on theme: "SAS Statistics Technology Short Courses: Spring 2010 Kentaka Aruga."— Presentation transcript:
SAS Statistics Technology Short Courses: Spring 2010 Kentaka Aruga
Object of the course Performing simple descriptive statistics (proc mean, proc freq, and proc corr) Performing basic test statistics (Chi-square test, T-test, F-test) Basic commands for regression analysis and how to export the result into a table (proc reg)
Section 1 Preparation Getting data and importing data
Getting data Download the SAS command that will be used in this practice from Download the data file that will be used in this course from Save the files under C:/ drive of your windows computer.
Importing Excel file to SAS Open SAS program and copy and paste the following commands from the file you have just downloaded sasstat.txt: libname car c:/; proc import out= car.auto datafile=c:/auto.xls dbms=excel2000 replace; sheet=auto; getnames=yes; run;
Then highlight the command line and execute the command.
Proc import Look at the trunk column Do you see an empty column? SAS determines the data type based on the most common data type in the first 8 rows. trunk column has mixed data.(since the first eight columns are all zero, the remaining columns become all zero)
Proc import Add the following statement mixed = yes; Now the command line should look like proc import out= car.auto datafile=c:/auto.xls dbms=excel2000 replace; sheet=auto; getnames=yes; mixed = yes; run; Execute this command ADDED
Importing Excel file from the main menu bar From the main menu click File, and then click Import Data.
Importing Excel file from the main menu bar Under the Import Wizard specify the data source (in this example select MS Excel) and click next. Under the Connect to MS Excel wizard, browse the Excel file you are importing.
Importing Excel file from the main menu bar Under the Select Table wizard select the name of the sheet of your Excel file and click next. Under the Select library and member wizard, specify the library where you want to import the Excel file. Put in the name of the file in the Member box to name the file that will be imported to SAS.
Saving the syntax for importing Excel file You can save the syntax for what we just did to import the Excel file using the main menu bar. Browse and name the file in Create SAS Statements wizard. Open the sas file you just saved to see the commands.
How to perform simple descriptive statistics (Review from SAS basics course) How would you see the number of obvs, mean, std, min, and max of all numeric variables in SAS? Ans. proc means data=car.auto; run; How do you analyze frequency of the variables? Ans. proc freq data=car.auto; run;
Proc means By default proc means provides the number of obvs, mean, std, min, and max of all numeric variables proc means data=car.auto; run; Specifying a certain variable –var variable name ; Q. How would you execute the mean procedure for the variables price, mpg, and weight ? Creating an output table –output out= file name Q. How would you get the output for the mean procedure for the variables price, mpg, and weight?
Proc means (Answers) proc means data=car.auto; output out=car.means; var price mpg weight; run;
Proc freq By default this procedure creates frequency tables for all variables proc freq data=car.auto; run; Specifying a certain variable –tables variable name Q. How would you execute the FREQ procedure for the variable foreign? Creating an output table –/out = file name Q. How would you get the output for the FREQ procedure for the variable foreign?
Proc corr The CORR procedure generates Simple Statistics based on non missing values, and Pearson Correlation Coefficient, an index that quantifies the linear relationship between a pair of variables Insignificant p-value indicates the lack of linear relationship between the two variables.
Proc corr Finding correlations between a pair of variables 1) All variables proc corr data=car.auto; run; 2) Three specific variables proc corr data=car.auto; var price mpg weight; run;
The low p-value indicates a strong negative linear relationship between weight and mpg. The heavier the car is the lower the mpg becomes.
Chi-square test of independence What is the Chi-square test of independence? Ans. It tests whether the variable in the row and column are independent or related What is the null hypothesis? Ans. The variables in the row and column are independent: there is no relationship between row and column frequencies The command for SAS to test this is provided in the option of proc freq. Simply use chisq. To display the expected cell frequency for each cell use the option expected.
Chi-square test of independence: exercise There are 34 students in the classroom and there was a vote on whether they wanted to have a turtle in their classroom as a pet. The data file vote.txt contains the result of the vote (Yes=y, No=n), and gender of the students (male=m, female=f). Q1 Import the file vote.txt into SAS and name the variables answers and gender. Q2 Using the option chisq, test whether or not the answers to the vote and gender are associated with each other.
What does the result tell you? The null hypothesis that the two variables are independent is rejected at even 1% significance level. The two variables answers and gender are associated with each other (They are dependent). This is lower than 0.01
Proc ttest This procedure is used to test the hypothesis of equality of means for two normal populations from which independent samples have been obtained. –Three cases in SAS One-sample t-test –Computes the sample mean of the variable and compares it with a given number. Two-sample t-test –Compares the mean of the first sample minus the mean of the second sample to a given number. Pair observations t-test –Compares the mean of the differences in the observations to a given number.
Assumptions of proc ttest The observations are random samples drawn from normally distributed populations. This can be tested using the UNIVARIATE procedure –If the normality assumptions are not satisfied: use NPAR1WAY procedure. Two populations of a group comparison must be independent. –If not independent, you should question the validity of a paired comparison. The default null hypothesis is set as equal to zero. To change this you can use H0=number. e.g. h0=10 The default confidence level is 5%. To change this you can use alpha=confidence level. e.g. alpha=0.01 Source:
Proc ttest: exercise How would you perform a t-test on mpg variable classified by foreign variable? Hint: use class and var statement What will the null hypothesis be in this case?
Proc ttest (Contd) The command proc ttest data=car.auto; class foreign; var mpg; run; –CLASS statement: contains a variable that distinguishes the groups being compared. –VAR statement: specifies the response variable to be used in calculations. The null hypothesis The alternative hypothesis
The first table shows the basic statistics The second table is the t-test for equal mean. Before using this table you need to look at the third table to determine if the assumption of equal variances is reasonable The third table is a test of equal variances In this example the null hypothesis of equal variance is not rejected. Thus you need to look at the equal variance in the second table. The second table suggests there is not a difference in means across domestic and foreign car. See here High high p-value
Section 4 Basic commands for regression analysis and how to export the result into a table (proc reg)
Regression analysis Regression analysis : finding a reasonable mathematical model of the relationship between a response variable (y) and a set of explanatory variables (x 1, x 2,…. x P ) General model
Proc reg General command proc reg data = file name model DV = IV ; run; DV: dependent variableIV: independent variable This procedure also does the following testing: –F-test: Tests the null hypothesis that none of the independent variables has any effect –T-test Tests for each IV the null hypothesis that the independent variable has no effect toward the dependent variable.
Proc reg: exercise Let price be a response variable (dependent variable (DV)), and mpg and length be explanatory variables (independent variables (IV)) Q1 What will be the commands? Q2 What null hypotheses will be tested? Q3 Will the model be significant?
Proc reg: answers Q1 proc reg data = car.auto; model price = mpg length; run; Q2 F-test T-test
Proc reg Q3
Proc reg: Confidence and prediction interval Constructing 95% confidence and prediction interval by adding two options, clm and cli How would you add these options in the case of previous model? proc reg data=car.auto; model price = mpg length / clm cli; run;
Proc reg: creating an output table Add outest = file name after the proc reg command proc reg data=car.auto outest=car.est1; model price = mpg length /clm cli; run; quit; In order to see the output data file car.est1 you need to add the statement quit in the end. No semicolon here
You can drop the categories you do not want to see by using the keep or drop statement e.g. data car.est2 (keep=intercept mpg length); set car.est1; run; data car.est3 (drop=price _model_ _depvar_ _type_ _RMSE_); set car.est1; run;
Proc reg: creating an output table To see other outputs go to Help and type in REG and go into The REG procedure. Click Syntax
Exporting the output data to Excel General commands proc export data = Name of the SAS data file you are exporting outfile = The name of the drive or the pass to the folder of your computer dbms = excel2000 replace; run; How would you export the file car.est2 into an Excel file? Ans. proc export data = car.est2 outfile = c:/est.xls" dbms = excel2000 replace; run;
Useful supports: other useful sites Online SAS manuals This will automatically link you to sas9doc.html Statbookstore: useful site for finding program examples