# Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.

## Presentation on theme: "Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language."— Presentation transcript:

Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language

3 Eysenck’s Data File

4 Open the SAS Program Double-click the lecture7.sas File Double-click the lecture7.sas File – Press the Run Icon (Runner Image) Editor Editor – Create and Modify SAS Command Files – Can Save in the Stat 6337 Folder : File / Save As … Log Log – Messages about the Compilation and Execution of the SAS Program – Contains Error Messages (in red), if any – Can Save in the Stat 6337 Folder : File / Save As … Output Output – Results of the Execution of the SAS Program – Can Save in the Stat 6337 Folder : File / Save As … To Erase the Contents of the Log or Output Files Right Click, Select “Clear All”

5 SAS Structure DATA Step DATA Step – Describe the data, provide names for variables, define new or transformed variables PROC s : SAS Procedures PROC s : SAS Procedures – Descriptive Statistics: Proc Univariate, Proc Means – Graphics: Proc Chart, Proc Plot – Regression: Proc Reg – Two-sample t-tests: Proc Ttest – Analysis of Variance: Proc Anova, Proc GLM, Proc Mixed – Specialized Data Operations: Proc Sort – etc.

6 SAS Syntax Every command MUST end with a semicolon Every command MUST end with a semicolon – Commands can continue over two or more lines – This WILL be Your #1, #2 & #3 Mistakes !!!! Variable names are 1-8 characters (letters and numerals, beginning with a letter or underscore), but no blanks or special characters Variable names are 1-8 characters (letters and numerals, beginning with a letter or underscore), but no blanks or special characters – Note: values for character variables can exceed 8 characters Comments Comments – Begin with *, end with ; – Can comment several lines: begin with /* and end with */

7 Data Input in the SAS File Data fname ; Data fname ; – creates temporary file with the data that are described in the data step Input name... name \$... ; Input name... name \$... ; – list input: lists the variable names (1 – 8 characters/letters), name is assumed to be a quantitative variable – name MUST be followed by \$ if name is a character variable – alternatives: comma separated, column specified Datalines (or Cards ) ; Datalines (or Cards ) ; – indicates that the data follow, line by line ; – indicates that the last line of data has been input, the semicolon is on a line by itself Example: lecture7class.sas Example: lecture7class.sas – Open lecture7class.sas » Change filename, if necessary – Clear output and log files; Run lecture7class.sas

8 Data Input with Multiple Responses on a Single Line of the Data File SAS Requires that Each Response Value be on a Separate Line of Data SAS Requires that Each Response Value be on a Separate Line of Data When n Responses are on One Line of Data When n Responses are on One Line of Data – Input y1 y2 … yn – y = y1; output; – y = y2; output; –... – y = yn; output; If y1 …yn Represent Responses for n Levels of a Factor If y1 …yn Represent Responses for n Levels of a Factor – Input y1 y2 … yn – factor = ‘Level 1’; y = y1; output; – factor = ‘Level 2’; y = y2; output; –... – factor = ‘Level n’; y = yn; output; Example: Example: lecture7.sas – Data Flow2 Creates n Data Lines with 1 Response Value on Each Line Creates n Data Lines with 1 Factor & Response Value on Each Line

9 Data Input from an External File Filename fn ‘complete directory/file specification’ ; Filename fn ‘complete directory/file specification’ ; filename eysdata ‘c:/Stat6337/EysenckRecall.dat’ – e.g., filename eysdata ‘c:/Stat6337/EysenckRecall.dat’ – Be Careful with Spaces in Directories and File Names !!! Data fname ; Data fname ; – creates temporary file with the data that are described in the data step Infile fn ; Infile fn ; – input the data from the file labeled fn Input name... name \$... ; Input name... name \$... ; – lists the variable names (1 – 8 characters/letters), name is assumed to be a quantitative variable – name MUST be followed by \$ if name is a character variable Run ; Run ; – indicates that the data step is completed Example: Example: lecture7class.sas – Data Recall

10 Program Data Vector One line of data is stored, as indicated on the Input statement of the Data Step One line of data is stored, as indicated on the Input statement of the Data Step Any calculations, deletions, etc. in the Data Step are performed on that line of data Any calculations, deletions, etc. in the Data Step are performed on that line of data When the Data Step is completed, the variables in the Program Data Vector are output to a temporary (work) file When the Data Step is completed, the variables in the Program Data Vector are output to a temporary (work) file Can force data lines to be written at any time with the Output statement Can force data lines to be written at any time with the Output statement

11 Operations in the Data Step Arithmetic Operations Arithmetic Operations – x = u + v ; Transformations Transformations – x = log(y) ; Logical Logical – If x > 0 then z = y/x ; Recoding Recoding – If gender = ‘m’ then gender = ‘Male’; else if gender = ‘f’ then gender = ‘Female’; – Note: SAS formats based on the first value of a variable – To force a length (e.g., character variable), use length

12 Titles and Labels Title# ‘…’ ; Title# ‘…’ ; – Up to 10 title lines: title# ‘include your title here’; – Can be placed in Data Steps or Procs – Changing Title# replaces that title and eliminates Titlex, where x > # Label name = ‘…’ ; Label name = ‘…’ ; – Can be in a Data Step or Proc Print

13 Some Useful PROCs Proc Chart – vertical or horizontal bar charts Proc Freq – frequency distributions, cross tabs Proc Means – select summary statistics Proc Plot – scatterplots Proc Print – prints data files Proc Sort – sorts data files by the values of one or more variables Proc Univariate – a wide range of summary statistics, box plots

14 General Form of PROCs PROC xxxx data=fname options; by groups; proc-specific statements; title... ; output out = fn... ; run ;

15 Printing to the Output File Proc Print data = fname ; Proc Print data = fname ; – var... ;lists the variables to be printed (can be omitted) – run ;indicates the print commands are complete

16 Group Analyses Sort the Groups Sort the Groups – Proc Sort data= … ; – by group; – run; Execute the Proc, by Group Execute the Proc, by Group – Proc xxx data= … ; – by group; –... – run;

17 Summarize the Recall Data Calculate frequencies for each condition/group and each age Proc Freq Graph a histogram of the recall data Proc Chart Calculate the average, standard deviation, minimum, and maximum to 2 decimal places Proc Means

18 Summarize the Recall Data Calculate descriptive statistics for each condition/group Proc Means, Proc Univariate Note: Sort First, then Use the BY Command. Graph Average Recall for All Combinations of Recall Condition/Group and Age Use a Group Identifier as the Plotting Symbol Proc Plot

19 Proc Anova Only for Complete Factorial Experiments in Completely Randomized Designs Only for Complete Factorial Experiments in Completely Randomized Designs – Otherwise: Proc GLM MUST have an Equal Number of Repeats for Each Factor-Level Combination MUST have an Equal Number of Repeats for Each Factor-Level Combination

20 Proc Anova Proc Anova data = fn ; Proc Anova data = fn ; – By … ; » Separate ANOVA Fits for Each Value of the BY variable(s). – Class … ; » List all the factors. – Model … / options; » e.g., model recall = age group age*group ; factors: list individually; e.g. age group interactions: connect with asterisk(s); e.g., age*group – Means … / options; » e.g., means age group age*group / t bon; – Run;

21 Eysenck’s Study of Incidental Learning Make analysis of variance calculations, use only recall condition as factor. Calculate factor-level averages, with the t option.

22 Effect of Cocaine Usage on Newborn Infant Body Lengths Research Question: Do Mean Body Lengths (cm) Differ by Cocaine Usage? Research Question: Do Mean Body Lengths (cm) Differ by Cocaine Usage? Usage Groups: First Trimester Throughout Pregnancy Drug-Free

23 Effect of Cocaine Usage on Newborn Infant Body Lengths

24 AssignmentAssignment Create a Data File Create a Data File Input the Data File into a SAS Program Input the Data File into a SAS Program Cocaine Usage Groups Cocaine Usage Groups – Calculate Averages and Standard Deviations – Make Comparative Box Plots – Test the Equality of the Group Means Email Me ONLY the FINAL.log File Email Me ONLY the FINAL.log File

Download ppt "Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language."

Similar presentations