Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Urban Institute - SAS Training6/9/20161 SAS Training This SAS Training Course was designed to introduce users at The Urban Institute to SAS programming.

Similar presentations


Presentation on theme: "The Urban Institute - SAS Training6/9/20161 SAS Training This SAS Training Course was designed to introduce users at The Urban Institute to SAS programming."— Presentation transcript:

1 The Urban Institute - SAS Training6/9/20161 SAS Training This SAS Training Course was designed to introduce users at The Urban Institute to SAS programming and provide them with the fundamental skills to begin working with SAS. The curriculum was created by the Training Committee of The Urban Institute’s SAS Users’ Group. Frustrated?

2 The Urban Institute - SAS Training6/9/20162 OBJECTIVES To provide basic SAS coding tools To supplement to manuals written by the SAS Institute To serve as an accessible guide To serve as a reference for more advanced users ENJOY!

3 The Urban Institute - SAS Training6/9/20163 WHY SAS? A simple method of organizing data Perform data analysis much more efficiently Means of storing and manipulating data There are other statistical packages, such as STATA and SPSS. However, technical support for them is not provided at The Urban Institute.

4 The Urban Institute - SAS Training6/9/20164 SAS in Windows When you open SAS, you will have three windows: Program Editor: where you write and correct programs Log: shows the program running and shows notes and error messages Output: displays the output you request in your program, such as tables and variable lists

5 The Urban Institute - SAS Training6/9/20165 HOW SAS ORGANIZES DATA The SAS system stores data in SAS data sets, which are structured as records (or observations) and variables. A typical SAS data set looks like this: OBSNAME AGE GENDER PHONE 1 Johnny 14 M 5516 2Michelle 15 F 5865 3Thy 16 F 5562 4Peter 13 M 5588 Each row is a record and each column is a variable. In this case, this data set has four records and four variables (NAME, AGE, GENDER, and PHONE). OBS is not a variable, but is internally created by SAS so that records are numbered.

6 The Urban Institute - SAS Training6/9/20166 SAMPLE SAS PROGRAM /******************************************* Sample.sas Created by UISUG September 20, 1999 A sample program for our students to look at. ********************************************/; libname sasdat ‘d:\data'; data one; set sasdat.final; if var1=3; total=var1+var2; label total='total of var1 and var2'; proc print data=one; run; Documentation Libname Datastep Procedure

7 The Urban Institute - SAS Training6/9/20167 MAIN COMPONENTS OF A SAS PROGRAM Library Name Data Step Procedures

8 The Urban Institute - SAS Training6/9/20168 LIBNAMES libname: This statement defines a reference name for a directory where a SAS data set either (1) is currently stored or (2) will be stored. libname ‘ ’; PC Syntax: libname sasdat ‘d:\data’; Alpha Syntax: libname sasdat ‘diskname:[myaccount.data]’;

9 The Urban Institute - SAS Training6/9/20169 DATA STEP data statement: for creating a SAS data set. set statement: for accessing a SAS data set. Syntax: data ; set ; libname sasdat ‘d:\data’; data tempdat; set sasdat.testdat;

10 The Urban Institute - SAS Training6/9/201610 Two types of SAS data sets: permanent and temporary. To access a permanent SAS data set called "testdat", located in the directory as identified in the libname statement and create a temporary SAS data set called "tempdat": libname sasdat ‘d:\data’; data tempdat; set sasdat.testdat; DATA STEP Temporary and Permanent SAS Datasets Temporary data set Permanent data set

11 The Urban Institute - SAS Training6/9/201611 To access a permanent SAS data set called "testdat2", located in the directory identified in the libname statement, and create a permanent SAS data set called "permdat” in the same directory, the data step looks like this: data sasdat.permdat; set sasdat.testdat2; Both Permanent Datasets DATA STEP Creating a Permanent SAS Dataset

12 The Urban Institute - SAS Training6/9/201612 READING RAW DATA Raw data are data that have not been organized and managed by the SAS system. Filename statement: defines a reference for the raw data set. Infile statement: indicates that a raw data set is being read. data ; infile ‘ '; filename ‘ '; data ; infile ; Both 1 and 2 accomplish the same thing. 1 2

13 The Urban Institute - SAS Training6/9/201613 READING RAW DATA with an INPUT statement INPUT statements specify variables within raw data. data tempdat; infile ‘d:\data\data.asc'; input var1 var2 var3; filename raw ‘d:\data\data.acs'; data tempdat; infile raw; input var1 var2 var3; List Input

14 The Urban Institute - SAS Training6/9/201614 To read in character data: data tempdat; infile ‘d:\data\data.asc'; input var1 var2 var3 $; There are 4 ways to specify variables with an input statement: List Column Formatted Named List Input with Character Data READING RAW DATA with an INPUT statement

15 The Urban Institute - SAS Training6/9/201615 Using Column Input: data tempdat; infile ‘d:\data\data.asc'; input var1 1-8 var2 9-12 var3 $ 13-15; READING RAW DATA with an INPUT statement

16 The Urban Institute - SAS Training6/9/201616 READING RAW DATA To enter data manually use cards data sasdat.permdat1; input var1 var2 var3; cards; 1 2 2 1 4 3 1 6 2 ;

17 The Urban Institute - SAS Training6/9/201617 LIMITING SAS DATASETS with KEEP/DROP Dataset Options To limit the number of variables using data set options: data (keep/drop= ); set (keep/drop= ); data tempdat; set sasdat.permdat (drop=var2); data tempdat (keep=var1 var3); set sasdat.permdat;

18 The Urban Institute - SAS Training6/9/201618 To limit the number of variables using KEEP or DROP statements: data tempdat set sasdat.permdat; keep var1 var3; This does the same function as using a KEEP= or DROP= data set option on a DATA statement. LIMITING SAS DATASETS with KEEP/DROP Statements

19 The Urban Institute - SAS Training6/9/201619 LIMITING SAS DATASETS To limit the number of observations with the OBS data set option: data tempdat; set sasdat.permdat (obs=20); Limiting with a where statement: data tempdat; set sasdat.permdat2; where v3<=2;

20 The Urban Institute - SAS Training6/9/201620 LIMITING SAS DATASETS Limiting with a subsetting IF statement: data tempdat; set sasdat.permdat2; if v3 le 2; Operators: lt<less than gt>greater than eq=equal to le<=less than or equal to ge>=greater than or equal to ne~=not equal to

21 The Urban Institute - SAS Training6/9/201621 LIMITING SAS DATASETS Points to remember: SAS data value is case sensitive. If the value was entered in caps, you must reference it in caps. The difference between referencing a numeric and a character variable is that the character value must always be in quotes. data tempdat; set sasdat.permdat2; where initial=‘T'and v1=1;

22 The Urban Institute - SAS Training6/9/201622 CREATING VARIABLES Creating the sum of variables: TOTAL=V1+V2+V3; Creating variables with assigned values if total=9 then tot9=1; else tot9=0; if total=9 then tot9='Y'; else tot9='N';

23 The Urban Institute - SAS Training6/9/201623 CREATING VARIABLES Using IF THEN ELSE Statements Multiple processes on variables that meet the condition: if then do; ; end; else do; ; end;

24 The Urban Institute - SAS Training6/9/201624 For example: if total=9 then do; tot9=1; totsqu=total*total; end; else do; tot9=0; totsqu=0; end; CREATING VARIABLES Using IF THEN ELSE Statements

25 The Urban Institute - SAS Training6/9/201625 CREATING VARIABLES Points to remember: If the condition is not met and there is no ELSE clause, the observation will not be processed and the new variable that you have created will have a value of ‘missing’. Remember to include an end statement to mark the end of the DO/END block.

26 The Urban Institute - SAS Training6/9/201626 CREATING VARIABLES Using LABEL Statements After you have created variables, it is helpful to label them so that you or others using the SAS data set will understand what the variable is. The syntax for doing this is as follows: label total='sum of v1 v2 and v3' tot9='flags where total equals 9' totsqu='total squared'; NOTE: A variable label is limited to 40 characters, including blanks.

27 The Urban Institute - SAS Training6/9/201627 Using the RUN Statement RUN statements execute the previously entered SAS statements. Final program: libname sasdat ‘d:\data'; data one; set sasdat.final; if var1=3; total=var1+var2; label total='total of var1 and var2'; run;

28 The Urban Institute - SAS Training6/9/201628 PROC CONTENTS This procedure provides basic information about a data set, including number of observations, number of variables, variable names, types, and labels. Running this procedure is usually the first step in working with a data set for the first time. proc contents data=datasetname; title ‘titlename'; proc contents data=sasdat.manual; title ‘Contents of Sample data set'; run;

29 The Urban Institute - SAS Training6/9/201629 PROC FORMAT This procedure has no printed output, but instead is used to clarify the output of other procedures. proc format; value name options range value-range value='formatted value'; proc format; value agefmt -99 = ‘Missing' 0-<18 = ‘under 18' 18-<25 = ‘18 to 24'; run;

30 The Urban Institute - SAS Training6/9/201630 PROC PRINT This simple procedure is useful for looking at the actual values of variable(s) for individual observations. proc print data=datsetname (options); var variable-name; title ‘titlename'; proc print data=sasdat.manual(obs=10); title ‘Print of Sample data set'; title2 ‘First 10 observations'; var age race gender initial id bbcrds87 height wt nbbal nbbnl nfoot nhoop; run;

31 The Urban Institute - SAS Training6/9/201631 PROC PRINT You can restrict the observations that are printed by using a where statement in the proc print. proc print data=sasdat.manual; title ‘Print of Sample data set where v3=2'; where v3=2; run; You can print the variable labels by using the label option. proc print data=sasdat.manual label; title ‘Print of Sample data set with labels’; run;

32 The Urban Institute - SAS Training6/9/201632 PROC SORT This procedure sorts a data set by one or more variables. The syntax is: proc sort data=directoryname.datasetname out=directoryname.datasetname; by variablename; proc sort data=sasdat.manual out=sortman; by id;

33 The Urban Institute - SAS Training6/9/201633 PROC MEANS This procedure provides information about the characteristics of the numerical variables in a data set, including the minimum, maximum and mean value. Running this procedure is often useful for checking the validity of a new data set or of newly created variables. proc means data=datasetname options; var variable-name; title ‘titlename'; proc means data=sasdat.manual; var nbbal nbbnl nfoot nhoop; title ‘Means of Manual sample data'; run;

34 The Urban Institute - SAS Training6/9/201634 PROC FREQ This procedure provides information about the distribution of values for variables of any type in a data set, including the frequency, percent, cumulative frequency, and cumulative percent. proc freq data=datasetname; tables variables /options; format variable formatname.; weight variable; title ‘titlename';

35 The Urban Institute - SAS Training6/9/201635 PROC FREQ proc freq data=sasdat.manual; tables age*race; weight wt; title ‘Distribution of Variables’; run;

36 The Urban Institute - SAS Training6/9/201636 PROC UNIVARIATE This procedure provides detailed statistics on a variable, including many about its distribution. The most commonly used statistics provided are mean, quantiles, standard deviation, variance, range, and extreme values, although, as evidenced in the output below, many more statistics are given as well. The syntax for proc univariate is: proc univariate data=directoryname.datasetname; var variable-name; proc univariate data=sasdat.manual; var age;

37 The Urban Institute - SAS Training6/9/201637 PROC SUMMARY This procedure creates a SAS data set containing summary statistics. It creates no output by default, but statistics can be printed using the print option. Proc summary is often interchangable with proc means and it computes the same statistics. The syntax is: proc summary data=directoryname.data setname nway; var variablename; class variablename; output out=outputdataset-name sum=sumvariable-name;

38 The Urban Institute - SAS Training6/9/201638 PROC SUMMARY proc summary data=sasdat.manual nway missing; var bbcrds87; class race; output out=temp sum=sumbbcrd; This is what the data set "temp" looks like: OBSRACE _TYPE_ _FREQ_ SUMBBCRD 1.11. 211792112 32114.

39 The Urban Institute - SAS Training6/9/201639 PROC TABULATE This procedure enables you to compute various statistics to summarize data and control the format of the report. The syntax for proc tabulate is: proc tabulate data=directoryname.datasetname; class class-variables; var variable-name; table page, row, column; run;

40 The Urban Institute - SAS Training6/9/201640 PROC TABULATE Table Operators: comma, go to new table dimension (rows, columns) blank concatenate tables asterisk * cross or subgroup proc tabulate data=sasdat.manual; class gender; var bbcrds87; table gender*bbcrds87; title '$ spent on baseball cards in 1987, by gender'; run;

41 The Urban Institute - SAS Training6/9/201641 PROC TABULATE Proc Tabulate Output: $ spent on baseball cards in 1987, by gender „ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ† ‚ Gender ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ 1 ‚ 2 ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ $ spent on ‚ $ spent on ‚ ‚ baseball ‚ baseball ‚ ‚ cards 1987 ‚ cards 1987 ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ SUM ‚ SUM ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ 1475.00‚ 637.00‚ Šƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ

42 The Urban Institute - SAS Training6/9/201642 COMBINING SAS DATA SETS Often you will wish to combine data from two or more existing SAS data sets. There are two ways to do this: CONCATENATE - You concatenate data sets when you want to add observations with the same variables. MERGE - You merge data sets when you want to add variables to existing observations.

43 The Urban Institute - SAS Training6/9/201643 MERGING SAS DATA SETS There are three steps necessary to create a data set by merging multiple data sets. Identify one or more variables common to all data sets by which you want to merge. Sort each data set by these variables Merge the data sets in the data step.

44 The Urban Institute - SAS Training6/9/201644 MERGING DATA SETS proc sort data=sasdat.manual; by ID; proc sort data=sasdat.manual2; by ID; data newfile; merge sasdat.manual sasdat.manual2; by ID; run;

45 The Urban Institute - SAS Training6/9/201645 MERGING DATA SETS To include only direct matches in your new data set : data newfile; merge sasdat.manual(IN=a) sasdat.manual2(IN=b); by ID; if a and b; run;

46 The Urban Institute - SAS Training6/9/201646 CONCATENATING DATA SETS The code to do this is: data newfile; set sasdat.manual sasdat.manual3; run; To concatenate data sets in sort order, use a by statement. data newfile; set sasdat.manual sasdat.manual3; by id; run;

47 The Urban Institute - SAS Training6/9/201647 DOCUMENTATION Documentation should include but is not limited to the following items: 1) The title of the program. 2) The creator and the date on which the program was created. 3) A description of what the program does. 4) Any modifications and the dates on which these were made. 5) A list of any related programs and data sets and an explanation as to how they are relevant. 6) Explanations throughout the program of what you are doing and in some instances why. 2 ways to make comments:*comment method 1; /*comment method 2*/

48 The Urban Institute - SAS Training6/9/201648 DOCUMENTATION *------------ Manual.SAS Created by C.Y. Wilson on January 15, 1997 This program creates a data set that includes totals. It uses data from final created with final.sas Modified January 16, 1997-added label statement -----------*; libname sasdat ‘d:\data'; data sasdat.total; set sasdat.final; if var1=3; *selecting only cases where var1=3; total=var1+var2; *create total used in next data step; label total='total of var1 and var2'; /* proc print data=sasdat.total; */ run;

49 The Urban Institute - SAS Training6/9/201649 PC ENVIRONMENT In PC SAS, a run statement must be at the end of each block of commands you submit. To run a program, use the ‘running man' button or the submit command from the "Locals" menu. To recall the last program submitted, use the F4 key, or the Recall command from the "Locals" menu. PC SAS output can be saved and edited in Word or WordPerfect and formatted nicely with Courier 9 or 10 font. You can submit part of a program by highlighting that part and submitting.

50 The Urban Institute - SAS Training6/9/201650 PC ENVIRONMENT In each window, new text will be added to old text unless old text is cleared. When in the program editor, be sure of what you are saving. If you clear a program, start writing another, and then choose save, the new lines of the program will overwrite the old program. Use the save as command to prevent this from happening. When you open SAS, everything that you do is part of the same session. Certain options that you specify will carry over. There is a way to retrieve the contents of a data set if you think that you have overwritten it.

51 The Urban Institute - SAS Training6/9/201651 ALPHA ENVIRONMENT The Alpha is our large platform computer which you can access using the PC in your office. The main use of the Alpha is to run SAS jobs on large data sets. The Alpha storage is separated into disks and each disk contains individual accounts where people store and use large data sets. Every project that requires using the Alpha will be assigned an account.

52 The Urban Institute - SAS Training6/9/201652 SAS TIPS Use proc means, proc freq and proc contents to learn about new data sets. Only read in the variables that you need. Sort only the variables that you need. Especially when using large data sets, programs should be tested on a subset of the data set. This saves CPU time.

53 The Urban Institute - SAS Training6/9/201653 SAS TIPS Label variables. Thoroughly document programs. ALWAYS read the log. Even if there are no error or warning messages, read note messages, and check things like observation and variable counts. When debugging, check carefully for missing semicolons ‘;' and other missing operands ‘+ * ()', particularly unmatched quotes and parentheses. SAS may not always detect these mistakes and may tell you there is a different problem.

54 The Urban Institute - SAS Training6/9/201654 SAS TIPS Most programmers use a pattern of indentation to make code more readable and easier to debug. Common indentation practices are: Indent everything after a data or proc statement up to the next data or proc statement. Indent everything after a do statement (including all do loops) up to the end statement.

55 The Urban Institute - SAS Training6/9/201655 OTHER SAS INFO Visit the UI SAS Users Group web page on the intranet, or the SAS Institute web page at: http://www.sas.com Need SAS manuals: contact Rena Yount x5695


Download ppt "The Urban Institute - SAS Training6/9/20161 SAS Training This SAS Training Course was designed to introduce users at The Urban Institute to SAS programming."

Similar presentations


Ads by Google