Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.

Similar presentations


Presentation on theme: "Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1."— Presentation transcript:

1 Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1

2 Chapter 4: Preparing Data for Analysis SAS ESSENTIALS -- Elliott & Woodward2

3 LEARNING OBJECTIVES  To be able to label variables with explanatory names  To be able to create new variables  To be able to use SAS® IF-THEN-ELSE statements  To be able to use DROP and KEEP to select variables  To be able to use the SET statement  To be able to use PROC SORT  To be able to append and merge data sets  To be able to use PROC FORMAT  Going Deeper: To be able to find first and last values in a group SAS ESSENTIALS -- Elliott & Woodward3

4 The information in this chaper is about the DATA Step…  RECALL… typical SAS program flow: DATA Step – defines the data set. PROCS are procedures that do statistical procedures All of the information in this chapter involve the DATA Step SAS ESSENTIALS -- Elliott & Woodward4

5 4.1 LABELING VARIABLES WITH EXPLANATORY NAMES  SAS labels are used to provide descriptive names for variables.  The LABEL statement uses the format: LABEL VAR1 ' Label for VAR1 ' VAR2 'Label for VAR2';  Use either single or double quotation marks in the LABEL statement, but you must match the type within each definition statement.  When SAS prints out information for VAR1, it also includes the label, making the output more readable. SAS ESSENTIALS -- Elliott & Woodward5

6 Labeling Variables  The following program (DLABEL. SAS): illustrates the use of labels. DATA MYDATA; INFILE 'C:\SASDATA\BPDATA.DAT'; * READ DATA FROM FILE; INPUT ID $ 1 SBP 2-4 DBP 5-7 GENDER $ 8 AGE 9-10 WT 11-13; LABEL ID = 'Identification Number' SBP= 'Systolic Blood Pressure' DBP = 'Diastolic Blood Pressure' AGE = 'Age on Jan 1, 2000' WT = 'Weight' ; PROC MEANS; VAR SBP DBP AGE WT; RUN;  Notice that the LABEL statement is placed within the DATA step. SAS ESSENTIALS -- Elliott & Woodward6

7 Hands On Exercise p 77  Output without labels:  And with labels: SAS ESSENTIALS -- Elliott & Woodward7

8 4.2 CREATING NEW VARIABLES  Arithmetic Operators (Table 4.4) + Addition ‑ Subtraction * Multiplication/ Division ** Exponentiation Examples:SUM= X+Y;Addition DIF=X ‑ Y;Subtraction TWICE=X*2; Multiplication HALF=X/2;Division CUBIC=X**3;Exponentiation SAS ESSENTIALS -- Elliott & Woodward8

9 Order of Operations…  You may remember the following mnemonic from a math class that can help you remember the order of operations: “Please excuse my dear Aunt Sally.”  Parentheses Exponents Multiplication Division Addition Subtraction  Do Hands-On Exercise p 79 (DCALC.SAS) – Creates a new variables by calculation. SAS ESSENTIALS -- Elliott & Woodward9

10 Creating New Variables as Constant Values Type in this program and run it: DATA PI; INPUT RADIUS; PI=3.1415927; AREA=PI* RADIUS**2; DATALINES; 10 100 1000 ; PROC PRINT;RUN; When you look at the output, notice that PI is a variable in the data set. NOTE that the value of PI is a constant used in a subsequent calculation. SAS ESSENTIALS -- Elliott & Woodward10

11 4.3 USING IF-THEN-ELSE CONDITIONAL STATEMENRT ASSIGNMENTS  Another way to create a new variable in the DATA step is to use the IF-THEN- ELSE conditional statement construct Format is: IF expression THEN statement; ELSE statement; Thus… IF SBP GE 140 THEN HIGHBP=1; ELSE HIGHBP=0; Creates a variable named HIGHBP with the values 1 or 0. SAS ESSENTIALS -- Elliott & Woodward11

12 Comparison Operators IF SBP GE 140 THEN HIGHBP=1; ELSE HIGHBP=0; A comparison operator tells SAS how to evaluate a condition – in this care Greater Than or Equal To. SAS ESSENTIALS -- Elliott & Woodward12

13 Logical Operators IF AGE GT 19 AND GENDER=”M” then GROUP=1; or IF TREATMENT EQ “A” ~ GROUP=2 THEN CATEGORY=”GREEN”; SAS ESSENTIALS -- Elliott & Woodward13

14 A more complex use of operators IF TRT=”A” THEN GROUP=1; ELSE IF TRT=”B” OR TRT=”C” THEN GROUP=2; ELSE GROUP=3; Uses an ELSEIF Clause SAS ESSENTIALS -- Elliott & Woodward14

15 Do Hands-On Exercise p 83  File DCONDITION.SAS  Pay attention to this statement IF SBP GE 140 then STATUS="HIGH"; else STATUS="OK"; SAS ESSENTIALS -- Elliott & Woodward15

16 Using IF to Assign Missing Values  Be Careful: Data sets often contain missing data codes to record when data are missing. For example for the variable age you might assign an impossible value, say -9, as a missing value code. Then… IF AGE EQ -9 then AGE =. ; In your DATA Step assigns the SAS missing value code. (dot) to AGE when the value is -9. You MUST do this for SAS to know how to handle missing values in statistical procedures. SAS ESSENTIALS -- Elliott & Woodward16

17 What could go wrong here? IF AGE GT 12 AND AGE LT 20 THEN TEEN=1;ELSE TEEN = 0;  Suppose this is your data IDAGEWHAT VALUE FOR TEEN? 00112 00220 00319 004. SAS ESSENTIALS -- Elliott & Woodward17

18 A better way… IF AGE GT 12 AND AGE LT 20 THEN TEEN=1;ELSE TEEN = 0; IF AGE =. THEN TEEN =.; Guarantees that if AGE is already missing, TEEN will also be coded as missing. SAS ESSENTIALS -- Elliott & Woodward18

19 Do Hands-On p 84  Uses another method to create TEEN IF AGE=. Then TEEN=.; ELSE IF AGE GT 12 and AGE LT 20 then TEEN=1; ELSE TEEN=0; SAS ESSENTIALS -- Elliott & Woodward19

20 Using IF and IF-THEN To Subset Data Sets  Data sets can be quite large. You may have a data set that contains some group of subjects (records) that you want to eliminate from your analysis. In that case, you can subset the data so it will contain only those records you need.  One method of eliminating certain records from a data set is to use a subsetting IF statement in the DATA step. The syntax for this statement is as follows: IF expression; SAS ESSENTIALS -- Elliott & Woodward20

21 Subsetting IF  For example, to select records containing the (character) value F (only females) from a data set, you could use this statement within a DATA step: IF GENDER EQ 'F';  Note that you can use single or double quotation marks("F" or ‘F’) in this statement. SAS ESSENTIALS -- Elliott & Woodward21

22 Subsetting with IF… DELETE  The opposite effect can be created by including the statement THEN DELETE at the end of the statement: IF expression THEN DELETE;  For example, to get rid of certain records (all males) in a data set, you could use the code IF GENDER EQ 'M' THEN DELETE;  Do Hands-Ion Example p 86 SAS ESSENTIALS -- Elliott & Woodward22

23 Using IF-THEN and DO for Program Control  Another use of the IF statement is to control the flow of your SAS program in conjunction with a DO statement. In this case, you can cause a group of SAS commands to be conditionally executed by using the following type of code: IF expression THEN DO; SAS Code to conditionally execute; END; SAS ESSENTIALS -- Elliott & Woodward23

24 Using @ and IF to Conditionally Read Lines in a Data Set  For big data sets, it is often the case that you don't want to read in all of the data. One method you could use to conditionally read in certain records is to set up a test condition and read in the record only if it meets that condition.  To do this, you can use the @ (at) sign in your input statement: INPUT GP $ 5 AGE 6-9 @ ; IF GP EQ "A" and AGE GE 10 THEN INPUT ID $ 1 -3 GP $ 5 AGE 6-9 TIME1 10-14 TIME2 15-19; Notice the use of the @ here – stops input so you can use an IF statement… SAS ESSENTIALS -- Elliott & Woodward24

25 4.4 USING DROP AND KEEP TO SELECT VARIABLES  The DROP and KEEP statements in the DATA step allow you to specify which variables to retain in a data set: DROP variables; KEEP variables;  For example, DATA MYDATA; INPUT A B C D E F G; DROP E F; DATALINES;... etc...  Do Hands on Example p 88 SAS ESSENTIALS -- Elliott & Woodward25

26 4.5 USING THE SET STATEMENT TO READ AN EXISTING DATA SET SAS ESSENTIALS -- Elliott & Woodward26  Suppose you have a big data set you want to use – modified. Don’t modify your ORIGINAL data set – modify a copy. Original Data set Modified Copy

27 Another way to “ENTER” data  Suppose you already have a data set named OLD. You can make a “copy” using DATA NEW; SET OLD; Or DATA NEW; SET “C:\SASDATA\OLD”;  Now the NEW data set is identical to OLD. You can now modify NEW without changing the original data set. SAS ESSENTIALS -- Elliott & Woodward27

28 Creating a Data Set from an Existing Data Set SAS ESSENTIALS -- Elliott & Woodward28

29 Using SET - Example 1  Suppose you have a data set named MYSASLIB.ALL. You want to create two subsets, FEMALE and MALE. DATA MALES; SET ALL; IF GENDER ='M'; RUN; DATA FEMALES; SET ALL; IF GENDER =‘F'; RUN;  Now you have three data sets to work with…ALL, MALES and FEMALES. SAS ESSENTIALS -- Elliott & Woodward29 Creates a data set with only Males Creates a data set with only Females

30 Using SET - Example 2  You receive a data set from the government, and you need to modify it before using it: DATA MYSASLIB.STUDY; SET MYSASLIB.GOV; IF AGE =-9 then AGE=.; IF SBP=-99 then SBP=.; * A NUMBER OF RECODES; BMI = WT /( HT **2) * 703; * A CALCULATION; *Etc;  Thus… you’re manipulating a copy of the original data… and not changing the original file. SAS ESSENTIALS -- Elliott & Woodward30 This is the original data set This is the new (copied) data set

31 Do Hands-On Exercise p 90  Using a subsetting IF statement…  DSUBSET3.SAS SAS ESSENTIALS -- Elliott & Woodward31

32 4.6 USING PROC SORT  The SORT procedure can be used in the DATA step to rearrange the observations in a SAS data set or create a new SAS data set containing the rearranged observations.  The Sorting Sequence is shown in the table: Sorting sequence information for SAS data sets Character variables blank!"#$%&'()*+, ‑./0123456789:; ?@ ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_abcdefghijklmnop qrstuvwxyz(|)~ Numeric variables: Missing values first, then numeric values Default Ascending (or indicate Descending) SAS ESSENTIALS -- Elliott & Woodward32

33 Syntax for PROC SORT The syntax for PROC SORT is: PROC SORT ; BY variable(s); Common options for PROC SORT include: DATA=datasetname; OUT= outputdatasetname; SAS ESSENTIALS -- Elliott & Woodward33

34 Example of PROC SORT PROC SORT DATA=MYDATA OUT=MYSORT; BY RECTIME;  This example sorts the MYDATA data set by RECTIME and puts the resulting data set into a new dataset named MYSORT. The original data set is NOT CHANGED.  Do Hands-On Examples p 92 & 93. (DSORT1.SAS, DSORT2.SAS) SAS ESSENTIALS -- Elliott & Woodward34 OPTIONS – Specifies that a new resulting data set be created named MYSORT Requires a BY variable.

35 4.7 APPENDING AND MERGING DATA SETS SAS ESSENTIALS -- Elliott & Woodward35  Appending adds new records to an existing data set. (It is sometimes called a vertical merge.)  Merging adds variables to a data set through the use of a key identifier that is present in both data sets (usually an identification code.) (It is sometimes called a horizontal merge.)

36 Appending Data Sets  Appending is accomplished by including multiple data set names in the SET statement. For example, DATA NEW; SET OLD1 OLD2; NOTE: You can append many files: DATA NEW; SET OLD1 OLD2 OLD3 OLD4 etc;  Do Hands-On Example p 95

37 Merging Data Sets by a Key Identifier  What is a Key Identifier?  Usually a variables at is an ID number, Subject Number, Patient Number, etc  Must be unique for each person  There does not have to be the same number in each data set to merge  Unmatching IDs will result in missing values  First, we’ll look at a one-to-one match

38 Two Steps to a Merge: Sort, then Merge  The technique for merging the data sets using some key identifier (such as patient ID) is as follows: 1. Sort each data set by the key identifier. 2. Within a DATA step, use the MERGE statement along with a BY statement to merge  Example: PROC SORT DATA=PRE; BY CASE; PROC SORT DATA=POST; BY CASE; DATA PREPOST; MERGE PRE POST; BY CASE; SORT each data set BY the key identifier Then perform the merge – by the SAME key identifier

39 Rename While Merging SAS ESSENTIALS -- Elliott & Woodward39  As with the SORT statement, you can RENAME, DROP, and KEEP variables during the MERGE. You can also merge many files at a time. The following shows the syntax for merging four data sets and performing a RENAME, DROP, and KEEP on the third data set DATA newdataset; MERGE datal data2 data3 (RENAME=(oldname=newname) DROP=variables or KEEP=variables)) data4; BY keyvar;RUN;  Do Hands On example page 97 (DMERGE1.SAS) Note that the RENAME and DROP are occurring in the DATA3 data set in this example

40 Few-To-Many-Merge  A Few-To-Many merge is used when you have records in one data set that you want to merge into some table that contains (typically) a smaller number of categories.  Suppose you own an auto parts store. You sell products to several kinds of buyers – and each get a particular discount.  You want to produce a report that shows the amount of actual sales price for a number of purchases.

41 Data for Few-To-Many Merge (Hands-On p 99) SAS ESSENTIALS -- Elliott & Woodward41 This is the “MANY” data set. This is the “FEW” data set

42 How to set up the few-to many (match) merge…  Define the Discounts (FEW) data set:  Repair Shops: 33% Discount  CONSUMERS 0% Discount  Other Auto Stores 40% Discount  Define the TYPE data set (The “FEW”) DATA TYPE; FORMAT BUYERTYPE $8.; INPUT BUYERTYPE DISCOUNT; DATALINES; REPAIR.33 CONSUMER 0 STORE.40 ; Note here that because you use a FORMAT statement to specify the format of BUYERTYPE, you don’t have to indicate type in the INPUT statement. Otherwise, that statement would have to be INPUT BUYERTYPE $ DISCOUNT;

43 Define the “MANY” data set DATA SALES; FORMAT ITEM $20. BUYERTYPE $8.; INPUT ITEM BUYERTYPE PRICE; DATALINES; CARBCLEANER REPAIR 2.30 BELT CONSUMER 6.99 MOTOROIL CONSUMER 14.34 CHAIN STORE 18.99 SPARKPLUGS REPAIR 28.99 CLEANER CONSUMER 1.99 WRENCH STORE 18.88 ; Note FORMAT Statement This is the “MANY” data set – in real life this may be thousands of transactions…

44 Prepare the two data sets for the merge PROC SORT DATA=SALES; BY BUYERTYPE; PROC SORT DATA=TYPE; BY BUYERTYPE;  And do the merge… DATA REPORT; MERGE SALES TYPE; BY BUYERTYPE; FINAL =ROUND(PRICE*(1-DISCOUNT),.01); RUN; PROC PRINT DATA=REPORT;RUN; * GET REPORT;

45 Few-to-Many Merge Results Note: Final price reflects the proper discount

46 4.8 USING PROC FORMAT SAS ESSENTIALS -- Elliott & Woodward46  The PROC FORMAT procedure allows you to create your own custom formats.  These custom formats allow you to specify the information that will be displayed for selected values of a variable.  For example, suppose you’ve coded DISEASED and NOT DISEASED as 0 and 1. You can create a format where 0 means DISEASED and 1 means NOT DISEASED so when output is displayed the words instead of the number codes appear.

47 Using PROC FORMAT  The steps for using formatted values are 1. Create a FORMAT definition using PROC FORMAT. 2. Apply the FORMAT to one or more variables. You can apply a format (once it is defined in PROC FORMAT in a DATA step or in a data analysis PROC statement. 3. For example: PROC FORMAT; VALUE FMTMARRIED 0 = "No" l = "Yes"; RUN; Choose any name for the format (similar restrictions as for SAS variables.) We name them as FMTsomething to make the name obvious.

48 Numeric and Character Formats PROC FORMAT; VALUE fmtname1 number1=“name1” number2=“name2” etc; VALUE $fmtname2 “textname1”=“name1” “textname2”=“name2” etc; RUN; Define a format for a numeric variable. For a character variable, the format name must start with a $, and the textnames must be in quotes.

49 Example – Numeric and Character Definitions PROC FORMAT; VALUE FMTMARRIED 0="No" 1="Yes"; VALUE $FMTGENDER “M”=“Male” “F”=“Female”; RUN; Numeric format defined. Character format defined – take note of format name $FMTGENDER and the values “M” and “F” are in quotes

50 Ways to specify formats SAS ESSENTIALS -- Elliott & Woodward50  Formats may also use ranges. For example, suppose that you want to classify your AGE data using the designations Child, Teen, Adult, and Senior. You could do this with the following format: PROC FORMAT; Value FMTAGE LOW- 12 = 'Child' 13,14,15,16,17,18,19 = 'TEEN' 20 - 59 = 'Adult' 60 - HIGH = 'Senior'; RUN;  Do Hands On Exercise p 102 (DFORMAT1.SAS) Note different ways to indicate ranges.

51 Assigning Formats to Variables SAS ESSENTIALS -- Elliott & Woodward51  You can also assign the same format to several variables. If you have questionnaire data with variables names Q1, QS, Q7 where each question is coded as 0 and 1 for answers Yes and No, respectively, and you have a format called FMTYN, you could use that FORMAT in a procedure as in the following example: PROC PRINT; FORMAT Q1 Q5 Q7 FMTYN. ; RUN; Assigns the same format (FMTYN) to three variables. Note the dot at the end of the assigned format (REQUIRED)

52 Format Assignments (Data Set vs PROC) SAS ESSENTIALS -- Elliott & Woodward52  Assign formats to variables within PROC STATEMENTS – Example: PROC PRINT; FORMAT GENDER $FMTGENDER. ; RUN;  Or in DATA statements DATA MYDATA;SET OLDDATA; FORMAT GENDER $FMTGENDER. ; RUN; Assigning a FORMAT in a DATA statement makes the format permanent in that data set. Assigning a FORMAT in a PROC makes the format assignment only within that PROC (temporary)

53 Creating Permanent Formats SAS ESSENTIALS -- Elliott & Woodward53  In all the previous examples, formats were applied in a PROC step and are considered temporary formats.  When you assign a format in a DATA step, you can also store those formats in a (permanent) format catalog.  For example, to store an SAS format in a specified permanent library location, you could use code such as PROC FORMAT LIBRARY= MYSASLIB ;  In this case, the MYSASLIB refers to an SAS library location you have previously created. Creates a FORMAT LIBRARY

54 FORMAT Libraries SAS ESSENTIALS -- Elliott & Woodward54  For example, the code PROC FORMAT LIBRARY = MYSASLIB; VALUE FMTMARRIED O="No" RUN; l="Yes 11 ; VALUE $FMTGENDER "F" = "Female" 11M11 = 11Male11 ;  creates two subfolders in the MYSASLIB.FORMATS Formats Library folder named FMTMARRIED and $FMTGENDER.

55 View Formats Folder SAS ESSENTIALS -- Elliott & Woodward55  That is, when you create an SAS format catalog, a folder icon appears in the designated SAS Library.  In this case, it is named FORMATS and appears in the MYSASLIB library. You can verify its existence by examining the MYSASLIB library using SAS Explorer.  If you double click on the FORMATS folder, you will see sub folders named with the names of the formats you have created.

56 Contents of a Format Folder SAS ESSENTIALS -- Elliott & Woodward56  Click on the FMTMARRIED Formats folder to see its contents – the definition of the format:

57 Tell SAS About Your Formats SAS ESSENTIALS -- Elliott & Woodward57  Once you have created permanent formats, you can use them in both PROC and DATA step statements. To tell SAS the location of a particular format, use the statement OPTIONS FMTSBARCH=(proclib);  where PROCLIB is the name of the SAS Library where your formats folder is located.

58 Using Stored SAS Formats SAS ESSENTIALS -- Elliott & Woodward58  For example, if you have previously created and stored the FMTMARRIED and $FMTGENDER formats in your MYSASLIB. FORMATS folder, you could use the following code to access those formats with PROC PRINT (or any PROC.) OPTIONS FMTSEARCH=(MYSASLIB.FORMATS); PROC PRINT DATA="C:\SASDATA\SURVEY"; VAR SUBJECT MARRIED GENDER; FORMAT MARRIED FMTMARRIED. GENDER $FMTGENDER.; RUN; Tells SAS where the formats are located…

59 Discovering SAS Formats SAS ESSENTIALS -- Elliott & Woodward59  To discover what formats are in a particular format library, you can use the PROC CATALOG procedure as shown here. This code displays all of the formats stored in the MYSASLIB. FORMATS library. PROC CATALOG CATALOG = MYSASLIB.FORMATS; CONTENTS; RUN; QUIT;  Do Hands On Example p 106

60 When Your SAS Format Library is Missing SAS ESSENTIALS -- Elliott & Woodward60  Suppose you have a. SASB7DAT file that uses created formats, but you do not have the format library? If you attempt to use that data set, you will get the following error message in the log. ERROR: The format FMTMARRIED was not found or could not be loaded.  If this occurs, you must use the following OPTIONS statement (above the code where you refer to the data set) to tell SAS to access the data set, or run the procedure without using the defined formats: OPTIONS NOFMTERR;  In this case, the output displays the raw values of the variables instead of the assigned format labels.

61 Know the Difference: Format vs Label  A common mistake is to try to use Labels as Formats or Formats as Labels. Make sure you know the difference:  LABELS are descriptions for variables Label AGE=“AGE in 2013”;  FORMATS are description for values VALUE FMTMARRIED 0="No“ 1="Yes";

62 4.9 GOING DEEPER: FINDING FIRST AND LAST VALUES SAS ESSENTIALS -- Elliott & Woodward62  Suppose that you want to identify the first and last person (ID) in each of those groups.  In an SAS DATA step, you identify the first and last values by FIRST.GP and LAST.GP, where GP is the name of the sorted grouping (or key) variable.  Do Hands on Example p 107. (DFINDFIRST.SAS)

63 4.10 SUMMARY SAS ESSENTIALS -- Elliott & Woodward63  This chapter discussed several techniques for preparing your data for analysis. In the next chapter, we begin the discussion of SAS procedures that perform analyses on the data.  Continue to Chapter 5: Preparing to Use SAS Procedures


Download ppt "Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1."

Similar presentations


Ads by Google