Presentation is loading. Please wait.

Presentation is loading. Please wait.

SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 8, 13, & 24 By Tasha Chapman, Oregon Health Authority.

Similar presentations


Presentation on theme: "SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 8, 13, & 24 By Tasha Chapman, Oregon Health Authority."— Presentation transcript:

1 SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 8, 13, & 24 By Tasha Chapman, Oregon Health Authority

2 Topics covered…  DO Loops  DO Groups  Sum statement  Iterative DO loops  DO Until/DO While  BY-group Processing  FIRST. / LAST.  Arrays

3 Cody’s rules of SAS programming “If you are writing a SAS program, and it is becoming very tedious, stop. There is a good chance that there is a SAS tool that will make your task less tedious.”

4 DO Groups

5 If, Then, Else If Score >= 90 Then Grade = 'A'; ELSE If Score >= 80 Then Grade = 'B'; ELSE If Score >= 70 Then Grade = 'C'; ELSE If Score >= 60 Then Grade = 'D'; ELSE If Score < 60 Then Grade = 'F'; StudentScoreGrade Jane75C Dave56F Jack90A Sue68D

6 If, Then, Else If Score >= 90 Then Pass_Fail = 'Pass'; ELSE If Score >= 80 Then Pass_Fail = 'Pass'; ELSE If Score >= 70 Then Pass_Fail = 'Pass'; ELSE If Score >= 60 Then Pass_Fail = 'Fail'; ELSE If Score < 60 Then Pass_Fail = 'Fail'; StudentScoreGradePass_Fail Jane75CPass Dave56FFail Jack90APass Sue68DFail

7 If, Then, Else

8 IF THEN DO; ; ; ; END; If Score >= 90 Then Do; Grade = 'A'; Pass_Fail = 'Pass'; End; DO Groups Get done all the stuff you need in just one pass

9 DO Groups  DO Groups can be nested within each other

10 DO Groups DO Group #1 DO Group #2

11 DO Groups DO Group #A DO Group #2 DO Group #C DO Group #B Each DO Group must begin with a DO; and end with an END; Each DO Group must begin with a DO; and end with an END; DO Group #1

12 Sum statement

13  Adds the result of an expression to an accumulator variable  Allows you to calculate running totals or counters in your dataset variable + expression

14 Sum statement How do we calculate a running total?

15 Sum statement Creates a variable called “Total” (initial value = 0) Adds the value of Revenue for each observation

16 Sum statement Will skip over missing data

17 Sum statement Can be used with conditional logic

18 Iterative DO Loops

19 Iterative DO Loop  Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year? Create a variable “Interest” with a value of.0375 (for all observations)

20 Iterative DO Loop  Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year? Create a variable “Balance” with an initial value of 100 (to be modified later by SUM statements)

21 Iterative DO Loop  Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year? Create a variable “Year” Add 1 to “Year” Add “Interest*Balance” to Balance Output – explicit instruction to write out an observation to the dataset

22 Iterative DO Loop  Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year? Ditto

23 Iterative DO Loop  Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year? …but there’s an easier way…

24 Iterative DO Loop  Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year?

25 Iterative DO Loop  Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year?

26 Iterative DO Loop  Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year? Nested DO loops

27 DO Until  Want to invest $100 with a 3.75% interest rate. When will the fund balance reach $200? DO UNTIL : Keep running the loop until the condition is true

28 DO While  Want to invest $100 with a 3.75% interest rate. When will the fund balance reach $200? DO WHILE : Keep running the loop until the condition is false

29 DO Loop Whoops  When using UNTIL or WHILE, make sure that your condition becomes true at some point  Otherwise you could end up in an infinite loop! Loop will run forever because the balance will never equal exactly 200

30 DO Loop Whoops  When using UNTIL or WHILE, make sure that your condition becomes true at some point  Otherwise you could end up in an infinite loop! Safeguard alternative: Loop will run until condition is true or 100 times, whichever comes first

31 A review of DO  DO group processing  Designates a group of statements to be executed as a unit  Iterative DO loop  Executes statements repetitively based on the value of an index variable  DO UNTIL  Executes DO loop until a condition is true  Checks the condition after the iteration of each DO loop  DO WHILE  Executes DO loop until a condition is false  Checks the condition before the iteration of each DO loop

32 BY-group processing

33 BY statement (PROC Print redux)  id statement – Assigns an observation ID based on listed variable (instead of OBS number)  by statement – Produces a separate section of the report for each BY group  pageby statement – Creates a page break after each BY group (not shown)  Must use be used with BY statement From Week 6 – Chapters 14 & 19

34 BY statement (PROC Print redux) From Week 6 – Chapters 14 & 19

35 BY statement (MERGE redux)  DATA step merge From Week 4 – Chapters 7 & 10

36 BY-group processing  BY group is a set of observations with the same BY value  BY-group processing is a method of processing observations that are grouped by this common value  Can be invoked in both DATA steps and PROC steps using a BY statement  Every PROC and DATA step with BY statement must use dataset sorted (or indexed) by BY variable

37 Clinic dataset IDVisitDateDxDx_DescHRSBPDBP 10110/21/20054GI Problems6812080 1012/25/20062Cold6812284 2559/1/20051Routine Visit76188100 25512/18/20051Routine Visit7418095 2552/1/20063Heart Problems79210110 2554/1/20063Heart Problems7218088 30310/10/20061Routine Visit7213884 4099/1/20056Injury8814292 40910/2/20051Routine Visit7213690 40912/15/20061Routine Visit6813084 7124/6/20067Infection5811870 7124/15/20067Infection5611872

38 Clinic dataset IDVisitDateDxDx_DescHRSBPDBP 10110/21/20054GI Problems6812080 1012/25/20062Cold6812284 2559/1/20051Routine Visit76188100 25512/18/20051Routine Visit7418095 2552/1/20063Heart Problems79210110 2554/1/20063Heart Problems7218088 30310/10/20061Routine Visit7213884 4099/1/20056Injury8814292 40910/2/20051Routine Visit7213690 40912/15/20061Routine Visit6813084 7124/6/20067Infection5811870 7124/15/20067Infection5611872 Multiple visits per patient

39 FIRST. / LAST. IDFirst.IDLast.ID 101 10 01 255 10 00 00 01 303 11 409 10 00 01 712 10 01 When using the BY statement, SAS creates two temporary variables: FIRST.var and LAST.var These can be used for identifying the first and last observation in a group When using the BY statement, SAS creates two temporary variables: FIRST.var and LAST.var These can be used for identifying the first and last observation in a group

40  When was the first visit for each patient? FIRST. / LAST. Observations grouped by patient (ID) with the first visit at the top of the list

41  When was the first visit for each patient? FIRST. / LAST. Using the BY statement will create the temp variables FIRST.ID and LAST.ID

42  When was the first visit for each patient? FIRST. / LAST. The subsetting IF statement will only include the first visit for each patient in the new dataset (Initial_Visit)

43 Clinic dataset IDVisitDateDxDx_DescHRSBPDBP 10110/21/20054GI Problems6812080 1012/25/20062Cold6812284 2559/1/20051Routine Visit76188100 25512/18/20051Routine Visit7418095 2552/1/20063Heart Problems79210110 2554/1/20063Heart Problems7218088 30310/10/20061Routine Visit7213884 4099/1/20056Injury8814292 40910/2/20051Routine Visit7213690 40912/15/20061Routine Visit6813084 7124/6/20067Infection5811870 7124/15/20067Infection5611872 Multiple visits for same issue per patient

44  When was the first visit for each health issue for each patient? FIRST. / LAST. Observations grouped by patient (ID), then diagnosis, with the first visit at the top of the list

45  When was the first visit for each health issue for each patient? FIRST. / LAST. Using the BY statement will create the temp variables FIRST.ID, LAST.ID, FIRST.Dx_Desc, and LAST.Dx_Desc

46 FIRST. / LAST. IDDx_DescFirst.IDLast.IDFirst.Dx_DescLast.Dx_Desc 101GI Problems 1011 101Cold 0111 255Heart Problems 1010 255Heart Problems 0001 255Routine Visit 0010 255Routine Visit 0101 303Routine Visit 1111 409Injury 1011 409Routine Visit 0010 409Routine Visit 0101 712Infection 1010 712Infection 0101

47  When was the first visit for each health issue for each patient? FIRST. / LAST. Using the BY statement will create the temp variables FIRST.ID, LAST.ID, FIRST.Dx_Desc, and LAST.Dx_Desc Subsetting IF statement will only include first visit for each new diagnosis per patient

48  How many visits did each patient have per diagnosis? FIRST. / LAST. Every time a new Dx group is encountered (FIRST.Dx_Desc = 1), N_visits is reset to 0

49  How many visits did each patient have per diagnosis? FIRST. / LAST. For each observation encountered in the group, N_visits is incremented by 1 (using the SUM statement)

50  How many visits did each patient have per diagnosis? FIRST. / LAST. When the last observation in the group is encountered (LAST.Dx_Desc = 1), an observation is written to the new dataset (Count_Visits)

51 Sampling  BY-group processing can also be used as a quick and dirty way to get a random sample  If you need to use a statistically rigorous sampling method, use PROC SurveySelect (part of SAS/STAT)

52 Sampling  Need to randomly select 25 records per coder for proofing Creates a dummy variable (X) that generates a random number for every observation

53 Sampling  Need to randomly select 25 records per coder for proofing Grouped by Coder_ID and randomly sorted by X

54 Sampling  Need to randomly select 25 records per coder for proofing Every time a new Coder_ID group is encountered, Count is reset to 0 For each observation encountered in the group, Count is incremented by 1

55 Sampling  Need to randomly select 25 records per coder for proofing If the Count is less than or equal to 25 (i.e. the first 25 observations per coder), then the observation is output to the new dataset (“Sample”)

56 Sampling  Need to randomly select 25 records per coder for proofing The dummy variables created for this process (X and Count) are dropped from the final dataset

57 A review of BY  By-group processing can be a useful way of dealing with groups of observations  Can be used for:  De-duping observations  Finding the first or last observation  Counting or summing observations  Comparing observations  Finding a quick and dirty random sample  …and much more.

58 Arrays

59  SAS Arrays are a collection of elements defined as a single group  Arrays allow you to write SAS statements referencing a group of variables  SAS Arrays are different than arrays in many other programming languages

60 Example array  Let’s say you have a dataset created in SPSS where all unknown numeric data is set to 999. How do you convert this to a missing value? Performing the same calculation on multiple variables …maybe there’s an easier way…

61 Example array  Let’s say you have a dataset created in SPSS where all unknown numeric data is set to 999. How do you convert this to a missing value?

62 Example array Define the array List all the variables you want to perform the manipulation on

63 Example array Do the DO Use an iterative DO loop to run through all seven variables

64 Example array Drop i i is just the temp variable created for the iterative DO loop

65 Array statement ARRAY array-name {subscript} ;  array-name : specifies the name of the array  Think of it as an alias for this group of variables  Cannot be the name of an existing SAS variable in the same DATA step  Should not be the name of a SAS function

66 Array statement ARRAY array-name {subscript} ;  subscript : describes the number and arrangement of elements in the array  Dimension-size(s) Explicitly specify number of elements in the array  Lower/Upper bounds Range from 1 to n  Asterisk Have SAS count the variables in the array

67 Array statement ARRAY array-name {subscript} ;  $ : specifies that the elements in the array are character (optional)  Useful when array creates new variables  length : specifies the length of the elements in the array (optional)  Useful when array creates new variables

68 Array statement ARRAY array-name {subscript} ;  array-elements : the elements (variables) that make up the array (optional)  Must be either all character or all numeric  Can be listed in any order  Can use keywords _NUMERIC_, _CHARACTER_, or _ALL_  Can also use _TEMPORARY_ to create an array of temporary elements  initial-value-list : initial values for the elements in the array (optional)

69 Array statement  A simple (and common) array statement looks like this: ARRAY array-name {subscript} array-elements; Name of the array Number of elements in the array List of elements in the array

70 Example array Variable nameArray reference Height oldvars{1} Weight oldvars{2} Age oldvars{3} SBP oldvars{4} DBP oldvars{5} Temp oldvars{6} HR oldvars{7}

71 Example array if oldvars{1} = 999 then oldvars{1} =.; if Height = 999 then Height =.;

72 More examples of arrays  Convert monthly average temperature from Fahrenheit to Celsius

73 More examples of arrays  If the DART rate is missing at the full NAICS level, impute missing values with the DART rate at the 3- digit NAICS level

74 More examples of arrays  Collapse monthly income into quarterly income

75 * and Dim()  Use the asterisk {*} as the subscript to have SAS count the elements for you  Cannot use with an array of temporary elements or multidimensional arrays  Use the DIM function in the DO Loop to return the stop value by counting the number of elements in the array

76 Creating character variables  By default, newly created variables will be numeric  Use the $ to denote that they should be character  May also need to define the length

77 Temporary arrays  You can create a temporary array of values to use during the DO Loop  The array only exists for the duration of the DATA step  Useful for storing constant values used in calculations

78 Temporary arrays  How do you apply a performance bonus to monthly income?

79 A review of arrays  Whenever you need to run a set of variables through the same DATA step manipulations – think arrays!  Can be used to:  Read data  Compare variables  Create many variables with the same attributes  Perform repetitive calculations  Transpose datasets  …and more!

80 Additional reading Summing with SAS DO which? Loop, Until, or While? The power of the BY statement A closer look at FIRST.var and LAST.var Arrays made easy: An introduction to arrays and array processing Arrays in SAS Using SAS Arrays to Manipulate Data

81 Read chapter 25 For next week…


Download ppt "SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 8, 13, & 24 By Tasha Chapman, Oregon Health Authority."

Similar presentations


Ads by Google