SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 8, 13, & 24 By Tasha Chapman, Oregon Health Authority.

Slides:



Advertisements
Similar presentations
Programming with App Inventor Computing Institute for K-12 Teachers Summer 2012 Workshop.
Advertisements

Chapter 7 Introduction to Procedures. So far, all programs written in such way that all subtasks are integrated in one single large program. There is.
Repetition Control Structures
CS0004: Introduction to Programming Repetition – Do Loops.
Chapter 10 Introduction to Arrays
Programming with Alice Computing Institute for K-12 Teachers Summer 2011 Workshop.
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. 1 Chapter 4 – C Program Control Outline 4.1Introduction.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Adrian Ilie COMP 14 Introduction to Programming Adrian Ilie July 5, 2005.
1 Objectives You should be able to describe: Relational Expressions The if-else Statement Nested if Statements The switch Statement Common Programming.
Introduction to Computers and Programming Lecture 8: More Loops New York University.
Loops – While, Do, For Repetition Statements Introduction to Arrays
Introduction to Computers and Programming More Loops  2000 Prentice Hall, Inc. All rights reserved. Modified for use with this course.
ECE122 L11: For loops and Arrays March 8, 2007 ECE 122 Engineering Problem Solving with Java Lecture 11 For Loops and Arrays.
COMP 14 Introduction to Programming Miguel A. Otaduy May 20, 2004.
1 CSCE 1030 Computer Science 1 Arrays Chapter 7 in Small Java.
 Pearson Education, Inc. All rights reserved Arrays.
COMP 110 Introduction to Programming Mr. Joshua Stough September 24, 2007.
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
The University of Texas – Pan American
Chapter 14: Generating Data with Do Loops OBJECTIVES Understand iterative DO loops. Construct a DO loop to perform repetitive calculations Use DO loops.
Lecture Set 5 Control Structures Part D - Repetition with Loops.
CHAPTER 07 Arrays and Vectors (part I). OBJECTIVES 2 In this part you will learn:  To use the array data structure to represent a set of related data.
C++ for Everyone by Cay Horstmann Copyright © 2012 by John Wiley & Sons. All rights reserved For Loops October 16, 2013 Slides by Evan Gallagher.
08/10/ Iteration Loops For … To … Next. 208/10/2015 Learning Objectives Define a program loop. State when a loop will end. State when the For.
Chapter 16 Processing Variables with Arrays Objectives Group variables into one- and two-dimensional arrays Perform an action on array elements Create.
ASP.NET Programming with C# and SQL Server First Edition Chapter 3 Using Functions, Methods, and Control Structures.
Chapter 4: Decision Making with Control Structures and Statements JavaScript - Introductory.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
2 Objectives You should be able to describe: Relational Expressions Relational Expressions The if-else Statement The if-else Statement Nested if Statements.
Programming Logic and Design Fifth Edition, Comprehensive
Computer Science 12 Mr. Jean May 2 nd, The plan: Video clip of the day Review of common errors in programs 2D Arrays.
Spring 2005, Gülcihan Özdemir Dağ Lecture 7, Page 1 BIL104E: Introduction to Scientific and Engineering Computing, Spring Lecture 7 Outline 7. 1.
Copyright © 2012 Pearson Education, Inc. Chapter 6 More Conditionals and Loops Java Software Solutions Foundations of Program Design Seventh Edition John.
An Introduction to Programming with C++ Sixth Edition Chapter 7 The Repetition Structure.
Visual Basic Programming
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter Arrays, Timers, and More 8.
 2007 Pearson Education, Inc. All rights reserved C Arrays.
Controlling Input and Output
 2008 Pearson Education, Inc. All rights reserved. 1 Arrays and Vectors.
Think Possibility 1 Iterative Constructs ITERATION / LOOPS C provides three loop structures: the for-loop, the while-loop, and the do-while-loop. Each.
BMTRY 789 Lecture 6: Proc Sort, Random Number Generators, and Do Loops Readings – Chapters 5 & 6 Lab Problem - Brain Teaser Homework Due – HW 2 Homework.
ALGORITHMS AND FLOWCHARTS. Why Algorithm is needed? 2 Computer Program ? Set of instructions to perform some specific task Is Program itself a Software.
JavaScript, Sixth Edition
 2005 Pearson Education, Inc. All rights reserved Arrays.
An Introduction to Programming with C++ Sixth Edition Chapter 5 The Selection Structure.
CC213 Programming Applications Week #2 2 Control Structures Control structures –control the flow of execution in a program or function. Three basic control.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapter 26 By Tasha Chapman, Oregon Health Authority.
4 - Conditional Control Structures CHAPTER 4. Introduction A Program is usually not limited to a linear sequence of instructions. In real life, a programme.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapter 25 By Tasha Chapman, Oregon Health Authority.
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. 1 Chapter 3 - Structured Program Development Outline.
Data Structures & Algorithms CHAPTER 2 Arrays Ms. Manal Al-Asmari.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 16 & 17 By Tasha Chapman, Oregon Health Authority.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
Copyright © 2014 Pearson Addison-Wesley. All rights reserved. 4 Simple Flow of Control.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 7 & 10 By Tasha Chapman, Oregon Health Authority.
Chapter 4 – C Program Control
Loop Structures.
CHAPTER 5A Loop Structure
The Selection Structure
Chapter 5: Repetition Structures
JavaScript: Control Statements.
Arrays, For loop While loop Do while loop
MSIS 655 Advanced Business Applications Programming
Chapter 6: Repetition Structures
Chapter 5: Repetition Structures
Iteration: Beyond the Basic PERFORM
CHAPTER 4 Iterative Structure.
EPSII 59:006 Spring 2004.
REPETITION Why Repetition?
Presentation transcript:

SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 8, 13, & 24 By Tasha Chapman, Oregon Health Authority

Topics covered…  DO Loops  DO Groups  Sum statement  Iterative DO loops  DO Until/DO While  BY-group Processing  FIRST. / LAST.  Arrays

Cody’s rules of SAS programming “If you are writing a SAS program, and it is becoming very tedious, stop. There is a good chance that there is a SAS tool that will make your task less tedious.”

DO Groups

If, Then, Else If Score >= 90 Then Grade = 'A'; ELSE If Score >= 80 Then Grade = 'B'; ELSE If Score >= 70 Then Grade = 'C'; ELSE If Score >= 60 Then Grade = 'D'; ELSE If Score < 60 Then Grade = 'F'; StudentScoreGrade Jane75C Dave56F Jack90A Sue68D

If, Then, Else If Score >= 90 Then Pass_Fail = 'Pass'; ELSE If Score >= 80 Then Pass_Fail = 'Pass'; ELSE If Score >= 70 Then Pass_Fail = 'Pass'; ELSE If Score >= 60 Then Pass_Fail = 'Fail'; ELSE If Score < 60 Then Pass_Fail = 'Fail'; StudentScoreGradePass_Fail Jane75CPass Dave56FFail Jack90APass Sue68DFail

If, Then, Else

IF THEN DO; ; ; ; END; If Score >= 90 Then Do; Grade = 'A'; Pass_Fail = 'Pass'; End; DO Groups Get done all the stuff you need in just one pass

DO Groups  DO Groups can be nested within each other

DO Groups DO Group #1 DO Group #2

DO Groups DO Group #A DO Group #2 DO Group #C DO Group #B Each DO Group must begin with a DO; and end with an END; Each DO Group must begin with a DO; and end with an END; DO Group #1

Sum statement

 Adds the result of an expression to an accumulator variable  Allows you to calculate running totals or counters in your dataset variable + expression

Sum statement How do we calculate a running total?

Sum statement Creates a variable called “Total” (initial value = 0) Adds the value of Revenue for each observation

Sum statement Will skip over missing data

Sum statement Can be used with conditional logic

Iterative DO Loops

Iterative DO Loop  Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year? Create a variable “Interest” with a value of.0375 (for all observations)

Iterative DO Loop  Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year? Create a variable “Balance” with an initial value of 100 (to be modified later by SUM statements)

Iterative DO Loop  Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year? Create a variable “Year” Add 1 to “Year” Add “Interest*Balance” to Balance Output – explicit instruction to write out an observation to the dataset

Iterative DO Loop  Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year? Ditto

Iterative DO Loop  Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year? …but there’s an easier way…

Iterative DO Loop  Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year?

Iterative DO Loop  Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year?

Iterative DO Loop  Want to invest $100 with a 3.75% interest rate. What is the fund balance at the end of each year? Nested DO loops

DO Until  Want to invest $100 with a 3.75% interest rate. When will the fund balance reach $200? DO UNTIL : Keep running the loop until the condition is true

DO While  Want to invest $100 with a 3.75% interest rate. When will the fund balance reach $200? DO WHILE : Keep running the loop until the condition is false

DO Loop Whoops  When using UNTIL or WHILE, make sure that your condition becomes true at some point  Otherwise you could end up in an infinite loop! Loop will run forever because the balance will never equal exactly 200

DO Loop Whoops  When using UNTIL or WHILE, make sure that your condition becomes true at some point  Otherwise you could end up in an infinite loop! Safeguard alternative: Loop will run until condition is true or 100 times, whichever comes first

A review of DO  DO group processing  Designates a group of statements to be executed as a unit  Iterative DO loop  Executes statements repetitively based on the value of an index variable  DO UNTIL  Executes DO loop until a condition is true  Checks the condition after the iteration of each DO loop  DO WHILE  Executes DO loop until a condition is false  Checks the condition before the iteration of each DO loop

BY-group processing

BY statement (PROC Print redux)  id statement – Assigns an observation ID based on listed variable (instead of OBS number)  by statement – Produces a separate section of the report for each BY group  pageby statement – Creates a page break after each BY group (not shown)  Must use be used with BY statement From Week 6 – Chapters 14 & 19

BY statement (PROC Print redux) From Week 6 – Chapters 14 & 19

BY statement (MERGE redux)  DATA step merge From Week 4 – Chapters 7 & 10

BY-group processing  BY group is a set of observations with the same BY value  BY-group processing is a method of processing observations that are grouped by this common value  Can be invoked in both DATA steps and PROC steps using a BY statement  Every PROC and DATA step with BY statement must use dataset sorted (or indexed) by BY variable

Clinic dataset IDVisitDateDxDx_DescHRSBPDBP 10110/21/20054GI Problems /25/20062Cold /1/20051Routine Visit /18/20051Routine Visit /1/20063Heart Problems /1/20063Heart Problems /10/20061Routine Visit /1/20056Injury /2/20051Routine Visit /15/20061Routine Visit /6/20067Infection /15/20067Infection

Clinic dataset IDVisitDateDxDx_DescHRSBPDBP 10110/21/20054GI Problems /25/20062Cold /1/20051Routine Visit /18/20051Routine Visit /1/20063Heart Problems /1/20063Heart Problems /10/20061Routine Visit /1/20056Injury /2/20051Routine Visit /15/20061Routine Visit /6/20067Infection /15/20067Infection Multiple visits per patient

FIRST. / LAST. IDFirst.IDLast.ID When using the BY statement, SAS creates two temporary variables: FIRST.var and LAST.var These can be used for identifying the first and last observation in a group When using the BY statement, SAS creates two temporary variables: FIRST.var and LAST.var These can be used for identifying the first and last observation in a group

 When was the first visit for each patient? FIRST. / LAST. Observations grouped by patient (ID) with the first visit at the top of the list

 When was the first visit for each patient? FIRST. / LAST. Using the BY statement will create the temp variables FIRST.ID and LAST.ID

 When was the first visit for each patient? FIRST. / LAST. The subsetting IF statement will only include the first visit for each patient in the new dataset (Initial_Visit)

Clinic dataset IDVisitDateDxDx_DescHRSBPDBP 10110/21/20054GI Problems /25/20062Cold /1/20051Routine Visit /18/20051Routine Visit /1/20063Heart Problems /1/20063Heart Problems /10/20061Routine Visit /1/20056Injury /2/20051Routine Visit /15/20061Routine Visit /6/20067Infection /15/20067Infection Multiple visits for same issue per patient

 When was the first visit for each health issue for each patient? FIRST. / LAST. Observations grouped by patient (ID), then diagnosis, with the first visit at the top of the list

 When was the first visit for each health issue for each patient? FIRST. / LAST. Using the BY statement will create the temp variables FIRST.ID, LAST.ID, FIRST.Dx_Desc, and LAST.Dx_Desc

FIRST. / LAST. IDDx_DescFirst.IDLast.IDFirst.Dx_DescLast.Dx_Desc 101GI Problems Cold Heart Problems Heart Problems Routine Visit Routine Visit Routine Visit Injury Routine Visit Routine Visit Infection Infection 0101

 When was the first visit for each health issue for each patient? FIRST. / LAST. Using the BY statement will create the temp variables FIRST.ID, LAST.ID, FIRST.Dx_Desc, and LAST.Dx_Desc Subsetting IF statement will only include first visit for each new diagnosis per patient

 How many visits did each patient have per diagnosis? FIRST. / LAST. Every time a new Dx group is encountered (FIRST.Dx_Desc = 1), N_visits is reset to 0

 How many visits did each patient have per diagnosis? FIRST. / LAST. For each observation encountered in the group, N_visits is incremented by 1 (using the SUM statement)

 How many visits did each patient have per diagnosis? FIRST. / LAST. When the last observation in the group is encountered (LAST.Dx_Desc = 1), an observation is written to the new dataset (Count_Visits)

Sampling  BY-group processing can also be used as a quick and dirty way to get a random sample  If you need to use a statistically rigorous sampling method, use PROC SurveySelect (part of SAS/STAT)

Sampling  Need to randomly select 25 records per coder for proofing Creates a dummy variable (X) that generates a random number for every observation

Sampling  Need to randomly select 25 records per coder for proofing Grouped by Coder_ID and randomly sorted by X

Sampling  Need to randomly select 25 records per coder for proofing Every time a new Coder_ID group is encountered, Count is reset to 0 For each observation encountered in the group, Count is incremented by 1

Sampling  Need to randomly select 25 records per coder for proofing If the Count is less than or equal to 25 (i.e. the first 25 observations per coder), then the observation is output to the new dataset (“Sample”)

Sampling  Need to randomly select 25 records per coder for proofing The dummy variables created for this process (X and Count) are dropped from the final dataset

A review of BY  By-group processing can be a useful way of dealing with groups of observations  Can be used for:  De-duping observations  Finding the first or last observation  Counting or summing observations  Comparing observations  Finding a quick and dirty random sample  …and much more.

Arrays

 SAS Arrays are a collection of elements defined as a single group  Arrays allow you to write SAS statements referencing a group of variables  SAS Arrays are different than arrays in many other programming languages

Example array  Let’s say you have a dataset created in SPSS where all unknown numeric data is set to 999. How do you convert this to a missing value? Performing the same calculation on multiple variables …maybe there’s an easier way…

Example array  Let’s say you have a dataset created in SPSS where all unknown numeric data is set to 999. How do you convert this to a missing value?

Example array Define the array List all the variables you want to perform the manipulation on

Example array Do the DO Use an iterative DO loop to run through all seven variables

Example array Drop i i is just the temp variable created for the iterative DO loop

Array statement ARRAY array-name {subscript} ;  array-name : specifies the name of the array  Think of it as an alias for this group of variables  Cannot be the name of an existing SAS variable in the same DATA step  Should not be the name of a SAS function

Array statement ARRAY array-name {subscript} ;  subscript : describes the number and arrangement of elements in the array  Dimension-size(s) Explicitly specify number of elements in the array  Lower/Upper bounds Range from 1 to n  Asterisk Have SAS count the variables in the array

Array statement ARRAY array-name {subscript} ;  $ : specifies that the elements in the array are character (optional)  Useful when array creates new variables  length : specifies the length of the elements in the array (optional)  Useful when array creates new variables

Array statement ARRAY array-name {subscript} ;  array-elements : the elements (variables) that make up the array (optional)  Must be either all character or all numeric  Can be listed in any order  Can use keywords _NUMERIC_, _CHARACTER_, or _ALL_  Can also use _TEMPORARY_ to create an array of temporary elements  initial-value-list : initial values for the elements in the array (optional)

Array statement  A simple (and common) array statement looks like this: ARRAY array-name {subscript} array-elements; Name of the array Number of elements in the array List of elements in the array

Example array Variable nameArray reference Height oldvars{1} Weight oldvars{2} Age oldvars{3} SBP oldvars{4} DBP oldvars{5} Temp oldvars{6} HR oldvars{7}

Example array if oldvars{1} = 999 then oldvars{1} =.; if Height = 999 then Height =.;

More examples of arrays  Convert monthly average temperature from Fahrenheit to Celsius

More examples of arrays  If the DART rate is missing at the full NAICS level, impute missing values with the DART rate at the 3- digit NAICS level

More examples of arrays  Collapse monthly income into quarterly income

* and Dim()  Use the asterisk {*} as the subscript to have SAS count the elements for you  Cannot use with an array of temporary elements or multidimensional arrays  Use the DIM function in the DO Loop to return the stop value by counting the number of elements in the array

Creating character variables  By default, newly created variables will be numeric  Use the $ to denote that they should be character  May also need to define the length

Temporary arrays  You can create a temporary array of values to use during the DO Loop  The array only exists for the duration of the DATA step  Useful for storing constant values used in calculations

Temporary arrays  How do you apply a performance bonus to monthly income?

A review of arrays  Whenever you need to run a set of variables through the same DATA step manipulations – think arrays!  Can be used to:  Read data  Compare variables  Create many variables with the same attributes  Perform repetitive calculations  Transpose datasets  …and more!

Additional reading Summing with SAS DO which? Loop, Until, or While? The power of the BY statement A closer look at FIRST.var and LAST.var Arrays made easy: An introduction to arrays and array processing Arrays in SAS Using SAS Arrays to Manipulate Data

Read chapter 25 For next week…