Use the UPDATE statement to: –update a master dataset with new transactions (e.g. a bank account updated regularly with deposits and withdrawals…). Not.

Slides:



Advertisements
Similar presentations
Haas MFE SAS Workshop Lecture 3:
Advertisements

The INFILE Statement Reading files into SAS from an outside source: A Very Useful Tool!
Chapter 17 Read Raw Data in Fixed Format using Formatted Input Objectives Distinguish between standard and nonstandard numeric data Read standard fixed-field.
A guide to the unknown…  A dataset is longitudinal if it tracks the same type of information on the same subjects at multiple points in time or space.
I OWA S TATE U NIVERSITY Department of Animal Science Modifying and Combing SAS Data Sets (Chapter in the 6 Little SAS Book) Animal Science 500 Lecture.
SAS Programming: Working With Variables. Data Step Manipulations New variables should be created during a Data step Existing variables should be manipulated.
What’s wrong NOW?! An introduction to debugging SAS programs for beginners Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie.
3.1 Data and Information –The rapid development of technology exposes us to a lot of facts and figures every day. –Some of these facts are not very meaningful.
Mean Comparison With More Than Two Groups
1 Computer Applications in Epidemiology Dongmei Li Lecture 26 5/6/2009.
Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09.
Let SAS Do the Coding for You! Robert Williams Business Info Analyst Sr. WellPoint Inc.
Chapter 18: Modifying SAS Data Sets and Tracking Changes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
Into to SAS ®. 2 List the components of a SAS program. Open an existing SAS program and run it. Objectives.
Introduction to SAS Lecture 2 Brian Healy.
Creating SAS® Data Sets
Topics in Data Management SAS Data Step. Combining Data Sets I - SET Statement Data available on common variables from different sources. Multiple datasets.
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
I OWA S TATE U NIVERSITY Department of Animal Science Writing Flexible Codes with the SAS Macro Facility (Chapter in the 7 Little SAS Book) Animal Science.
Lecture 5 Sorting, Printing, and Summarizing Your Data.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
PROC REPORT organizes the output in many ways, from the simple to highly complex… PROC REPORT NOWINDOWS HEADLINE HEADSKIP; COLUMN variable-list; DEFINE.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
Chapter 21 Reading Hierarchical Files Reading Hierarchical Raw Data Files.
An Animated Guide©: Sending SAS files to Excel Concentrating on a D.D.E. Macro.
Chapter 20 Creating Multiple Observations from a Single Record Objectives Create multiple observations from a single record containing repeating blocks.
BMTRY 789 Lecture 3: Categorical Data and Dates Readings – Chapter 3 & 4 Lab Problems 3.1, 3.2, 3.19, 4.1, 4.3, 4.5 Homework – HW 2 Book Problems Due 6/24!
Using Advanced INPUT Techniques Peter Cosette Dave Hall Amy Dunn-Ruiz Eric Lyon.
SAS Macro: Some Tips for Debugging Stat St. Paul’s Hospital April 2, 2007.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
EPIB 698C Lecture 2 Notes Instructor: Raul Cruz 2/14/11 1.
Haas MFE SAS Workshop Lecture 2: The Data Management Alex Vedrashko For sample code and these slides, see Peng Liu’s page
Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.
BMTRY 789 Lecture 11: Debugging Readings – Chapter 10 (3 rd Ed) from “The Little SAS Book” Lab Problems – None Homework Due – None Final Project Presentations.
Here’s another problem (see section 2.13 on page 54). A file contains two different types of records (say A’s and B’s) and we only want to read in the.
SAS Basics. Windows Program Editor Write/edit all your statements here. Log Watch this for any errors in program as it runs. Output Will automatically.
The Power of the BY Statement SVSUG Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)
Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently created dataset –all the observations –all the appropriate.
1 EPIB 698C Lecture 4 Raul Cruz-Cano Summer 2012.
Writing and Reading XML files with SAS (Statistical Analysis System) What is SAS ? SAS Institute (or SAS, pronounced "sass") is an American developer of.
1 Statistical Software Programming. STAT 6360 –Statistical Software Programming Sorting, Printing, Summarizing Data Now that we can input data and do.
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
Summer SAS Workshop Lecture 3. Summer SAS Workshop Website
1 Statistical Software Programming. STAT 6360 –Statistical Software Programming Modifying and Combining Datasets For most tasks we need to work with multiple.
Lesson 8 - Topics Creating SAS datasets from procedures Using ODS and data steps to make reports Using PROC RANK Programs in course notes LSB 4:11;5:3.
Controlling Input and Output
FOR MONDAY: Be prepared to hand in a one-page summary of the data you are going to use for your project and your questions to be addressed in the project.
SAS Basics. Windows Program Editor Write/edit all your statement here.
Time Series Data Processes by Tai Yu April 15, 2013.
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
FORMAT statements can be used to change the look of your output –if FORMAT is in the DATA step, then the formats are permanent and stored with the dataset.
Chapter 2 Getting Data into SAS Directly enter data into SAS data sets –use the ViewTable window. You can define columns (variables) with the Column Attributes.
“LAG with a WHERE” and other DATA Step Stories Neil Howard A.
LISA SHORT COURSE SERIES: INTRODUCTION TO SAS UNIVERSITY William DeShong Fall 2015.
Use the SET statement to: –create an exact copy of a SAS dataset –modify an existing SAS dataset by creating new variables, subsetting (using a subsetting.
Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
Online Programming| Online Training| Real Time Projects | Certifications |Online Classes| Corporate Training |Jobs| CONTACT US: STANSYS SOFTWARE SOLUTIONS.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 8, 13, & 24 By Tasha Chapman, Oregon Health Authority.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
Chapter 11 Reading SAS Data
By Sasikumar Palanisamy
Chapter 6: Modifying and Combining Data Sets
SAS Essentials How SAS Thinks
Hans Baumgartner Penn State University
Presentation transcript:

Use the UPDATE statement to: –update a master dataset with new transactions (e.g. a bank account updated regularly with deposits and withdrawals…). Not used a lot, but when you need it, it’s exactly what you need… –the general form is DATA master_data_set; UPDATE master_data_set transaction_data_set; BY variable_list;

Notes on the UPDATE statement: only two datasets can be specified (master & transactions) both sets must be SORTed by their common variables the values of the BY variables must by unique in the master set (e.g., only one account per account number in the master bank dataset…could be many transactions per account though) missing values in the transaction dataset don’t overwrite existing values in the master dataset.

*Go over the example in section 6.8 on page ; LIBNAME perm 'c:\MySASLib'; DATA perm.patientmaster; *INFILE fill in here; INPUT Account LastName $ 8-16 Address $ BirthDate MMDDYY10. Sex $ InsCode $ LastUpdate MMDDYY10.; RUN; /* Second Program */ LIBNAME perm 'c:\MySASLib'; DATA transactions; *INFILE fill in here; INPUT Account LastName $ 8-16 Address $ BirthDate MMDDYY10. Sex $ InsCode $ LastUpdate MMDDYY10.; PROC SORT DATA = transactions; BY Account; * Update patient data with transactions; DATA perm.patientmaster; UPDATE perm.patientmaster transactions; BY Account; PROC PRINT DATA = 'c:\MySASLib\patientmaster'; FORMAT BirthDate LastUpdate MMDDYY10.; TITLE 'Admissions Data'; RUN;

There are many SAS dataset OPTIONS. The list in section 6.9 is not comprehensive, but gives a flavor of what’s possible… RENAME = (oldvariable_name = newvariable_name) –this changes a variable’s name FIRSTOBS = n –this tells SAS the observation number on which to begin reading OBS = n –this tells SAS the observation number on which to stop reading IN = new_variable_name –this tells SAS to create a new variable (temporarily) to track whether an observation comes from that dataset (value=1) or not (value=0). Let’s try the example in section 6.10…

Here’s the customer data: 101 Murphy's Sports 115 Main St. 102 Sun N Ski 2106 Newberry Ave. 103 Sports Outfitters 19 Cary Way 104 Cramer & Johnson 4106 Arlington Blvd. 105 Sports Savers 2708 Broadway Here’s the orders data:

Here’s the SAS code to find the customers who didn’t place any orders: DATA customer; *INFILE fill-in TRUNCOVER; INPUT CustomerNumber Name $ 5-21 Address $ 23-42; DATA orders; *INFILE why no TRUNCOVER?; INPUT CustomerNumber Total; PROC SORT DATA = orders; BY CustomerNumber; * Combine the data sets using the IN= option; DATA noorders; MERGE customer orders (IN = Recent); BY CustomerNumber; IF Recent = 0; PROC PRINT DATA = noorders; TITLE 'Customers with No Orders in the Third Quarter'; RUN;

Now modify the code so you can see the effect of the IN= statement… take out the subsetting IF statement create a new variable whose values are those of the variable RECENT (why do I have to do this?) PRINT the entire dataset including this new one made from RECENT to see its effect. We may use the OUTPUT statement to create more than one dataset; e.g., DATA X Y Z; INPUT … ; This will create 3 identical datasets (named WORK.X, WORK.Y, and WORK.Z.). The next example uses IF … THEN statements to create different datasets with the OUTPUT statement.

/* Here’s the zoo data with feeding time as the last column. Create two datasets using the OUTPUT statement, one for each of the feeding times: morning and evening - be sure to put the animals in both datasets if they are fed at both times… */ bears Mammalia E2 both elephants Mammalia W3 am flamingos Aves W1 pm frogs Amphibia S2 pm kangaroos Mammalia N4 am lions Mammalia W6 pm snakes Reptilia S1 pm tigers Mammalia W9 both zebras Mammalia W2 am

DATA morning afternoon; *INFILE fill-in here; INPUT Animal $ 1-9 Class $ Enclosure $ FeedTime $; IF FeedTime = 'am' THEN OUTPUT morning; ELSE IF FeedTime = 'pm' THEN OUTPUT afternoon; ELSE IF FeedTime = 'both' THEN OUTPUT; PROC PRINT DATA = morning; TITLE 'Animals with Morning Feedings'; PROC PRINT DATA = afternoon; TITLE 'Animals with Afternoon Feedings'; RUN; We may also use OUTPUT statements to generate our own data and to create datasets from raw data formatted in unusual ways (see section 6.12 and below…)

dm log 'clear'; dm output 'clear'; options ls=80; DATA generate; DO x=1 to 10; y=x**2; z=sqrt(x); OUTPUT; END; PROC PRINT DATA=generate; run; quit; /* Put this into a raw datafile */ Jan Varsity Downtown Super Feb Varsity Downtown Super Mar Varsity Downtown Super *now read it in properly…; DATA theaters; *INFILE fill-in; INPUT Month $ Location $ OUTPUT; INPUT Location $ OUTPUT; INPUT Location $ Tickets; OUTPUT; PROC PRINT DATA = theaters; TITLE 'Ticket Sales'; RUN;

/* We may also convert observations to variables and vice versa… */ PROC TRANSPOSE DATA=old OUT=new; BY var_list; ID variable; VAR var_list; /* go over the example on p here’s the data… team name, player #, type of data, value of the salary or b.a. */ Garlics 10 salary Peaches 8 salary Garlics 21 salary Peaches 10 salary Garlics 10 batavg.281 Peaches 8 batavg.252 Garlics 21 batavg.265 Peaches 10 batavg.301

/* Here’s the SAS code… */ DATA baseball; *INFILE fill-in here; INPUT Team $ Player Type $ Entry; PROC SORT DATA = baseball; BY Team Player; PROC PRINT DATA = baseball; TITLE 'Baseball Data After Sorting and Before Transposing'; * Transpose data so salary & batavg are vars; PROC TRANSPOSE DATA = baseball OUT = flipped; BY Team Player; ID Type; VAR Entry; PROC PRINT DATA = flipped; TITLE 'Baseball Data After Transposing'; RUN;

BY variables are included in the new dataset, not transposed. There will be one obs. for each BY level per variable transposed. ID variable’s values become the names of the variables in the newly transposed dataset. The ID variable’s values must be unique within the BY-values. VAR statement names the variables whose values are going to be transposed. SAS creates a new variable (_NAME_) whose value(s) is the name of the VAR variable(s). SEE THE PREVIOUS EXAMPLE AND THE GRAPHIC ON THE TOP OF P.194

There are several variables that SAS creates automatically when you create a new dataset, but because they are temporary, you never see them. A short list is given on page 196: _N_ = the number of times SAS has looped through the DATA step _ERROR_ = 0 or 1 depending upon whether there is a data error for that particular observation. FIRST.variable and LAST.variable are created when you use a BY statement in the DATA step. FIRST.variable has the value 1 when SAS is processing the first occurrence of a new value of the BY variable and 0 otherwise. The LAST.variable is similar - it has the value 1 when SAS is processing the last occurrence of a value of the BY variable and 0 otherwise. See the example program on pages …

Here’s the data (entry #, age group, finishing time). We want to create a new variable whose value is the overall place that the person finished. Note that the value of place can be determined from the _N_ variable if the new dataset is being created from a dataset sorted by finishing time. The second part of the program uses the FIRST.agegroup automatic variable to pick the top finisher in each age category. 54 youth adult adult senior senior youth adult youth adult senior adult youth 38.6

DATA walkers; *INFILE fill in here; INPUT Entry AgeGroup $ Time /*note >1 obs per line*/ PROC SORT DATA = walkers; BY Time; * Create a new variable, Place; DATA ordered; SET walkers; Place = _N_; PROC PRINT DATA = ordered; TITLE 'Results of Walk'; PROC SORT DATA = ordered; BY AgeGroup Time; * Keep first observation in each age group; DATA winners; SET ordered; BY AgeGroup; IF FIRST.AgeGroup = 1; PROC PRINT DATA = winners; TITLE 'Winners in Each Age Group'; RUN;