Chapter 6: Modifying and Combining Data Sets

Slides:



Advertisements
Similar presentations
Haas MFE SAS Workshop Lecture 3:
Advertisements

Slide C.1 SAS MathematicalMarketing Appendix C: SAS Software Uses of SAS  CRM  datamining  data warehousing  linear programming  forecasting  econometrics.
Chapter 9: Introducing Macro Variables 1 © Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
I OWA S TATE U NIVERSITY Department of Animal Science Modifying and Combing SAS Data Sets (Chapter in the 6 Little SAS Book) Animal Science 500 Lecture.
I OWA S TATE U NIVERSITY Department of Animal Science Getting Started Using SAS Software Animal Science 500 Lecture No. 2.
Chapter 18: Modifying SAS Data Sets and Tracking Changes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
Lecture 5 Sorting, Printing, and Summarizing Your Data.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
Key Data Management Tasks in Stata
1 Performing Spreadsheet What-If Analysis Applications of Spreadsheets.
Chapter 15: Combining Data Horizontally 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Use the UPDATE statement to: –update a master dataset with new transactions (e.g. a bank account updated regularly with deposits and withdrawals…). Not.
1 Lab 2 and Merging Data (with SQL) HRP223 – 2009 October 19, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
Chapter 16: Using Lookup Tables to Match Data 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Priya Ramaswami Janssen R&D US. Advantages of PROC REPORT -Very powerful -Perform lists, subsets, statistics, computations, formatting within one procedure.
Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 8 Lists and Tuples.
BMTRY 789 Lecture 11: Debugging Readings – Chapter 10 (3 rd Ed) from “The Little SAS Book” Lab Problems – None Homework Due – None Final Project Presentations.
5 1 Data Files CGI/Perl Programming By Diane Zak.
Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently created dataset –all the observations –all the appropriate.
Chapter 17: Formatting Data 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Summer SAS Workshop Lecture 3. Summer SAS Workshop Website
1 Statistical Software Programming. STAT 6360 –Statistical Software Programming Modifying and Combining Datasets For most tasks we need to work with multiple.
An Introduction Katherine Nicholas & Liqiong Fan.
14b. Accessing Data Files in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
“LAG with a WHERE” and other DATA Step Stories Neil Howard A.
LISTS and TUPLES. Topics Sequences Introduction to Lists List Slicing Finding Items in Lists with the in Operator List Methods and Useful Built-in Functions.
17b.Accessing Data: Manipulating Variables in SAS ®
BMTRY 789 Lecture 6: Proc Sort, Random Number Generators, and Do Loops Readings – Chapters 5 & 6 Lab Problem - Brain Teaser Homework Due – HW 2 Homework.
Chapter 23: Selecting Efficient Sorting Strategies 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Use the SET statement to: –create an exact copy of a SAS dataset –modify an existing SAS dataset by creating new variables, subsetting (using a subsetting.
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 16 & 17 By Tasha Chapman, Oregon Health Authority.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 7 & 10 By Tasha Chapman, Oregon Health Authority.
SAS and Other Packages SAS can interact with other packages in a variety of different ways. We will briefly discuss SPSSX (PASW) SUDAAN IML SQL will be.
Session 1 Retrieving Data From a Single Table
Chapter 11 Reading SAS Data
Chapter 5: Enhancing Your Output with ODS
Using ODS Excel Migrating from DDE to ODS
Chapter 2: Getting Data into SAS
Chapter 8: ODS Graphics ODS graphics were not available prior to SAS 9.2 They have been implemented across a wide range of procedures Functionality isn’t.
Chapter 3: Working With Your Data
ECONOMETRICS ii – spring 2018
Chapter 18: Modifying SAS Data Sets and Tracking Changes
Former Chapter 23: Selecting Efficient Sorting Strategies
By Don Henderson PhilaSUG, June 18, 2018
Chapter 1: Introduction to SAS
Instructor: Raul Cruz-Cano
Chapter 4: Sorting, Printing, Summarizing
Lesson 1: Introduction to Trifacta Wrangler
Lesson 1: Introduction to Trifacta Wrangler
Chapter 7: Macros in SAS Macros provide for more flexible programming in SAS Macros make SAS more “object-oriented”, like R Not a strong suit of text ©
Topics Introduction to File Input and Output
SAS Essentials How SAS Thinks
Programming Logic and Design Fourth Edition, Comprehensive
Introduction to Stata Spring 2017.
Introduction to DATA Step Programming SAS Basics II
Lesson 1 – Chapter 1C Trifacta Interface Navigation
How are your SAS Skills? Chapter 1: Accessing Data (Question # 1)
Introduction to DATA Step Programming: SAS Basics II
Topics Sequences Introduction to Lists List Slicing
CSCI N317 Computation for Scientific Applications Unit R
Lab 2 and Merging Data (with SQL)
Producing Descriptive Statistics
Topics Sequences Introduction to Lists List Slicing
Topics Introduction to File Input and Output
Introduction to SAS Essentials Mastering SAS for Data Analytics
Presentation transcript:

Chapter 6: Modifying and Combining Data Sets The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname; .. run; © Fall 2011 John Grego and the University of South Carolina

The SET statement Its main use is to read in a previously created SAS data set (either in WORK or another library) to be modified and saved as a new data set We could stack (not concatenate) multiple data sets by listing several data sets in the SET statement If one data set contains variable(s) not included in the other data set(s), the observations from the other sets will have missing values for those variables in the combined data set

The SET statement If input data sets are sorted by a specific variable, stacking them may not preserve the sorting To preserve the sorting, we can interleave the data sets with a BY statement (but we must sort all data sets first)

Merging Data Sets When observations in two or more data sets are connected by having at least one common variable, it is possible to merge the data sets together DATA combineddataname; MERGE dataset1.. datasetk; BY common_variable;

Merging Data Sets Note: If the data sets have identically named variables (other than the BY variable), then the merged data will contain only the values from the last data set All data sets need to be sorted by the BY variable before they can be merged

Merging Data Sets We can also merge each observation in a smaller data set with several observations from a larger data set (one-to-many match-merge) MERGE without a BY statement saves the input data side-by-side (compare to SET command)

Merging Data Sets The IN option is typically used to track which data set an observation in a combined data set came from Variables in the IN option only exist during that data step, but can be used to create other variables

Merging Summary Statistics and Data Often we want to merge summary statistics (either statistics for entire data set or often for groups within the data set) with the observations themselves) First calculate the summary statistics using PROC MEANS (after sorting if necessary)

Merging Summary Statistics and Data Output the summary statistics to another data set with an OUTPUT statement Give the statistics meaningful names in this output data set Use a MERGE statement to combine the original data with the OUTPUT data from PROC MEANS

Merging Summary Statistics and Data Once summary statistics are merged with original data, we can calculate: centered data observations, standardized data observations, or data expressed as a percentage of BY variable sums This is done by transforming data through functions involving the summary statistics This can be tricky in R too (I have to use match() twice to make it work)

Merging the Grand Total with the Original Data When PROC MEANS is used without a BY statement, you can get the grand total, the grand mean, etc., rather than the BY group variables’ summary statistics

Merging the Grand Total with the Original Data Merging is more difficult because the original data and summary data do not have a common variable We need to trick SAS with the SET command DATA newdataset; IF _N_=1 THEN SET summarydataset; SET olddataset;

Merging the Grand Total with the Original Data Variables read from the summary data set with the first SET command are retained with all observations This is a general trick for merging one (or a few) observations with many, where no common variables are present

Merging the Grand Total with the Original Data The UPDATE statement is similar to MERGE, but is typically used when a data set changes over time—new variables are added, values of variables are changed for old observations, etc (See Pages 192-193)

Data Set Options We have already seen many of these options incorporated into our earlier examples System options are specified in the OPTIONS statement (they affect SAS’s operation, often its formatting)

Data Set Options Statement options affect the running of a step NOPRINT is often used when you use a PROC to create an output data set NOPRINT option in PROC MEANS NOWINDOWS option in PROC REPORT DATA=.. option in any procedure

Data Set Options Data set options affect the reading/writing of the data set We can use these data set options in DATA steps (with statements like DATA, SET, MERGE, UPDATE)

Data Set Options We can use data set options in PROC steps (with DATA=.. option) KEEP = keepvar1..keepvark; DROP=dropvar1..dropvark; RENAME=(oldname=newname); FIRSTOBS=firstrownumber OBS=numberobservations

Creating several data sets with the OUTPUT statement A single DATA step can create several SAS data sets (this is a trick I don’t use nearly enough) The DATA line must give multiple data set names DATA set1 set2 set3;

Creating several data sets with the OUTPUT statement The OUTPUT statement is ofen used with IF-THEN statements or within a DO loop IF..THEN OUTPUT set1; ELSE OUTPUT set2;

Creating several data sets with the OUTPUT statement The OUTPUT statement can also be used to create several observations from one It transforms “wide” data sets into “long” data sets It is often used with repeated-measures data (several values observed for each individual)

Creating several data sets with the OUTPUT statement OUTPUT is also useful for generating function values Used in a DO loop, OUTPUT will tell SAS to create an observation at each iteration of the DO loop

Using PROC TRANSPOSE to Flip Observations and Variables PROC TRANSPOSE DATA=.. OUT=..; (names the new transposed data set) BY..; (identifies variables you don’t want transposed) ID..; (values of this variable will become variable names) VAR..; (the values of thse variables will be transposed—placed as rows for each level of the BY variable)

Using PROC TRANSPOSE to Flip Observations and Variables We must first sort by the BY variable Note: If the ID statement is missing, the newly created variables will have default names col1, col2, etc. PROC TRANSPOSE is handy for converting “wide” data files into “long” data files (or vice versa), especially with longitudinal data

Automatic Variables in SAS During the DATA step, SAS creates temporary “automatic” variables. These are not typically saved as part of the data set, but they can be used in the DATA step

Automatic Variables in SAS _N_ keeps track of the number of times SAS has looped through the DATA step (i.e., the number of observations that have been read) It may be different from “obs #” if data has been “subsetted” The automatic variable _ERROR_ is binary; 1 if observation has an error, 0 otherwise

Automatic Variables in SAS FIRST.groupvariable is 1 for the first observation with a new value for groupvariable; 0 otherwise LAST.groupvariable is 1 for the first observation with a new value for groupvariable; 0 otherwise

Automatic Variables in SAS These commands can be useful for picking out the highest or lowest values for each level of groupvariable Sort by the groupvariable, then use a subsetting IF along with a BY statement to save only the first or last occurrences for each level of groupvariable