Chapter 14: Combining Data Vertically 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.

Slides:



Advertisements
Similar presentations
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Advertisements

SAS Programming:File Merging and Manipulation. Reading External Files (review) data barf; * create the dataset BARF; infile ’s:\mysas\Table7.1'; * open.
Examples from SAS Functions by Example Ron Cody
Chapter 9: Introducing Macro Variables 1 © Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 17 Read Raw Data in Fixed Format using Formatted Input Objectives Distinguish between standard and nonstandard numeric data Read standard fixed-field.
Chapter 11: Creating and Using Macro Programs 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Performing Queries Using PROC SQL Chapter 1 1 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 13: Creating Samples and Indexes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
VBA Modules, Functions, Variables, and Constants
The Class String. String Constants and Variables  There is no primitive type for strings in Java.  There is a class called String that can be used to.
Introduction to Array The fundamental unit of data in any MATLAB program is the array. 1. An array is a collection of data values organized into rows and.
Chapter 18: Modifying SAS Data Sets and Tracking Changes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
Creating SAS® Data Sets
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
Fortran- Subprograms Chapters 6, 7 in your Fortran book.
Chapter 5 new The Do…Loop Statement
Chapter 10:Processing Macro Variables at Execution Time 1 STAT 541 © Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 14: Generating Data with Do Loops OBJECTIVES Understand iterative DO loops. Construct a DO loop to perform repetitive calculations Use DO loops.
Fortran 1- Basics Chapters 1-2 in your Fortran book.
Chapter 21 Reading Hierarchical Files Reading Hierarchical Raw Data Files.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
Introduction to Computational Linguistics Programming I.
08/10/ Iteration Loops For … To … Next. 208/10/2015 Learning Objectives Define a program loop. State when a loop will end. State when the For.
©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina Chapter 17 supplement: Review of Formatting Data STAT 541.
CNG 140 C Programming Lecture Notes 2 Processing and Interactive Input Spring 2007.
A First Book of ANSI C Fourth Edition Chapter 3 Processing and Interactive Input.
LECTURE 1 INTRODUCTION TO PL/SQL Tasneem Ghnaimat.
Chapter 16: Using Lookup Tables to Match Data 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Revision Lecture Mauro Jaskelioff. AWK Program Structure AWK programs consists of patterns and procedures Pattern_1 { Procedure_1} Pattern_2 { Procedure_2}
Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
5 1 Data Files CGI/Perl Programming By Diane Zak.
Artificial Intelligence Lecture No. 26 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Chapter 17: Formatting Data 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
1 Chapter 3 – Examples The examples from chapter 3, combining the data types, variables, expressions, assignments, functions and methods with Windows controls.
Chapter 19: Introduction to Efficient SAS Programming 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.
Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,
Chapter 4: Combining Tables Vertically using PROC SQL 1 © Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 17 Supplement: Alternatives to IF-THEN/ELSE Processing STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South.
Chapter 23: Selecting Efficient Sorting Strategies 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 21: Controlling Data Storage Space 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 4: Variables, Constants, and Arithmetic Operators Introduction to Programming with C++ Fourth Edition.
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
Writing Basic SQL SELECT Statements Lecture
OPERATORS IN C CHAPTER 3. Expressions can be built up from literals, variables and operators. The operators define how the variables and literals in the.
Basic SAS Functions in Version 8.2 Kim Michalski Office of the Actuary Rick Andrews Office of Research, Development, and Information.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
© 2010 Lawrenceville Press Slide 1 Chapter 5 The Do…Loop Statement  Loop structure that executes a set of statements as long as a condition is true. 
26/06/ Iteration Loops For … To … Next. 226/06/2016 Learning Objectives Define a program loop. State when a loop will end. State when the For.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
MARK CARPENTER, Ph.D. Introduction to SAS Programming and Applications
Chapter 2: Getting Data into SAS
Chapter 13: Creating Samples and Indexes
Chapter 18: Modifying SAS Data Sets and Tracking Changes
Former Chapter 23: Selecting Efficient Sorting Strategies
Variables and Arithmetic Operations
Chapter 22 Reading Hierarchical Files
Topics Introduction to File Input and Output
Chapter 24 (4th ed.): Creating Functions
WEB PROGRAMMING JavaScript.
A First Book of ANSI C Fourth Edition
Introduction to DATA Step Programming: SAS Basics II
Writing Basic SQL SELECT Statements
Chapter 24: Querying Data Efficiently
Functions continued.
Topics Introduction to File Input and Output
Presentation transcript:

Chapter 14: Combining Data Vertically 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina

2 Combining Data Vertically Process of concatenating or interleaving data Process of concatenating or interleaving data IDName 9Abe 10Burt IDName 7Chuck 8David IDX 9Abe 10Burt 7Chuck 8David

3 Examples of Concatenating Data Vertically Create a SAS data set from multiple raw data files using a FILENAME statement Create a SAS data set from multiple raw data files using a FILENAME statement Create a SAS data set from multiple raw data files using an INFILE statement with the FILEVAR= option Create a SAS data set from multiple raw data files using an INFILE statement with the FILEVAR= option Append SAS data sets using the APPEND procedure Append SAS data sets using the APPEND procedure

4 Using a FILENAME Statement to Concatenate Raw Data Files Assign a single fileref to the raw data files that need to be combined Assign a single fileref to the raw data files that need to be combined All of the file specifications must be enclosed in one set of parentheses All of the file specifications must be enclosed in one set of parentheses FILENAME fileref (’external-file1’ ’external-file2’ … ’external-filen’); … ’external-filen’);

5 Example of Using a FILENAME Statement to Concatenate Raw Data Files filename qtr1 (‘c:\...jan.txt’ ‘c:\...feb.txt’ ‘c:…mar.txt’); data all; infile qtr1; infile qtr1; input x y z; run;

6 Using an INFILE Statement to Concatenate Raw Data Files Concatenation process can be more flexible by using an INFILE statement with the FILEVAR= option Concatenation process can be more flexible by using an INFILE statement with the FILEVAR= option FILEVAR= option can dynamically change the currently opened input file to a new input file FILEVAR= option can dynamically change the currently opened input file to a new input file

7 Using an INFILE Statement to Concatenate Raw Data Files INFILE file-specification FILEVAR=variable; file-specification is a placeholder (not an actual filename or fileref assigned previously to a file) file-specification is a placeholder (not an actual filename or fileref assigned previously to a file) variable contains a character string that is a physical filename for an input file to be opened variable contains a character string that is a physical filename for an input file to be opened FILEVAR=variable causes the INFILE statement to close the current input file and open a new input file whenever the value of variable changes FILEVAR=variable causes the INFILE statement to close the current input file and open a new input file whenever the value of variable changes

8 Assigning the Names of the Files to Be Read data combined; do i = 1 to 3; fname= ’c:\temp\year’ || put(i,1.) || ’.dat’; fname= ’c:\temp\year’ || put(i,1.) || ’.dat’; infile datafiles filevar=fname; infile datafiles filevar=fname; input x y z; input x y z;end; … *program incomplete; ifname 1c:\temp\year1.dat 2c:\temp\year2.dat 3c:\temp\year3.dat

9 Example of Using an INFILE Statement to Concatenate Raw Data Files data combined; do i = 8, 9, 10; fname= ’c:\temp\year’ || put(i,2.) || ’.dat’; fname= ’c:\temp\year’ || put(i,2.) || ’.dat’; infile datafiles filevar=fname; infile datafiles filevar=fname; input x y z; input x y z;end; … *program incomplete; Note: There is a space before 8 and 9 in fname. ifname 1c:\temp\year 8.dat 2c:\temp\year 9.dat 3c:\temp\year10.dat

10 Example of Using an INFILE Statement and the COMPRESS Function to Concatenate Files data combined; do i = 8, 9, 10; fname= compress(’c:\temp\year’ || put(i,2.) || ’.dat’); fname= compress(’c:\temp\year’ || put(i,2.) || ’.dat’); infile datafiles filevar=fname; infile datafiles filevar=fname; input x y z; input x y z;end; … *program incomplete; Note: The COMPRESS function, as shown, removes the space before 8 and 9 in fname. ifname 1c:\temp\year8.dat 2c:\temp\year9.dat 3c:\temp\year10.dat

11 COMPRESS Function (with One or Two Arguments) Eliminates the specified characters in a string Eliminates the specified characters in a string COMPRESS (source, ); 1 st argument: source specifies a string 1 st argument: source specifies a string 2 nd argument: optional characters-to-remove specifies the character or characters that SAS removes from source 2 nd argument: optional characters-to-remove specifies the character or characters that SAS removes from source When the second argument is not used, COMPRESS(source) removes blanks from the source When the second argument is not used, COMPRESS(source) removes blanks from the source Note: The function has an optional third argument called modifiers. Refer to the SAS manuals for complete syntax. Note: The function has an optional third argument called modifiers. Refer to the SAS manuals for complete syntax.

12 Preventing an Infinite Loop of the DATA Step data combined; do i = 1 to 3; fname= ’c:\temp\year’ || put(i,1.) || ’.dat’; fname= ’c:\temp\year’ || put(i,1.) || ’.dat’; infile datafiles filevar=fname; infile datafiles filevar=fname; input x y z; input x y z; output; output;end;stop; … *program incomplete; ifname 1c:\temp\year1.dat 2c:\temp\year2.dat 3c:\temp\year3.dat

13 Using the END= Option to Complete the Programming Statements data combined; do i = 1 to 3; fname= ’c:\temp\year’ || put(i,1.) || ’.dat’; fname= ’c:\temp\year’ || put(i,1.) || ’.dat’; do until (lastobs); do until (lastobs); infile datafiles filevar=fname end=lastobs; infile datafiles filevar=fname end=lastobs; input x y z; input x y z; output; output; end; end;end;stop;run; ifname 1c:\temp\year1.dat 2c:\temp\year2.dat 3c:\temp\year3.dat

14 Using the END= Option INFILE file-specification END=variable; variable names a variable variable names a variable The variable is set to 0 when the current input data record is not the last record in the input file The variable is set to 0 when the current input data record is not the last record in the input file The variable is set to 1 when the current input data record is the last record in the input file The variable is set to 1 when the current input data record is the last record in the input file

15 Using the END= Option to Complete Programming Statements The DATA step normally stops when SAS reads past the last record in a raw data file. The DATA step normally stops when SAS reads past the last record in a raw data file. In the concatenation example, SAS needs to read till the last record in the first two data files but not past the last record. Doing so will cause the DATA step to stop processing. In the concatenation example, SAS needs to read till the last record in the first two data files but not past the last record. Doing so will cause the DATA step to stop processing. The INFILE statement’s END= option determines when the last record is being read. The INFILE statement’s END= option determines when the last record is being read. The END= variable is not written to the data set and its value can be tested within the DATA step. The END= variable is not written to the data set and its value can be tested within the DATA step.

16 Using Date Functions to Automate DO TODAY() returns the current date from the system clock as a SAS date value TODAY() returns the current date from the system clock as a SAS date value MONTH(TODAY()) returns the month (1 to 12) from TODAY() MONTH(TODAY()) returns the month (1 to 12) from TODAY() MONTH(TODAY()) – 1 is the month prior to MONTH(TODAY()) (can cause a problem when MONTH(TODAY()) is 1) MONTH(TODAY()) – 1 is the month prior to MONTH(TODAY()) (can cause a problem when MONTH(TODAY()) is 1) MONTH(TODAY()) – 2 is two months prior to MONTH(TODAY()) (can cause a problem when MONTH(TODAY()) is 1 or 2) MONTH(TODAY()) – 2 is two months prior to MONTH(TODAY()) (can cause a problem when MONTH(TODAY()) is 1 or 2)

17 INTNX Function The INTNX function increments a date, time, or datetime value by a given time interval, and returns a date, time, or datetime value. The INTNX function increments a date, time, or datetime value by a given time interval, and returns a date, time, or datetime value. INTNX(interval, start-from, increment ) interval specifies a character constant, variable, or expression that contains a time interval (e.g., month) and can appear in upper or lower case. The interval must match the type of value used for start-from and increment. (The values of interval are listed in the “Intervals Used with Date and Time Functions” table in SAS Language Reference: Concepts.) interval specifies a character constant, variable, or expression that contains a time interval (e.g., month) and can appear in upper or lower case. The interval must match the type of value used for start-from and increment. (The values of interval are listed in the “Intervals Used with Date and Time Functions” table in SAS Language Reference: Concepts.)

18 INTNX Function (continued) INTNX(interval, start-from, increment ) optional multiple is an optional multiplier that sets the interval equal to a multiple of the period of the basic interval type (e.g., the interval YEAR2 consists of two- year, or biennial, periods) optional multiple is an optional multiplier that sets the interval equal to a multiple of the period of the basic interval type (e.g., the interval YEAR2 consists of two- year, or biennial, periods) optional shift-index is a shift index that shifts the interval to start at a specified subperiod starting point (e.g., YEAR.7 specifies yearly periods shifted to start on the first of July of each calendar year and to end in June of the following year) optional shift-index is a shift index that shifts the interval to start at a specified subperiod starting point (e.g., YEAR.7 specifies yearly periods shifted to start on the first of July of each calendar year and to end in June of the following year)

19 INTNX Function (continued) INTNX(interval, start-from, increment ) start-from specifies a SAS expression that represents a SAS date, time, or datetime value that identifies a starting point. start-from specifies a SAS expression that represents a SAS date, time, or datetime value that identifies a starting point. increment specifies a negative, positive, or zero integer that represents the number of date, time, or datetime intervals. Increment is the number of intervals to shift the value of start-from. increment specifies a negative, positive, or zero integer that represents the number of date, time, or datetime intervals. Increment is the number of intervals to shift the value of start-from. Optional ’alignment’ controls the position of SAS dates within the interval and must be enclosed in quotation marks. Optional ’alignment’ controls the position of SAS dates within the interval and must be enclosed in quotation marks. See SAS manuals for detailed explanation. See SAS manuals for detailed explanation.

20 Appending SAS Data Sets with PROC APPEND PROC APPEND BASE=SAS-data-set DATA=SAS-data-set; The BASE= data set is the data set to which observations are to be added to. The BASE= data set is the data set to which observations are to be added to. The DATA= data set contains the records that will be appended to the BASE= data set. The DATA= data set contains the records that will be appended to the BASE= data set. The BASE= data set may contain more variables than the DATA= data set. When that happens, missing values will be assigned to the additional variables for the observations in the DATA= data set. A warning message will also appear in the SAS log. The BASE= data set may contain more variables than the DATA= data set. When that happens, missing values will be assigned to the additional variables for the observations in the DATA= data set. A warning message will also appear in the SAS log.

21 Appending SAS Data Sets with the SET Statement Example: data first; set second third; **overwrites second and third on existing data set first; If there are several data sets listed, the result will be the concatenation of all the data sets listed. If there are several data sets listed, the result will be the concatenation of all the data sets listed. The SET statement reads all observations in all the listed input data sets in order to concatenate them. The more efficient PROC APPEND reads only the data in the DATA= data set. The SET statement reads all observations in all the listed input data sets in order to concatenate them. The more efficient PROC APPEND reads only the data in the DATA= data set.

22 Appending SAS Data Sets with PROC APPEND and the FORCE Option PROC APPEND BASE=SAS-data-set DATA=SAS-data-set ; When the DATA= data set contains more variables than the BASE= data set, use the FORCE option to concatenate. When the DATA= data set contains more variables than the BASE= data set, use the FORCE option to concatenate. The structure of the BASE= data set will be used for the appended data set, which could lead to loss of data due to dropping of variables. Variables in the DATA= data set but not in the BASE= data set will be dropped. The structure of the BASE= data set will be used for the appended data set, which could lead to loss of data due to dropping of variables. Variables in the DATA= data set but not in the BASE= data set will be dropped.

23 Appending SAS Data Sets with PROC APPEND and the FORCE Option (continued) The structure of the BASE= data set will be used for the appended data set, which could lead to loss of data due to truncation. The structure of the BASE= data set will be used for the appended data set, which could lead to loss of data due to truncation. VARIABLES WITH DIFFERENT LENGTHS: If the same variable is on both data sets but has a shorter length in the BASE= data set, then the variable values in the DATA= data set will be truncated in the appended data set. The variable label from the BASE= data set will be retained instead of the one from the DATA= data set. VARIABLES WITH DIFFERENT LENGTHS: If the same variable is on both data sets but has a shorter length in the BASE= data set, then the variable values in the DATA= data set will be truncated in the appended data set. The variable label from the BASE= data set will be retained instead of the one from the DATA= data set.

24 Appending SAS Data Sets with PROC APPEND and the FORCE Option (continued) The structure of the BASE= data set will be used for the appended data set. The structure of the BASE= data set will be used for the appended data set. VARIABLES WITH DIFFERENT TYPES: If a variable is on both the BASE= and DATA= data set but has different types, a type mismatch will require the use of the FORCE option to include the said variable in the appended data set. However, the values for the said variable in the DATA= data set records appended will be set to missing. The variable values in the BASE= data set will be intact. VARIABLES WITH DIFFERENT TYPES: If a variable is on both the BASE= and DATA= data set but has different types, a type mismatch will require the use of the FORCE option to include the said variable in the appended data set. However, the values for the said variable in the DATA= data set records appended will be set to missing. The variable values in the BASE= data set will be intact.

25 Append Raw Data Files Using a SAS Data Set with Names of Files to be Appended data combined; set sasuser.rawdata; infile in filevar=filename end=lastfile; infile in filevar=filename end=lastfile; do while(lastfile=0); do while(lastfile=0); input x y z; input x y z; output; output; end; end;run; obsfilename 1c:\temp\year1.dat 2c:\temp\year2.dat 3c:\temp\year3.dat

26 Append Raw Data Files Using an External File with Names of Files to be Appended data combined; infile ’rawdatafiles.dat’; input filename $20.; infile in filevar=filename end=lastfile; infile in filevar=filename end=lastfile; do while(lastfile=0); do while(lastfile=0); input x y z; input x y z; output; output; end; end;run; obsfilename 1c:\temp\year1.dat 2c:\temp\year2.dat 3c:\temp\year3.dat