Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 15: Combining Data Horizontally 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.

Similar presentations


Presentation on theme: "Chapter 15: Combining Data Horizontally 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina."— Presentation transcript:

1 Chapter 15: Combining Data Horizontally 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina

2 2 Terminology Table Lookup Table Lookup Base table Base table Lookup tables Lookup tables Lookup values Lookup values

3 3 Working with Lookup Values Outside of SAS Data Sets Lookup tables are not necessarily SAS data sets. Lookup tables are not necessarily SAS data sets. The following techniques can be used to hard-code lookup values into programs: The following techniques can be used to hard-code lookup values into programs: –IF-THEN/ELSE statements –SAS arrays –User-defined SAS formats

4 4 IF-THEN/ELSE Statement Advantages: easy to use and to understand, versatile Advantages: easy to use and to understand, versatile Disadvantages: Code requires maintenance. Lookup values might change. Number of statements might be very large and create inefficiencies both in execution and maintenance. Disadvantages: Code requires maintenance. Lookup values might change. Number of statements might be very large and create inefficiencies both in execution and maintenance.

5 5 IF-THEN/ELSE Statement Example data new; set old; set old; if id=1 then x=4; if id=1 then x=4; else if id=2 then x=5; else if id=2 then x=5; else if id=3 then x=6; else if id=3 then x=6; IDX 14 25 36

6 6 SAS Arrays Lookup values can be hard-coded into the program or read into the array from a data set Lookup values can be hard-coded into the program or read into the array from a data set Array elements are referenced positionally Array elements are referenced positionally Potential disadvantages: system memory requirements, only returns a single value per lookup operation, dimensions of the array must be known at compile time Potential disadvantages: system memory requirements, only returns a single value per lookup operation, dimensions of the array must be known at compile time

7 7 Scoring Example with 1-Dimensional SAS Array Item 1Item 2Item 3 Response Variabler1r2r3 Answer KeyBDC data one; input name $4. +1 (r1-r3) ($1.); array answer {3} $1 _temporary_ ('B','D','C'); array response r1-r3; score=0; do _i_=1 to 3; if answer{_i_}=response{_i_} then score+1; end;

8 8 DATA Step match-merge Familiar technique from STAT 540 Familiar technique from STAT 540 Typically introduced as Typically introduced as –a one-to-one Outer Join –A many-to-one match merge of summary data Not appropriate for a many-to-many match Not appropriate for a many-to-many match

9 9 DATA Step match-merge BY variables should match, but matching can be done during execution. BY variables should match, but matching can be done during execution. proc sort data=a; by student; proc sort data=b; by name; data gradebook; merge a(in=in_a) b(in=in_b rename=(name=student)); by student; if in_a and in_b; run;

10 10 DATA Step match-merge vs. PROC SQL Match-merge Match-merge –Unlimited data sets –More complex data management PROC SQL PROC SQL –No pre-sorting –No common variables

11 11 DATA Step match-merge vs. PROC SQL Match-merge Match-merge –Portable Data Vector (PDV) used to hold information while DATA step executes –Outputs first observation from each data set for each level of the BY group variable PROC SQL PROC SQL –Creates Cartesian product –Eliminates ineligible cases in WHERE clause

12 12 DATA Step match-merge The DATA step can be used for many-to- one match merges The DATA step can be used for many-to- one match merges –By exporting calculation of summary measures –By computing summary measures within the DATA step itself –STAT 540 example

13 13 DATA Step match-merge The DATA step tends to over-match on many-to-many match merges The DATA step tends to over-match on many-to-many match merges The text introduces a fix, but it’s cumbersome The text introduces a fix, but it’s cumbersome

14 14 Using an Index to Combine Data Useful when Useful when –One of the data sets is much larger than the other –The smaller data set contains all the cases of interest (e.g., a left/right join) Appropriate for one-to-one matches only Appropriate for one-to-one matches only

15 15 Using an Index to Combine Data Example Example –SAS uses the noobs index in Fall08 to find lookup values in Fall10ms to match values of the index. –The smaller data set has to be included first so that lookup values are available in the PDV for use by the index. –_IORC_ (Input/Output Return Code) indicates whether a match for each record in the smaller data set was found.

16 16 Using an Index to Combine Data Example Example –Full Fall08 data set –Fall10 Marine Science majors proc sql; create index noobs on fall08(noobs); quit; data msretro; set fall10ms; set fall08 key=noobs; run;

17 17 Using a Transactional Data Set The Base data set can be updated from a lookup table The Base data set can be updated from a lookup table Both data sets have to be sorted Both data sets have to be sorted The lookup table can have missing values for variables that are unchanged The lookup table can have missing values for variables that are unchanged Be careful about “mixed” information (see example) Be careful about “mixed” information (see example)


Download ppt "Chapter 15: Combining Data Horizontally 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina."

Similar presentations


Ads by Google