Presentation is loading. Please wait.

Presentation is loading. Please wait.

Use the SET statement to: –create an exact copy of a SAS dataset –modify an existing SAS dataset by creating new variables, subsetting (using a subsetting.

Similar presentations


Presentation on theme: "Use the SET statement to: –create an exact copy of a SAS dataset –modify an existing SAS dataset by creating new variables, subsetting (using a subsetting."— Presentation transcript:

1 Use the SET statement to: –create an exact copy of a SAS dataset –modify an existing SAS dataset by creating new variables, subsetting (using a subsetting IF) –concatenate datasets; i.e., stack them one on top of the other. Most useful when the datasets contain the same variables but different observations… –interleave datasets; i.e., keep sorted datasets sorted… use a BY statement with the SET statement… –the general form is DATA new_data_set; SET old_data_set; … other statements as needed…

2 *make permanent SAS dataset from the data here; * time of train, # of cars, total # of riders; *then create new dataset with new variable average number of riders per car; * see page 171; 10:10 6 21 12:15 10 56 15:30 10 25 11:30 8 34 13:15 8 12 10:45 6 13 20:30 6 32 23:15 6 12

3 *CONCATENATION OF TWO DATASETS - NOTE WHAT HAPPENS WHEN THEY HAVE DIFFERENT VARIABLES; DATA southentrance; INPUT Entrance $ PassNumber PartySize Age @@; DATALINES; S 43 3 27 S 44 3 24 S 45 3 2 ; PROC PRINT DATA = southentrance; TITLE 'South Entrance Data'; DATA northentr ance; INPUT Entrance $ PassNumber PartySize Age Lot @@; DATALINES; N 21 5 41 1 N 87 4 33 3 N 65 2 67 1 N 66 2 7 1 ; PROC PRINT DATA = northentrance; TITLE 'North Entrance Data'; DATA both; SET southentrance northentrance; IF Age =. THEN AmountPaid =.; ELSE IF Age < 3 THEN AmountPaid = 0; ELSE IF Age < 65 THEN AmountPaid = 17; ELSE AmountPaid = 12; PROC PRINT DATA = both; TITLE 'Both Entrances'; RUN; QUIT;

4 * INTERLEAVE datasets to keep an ordering in them; * The datasets to be INTERLEAVED must be SORTED by the interleaving variable; DATA S; SET southentrance; PROC SORT DATA=S; BY PassNumber; run; PROC PRINT DATA = S; TITLE 'South Entrance Data'; DATA N; SET northentrance; PROC SORT DATA = N; BY PassNumber; PROC PRINT DATA = N; TITLE 'North Entrance Data'; *now put them together in order of passnumber; DATA interleave; SET N S; BY PassNumber; PROC PRINT DATA = interleave; TITLE 'Both Entrances, By Pass Number'; RUN; QUIT; Now concatenate the two therapy datasets creating a new total number of patients variable first, without sorting; then interleave them, BY month.

5 One-to-one match merge Use the MERGE statement with the BY statement to combine two datasets with a common so-called matching variable (the BY- variable) to uniquely identify each observation in the datasets... DATA new; MERGE old1 old2; BY matching_vars; NOTE: If the two datasets have variables with the same names (besides the BY variables) the variables from the second dataset will overwrite the values of the variables in the first dataset with the same names. Also note that all observations from both datasets are included in the new dataset whether they had a match or not...

6 Try to merge the demographic information about the patients with the dataset on the visits they made to the doctor’s office… (ID, age, sex, date of birth): A001 31 M 05/22/75 A002 43 M 06/15/63 A003 34 F 08/17/72 A004. F 03/27/69 A005 54 F 02/24/52 A007 49 M 11/01/57 Next is the “visits” data (ID, visit#, sysBP, diasBP, Weight, date of visit):

7 A001 1 140 85 195 01/05/01 A001 2 138 90 198 10/13/01 A001 3 145 95 200 07/14/02 A002 1 121 75 168 04/14/01 A003 1 118 68 125 08/12/01 A003 2 112 65 123 11/15/01 A004 1 143 86 204 03/30/01 A005 1 132 76 174 02/27/01 A005 2 132 78 175 08/02/01 A005 3 134 78 176 12/04/01 A008 1 126 80 182 05/22/01 What’s the merging variable? Do you need to sort by that variable before merging? Can you merge them in descending order?? Try the next one too - see section 6.4 on p. 176-177.

8 dm log 'clear'; dm output 'clear'; options ls=80; DATA sales; INPUT CodeNum $ 1-4 PiecesSold 6-7; DATALINES; C865 15 K086 9 A536 21 S163 34 K014 1 A206 12 B713 29 DATA descriptions; INPUT CodeNum $ 1-4 Name $ 6-14 Description $ 15-60; DATALINES; A206 Mokka Coffee buttercream in dark chocolate A536 Walnoot Walnut halves in bed of dark chocolate B713 Frambozen Raspberry marzipan covered in milk chocolate C865 Vanille Vanilla-flavored rolled in ground hazelnuts K014 Kroon Milk chocolate with a mint cream center K086 Koning Hazelnut paste in dark chocolate M315 Pyramide White with dark chocolate trimming S163 Orbais Chocolate cream in dark chocolate ;

9 PROC SORT DATA = sales; BY CodeNum; PROC SORT DATA= descriptions; BY CodeNum; * Merge data sets by CodeNum; DATA chocolates; MERGE sales descriptions; BY CodeNum; PROC PRINT DATA = chocolates; TITLE "Today's Chocolate Sales"; RUN; quit; NOTE the SORTing of the two datasets by the merging variable... all observations from both datasets are included in the merged dataset, whether they had a match or not.

10 One-to-many match merge Same MERGE statement is used as in one-to-one... but the result will be different if you have many observations in one dataset to be matched with a single observation in the other... A good example of this is merging summary statistics with original data from which the statistics were computed. Go over sections 6.5-6.6 to see how this is done... Also note the output (p. 181) from the PROC PRINT when both a BY and an ID statement are used together. Make the two “shoes” programs work (in section 6.5 and 6.6).... use the code below and get the data from the book datasets and put them into files that the INFILE statements can read…

11 dm log 'clear'; dm output 'clear'; options ls=80; DATA shoes; INFILE ''; INPUT Style $ 1-15 ExerciseType $ sales; PROC SORT DATA = shoes; BY ExerciseType; DATA discount; INFILE ''; INPUT ExerciseType $ Adjustment; * Perform many-to-one match merge; DATA prices; MERGE shoes discount; BY ExerciseType; NewPrice=ROUND(RegularPrice-(RegularPrice*Adjustment),.01); PROC PRINT DATA = prices; TITLE 'Price List for May'; RUN; quit;

12 Now consider an important merging application: putting summary statistics back in the same dataset with the data used to compute the summaries. Note this is a many-to- one merge, since there are only a few statistics computed (one for each by-group value) and they are to be merged back with the many individual values used to compute those statistics… As an example, compute the sum of the sales variable for each of the three classes of shoes in the previous example, and then merge that total back into the original dataset. After this you can compute the percentage of the total that each group represents, for example… (get the sales data from page 180)

13 PROC MEANS can be used to compute a grand total for a variable, but then it can't be merged back into the original dataset since you don't have a common BY variable in the two datasets… Thus we must use a different technique (see section 6.7) … it turns out that the SET statement with an IF - THEN will do the job: DATA new; if _n_=1 then SET summary_data_set; SET original_data_set; The original dataset has many obs., while the summary dataset has only one. SAS reads this one obs. with the SET statement, but only for the first obs (i.e., when _n_ = 1)… this works because SET automatically RETAINS the observations from the first read - go over the example given in section 6.7: Combining a Grand Total with the Original Data

14 For Monday: In the “padgett” data, get the mean and standard deviation of plant height for the ”pizza hut" and ”shell island" marshes. Merge those statistics back into the original dataset and then calculate a z-score for each plant in the dataset. Recall that Z = (X - mean)/(s.d.), where X is the original height. There are two means and two standard deviations, one for each of the two marshes ”pizza hut" and ”shell island” and so you want to be sure that the correct mean and s.d. gets used with plants from the respective marshes…


Download ppt "Use the SET statement to: –create an exact copy of a SAS dataset –modify an existing SAS dataset by creating new variables, subsetting (using a subsetting."

Similar presentations


Ads by Google