Download presentation

Presentation is loading. Please wait.

Published byTrent Gossage Modified over 2 years ago

1
SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor

2
2 Program Objectives campaign 11,346 obs disposition 97 obs program 446 obs disposition category 6 obs dec_offers 6,145,029 obs

3
3 Matching Datasets What is the best way to get the fields from the 4 small datasets into the main population of 6.1 million observations? PROC merge HASH

4
4 PROC merge Both datasets need to be sorted prior to the merge Merge datasets Sort again Merge again

5
5 proc sort data=oms_prod.disposition out=disp ; by disposition_id ;run ; proc sort data=dec_offers ; by disposition_id ;run ; data dec_match ; merge dec_offers (in=a) disp(keep=disposition_id description touched disposition_category_code in=b) ; by disposition_id ; if a and b ; run ; Sort / Merge Code

6
6 NOTE: Sorting was performed by the data source. NOTE: There were 97 observations read from the data set OMS_PROD.DISPOSITION. NOTE: The data set WORK.DISP has 97 observations and 10 variables. NOTE: Compressing data set WORK.DISP decreased size by 0.00 percent. Compressed is 2 pages; un-compressed would require 2 pages. NOTE: PROCEDURE SORT used (Total process time): real time 0.23 seconds cpu time 0.00 seconds NOTE: There were 6145029 observations read from the data set WORK.DEC_OFFERS. NOTE: The data set WORK.DEC_OFFERS has 6145029 observations and 4 variables. NOTE: Compressing data set WORK.DEC_OFFERS increased size by 58.15 percent. Compressed is 38412 pages; un-compressed would require 24289 pages. NOTE: PROCEDURE SORT used (Total process time): real time 28.44 seconds cpu time 39.81 seconds NOTE: There were 6145029 observations read from the data set WORK.DEC_OFFERS. NOTE: There were 97 observations read from the data set WORK.DISP. NOTE: The data set WORK.DEC_MATCH has 6145029 observations and 7 variables. NOTE: Compressing data set WORK.DEC_MATCH decreased size by 74.94 percent. Compressed is 27499 pages; un-compressed would require 109733 pages. NOTE: DATA statement used (Total process time): real time 42.81 seconds cpu time 42.58 seconds Log For Sort / Merge

7
7 proc sort data=oms_prod.campaign out=camp ; by campaign_id ;run ; proc sort data=dec_match ; by campaign_id ; run ; data dec_match2 ; merge dec_match (in=a) camp(keep=campaign_id program_id campaign_code description in=b) ; by campaign_id ; if a and b ; run ; Sort / Merge Code Continued

8
8 NOTE: Sorting was performed by the data source. NOTE: There were 11346 observations read from the data set OMS_PROD.CAMPAIGN. NOTE: The data set WORK.CAMP has 11346 observations and 19 variables. NOTE: Compressing data set WORK.CAMP decreased size by 43.48 percent. Compressed is 143 pages; un-compressed would require 253 pages. NOTE: PROCEDURE SORT used (Total process time): real time 0.67 seconds cpu time 0.43 seconds NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH. NOTE: The data set WORK.DEC_MATCH has 6145029 observations and 7 variables. NOTE: Compressing data set WORK.DEC_MATCH decreased size by 74.94 percent. Compressed is 27496 pages; un-compressed would require 109733 pages. NOTE: PROCEDURE SORT used (Total process time): real time 1:09.07 cpu time 1:59.52 NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH. NOTE: There were 11346 observations read from the data set WORK.CAMP. NOTE: The data set WORK.DEC_MATCH2 has 6145029 observations and 9 variables. NOTE: Compressing data set WORK.DEC_MATCH2 decreased size by 71.53 percent. Compressed is 34306 pages; un-compressed would require 120491 pages. NOTE: DATA statement used (Total process time): real time 51.29 seconds cpu time 51.05 seconds Log For Sort / Merge

9
9 Sort / Merge Code Continued proc sort data=oms_prod.program out=pgm ; by program_id ; run ; proc sort data=dec_match2 ; by program_id ; run ; data dec_match3 ; merge dec_match (in=a) pgm(keep=program_id name in=b) ; by program_id ; if a and b ; run ;

10
10 Log For Sort / Merge NOTE: Sorting was performed by the data source. NOTE: There were 446 observations read from the data set OMS_PROD.PROGRAM. NOTE: The data set WORK.PGM has 446 observations and 16 variables. NOTE: Compressing data set WORK.PGM decreased size by 40.00 percent. Compressed is 6 pages; un-compressed would require 10 pages. NOTE: PROCEDURE SORT used (Total process time): real time 0.25 seconds cpu time 0.03 seconds NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH2. NOTE: The data set WORK.DEC_MATCH2 has 6145029 observations and 9 variables. NOTE: Compressing data set WORK.DEC_MATCH2 decreased size by 71.53 percent. Compressed is 34306 pages; un-compressed would require 120491 pages. NOTE: PROCEDURE SORT used (Total process time): real time 1:17.37 cpu time 2:02.37 NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH2. NOTE: There were 446 observations read from the data set WORK.PGM. NOTE: The data set WORK.DEC_MATCH3 has 6145029 observations and 10 variables. NOTE: Compressing data set WORK.DEC_MATCH3 decreased size by 72.06 percent. Compressed is 26016 pages; un-compressed would require 93107 pages. NOTE: DATA statement used (Total process time): real time 59.00 seconds cpu time 58.97 seconds

11
11 Sort / Merge Code proc sort data=oms_prod.disposition_category out=disp_cat(rename=(description=disp_desc)) ; by disposition_category_code ; run ; proc sort data=dec_match3 ; by disposition_category_code ; run ; data dec_match4 ; merge dec_match3 (in=a) disp_cat(keep=disposition_category_code disp_desc in=b) ; by disposition_category_code ; if a and b ; run ;

12
12 Log For Sort / Merge NOTE: Sorting was performed by the data source. NOTE: There were 6 observations read from the data set OMS_PROD.DISPOSITION_CATEGORY. NOTE: The data set WORK.DISP_CAT has 6 observations and 2 variables. NOTE: Compressing data set WORK.DISP_CAT increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: PROCEDURE SORT used (Total process time): real time 0.03 seconds cpu time 0.02 seconds NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH3. NOTE: The data set WORK.DEC_MATCH3 has 6145029 observations and 10 variables. NOTE: Compressing data set WORK.DEC_MATCH3 decreased size by 72.06 percent. Compressed is 26017 pages; un-compressed would require 93107 pages. NOTE: PROCEDURE SORT used (Total process time): real time 1:26.08 cpu time 2:14.65 NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH3. NOTE: There were 6 observations read from the data set WORK.DISP_CAT. NOTE: The data set WORK.DEC_MATCH4 has 6145029 observations and 11 variables. NOTE: Compressing data set WORK.DEC_MATCH4 decreased size by 71.05 percent. Compressed is 31209 pages; un-compressed would require 107808 pages. NOTE: DATA statement used (Total process time): real time 1:03.35 cpu time 1:03.28

13
13 HASH code data dec_match ; if _n_ = 1 then do ; IF 0 then set oms_prod.disposition(keep=disposition_id description touched disposition_category_code ) ; declare hash ht(dataset: "oms_prod.disposition") ; ht.defineKEY("disposition_id ") ; ht.defineData("disposition_id ", "description “ “touched","disposition_category_code") ; ht.defineDone() ; end ; set dec_offers ; if ht.find()=0 ; run ; No sorting !!

14
14 HASH Log NOTE: There were 97 observations read from the data set OMS_PROD.DISPOSITION. NOTE: There were 6145029 observations read from the data set WORK.DEC_OFFERS. NOTE: The data set WORK.DEC_MATCH has 6145029 observations and 7 variables. NOTE: Compressing data set WORK.DEC_MATCH decreased size by 74.94 percent. Compressed is 27499 pages; un-compressed would require 109733 pages. NOTE: DATA statement used (Total process time): real time 48.38 seconds cpu time 48.14 seconds

15
15 HASH Code data dec_match2 ; if _n_ = 1 then do ; IF 0 then set oms_prod.campaign(keep=campaign_id program_id campaign_code description ) ; declare hash ht(dataset: "oms_prod.campaign") ; ht.defineKEY("campaign_id") ; ht.defineData("campaign_id", "program_id", "campaign_code", "description") ; ht.defineDone() ; end ; set dec_match ; if ht.find()=0 ; run ;

16
16 HASH Log NOTE: There were 11346 observations read from the data set OMS_PROD.CAMPAIGN. NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH. NOTE: The data set WORK.DEC_MATCH2 has 6145029 observations and 9 variables. NOTE: Compressing data set WORK.DEC_MATCH2 decreased size by 38.33 percent. Compressed is 39071 pages; un-compressed would require 63352 pages. NOTE: DATA statement used (Total process time): real time 55.35 seconds cpu time 55.21 seconds

17
17 HASH Code data dec_match3; if _n_ = 1 then do; IF 0 then set oms_prod.program(keep=program_id name ); declare hash ht(dataset: "oms_prod.program"); ht.defineKEY("program_id"); ht.defineData("program_id", "name"); ht.defineDone(); end; set dec_match2; if ht.find()=0; run;

18
18 HASH Log NOTE: There were 446 observations read from the data set OMS_PROD.PROGRAM. NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH2. NOTE: The data set WORK.DEC_MATCH3 has 6145029 observations and 10 variables. NOTE: Compressing data set WORK.DEC_MATCH3 decreased size by 48.53 percent. Compressed is 43928 pages; un-compressed would require 85348 pages. NOTE: DATA statement used (Total process time): real time 1:00.38 cpu time 1:00.17

19
19 HASH Code data disposition_category (rename=(description=disp_desc)); set oms_prod.disposition_category; run; data dec_match4; if _n_ = 1 then do; IF 0 then set disposition_category(keep=disposition_category_code disp_desc); declare hash ht(dataset: "disposition_category"); ht.defineKEY("disposition_category_code"); ht.defineData("disposition_category_code", "disp_desc"); ht.defineDone(); end; set dec_match3; if ht.find()=0; run;

20
20 HASH Log NOTE: There were 6 observations read from the data set OMS_PROD.DISPOSITION_CATEGORY. NOTE: The data set WORK.DISPOSITION_CATEGORY has 6 observations and 2 variables. NOTE: Compressing data set WORK.DISPOSITION_CATEGORY increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.02 seconds cpu time 0.01 seconds NOTE: There were 6 observations read from the data set WORK.DISPOSITION_CATEGORY. NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH3. NOTE: The data set WORK.DEC_MATCH4 has 6145029 observations and 11 variables. NOTE: Compressing data set WORK.DEC_MATCH4 decreased size by 49.47 percent. Compressed is 51750 pages; un-compressed would require 102418 pages. NOTE: DATA statement used (Total process time): real time 1:02.45 cpu time 1:02.30

21
21 Comparison Of Processing Time Sort / MergeHASH ~70 sec48 sec ~2 min55 sec ~2 min 16 sec1 min ~2 min 29 sec1 min 2 sec ~8 min TOTAL~4 min TOTAL dec_match dec_match2 dec_match3 dec_match4

22
22 Conclusion When looking for efficiencies HASH objects are definitely worth considering. In larger programs, HASH objects can save valuable processing time.

23
23 References Linda Jolley – Using Table Lookup Techniques Efficiently Jason Secosky – The DATA Step In Version 9: What’s New? Paul Dorfman- DATA Step HASH Objects As Programming Tools

24
24 Contact Information Denise A. Kruse SAS Contractor DeniseAKruse@gmail.com

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google