Download presentation

Presentation is loading. Please wait.

Published byTrent Gossage Modified over 2 years ago

1
SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor

2
2 Program Objectives campaign 11,346 obs disposition 97 obs program 446 obs disposition category 6 obs dec_offers 6,145,029 obs

3
3 Matching Datasets What is the best way to get the fields from the 4 small datasets into the main population of 6.1 million observations? PROC merge HASH

4
4 PROC merge Both datasets need to be sorted prior to the merge Merge datasets Sort again Merge again

5
5 proc sort data=oms_prod.disposition out=disp ; by disposition_id ;run ; proc sort data=dec_offers ; by disposition_id ;run ; data dec_match ; merge dec_offers (in=a) disp(keep=disposition_id description touched disposition_category_code in=b) ; by disposition_id ; if a and b ; run ; Sort / Merge Code

6
6 NOTE: Sorting was performed by the data source. NOTE: There were 97 observations read from the data set OMS_PROD.DISPOSITION. NOTE: The data set WORK.DISP has 97 observations and 10 variables. NOTE: Compressing data set WORK.DISP decreased size by 0.00 percent. Compressed is 2 pages; un-compressed would require 2 pages. NOTE: PROCEDURE SORT used (Total process time): real time 0.23 seconds cpu time 0.00 seconds NOTE: There were 6145029 observations read from the data set WORK.DEC_OFFERS. NOTE: The data set WORK.DEC_OFFERS has 6145029 observations and 4 variables. NOTE: Compressing data set WORK.DEC_OFFERS increased size by 58.15 percent. Compressed is 38412 pages; un-compressed would require 24289 pages. NOTE: PROCEDURE SORT used (Total process time): real time 28.44 seconds cpu time 39.81 seconds NOTE: There were 6145029 observations read from the data set WORK.DEC_OFFERS. NOTE: There were 97 observations read from the data set WORK.DISP. NOTE: The data set WORK.DEC_MATCH has 6145029 observations and 7 variables. NOTE: Compressing data set WORK.DEC_MATCH decreased size by 74.94 percent. Compressed is 27499 pages; un-compressed would require 109733 pages. NOTE: DATA statement used (Total process time): real time 42.81 seconds cpu time 42.58 seconds Log For Sort / Merge

7
7 proc sort data=oms_prod.campaign out=camp ; by campaign_id ;run ; proc sort data=dec_match ; by campaign_id ; run ; data dec_match2 ; merge dec_match (in=a) camp(keep=campaign_id program_id campaign_code description in=b) ; by campaign_id ; if a and b ; run ; Sort / Merge Code Continued

8
8 NOTE: Sorting was performed by the data source. NOTE: There were 11346 observations read from the data set OMS_PROD.CAMPAIGN. NOTE: The data set WORK.CAMP has 11346 observations and 19 variables. NOTE: Compressing data set WORK.CAMP decreased size by 43.48 percent. Compressed is 143 pages; un-compressed would require 253 pages. NOTE: PROCEDURE SORT used (Total process time): real time 0.67 seconds cpu time 0.43 seconds NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH. NOTE: The data set WORK.DEC_MATCH has 6145029 observations and 7 variables. NOTE: Compressing data set WORK.DEC_MATCH decreased size by 74.94 percent. Compressed is 27496 pages; un-compressed would require 109733 pages. NOTE: PROCEDURE SORT used (Total process time): real time 1:09.07 cpu time 1:59.52 NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH. NOTE: There were 11346 observations read from the data set WORK.CAMP. NOTE: The data set WORK.DEC_MATCH2 has 6145029 observations and 9 variables. NOTE: Compressing data set WORK.DEC_MATCH2 decreased size by 71.53 percent. Compressed is 34306 pages; un-compressed would require 120491 pages. NOTE: DATA statement used (Total process time): real time 51.29 seconds cpu time 51.05 seconds Log For Sort / Merge

9
9 Sort / Merge Code Continued proc sort data=oms_prod.program out=pgm ; by program_id ; run ; proc sort data=dec_match2 ; by program_id ; run ; data dec_match3 ; merge dec_match (in=a) pgm(keep=program_id name in=b) ; by program_id ; if a and b ; run ;

10
10 Log For Sort / Merge NOTE: Sorting was performed by the data source. NOTE: There were 446 observations read from the data set OMS_PROD.PROGRAM. NOTE: The data set WORK.PGM has 446 observations and 16 variables. NOTE: Compressing data set WORK.PGM decreased size by 40.00 percent. Compressed is 6 pages; un-compressed would require 10 pages. NOTE: PROCEDURE SORT used (Total process time): real time 0.25 seconds cpu time 0.03 seconds NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH2. NOTE: The data set WORK.DEC_MATCH2 has 6145029 observations and 9 variables. NOTE: Compressing data set WORK.DEC_MATCH2 decreased size by 71.53 percent. Compressed is 34306 pages; un-compressed would require 120491 pages. NOTE: PROCEDURE SORT used (Total process time): real time 1:17.37 cpu time 2:02.37 NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH2. NOTE: There were 446 observations read from the data set WORK.PGM. NOTE: The data set WORK.DEC_MATCH3 has 6145029 observations and 10 variables. NOTE: Compressing data set WORK.DEC_MATCH3 decreased size by 72.06 percent. Compressed is 26016 pages; un-compressed would require 93107 pages. NOTE: DATA statement used (Total process time): real time 59.00 seconds cpu time 58.97 seconds

11
11 Sort / Merge Code proc sort data=oms_prod.disposition_category out=disp_cat(rename=(description=disp_desc)) ; by disposition_category_code ; run ; proc sort data=dec_match3 ; by disposition_category_code ; run ; data dec_match4 ; merge dec_match3 (in=a) disp_cat(keep=disposition_category_code disp_desc in=b) ; by disposition_category_code ; if a and b ; run ;

12
12 Log For Sort / Merge NOTE: Sorting was performed by the data source. NOTE: There were 6 observations read from the data set OMS_PROD.DISPOSITION_CATEGORY. NOTE: The data set WORK.DISP_CAT has 6 observations and 2 variables. NOTE: Compressing data set WORK.DISP_CAT increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: PROCEDURE SORT used (Total process time): real time 0.03 seconds cpu time 0.02 seconds NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH3. NOTE: The data set WORK.DEC_MATCH3 has 6145029 observations and 10 variables. NOTE: Compressing data set WORK.DEC_MATCH3 decreased size by 72.06 percent. Compressed is 26017 pages; un-compressed would require 93107 pages. NOTE: PROCEDURE SORT used (Total process time): real time 1:26.08 cpu time 2:14.65 NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH3. NOTE: There were 6 observations read from the data set WORK.DISP_CAT. NOTE: The data set WORK.DEC_MATCH4 has 6145029 observations and 11 variables. NOTE: Compressing data set WORK.DEC_MATCH4 decreased size by 71.05 percent. Compressed is 31209 pages; un-compressed would require 107808 pages. NOTE: DATA statement used (Total process time): real time 1:03.35 cpu time 1:03.28

13
13 HASH code data dec_match ; if _n_ = 1 then do ; IF 0 then set oms_prod.disposition(keep=disposition_id description touched disposition_category_code ) ; declare hash ht(dataset: "oms_prod.disposition") ; ht.defineKEY("disposition_id ") ; ht.defineData("disposition_id ", "description “ “touched","disposition_category_code") ; ht.defineDone() ; end ; set dec_offers ; if ht.find()=0 ; run ; No sorting !!

14
14 HASH Log NOTE: There were 97 observations read from the data set OMS_PROD.DISPOSITION. NOTE: There were 6145029 observations read from the data set WORK.DEC_OFFERS. NOTE: The data set WORK.DEC_MATCH has 6145029 observations and 7 variables. NOTE: Compressing data set WORK.DEC_MATCH decreased size by 74.94 percent. Compressed is 27499 pages; un-compressed would require 109733 pages. NOTE: DATA statement used (Total process time): real time 48.38 seconds cpu time 48.14 seconds

15
15 HASH Code data dec_match2 ; if _n_ = 1 then do ; IF 0 then set oms_prod.campaign(keep=campaign_id program_id campaign_code description ) ; declare hash ht(dataset: "oms_prod.campaign") ; ht.defineKEY("campaign_id") ; ht.defineData("campaign_id", "program_id", "campaign_code", "description") ; ht.defineDone() ; end ; set dec_match ; if ht.find()=0 ; run ;

16
16 HASH Log NOTE: There were 11346 observations read from the data set OMS_PROD.CAMPAIGN. NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH. NOTE: The data set WORK.DEC_MATCH2 has 6145029 observations and 9 variables. NOTE: Compressing data set WORK.DEC_MATCH2 decreased size by 38.33 percent. Compressed is 39071 pages; un-compressed would require 63352 pages. NOTE: DATA statement used (Total process time): real time 55.35 seconds cpu time 55.21 seconds

17
17 HASH Code data dec_match3; if _n_ = 1 then do; IF 0 then set oms_prod.program(keep=program_id name ); declare hash ht(dataset: "oms_prod.program"); ht.defineKEY("program_id"); ht.defineData("program_id", "name"); ht.defineDone(); end; set dec_match2; if ht.find()=0; run;

18
18 HASH Log NOTE: There were 446 observations read from the data set OMS_PROD.PROGRAM. NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH2. NOTE: The data set WORK.DEC_MATCH3 has 6145029 observations and 10 variables. NOTE: Compressing data set WORK.DEC_MATCH3 decreased size by 48.53 percent. Compressed is 43928 pages; un-compressed would require 85348 pages. NOTE: DATA statement used (Total process time): real time 1:00.38 cpu time 1:00.17

19
19 HASH Code data disposition_category (rename=(description=disp_desc)); set oms_prod.disposition_category; run; data dec_match4; if _n_ = 1 then do; IF 0 then set disposition_category(keep=disposition_category_code disp_desc); declare hash ht(dataset: "disposition_category"); ht.defineKEY("disposition_category_code"); ht.defineData("disposition_category_code", "disp_desc"); ht.defineDone(); end; set dec_match3; if ht.find()=0; run;

20
20 HASH Log NOTE: There were 6 observations read from the data set OMS_PROD.DISPOSITION_CATEGORY. NOTE: The data set WORK.DISPOSITION_CATEGORY has 6 observations and 2 variables. NOTE: Compressing data set WORK.DISPOSITION_CATEGORY increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.02 seconds cpu time 0.01 seconds NOTE: There were 6 observations read from the data set WORK.DISPOSITION_CATEGORY. NOTE: There were 6145029 observations read from the data set WORK.DEC_MATCH3. NOTE: The data set WORK.DEC_MATCH4 has 6145029 observations and 11 variables. NOTE: Compressing data set WORK.DEC_MATCH4 decreased size by 49.47 percent. Compressed is 51750 pages; un-compressed would require 102418 pages. NOTE: DATA statement used (Total process time): real time 1:02.45 cpu time 1:02.30

21
21 Comparison Of Processing Time Sort / MergeHASH ~70 sec48 sec ~2 min55 sec ~2 min 16 sec1 min ~2 min 29 sec1 min 2 sec ~8 min TOTAL~4 min TOTAL dec_match dec_match2 dec_match3 dec_match4

22
22 Conclusion When looking for efficiencies HASH objects are definitely worth considering. In larger programs, HASH objects can save valuable processing time.

23
23 References Linda Jolley – Using Table Lookup Techniques Efficiently Jason Secosky – The DATA Step In Version 9: What’s New? Paul Dorfman- DATA Step HASH Objects As Programming Tools

24
24 Contact Information Denise A. Kruse SAS Contractor DeniseAKruse@gmail.com

Similar presentations

Presentation is loading. Please wait....

OK

The 5S numbers game..

The 5S numbers game..

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on decimals for class 7 Ppt on depth first search python Ppt on crop production and management Ppt on mammals and egg laying animals for kids Ppt on polynomials in maths what does median Ppt on tourism industry in india Ppt on service oriented architecture diagram Ppt on area of plane figures in math Ppt on credit policy pdf Ppt on mobile computing pdf