SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor.

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

EcoTherm Plus WGB-K 20 E 4,5 – 20 kW.
1 A B C
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
AP STUDY SESSION 2.
1
EuroCondens SGB E.
Worksheets.
Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
Slide 1Fig 26-CO, p.795. Slide 2Fig 26-1, p.796 Slide 3Fig 26-2, p.797.
Slide 1Fig 25-CO, p.762. Slide 2Fig 25-1, p.765 Slide 3Fig 25-2, p.765.
Select from the most commonly used minutes below.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
Addition and Subtraction Equations
David Burdett May 11, 2004 Package Binding for WS CDL.
Local Customization Chapter 2. Local Customization 2-2 Objectives Customization Considerations Types of Data Elements Location for Locally Defined Data.
Create an Application Title 1Y - Youth Chapter 5.
CALENDAR.
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
The 5S numbers game..
Photo Slideshow Instructions (delete before presenting or this page will show when slideshow loops) 1.Set PowerPoint to work in Outline. View/Normal click.
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Welcome. © 2008 ADP, Inc. 2 Overview A Look at the Web Site Question and Answer Session Agenda.
Break Time Remaining 10:00.
This module: Telling the time
The basics for simulations
Turing Machines.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
PP Test Review Sections 6-1 to 6-6
MM4A6c: Apply the law of sines and the law of cosines.
1 Prediction of electrical energy by photovoltaic devices in urban situations By. R.C. Ott July 2011.
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
15. Oktober Oktober Oktober 2012.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Progressive Aerobic Cardiovascular Endurance Run
1..
Adding Up In Chunks.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
Facebook Pages 101: Your Organization’s Foothold on the Social Web A Volunteer Leader Webinar Sponsored by CACO December 1, 2010 Andrew Gossen, Senior.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
1 Termination and shape-shifting heaps Byron Cook Microsoft Research, Cambridge Joint work with Josh Berdine, Dino Distefano, and.
When you see… Find the zeros You think….
Before Between After.
Subtraction: Adding UP
: 3 00.
5 minutes.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Speak Up for Safety Dr. Susan Strauss Harassment & Bullying Consultant November 9, 2012.
Static Equilibrium; Elasticity and Fracture
Essential Cell Biology
Converting a Fraction to %
Clock will move after 1 minute
PSSA Preparation.
& dding ubtracting ractions.
Physics for Scientists & Engineers, 3rd Edition
Select a time to count down from the clock above
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
Copyright Tim Morris/St Stephen's School
1.step PMIT start + initial project data input Concept Concept.
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
Presentation transcript:

SAS Hash Object: My New Best Friend Demonstration Of Time Savings Using A Hash Object By Denise A. Kruse SAS Contractor

2 Program Objectives campaign 11,346 obs disposition 97 obs program 446 obs disposition category 6 obs dec_offers 6,145,029 obs

3 Matching Datasets What is the best way to get the fields from the 4 small datasets into the main population of 6.1 million observations? PROC merge HASH

4 PROC merge Both datasets need to be sorted prior to the merge Merge datasets Sort again Merge again

5 proc sort data=oms_prod.disposition out=disp ; by disposition_id ;run ; proc sort data=dec_offers ; by disposition_id ;run ; data dec_match ; merge dec_offers (in=a) disp(keep=disposition_id description touched disposition_category_code in=b) ; by disposition_id ; if a and b ; run ; Sort / Merge Code

6 NOTE: Sorting was performed by the data source. NOTE: There were 97 observations read from the data set OMS_PROD.DISPOSITION. NOTE: The data set WORK.DISP has 97 observations and 10 variables. NOTE: Compressing data set WORK.DISP decreased size by 0.00 percent. Compressed is 2 pages; un-compressed would require 2 pages. NOTE: PROCEDURE SORT used (Total process time): real time 0.23 seconds cpu time 0.00 seconds NOTE: There were observations read from the data set WORK.DEC_OFFERS. NOTE: The data set WORK.DEC_OFFERS has observations and 4 variables. NOTE: Compressing data set WORK.DEC_OFFERS increased size by percent. Compressed is pages; un-compressed would require pages. NOTE: PROCEDURE SORT used (Total process time): real time seconds cpu time seconds NOTE: There were observations read from the data set WORK.DEC_OFFERS. NOTE: There were 97 observations read from the data set WORK.DISP. NOTE: The data set WORK.DEC_MATCH has observations and 7 variables. NOTE: Compressing data set WORK.DEC_MATCH decreased size by percent. Compressed is pages; un-compressed would require pages. NOTE: DATA statement used (Total process time): real time seconds cpu time seconds Log For Sort / Merge

7 proc sort data=oms_prod.campaign out=camp ; by campaign_id ;run ; proc sort data=dec_match ; by campaign_id ; run ; data dec_match2 ; merge dec_match (in=a) camp(keep=campaign_id program_id campaign_code description in=b) ; by campaign_id ; if a and b ; run ; Sort / Merge Code Continued

8 NOTE: Sorting was performed by the data source. NOTE: There were observations read from the data set OMS_PROD.CAMPAIGN. NOTE: The data set WORK.CAMP has observations and 19 variables. NOTE: Compressing data set WORK.CAMP decreased size by percent. Compressed is 143 pages; un-compressed would require 253 pages. NOTE: PROCEDURE SORT used (Total process time): real time 0.67 seconds cpu time 0.43 seconds NOTE: There were observations read from the data set WORK.DEC_MATCH. NOTE: The data set WORK.DEC_MATCH has observations and 7 variables. NOTE: Compressing data set WORK.DEC_MATCH decreased size by percent. Compressed is pages; un-compressed would require pages. NOTE: PROCEDURE SORT used (Total process time): real time 1:09.07 cpu time 1:59.52 NOTE: There were observations read from the data set WORK.DEC_MATCH. NOTE: There were observations read from the data set WORK.CAMP. NOTE: The data set WORK.DEC_MATCH2 has observations and 9 variables. NOTE: Compressing data set WORK.DEC_MATCH2 decreased size by percent. Compressed is pages; un-compressed would require pages. NOTE: DATA statement used (Total process time): real time seconds cpu time seconds Log For Sort / Merge

9 Sort / Merge Code Continued proc sort data=oms_prod.program out=pgm ; by program_id ; run ; proc sort data=dec_match2 ; by program_id ; run ; data dec_match3 ; merge dec_match (in=a) pgm(keep=program_id name in=b) ; by program_id ; if a and b ; run ;

10 Log For Sort / Merge NOTE: Sorting was performed by the data source. NOTE: There were 446 observations read from the data set OMS_PROD.PROGRAM. NOTE: The data set WORK.PGM has 446 observations and 16 variables. NOTE: Compressing data set WORK.PGM decreased size by percent. Compressed is 6 pages; un-compressed would require 10 pages. NOTE: PROCEDURE SORT used (Total process time): real time 0.25 seconds cpu time 0.03 seconds NOTE: There were observations read from the data set WORK.DEC_MATCH2. NOTE: The data set WORK.DEC_MATCH2 has observations and 9 variables. NOTE: Compressing data set WORK.DEC_MATCH2 decreased size by percent. Compressed is pages; un-compressed would require pages. NOTE: PROCEDURE SORT used (Total process time): real time 1:17.37 cpu time 2:02.37 NOTE: There were observations read from the data set WORK.DEC_MATCH2. NOTE: There were 446 observations read from the data set WORK.PGM. NOTE: The data set WORK.DEC_MATCH3 has observations and 10 variables. NOTE: Compressing data set WORK.DEC_MATCH3 decreased size by percent. Compressed is pages; un-compressed would require pages. NOTE: DATA statement used (Total process time): real time seconds cpu time seconds

11 Sort / Merge Code proc sort data=oms_prod.disposition_category out=disp_cat(rename=(description=disp_desc)) ; by disposition_category_code ; run ; proc sort data=dec_match3 ; by disposition_category_code ; run ; data dec_match4 ; merge dec_match3 (in=a) disp_cat(keep=disposition_category_code disp_desc in=b) ; by disposition_category_code ; if a and b ; run ;

12 Log For Sort / Merge NOTE: Sorting was performed by the data source. NOTE: There were 6 observations read from the data set OMS_PROD.DISPOSITION_CATEGORY. NOTE: The data set WORK.DISP_CAT has 6 observations and 2 variables. NOTE: Compressing data set WORK.DISP_CAT increased size by percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: PROCEDURE SORT used (Total process time): real time 0.03 seconds cpu time 0.02 seconds NOTE: There were observations read from the data set WORK.DEC_MATCH3. NOTE: The data set WORK.DEC_MATCH3 has observations and 10 variables. NOTE: Compressing data set WORK.DEC_MATCH3 decreased size by percent. Compressed is pages; un-compressed would require pages. NOTE: PROCEDURE SORT used (Total process time): real time 1:26.08 cpu time 2:14.65 NOTE: There were observations read from the data set WORK.DEC_MATCH3. NOTE: There were 6 observations read from the data set WORK.DISP_CAT. NOTE: The data set WORK.DEC_MATCH4 has observations and 11 variables. NOTE: Compressing data set WORK.DEC_MATCH4 decreased size by percent. Compressed is pages; un-compressed would require pages. NOTE: DATA statement used (Total process time): real time 1:03.35 cpu time 1:03.28

13 HASH code data dec_match ; if _n_ = 1 then do ; IF 0 then set oms_prod.disposition(keep=disposition_id description touched disposition_category_code ) ; declare hash ht(dataset: "oms_prod.disposition") ; ht.defineKEY("disposition_id ") ; ht.defineData("disposition_id ", "description “ “touched","disposition_category_code") ; ht.defineDone() ; end ; set dec_offers ; if ht.find()=0 ; run ; No sorting !!

14 HASH Log NOTE: There were 97 observations read from the data set OMS_PROD.DISPOSITION. NOTE: There were observations read from the data set WORK.DEC_OFFERS. NOTE: The data set WORK.DEC_MATCH has observations and 7 variables. NOTE: Compressing data set WORK.DEC_MATCH decreased size by percent. Compressed is pages; un-compressed would require pages. NOTE: DATA statement used (Total process time): real time seconds cpu time seconds

15 HASH Code data dec_match2 ; if _n_ = 1 then do ; IF 0 then set oms_prod.campaign(keep=campaign_id program_id campaign_code description ) ; declare hash ht(dataset: "oms_prod.campaign") ; ht.defineKEY("campaign_id") ; ht.defineData("campaign_id", "program_id", "campaign_code", "description") ; ht.defineDone() ; end ; set dec_match ; if ht.find()=0 ; run ;

16 HASH Log NOTE: There were observations read from the data set OMS_PROD.CAMPAIGN. NOTE: There were observations read from the data set WORK.DEC_MATCH. NOTE: The data set WORK.DEC_MATCH2 has observations and 9 variables. NOTE: Compressing data set WORK.DEC_MATCH2 decreased size by percent. Compressed is pages; un-compressed would require pages. NOTE: DATA statement used (Total process time): real time seconds cpu time seconds

17 HASH Code data dec_match3; if _n_ = 1 then do; IF 0 then set oms_prod.program(keep=program_id name ); declare hash ht(dataset: "oms_prod.program"); ht.defineKEY("program_id"); ht.defineData("program_id", "name"); ht.defineDone(); end; set dec_match2; if ht.find()=0; run;

18 HASH Log NOTE: There were 446 observations read from the data set OMS_PROD.PROGRAM. NOTE: There were observations read from the data set WORK.DEC_MATCH2. NOTE: The data set WORK.DEC_MATCH3 has observations and 10 variables. NOTE: Compressing data set WORK.DEC_MATCH3 decreased size by percent. Compressed is pages; un-compressed would require pages. NOTE: DATA statement used (Total process time): real time 1:00.38 cpu time 1:00.17

19 HASH Code data disposition_category (rename=(description=disp_desc)); set oms_prod.disposition_category; run; data dec_match4; if _n_ = 1 then do; IF 0 then set disposition_category(keep=disposition_category_code disp_desc); declare hash ht(dataset: "disposition_category"); ht.defineKEY("disposition_category_code"); ht.defineData("disposition_category_code", "disp_desc"); ht.defineDone(); end; set dec_match3; if ht.find()=0; run;

20 HASH Log NOTE: There were 6 observations read from the data set OMS_PROD.DISPOSITION_CATEGORY. NOTE: The data set WORK.DISPOSITION_CATEGORY has 6 observations and 2 variables. NOTE: Compressing data set WORK.DISPOSITION_CATEGORY increased size by percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.02 seconds cpu time 0.01 seconds NOTE: There were 6 observations read from the data set WORK.DISPOSITION_CATEGORY. NOTE: There were observations read from the data set WORK.DEC_MATCH3. NOTE: The data set WORK.DEC_MATCH4 has observations and 11 variables. NOTE: Compressing data set WORK.DEC_MATCH4 decreased size by percent. Compressed is pages; un-compressed would require pages. NOTE: DATA statement used (Total process time): real time 1:02.45 cpu time 1:02.30

21 Comparison Of Processing Time Sort / MergeHASH ~70 sec48 sec ~2 min55 sec ~2 min 16 sec1 min ~2 min 29 sec1 min 2 sec ~8 min TOTAL~4 min TOTAL dec_match dec_match2 dec_match3 dec_match4

22 Conclusion When looking for efficiencies HASH objects are definitely worth considering. In larger programs, HASH objects can save valuable processing time.

23 References Linda Jolley – Using Table Lookup Techniques Efficiently Jason Secosky – The DATA Step In Version 9: What’s New? Paul Dorfman- DATA Step HASH Objects As Programming Tools

24 Contact Information Denise A. Kruse SAS Contractor