Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2008, SAS Institute Inc. All rights reserved. Hash Objects – Why Use Them? Carolyn Cunnison SAS Technical Training Specialist.

Similar presentations


Presentation on theme: "Copyright © 2008, SAS Institute Inc. All rights reserved. Hash Objects – Why Use Them? Carolyn Cunnison SAS Technical Training Specialist."— Presentation transcript:

1 Copyright © 2008, SAS Institute Inc. All rights reserved. Hash Objects – Why Use Them? Carolyn Cunnison SAS Technical Training Specialist

2 Copyright © 2008, SAS Institute Inc. All rights reserved. Agenda  What are a Hash objects?  When should I use them?  Some sample code.

3 Copyright © 2008, SAS Institute Inc. All rights reserved. 3 What are HASH objects? Keys Data... Hash object can be thought of as rows of keys and data loaded into memory.

4 Copyright © 2008, SAS Institute Inc. All rights reserved. Advantages of Hash Objects  Values can be hard-coded or loaded from a SAS data set.  Keys and data can be a mixture of character and numeric.  Provides in-memory data storage and retrieval.  Does not require that data be sorted.  Is sized dynamically.

5 Copyright © 2008, SAS Institute Inc. All rights reserved. When to Use Hash Objects (1) Joining tables  I cut my processing time by 90% using hash tables - You can do it too!; Jennifer K. Warner- Freeman http://www.nesug.info/Proceedings/nesug07/bb/bb1 6.pdfhttp://www.nesug.info/Proceedings/nesug07/bb/bb1 6.pdf  Jennifer took an existing Proc SQL join which took between 2 and 4 hours to run. When she rewrote the program to use Hash tables, the program ran in 11 minutes.

6 Copyright © 2008, SAS Institute Inc. All rights reserved. When to Use Hash Objects (2) Summary-less summarization  Hash-Crash and Beyond; Paul Dortman et al http://www2.sas.com/proceedings/forum2008/037- 2008.pdfhttp://www2.sas.com/proceedings/forum2008/037- 2008.pdf  Compared PROC SUMMARY with NWAY option to Hash Object proc summary data = input nway ; class k1 k2 ; var num ; output out = summ_sum (drop = _:) sum = sum ;  The Hash Object did “the job more than twice as fast at the same time utilizing ⅓ the memory”

7 Copyright © 2008, SAS Institute Inc. All rights reserved. When to Use Hash Objects (3) Dynamically output to multiple files  Paul Dortman paper (continued)  Use a Hash table instead of the following: Data out1 out2; Set tablein;… If id = 1 then output out1; Else if id = 2 then output out2;

8 Copyright © 2008, SAS Institute Inc. All rights reserved. When to Use Hash Objects ( 4) Removing data extremes  Knowledge Base Sample 25990 http://support.sas.com/kb/25/990.html  Removes top and bottom 10% of data values.

9 Copyright © 2008, SAS Institute Inc. All rights reserved. When to Use Hash Objects (5) Perform data sampling without Proc Surveyselect  Better Hashing in SAS9.2; Robert Ray and Jason Secosky http://support.sas.com/rnd/base/datastep/dot/better- hashing-sas92.pdfhttp://support.sas.com/rnd/base/datastep/dot/better- hashing-sas92.pdf  Select observations from a table without replacement.  Perform sampling and data manipulation in one step.

10 Copyright © 2008, SAS Institute Inc. All rights reserved. 10 Terminology Partial list of methods: ObjectsMethods HASHDefinedata Add - a row Definekey Remove - a row Definedone Replace - data for key Find Delete - hash table HITERFirst Last Next Prev

11 11 Business Scenario You need to read orion.product_list and then look up information in the orion.supplier table. Structure of orion.product_list Structure of orion.supplier

12 12 Loading Data from a SAS Data Set data supplier_info; drop rc; length Supplier_Name $40 Supplier_Address $ 45 Country $ 2; if _N_=1 then do; declare hash S(dataset:'orion.supplier'); S.definekey('Supplier_ID'); S.definedata('Supplier_Name', 'Supplier_Address','Country'); S.definedone(); call missing(Supplier_Name, Supplier_Address,Country); end; set orion.product_list; rc=S.find(); if rc=0; run; p305d02

13 13... Partial PDV Supplier_ Name Supplier_ Address Country Product _ID Product_ Name Supplier _ID.. rc_N_. 1... D Partial HASH Object S KEY: Supplier _ID DATA: Supplier_ Name DATA: Supplier_ Address DATA: Country 50 Scandinavian Clothing A/S Kr. Augusts Gate 13 NO 109Petterson AB Blasieh- olmstorg 1 SE 316 Prime Sports Ltd 9 Carlisle Place GB........................ 3298A Team Sports 2687 Julie Ann Ct US data supplier_info; drop rc; length Supplier_Name $40 Supplier_Address $ 45 Country $ 2; if _N_=1 then do; declare hash S(dataset:'orion.supplier'); S.definekey('Supplier_ID'); S.definedata('Supplier_Name', 'Supplier_Address', 'Country'); S.definedone(); call missing(Supplier_Name, Supplier_Address, Country); end; set orion.product_list; rc=S.find(); if rc=0; run; Execution

14 14... Partial PDV Supplier_ Name Supplier_ Address Country Product _ID Product_ Name Supplier _ID 210200100009 Kids Sweat Round Neck,Large Logo 3298 rc_N_ 06... D Partial HASH Object S KEY: Supplier _ID DATA: Supplier_ Name DATA: Supplier_ Address DATA: Country 50 Scandinavian Clothing A/S Kr. Augusts Gate 13 NO 109Petterson AB Blasieh- olmstorg 1 SE 316 Prime Sports Ltd 9 Carlisle Place GB........................ 3298A Team Sports 2687 Julie Ann Ct US data supplier_info; drop rc; length Supplier_Name $40 Supplier_Address $ 45 Country $ 2; if _N_=1 then do; declare hash S(dataset:'orion.supplier'); S.definekey('Supplier_ID'); S.definedata('Supplier_Name', 'Supplier_Address', 'Country'); S.definedone(); call missing(Supplier_Name, Supplier_Address, Country); end; set orion.product_list; rc=S.find(); if rc=0; run; Execution

15 15... Partial PDV Supplier_ Name Supplier_ Address Country Product _ID Product_ Name Supplier _ID A Team Sports 2687 Julie Ann Ct US 210200100009 Kids Sweat Round Neck,Large Logo 3298 rc_N_ 06... D Partial HASH Object S KEY: Supplier _ID DATA: Supplier_ Name DATA: Supplier_ Address DATA: Country 50 Scandinavian Clothing A/S Kr. Augusts Gate 13 NO 109Petterson AB Blasieh- olmstorg 1 SE 316 Prime Sports Ltd 9 Carlisle Place GB........................ 3298A Team Sports 2687 Julie Ann Ct US data supplier_info; drop rc; length Supplier_Name $40 Supplier_Address $ 45 Country $ 2; if _N_=1 then do; declare hash S(dataset:'orion.supplier'); S.definekey('Supplier_ID'); S.definedata('Supplier_Name', 'Supplier_Address', 'Country'); S.definedone(); call missing(Supplier_Name, Supplier_Address, Country); end; set orion.product_list; rc=S.find(); if rc=0; run; Execution

16 16... Partial PDV Supplier_ Name Supplier_ Address Country Product _ID Product_ Name Supplier _ID A Team Sports 2687 Julie Ann Ct US 210200100009 Kids Sweat Round Neck,Large Logo 3298 rc_N_ 06... D Partial HASH Object S KEY: Supplier _ID DATA: Supplier_ Name DATA: Supplier_ Address DATA: Country 50 Scandinavian Clothing A/S Kr. Augusts Gate 13 NO 109Petterson AB Blasieh- olmstorg 1 SE 316 Prime Sports Ltd 9 Carlisle Place GB........................ 3298A Team Sports 2687 Julie Ann Ct US data supplier_info; drop rc; length Supplier_Name $40 Supplier_Address $ 45 Country $ 2; if _N_=1 then do; declare hash S(dataset:'orion.supplier'); S.definekey('Supplier_ID'); S.definedata('Supplier_Name', 'Supplier_Address', 'Country'); S.definedone(); call missing(Supplier_Name, Supplier_Address, Country); end; set orion.product_list; rc=S.find(); if rc=0; run; Execution True

17 17... Partial PDV Supplier_ Name Supplier_ Address Country Product _ID Product_ Name Supplier _ID A Team Sports 2687 Julie Ann Ct US 210200100009 Kids Sweat Round Neck,Large Logo 3298 rc_N_ 06... D Partial HASH Object S KEY: Supplier _ID DATA: Supplier_ Name DATA: Supplier_ Address DATA: Country 50 Scandinavian Clothing A/S Kr. Augusts Gate 13 NO 109Petterson AB Blasieh- olmstorg 1 SE 316 Prime Sports Ltd 9 Carlisle Place GB........................ 3298A Team Sports 2687 Julie Ann Ct US data supplier_info; drop rc; length Supplier_Name $40 Supplier_Address $ 45 Country $ 2; if _N_=1 then do; declare hash S(dataset:'orion.supplier'); S.definekey('Supplier_ID'); S.definedata('Supplier_Name', 'Supplier_Address', 'Country'); S.definedone(); call missing(Supplier_Name, Supplier_Address, Country); end; set orion.product_list; rc=S.find(); if rc=0; run; Execution Implicit OUTPUT; Implicit RETURN;

18 18... Partial PDV Supplier_ Name Supplier_ Address Country Product _ID Product_ Name Supplier _ID A Team Sports 2687 Julie Ann Ct US 210200100009 Kids Sweat Round Neck,Large Logo 3298 rc_N_ 06... D Partial HASH Object S KEY: Supplier _ID DATA: Supplier_ Name DATA: Supplier_ Address DATA: Country 50 Scandinavian Clothing A/S Kr. Augusts Gate 13 NO 109Petterson AB Blasieh- olmstorg 1 SE 316 Prime Sports Ltd 9 Carlisle Place GB........................ 3298A Team Sports 2687 Julie Ann Ct US data supplier_info; drop rc; length Supplier_Name $40 Supplier_Address $ 45 Country $ 2; if _N_=1 then do; declare hash S(dataset:'orion.supplier'); S.definekey('Supplier_ID'); S.definedata('Supplier_Name', 'Supplier_Address', 'Country'); S.definedone(); call missing(Supplier_Name, Supplier_Address, Country); end; set orion.product_list; rc=S.find(); if rc=0; run; Execution Continue until EOF

19 19 Results proc print data=supplier_info(obs=10); var Product_ID Supplier_ID Supplier_Name Supplier_Address Country; title "Product Information"; run; Product Information Obs Product_ID Supplier_ID Supplier_Name Supplier_Address Country 1 210200100009 3298 A Team Sports 2687 Julie Ann Ct US 2 210200100017 3298 A Team Sports 2687 Julie Ann Ct US 3 210200200022 6153 Nautlius SportsWear Inc 56 Bagwell Ave US 4 210200200023 6153 Nautlius SportsWear Inc 56 Bagwell Ave US 5 210200300006 1303 Eclipse Inc 1218 Carriole Ct US 6 210200300007 1303 Eclipse Inc 1218 Carriole Ct US 7 210200300052 1303 Eclipse Inc 1218 Carriole Ct US 8 210200400020 1303 Eclipse Inc 1218 Carriole Ct US 9 210200400070 1303 Eclipse Inc 1218 Carriole Ct US 10 210200500002 772 AllSeasons Outdoor Clothing 553 Cliffview Dr US Partial PROC PRINT Output

20 20 Could I do the same thing with a MERGE ? Yes. But …… Would have to sort both tables. Reading from disk is slower than reading from memory.

21 21 What about data size ? Scalability of Table Lookup Techniques, Rick Langston http://support.sas.com/resources/papers/proceedi ngs09/037-2009.pdf http://support.sas.com/resources/papers/proceedi ngs09/037-2009.pdf Compared Hash table, Sort/Merge, Indexing, Proc SQL and Proc Format as table lookup techniques. Hash object processing was successful up to around 1,900,000 rows and then ran out of memory.

22 22 Did you know that…. PROC SQL sometimes uses hashing to join tables. Possible processing methods are: sqxjsl - Step Loop Join (Cartesian product) sqxjm - Merge Join sqxjndx- Index Join sqxjhsh- Hash Join To view the method used: Proc sql _method;

23 23 The HITER object The HITER object must point to a HASH object. Read the HITER using the following methods.

24 Copyright © 2008, SAS Institute Inc. All rights reserved. Conclusion  Hash and Hiter objects are very flexible.  Data has to fit into memory.  Results will depend on your data, your environment, and what you are trying to do.  You have to benchmark.

25 Copyright © 2008, SAS Institute Inc. All rights reserved. 25 Want to know more? SAS Programming III: Advanced Techniques and Efficiencies https://support.sas.com/edu/schedules.html?ctry=ca&id=279 Also available as Live Web course.

26 Copyright © 2008, SAS Institute Inc. All rights reserved. Questions?


Download ppt "Copyright © 2008, SAS Institute Inc. All rights reserved. Hash Objects – Why Use Them? Carolyn Cunnison SAS Technical Training Specialist."

Similar presentations


Ads by Google