Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Chapter 4: Introduction to Lookup Techniques 4.1 Introduction to Lookup Techniques 4.2 In-Memory Lookup Techniques 4.3 Disk Storage Techniques.

Similar presentations


Presentation on theme: "1 Chapter 4: Introduction to Lookup Techniques 4.1 Introduction to Lookup Techniques 4.2 In-Memory Lookup Techniques 4.3 Disk Storage Techniques."— Presentation transcript:

1 1 Chapter 4: Introduction to Lookup Techniques 4.1 Introduction to Lookup Techniques 4.2 In-Memory Lookup Techniques 4.3 Disk Storage Techniques

2 2 Chapter 4: Introduction to Lookup Techniques 4.1 Introduction to Lookup Techniques 4.2 In-Memory Lookup Techniques 4.3 Disk Storage Techniques

3 3 Objectives Define table lookup. List table lookup techniques.

4 4 Table Lookups Lookup values for a table lookup can be stored in the following: array hash object format data set Lookup techniques include the following: array subscript value hash object key value FORMAT statement, PUT function MERGE, SET/SET, join Data Values Lookup Values lookup

5 5

6 6 4.01 Multiple Choice Poll Which of these is an example of a table lookup? a.You have the data for January sales in one data set, February sales in a second data set, and March sales in a third. You need to create a report for the entire first quarter. b.You want to send birthday cards to employees. The employees’ names and addresses are in one data set and their birthdates are in another. c.You need to calculate the amount each customer owes for his purchases. The price per item and the number of items purchased are stored in the same data set.

7 7 4.01 Multiple Choice Poll – Correct Answer Which of these is an example of a table lookup? a.You have the data for January sales in one data set, February sales in a second data set, and March sales in a third. You need to create a report for the entire first quarter. b.You want to send birthday cards to employees. The employees’ names and addresses are in one data set and their birthdates are in another. c.You need to calculate the amount each customer owes for his purchases. The price per item and the number of items purchased are stored in the same data set.

8 8 Overview of Table Lookup Techniques Arrays, hash objects, and formats provide an in-memory lookup table. The DATA step MERGE statement, multiple SET statements in the DATA step, and SQL procedure joins use lookup values that are stored on disk.

9 9 Chapter 4: Introduction to Lookup Techniques 4.1 Introduction to Lookup Techniques 4.2 In-Memory Lookup Techniques 4.3 Disk Storage Techniques

10 10 Objectives Describe arrays as a lookup technique. Describe hash objects as a lookup technique. Describe formats as a lookup technique.

11 11

12 12 4.02 Multiple Answer Poll Which techniques do you currently use when you perform table lookups with a single data set? a.Arrays b.Hash object c.Formats d.None of the above

13 13 Overview of Arrays An array is similar to a numbered row of buckets.... 1234

14 14 Overview of Arrays An array is similar to a numbered row of buckets. SAS puts a value in a bucket based on the bucket number. 1234...

15 15 Overview of Arrays An array is similar to a numbered row of buckets. SAS puts a value in a bucket based on the bucket number. A value is retrieved from a bucket based on the bucket number. 1234

16 16 DATA data-set-name; ARRAY array-name { subscript } ; new-variable=array-name{subscript-value}; RUN; DATA data-set-name; ARRAY array-name { subscript } ; new-variable=array-name{subscript-value}; RUN; Overview of Arrays General form of the ARRAY statement:  The READ statement can be the SET, MERGE or INFILE/INPUT statement. The ARRAY statement associates variables or initial values to be retrieved using the array name and a subscript value. The assignment statement retrieves values from the array based on the value of the subscript.

17 17 Overview of Arrays data country_info; array Cont_Name{91:96} $ 30 _temporary_ ('North America', ' ', 'Europe', 'Africa', 'Asia', 'Australia/Pacific'); set orion.country; Continent=Cont_Name{Continent_ID}; run; The ARRAY statement associates variables or initial values to be retrieved using the array name and a subscript value. The assignment statement retrieves values from the array based on the value of the subscript. p304d01

18 18

19 19 Setup for the Poll data country_info; array Cont_Name{91:96} $ 30 _temporary_ ('North America', ' ', 'Europe', 'Africa', 'Asia', 'Australia/Pacific'); set orion.country; Continent=Cont_Name{Continent_ID}; run; p304d01

20 20 4.03 Multiple Choice Poll In p304d01, how many elements are in the array Cont_name? a.0 b.5 c.6 d.unknown

21 21 4.03 Multiple Choice Poll – Correct Answer In p304d01, how many elements are in the array Cont_name? a.0 b.5 c.6 d.unknown

22 22 Overview of a Hash Object A hash object is similar to rows of buckets that are identified by the value of a key. KeyData...

23 23 Overview of a Hash Object A hash object is similar to rows of buckets that are identified by the value of a key. SAS puts value(s) in the data bucket(s) based on the value(s) in the key bucket. KeyData...

24 24 Overview of a Hash Object A hash object is similar to rows of buckets that are identified by the value of a key. SAS puts value(s) in the data bucket(s) based on the value(s) in the key bucket. Value(s) are retrieved from the data bucket(s) based on the value(s) in the key bucket. KeyData

25 25 DATA data-set-name; IF _N_=1 THEN DO; DECLARE HASH object-name( ); object-name.DEFINEKEY('key-name'); object-name.DEFINEDATA('data-name'); object-name.DEFINEDONE(); END; return-code=object-name.FIND( ); RUN; DATA data-set-name; IF _N_=1 THEN DO; DECLARE HASH object-name( ); object-name.DEFINEKEY('key-name'); object-name.DEFINEDATA('data-name'); object-name.DEFINEDONE(); END; return-code=object-name.FIND( ); RUN; Overview of Hash Objects General form of the hash object:  The READ statement can be the SET, MERGE, or INFILE/INPUT statement. The syntax within the DO group defines and can populate the hash object. The FIND method retrieves the data value based on the key value.

26 26 Overview of Hash Objects data country_info; length Continent_Name $ 30; if _N_=1 then do; declare hash Cont_Name(dataset:'orion.continent'); Cont_Name.definekey('Continent_ID'); Cont_Name.definedata('Continent_Name'); Cont_Name.definedone(); end; set orion.country; rc=Cont_Name.find(key:Continent_ID); if rc=0; run; The syntax within the DO group defines and populates the hash object. The FIND method retrieves the data value based on the key value. p304d02

27 27

28 28 Setup for the Poll data country_info; length Continent_Name $ 30; if _N_=1 then do; declare hash Cont_Name(dataset:'orion.continent'); Cont_Name.definekey('Continent_ID'); Cont_Name.definedata('Continent_Name'); Cont_Name.definedone(); end; set orion.country; rc=Cont_Name.find(key:Continent_ID); if rc=0; run; p304d02

29 29 4.04 Multiple Choice Poll In p304d02, how many times do the statements in the DO group execute? a.only once b.once for every observation in the data set orion.country c.once for every observation in the data set orion.continent

30 30 4.04 Multiple Choice Poll – Correct Answer In p304d02, how many times do the statements in the DO group execute? a.only once b.once for every observation in the data set orion.country c.once for every observation in the data set orion.continent

31 31 Overview of a Format A format is similar to rows of buckets that are identified by the data value. Data ValueLabel...

32 32 Overview of a Format A format is similar to rows of buckets that are identified by the data value. SAS puts data values and label values in the buckets when the format is used in a FORMAT statement, PUT function, or PUT statement. Data ValueLabel...

33 33 Overview of a Format A format is similar to rows of buckets that are identified by the data value. SAS puts data values and label values in the buckets when the format is used in a FORMAT statement, PUT function, or PUT statement. SAS uses a binary search on the data value bucket in order to return the value in the label bucket. Data ValueLabel

34 34 Overview of a Format General form of the user-defined format:  The READ statement can be the SET, MERGE, or INFILE/INPUT statement. PROC FORMAT; VALUE fmtname range-1=label-1... range-n=label-n; RUN; DATA data-set-name; ; new-variable=PUT(variable,fmtname.); RUN; PROC FORMAT; VALUE fmtname range-1=label-1... range-n=label-n; RUN; DATA data-set-name; ; new-variable=PUT(variable,fmtname.); RUN; When the PUT function executes, the format is loaded into memory, and a binary search is used to retrieve the format value. The FORMAT step compiles the format and stores it on disk.

35 35 Overview of a Format proc format; value Cont_Name 91='North America' 93='Europe' 94='Africa' 95='Asia' 96='Australia/Pacific'; run; data country_info; set orion.country; Continent=put(Continent_ID,Cont_Name.); run; When the PUT function executes, the format is loaded into memory, and a binary search is used to retrieve the format value. The FORMAT step compiles the format and stores it on disk. p304d03

36 36 Chapter 4: Introduction to Lookup Techniques 4.1 Introduction to Lookup Techniques 4.2 In-Memory Lookup Techniques 4.3 Disk Storage Techniques

37 37 Objectives List methods for combining data horizontally. Use multiple SET statements to combine data horizontally. Compare methods for combining SAS data sets.

38 38 Combining Data Horizontally DATA step techniques for combining data horizontally include using the following: MERGE statement multiple SET statements UPDATE statement MODIFY statement In addition, you can use the SQL procedure with an inner or outer join.

39 39

40 40 4.05 Multiple Answer Poll Which techniques do you currently use when you perform table lookups with multiple data sets? a.MERGE statement b.Joins c.Multiple SET statements d.UPDATE statement e.MODIFY statement f.None of the above

41 41 Overview of Merges and Joins The DATA step MERGE and the SQL join operators are similar to multiple stacks of buckets that are referred to by the value of one or more common variables. By Value(s)Data By Value(s)Data

42 42 DATA Step MERGE Statement General form of the DATA step merge: Matches on equal values for like-named variables: Continent_ID DATA data-set-name; MERGE SAS-data-sets; BY variables; RUN; DATA data-set-name; MERGE SAS-data-sets; BY variables; RUN;

43 43 DATA Step MERGE Statement proc sort data=orion.country out=country; by Continent_ID; run; data country_info; merge country orion.continent; by Continent_ID; run; Matches on equal values for like-named variables p304d04

44 44

45 45 Setup for the Poll proc sort data=orion.country out=country; by Continent_ID; run; data country_info; merge country orion.continent; by Continent_ID; run; p304d04

46 46 4.06 Multiple Choice Poll In p304d04, if the data set country has seven observations and the data set orion.continent has five observations, what stops the execution of the DATA step? a.end of file for work.country, the data set with the most observations b.end of file for orion.continent, the last data set listed in the MERGE statement c.end of file for the data set that contains the final value of the BY variable Continent_ID

47 47 4.06 Multiple Choice Poll – Correct Answer In p304d04, if the data set country has seven observations and the data set orion.continent has five observations, what stops the execution of the DATA step? a.end of file for work.country, the data set with the most observations b.end of file for orion.continent, the last data set listed in the MERGE statement c.end of file for the data set that contains the final value of the BY variable Continent_ID

48 48 You can use an SQL procedure inner or outer join to create a SAS data set. General form of the SQL procedure CREATE TABLE statement with an inner join: PROC SQL; CREATE TABLE SAS-data-set AS SELECT column-1, column-2,…,column-n FROM table-1, table-2,…,table-n WHERE joining criteria ORDER BY sorting criteria; QUIT; PROC SQL; CREATE TABLE SAS-data-set AS SELECT column-1, column-2,…,column-n FROM table-1, table-2,…,table-n WHERE joining criteria ORDER BY sorting criteria; QUIT; The SQL Procedure Performs an inner join based on the WHERE criteria

49 49 The SQL Procedure proc sql; create table country_info as select country.*, Continent_Name from orion.country, orion.continent where country.Continent_ID= continent.Continent_ID; order by country.Continent_ID; quit; Performs an inner join where the Continent_ID values from both data sets are equal p304d05

50 50

51 51 4.07 Multiple Choice Poll Which of the following is true of the SQL inner join? a.The resulting data set contains only the observations with matching key values. b.The resulting data set contains both the observations with matching key values and those observations where the key values do not match.

52 52 4.07 Multiple Choice Poll – Correct Answer Which of the following is true of the SQL inner join? a.The resulting data set contains only the observations with matching key values. b.The resulting data set contains both the observations with matching key values and those observations where the key values do not match.

53 53 Multiple SET Statements The DATA step with multiple SET statements combines data sets by performing one-to-one reading. Data

54 54 Multiple SET Statements You can use multiple SET statements to combine observations from several SAS data sets. When you use multiple SET statements, the following occurs: Processing stops when SAS encounters the end-of-file marker on either data set. The variables in the PDV are not reinitialized when a second SET statement is executed.

55 55 Multiple SET Statements General form of the DATA step with multiple set statements: DATA data-set-name; SET SAS-data-set; RUN; DATA data-set-name; SET SAS-data-set; RUN;

56 56 Multiple SET Statements data country_info; set orion.country; set orion.continent; run; Country_ Country_ Continent_ Country_Former Obs Country Name Population ID ID Name Continent_Name 1 AU Australia 20,000,000 160 91 North America 2 CA Canada. 260 93 Europe 3 DE Germany 80,000,000 394 94 East/West Germany Africa 4 IL Israel 5,000,000 475 95 Asia 5 TR Turkey 70,000,000 905 96 Australia/Pacific p304d06 Listing of country_info

57 57 Execution... one XY 12 23 34 two Z A B data three; set one; set two; Total=X+Y; run; PDV XYZTotal_N_ 12.1

58 58 Execution... one XY 12 23 34 two Z A B data three; set one; set two; Total=X+Y; run; PDV XYZTotal_N_ 12A.1 D

59 59 Execution... one XY 12 23 34 two Z A B data three; set one; set two; Total=X+Y; run; PDV XYZTotal_N_ 12A31 D

60 60 Execution... one XY 12 23 34 two Z A B data three; set one; set two; Total=X+Y; run; PDV XYZTotal_N_ 12A31 Implicit OUTPUT; Implicit RETURN; D

61 61 Execution... one XY 12 23 34 two Z A B data three; set one; set two; Total=X+Y; run; PDV XYZTotal_N_ 12A.2 Initialize PDV. D

62 62 Execution... one XY 12 23 34 two Z A B data three; set one; set two; Total=X+Y; run; PDV XYZTotal_N_ 23A.2 D

63 63 Execution... one XY 12 23 34 two Z A B data three; set one; set two; Total=X+Y; run; PDV XYZTotal_N_ 23B.2 D

64 64 Execution... one XY 12 23 34 two Z A B data three; set one; set two; Total=X+Y; run; PDV XYZTotal_N_ 23B52 D

65 65 Execution... one XY 12 23 34 two Z A B data three; set one; set two; Total=X+Y; run; PDV XYZTotal_N_ 23B52 Implicit OUTPUT; Implicit RETURN; D

66 66 Execution... one XY 12 23 34 two Z A B data three; set one; set two; Total=X+Y; run; PDV XYZTotal_N_ 23B.3 Initialize PDV. D

67 67 Execution... one XY 12 23 34 two Z A B data three; set one; set two; Total=X+Y; run; PDV XYZTotal_N_ 34B.3 D

68 68 Execution three XYZTotal 12A3 23B5 one XY 12 23 34 two Z A B data three; set one; set two; Total=X+Y; run; PDV XYZTotal_N_ 34B.3 EOF D Processing stops.

69 69

70 70 Setup for the Poll The previous example created a data set named three with two observations. Using the same one and two data sets, if the SET statements were reversed, how many observations would be in the data set three? data three; set one; set two; Total=X+Y; run; one XY 12 23 34 two Z A B data three; set two; set one; Total=X+Y; run;

71 71 4.08 Multiple Choice Poll Using the same one and two data sets, if the SET statements were reversed, how many observations would be in the data set three? a.5 b.2 c.3 d.6

72 72 4.08 Multiple Choice Poll – Correct Answer Using the same one and two data sets, if the SET statements were reversed, how many observations would be in the data set three? a.5 b.2 c.3 d.6

73 73 DATA Step Methods for Reading SAS Data CodeWhich variables are reinitialized to missing at the top of the DATA step? What stops the DATA step? data two; set one; New_Var=Value; run; variables created in the DATA stepend of the file for data set one data three; merge one two; by Var; New_Var=Value; run; variables created in the DATA step all variables when the BY value changes the last end of file that is encountered data three; set one two; New_Var=Value; run; variables created in the DATA step all variables when SAS finishes reading data set one and starts reading data set two end of the file for data set two data three; set one; set two; New_Var=Value; run; variables created in the DATA stepthe first end of file that is encountered

74 74 Chapter Review 1.What are the three types of in-memory table lookups? 2.What are three types of disk storage table lookups? 3.When multiple SET statements are executed, when does execution stop?

75 75 Chapter Review – Correct Answers 1.What are the three types of in-memory table lookups? arrays, hash objects, and formats 2.What are three types of disk storage table lookups? PROC SQL, the DATA step with a MERGE statement, or the DATA step with multiple SET statements 3.When multiple SET statements are executed, when does execution stop? Execution stops when the first end of file is encountered.


Download ppt "1 Chapter 4: Introduction to Lookup Techniques 4.1 Introduction to Lookup Techniques 4.2 In-Memory Lookup Techniques 4.3 Disk Storage Techniques."

Similar presentations


Ads by Google