Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Summary HRP223 – 2009 November 1 st, 2010 Copyright © 1999-2010 Leland Stanford Junior University. All rights reserved. Warning: This presentation is.

Similar presentations


Presentation on theme: "1 Summary HRP223 – 2009 November 1 st, 2010 Copyright © 1999-2010 Leland Stanford Junior University. All rights reserved. Warning: This presentation is."— Presentation transcript:

1 1 Summary HRP223 – 2009 November 1 st, 2010 Copyright © 1999-2010 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.

2 2 You know … How to create a table from scratch How to import tables – From external sources like Excel or using export/import code from databases How to create tables – from a single existing table with selected variables with recoded variables with or without subsets of the records – from multiple tables by adding columns (joins) by adding sets of records (set operators) With code or GUI

3 3 Create a New Table The GUI is the easiest. Look in the optional textbooks for the class to learn the syntax for code. $ means a character string. 10. means 10 letters wide. The age variable starts in column 11. Missing numbers are just a. Missing characters are just spaces (not tabs)

4 4 Importing The most bullet proof way to import is to use the import wizard. You can also write a program with proc import

5 5 Code If you write any code be sure to load my keyboard macros: Once you have a program node open in a flowchart, you add the macros to both SAS and EG by using the Program menu. The import macro gives you the shell to import Excel files.

6 6 From a Database If you load data that came with an import/export program, you will probably need to add the path to infile statement.

7 7 Importing Advice It is a good idea to import the source into a permanent library. After importing, use the Query Builder or a Program node and copy all the variables into a new data set. This node can be tweaked later to fix the problems that you identify later. – If you do not do this, you will have to change the links leading from the cleaned/fixed data to point to the analyses.

8 8 Creating New Datasets From 1 Table Name the query and new table. Drag the entire table or individual variables to the Select Data pane. In the Select Data pane pick variables then click the properties button.

9 9 Changing a Variable Computed Columns>New… > Recoded column> pick a variable. Notice the other tabs for selecting what to change to a new value. SAS allows 27 different types of missing numbers..A through.Z and.

10 10 Bad Ages Recoded to NULL If you get data from a program that uses bogus numbers to indicate problems in a numeric field, replace the values with different NULL values.A,.B, etc. When you do descriptive statistics the null values will be automatically excluded.

11 11 Removing/Choosing Records Right click on the variables you want to use for dropping records or use the Filter Data tab.

12 12

13 13 Advanced Changes Comparisons You can use the Advanced Expression dialog box to do complex tasks like editing and combining text variables. – catt(), lowcase(), compress(), combl() SAS has built in Regular Expression processing (like PERL) as well as Soundex for phonetic spellings and (Levenshtein) edit distances for measuring dissimilarity between strings.

14 14 Working with Several Tables Joins add columns to a base table. Set operations add (or subtract) records. Table 1 Table 2 New Table Table 1 Table 2 New Table

15 15 Commonly Used Joins Table 1 Table 2 Inner Join New Table Table 1 Table 2 Left Join New Table Keep only records where you can match IDs in both tables. Keep only all records from the left table and matching records from the right. Use NULL for the unmatched records in the right table variables.

16 16 One to Many Joins All of the SQL joins that I have mentioned work with either a 1 to 1 match of key variables across tables or a 1 to many match. But you need to be cognizant of how many records are in each table. Double check the new table size. Inner Left

17 17 If there are duplicate key values in one of the tables and you do not join on a second variable, SQL will multiply the combinations and you can end up with the total records being the product of the number of records. Cartesian Joins Inner Join on Family

18 18 PROC SQL - Set Operators NO GUI Outer Union Corresponding – concatenates Unions – unique rows from both queries Except – rows that are part of first query Intersect – rows common to both queries

19 19 How does a data step typically work? The data statement says make this (or these) data set(s). 1.SAS then reads every line down to the run statement and gathers a list of all variables used. This list is called the program data vector (PDV). 2.It then sets all the variables to missing.

20 20 How does a data step typically work? 3.It then does the instruction listed on each line of the data step program in the order that the lines are written. 4.Then it writes all the variables out to the new dataset. 5.It then repeats the process if there is more data.

21 21 How SAS Processes a Dataset (1) In the example below, SAS will look in the existing dataset called Teletubbies and it will find two variables, teletubby and thing. Then it will find the variable called kid. Then it will do the instructions in order. data Teletubbies2; *name of a new data set; set Teletubbies; *load 1 observation of data; kid = "Andrew"; * fill in the blank; output; *write the variables to teletubbies2; return; *return to the top of the step; run; *end of these instructions;

22 22 The Set Statement set Teletubbies; This line tells SAS to load one row of data from the data set Teletubbies into the PDV. The first time this line is run, the first row of data is loaded into the PDV. When there is no more data to load, the data step is done.

23 23 Variable Assignment In the example the word Andrew is assigned to the variable kid. All variables are assigned from the right side into the variable named on the left. kid = "Andrew"; If a variable appears on the left and right side of an equal sign, the original value on the right is changed and then written to the left. aNumber = aNumber + 4; Assignment goes this way original valuenew value

24 24 How SAS Processes a Dataset (2) If you do not include the output and return statements, SAS will do them automatically. So, the previous data step would typically be written like this. data Teletubbies2; set Teletubbies; kid = "Andrew"; run;

25 25 How SAS Processes a Dataset (3) If, If-else, or select statements are typically used to conditionally assign values in a data step. If: one possibility If else: two possibilities Select when otherwise end: multiple possibilities

26 26 Error Trapping “Tinkywinkey” is not “Tinky Winkey” … Bad Teletubby.

27 27 Test Your Understanding data test3a test3b; set source; if isMale = 1 then output test3a; hasCancer = 1; output test3b; run;

28 28 Common Ground … where Both SQL and data step programming use where statements to select what records are included in the new dataset. With data steps the variables used in the where statement need to already exist in the source file. Use if to check variables created in the data step.

29 29 where The syntax for where is identical in SQL and data steps. Differences vs. if statements: – main points work in where only sub points work in either – x between y and z x >= y and x <= z y <= x <= z – string1 ? string2 or string1 contains string2 index(string1,string2) > 0 – string1 =* string2 soundex(string1) = soundex(string2) – x is null or x is missing missing(x) – String1 like “U%of%A%” use regular expressions (PRX)

30 30 where Syntax The where statement, like all SAS statements, begins with a keyword (where) and ends in a semicolon. –where isDead = "false"; –where isDead ne "true"; –where missing(gender); –where salary > 100000; –where country in ("USA", "Japan", "UK"); –where country in ("USA" "Japan" "UK");

31 31 where Syntax Arithmetic –where salary/12 > 10000; –where (salary /12) * 1.20 ge 9900; –where salary + bonus < 120000; Logical –where gender ne "M" and salary >= 50000; –where gender ne "M" or salary >= 50000; –where country = "UK" or country = "UTAH"; –where country not in ("USA", "AU");

32 32 SAS has many operations available to help you make decisions. = eq, ~= ne, gt, = ge, in ( ) Not requires the expression following it to not be true. & And, | or, in & Requires both operands to be true. | Requires one operand to be true. In () requires at least one comparison to be true. Math operations: + - * / **. Make Decisions

33 33 Logical Decisions & Compound Expressions Common tests and common problems: where YODeath < YOBirth; where Sex = "M" and numPreg > 0; where Sex="M" and numPreg > 0 or ageLMP > 0; *** bad ***; where Sex="M" and (numPreg > 0 or ageLMP > 0); *** good ***; – Moral: Use parentheses generously with ands and ors.

34 34 Where is everywhere

35 35 Numeric Data and Looping Say somebody tells you to simulate rolling dice. The formula to do this says: – generate a random number between 0 and 1 – multiply it by 6 – round up to the closest integer data die; *the 22 says which list of numbers between 0 & 1; aNumber = ranuni(22); die = ceil(6*aNumber); * Generate a random integer between 1 and 6.; dieDie = ceil(6*ranuni(78687632)); output; * write to the new dataset; return; * go to the top and try to read in data; run;

36 36 Doing Stuff Repeatedly How to roll two dice: data dice; do x = 1 to 2 by 1; roll= ceil(6*ranuni(78687632)); output; end; return; * go to the top and try to read in data; run;

37 37 Craps… In the dice game “craps” you throw two dice and the number you roll determines if you win or lose. How do you simulate rolling 10 pairs of dice? data craps ; do trial = 1 to 10; do dieNumber = 1 to 2; roll = ceil(6*ranuni(78687632)); output; end; return; run;

38 38 Summing


Download ppt "1 Summary HRP223 – 2009 November 1 st, 2010 Copyright © 1999-2010 Leland Stanford Junior University. All rights reserved. Warning: This presentation is."

Similar presentations


Ads by Google