Presentation is loading. Please wait.

Presentation is loading. Please wait.

Summary HRP223 – 2009 October 28, 2009 Copyright © 1999-2009 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.

Similar presentations


Presentation on theme: "Summary HRP223 – 2009 October 28, 2009 Copyright © 1999-2009 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected."— Presentation transcript:

1 Summary HRP223 – 2009 October 28, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.

2 It is broken!!! Yesterday a student had the experience where right clicking on nodes in the flowchart brought up the wrong menu and/or the nodes did not respond when she clicked on them. I was able to replicate the problem by doing run branch starting with data that did not exist (because it had been in work and I restarted the project). If this happens to you, try to get a screen shot so we can send it to SAS and use the project maintenance option on the tools menu.

3 Don’t use the same dataset name
I have not replicated it yet but I think you can also cause EG to come unglued if you create a dataset with the query builder on one process-flow-chart and create a different dataset with the same name (different variables but with the same name) on second process-flow-chart.

4 You know … How to create a table from scratch How to import tables
From external sources like Excel or using export/import code from databases How to create tables from a single existing table with selected variables with recoded variables with or without subsets of the records from multiple tables by adding columns (joins) by adding sets of records (set operators) With code or GUI

5 Create a New Table The GUI is the easiest.
Look in the optional textbooks for the class to learn the syntax for code. $ means a character string. 10. means 10 letters wide. Missing numbers are just a . Missing characters are just spaces (not tabs) The age variable starts in column 11.

6 Importing The most bullet proof way to import is to use the import wizard. You can also write a program with proc import

7 Code If you write any code be sure to load my keyboard macros:
The import macro gives you the shell to import Excel files. If you write any code be sure to load my keyboard macros: Once you have a program node open in a flowchart, you add the macros to both SAS and EG by using the Program menu.

8 From a Database If you load data that came with an import/export program, you will probably need to add the path to infile statement.

9 Importing Advice It is a good idea to import the source into a permanent library. After importing, use the Query Builder or a Program node and copy all the variables into a new data set. This node can be tweaked later to fix the problems that you identify later. If you do not do this, you will have to change the links leading from the cleaned/fixed data to point to the analyses.

10 Creating New Datasets From 1 Table
Name the query and new table. Drag the entire table or individual variables to the Select Data pane. In the Select Data pane pick variables then click the properties button.

11 SAS allows 27 different types of missing numbers. .A through .Z and .
Changing a Variable Computed Columns>New… > Recoded column> pick a variable. Notice the other tabs for selecting what to change to a new value. SAS allows 27 different types of missing numbers. .A through .Z and .

12 Bad Ages Recoded to NULL
If you get data from a program that uses bogus numbers to indicate problems in a numeric field, replace the values with different NULL values .A , .B , etc. When you do descriptive statistics the null values will be automatically excluded.

13 Removing/Choosing Records
Right click on the variables you want to use for dropping records or use the Filter Data tab.

14

15 Advanced Changes Comparisons
You can use the Advanced Expression dialog box to do complex tasks like editing and combining text variables. catt(), lowcase(), compress(), combl() SAS has built in Regular Expression processing (like PERL) as well as Soundex for phonetic spellings and (Levenshtein) edit distances for measuring dissimilarity between strings.

16 Working with Several Tables
Joins add columns to a base table. Set operations add (or subtract) records. Table 1 Table 2 New Table Table 1 Table 2 New Table

17 Commonly Used Joins Table 2
Inner Join New Table Keep only records where you can match IDs in both tables. Table 1 Table 2 Left Join New Table Keep only all records from the left table and matching records from the right. Use NULL for the unmatched records in the right table variables.

18 One to Many Joins All of the SQL joins that I have mentioned work with either a 1 to 1 match of key variables across tables or a 1 to many match. But you need to be cognizant of how many records are in each table. Double check the new table size. Inner Left

19 Cartesian Joins If there are duplicate key values in one of the tables and you do not join on a second variable, SQL will multiply the combinations and you can end up with the total records being the product of the number of records. Inner Join on Family

20 PROC SQL - Set Operators NO GUI
Outer Union Corresponding concatenates Unions unique rows from both queries Except rows that are part of first query Intersect rows common to both queries

21 How does a data step typically work?
The data statement says make this (or these) data set(s). SAS then reads every line down to the run statement and gathers a list of all variables used. This list is called the program data vector (PDV). It then sets all the variables to missing.

22 How does a data step typically work?
It then does the instruction listed on each line of the data step program in the order that the lines are written. Then it writes all the variables out to the new dataset. It then repeats the process if there is more data.

23 How SAS Processes a Dataset(1)
In the example below, SAS will look in the existing dataset called Teletubbies and it will find two variables, teletubby and thing. Then it will find the variable called kid. Then it will do the instructions in order. data Teletubbies2; *name of a new data set; set Teletubbies; *load 1 observation of data; kid = "Andrew"; * fill in the blank; output; *write the variables to teletubbies2; return; *return to the top of the step; run; *end of these instructions; data teletubbies; input teletubby $20. thing $7.; datalines; po scoot lala ball dipsy hat tinkywinkey bag ; run;

24 The Set Statement set Teletubbies; This line tells SAS to load one row of data from the data set Teletubbies into the PDV. The first time this line is run, the first row of data is loaded into the PDV. When there is no more data to load, the data step is done.

25 Variable Assignment In the example the word Andrew is assigned to the variable kid. All variables are assigned from the right side into the variable named on the left. kid = "Andrew"; If a variable appears on the left and right side of an equal sign, the original value on the right is changed and then written to the left. aNumber = aNumber + 4; Assignment goes this way new value original value

26 How SAS Processes a Dataset(2)
If you do not include the output and return statements, SAS will do them automatically. So, the previous data step would typically be written like this. data Teletubbies2; set Teletubbies; kid = "Andrew"; run;

27 How SAS Processes a Dataset(3)
If, If-else, or select statements are typically used to conditionally assign values in a data step. If: one possibility If else: two possibilities Select when otherwise end: multiple possibilities

28 Error Trapping “Tinkywinkey” is not “Tinky Winkey” … Bad Teletubby.

29 Test Your Understanding
data test3a test3b; set source; if isMale = 1 then output test3a; hasCancer = 1; output test3b; run;

30 Common Ground … where Both SQL and data step programming use where statements to select what records are included in the new dataset. With data steps the variables used in the where statement need to already exist in the source file. Use if to check variables created in the data step.

31 where The syntax for where is identical in SQL and data steps.
Differences vs. if statements: main points work in where only sub points work in either x between y and z x >= y and x <= z y <= x <= z string1 ? string2 or string1 contains string2 index(string1,string2) > 0 string1 =* string2 soundex(string1) = soundex(string2) x is null or x is missing missing(x) String1 like “U%of%A%” use regular expressions (PRX)

32 where Syntax The where statement, like all SAS statements, begins with a keyword (where) and ends in a semicolon. where isDead = "false"; where isDead ne "true"; where missing(gender); where salary > ; where country in ("USA", "Japan", "UK"); where country in ("USA" "Japan" "UK");

33 where Syntax Arithmetic Logical where salary/12 > 10000;
where (salary /12) * 1.20 ge 9900; where salary + bonus < ; Logical where gender ne "M" and salary >= 50000; where gender ne "M" or salary >= 50000; where country = "UK" or country = "UTAH"; where country not in ("USA", "AU");

34 Make Decisions SAS has many operations available to help you make decisions. = eq, ~= ne, < lt, > gt, <= le, >= ge, in ( ) Not requires the expression following it to not be true. & And, | or, in & Requires both operands to be true. | Requires one operand to be true. In () requires at least one comparison to be true. Math operations: + - * / **.

35 Logical Decisions & Compound Expressions
Common tests and common problems: where YODeath < YOBirth; where Sex = "M" and numPreg > 0; where Sex="M" and numPreg > 0 or ageLMP > 0; *** bad ***; where Sex="M" and (numPreg > 0 or ageLMP > 0); *** good ***; Moral: Use parentheses generously with ands and ors.

36 Where is everywhere

37 Numeric Data and Looping
Say somebody tells you to simulate rolling dice. The formula to do this says: generate a random number between 0 and 1 multiply it by 6 round up to the closest integer data die; *the 22 says which list of numbers between 0 & 1; aNumber = ranuni(22); die = ceil(6*aNumber); * Generate a random integer between 1 and 6.; dieDie = ceil(6*ranuni( )); output; * write to the new dataset; return; * go to the top and try to read in data; run;

38 Doing Stuff Repeatedly
How to roll two dice: data dice; do x = 1 to 2 by 1; roll= ceil(6*ranuni( )); output; end; return; * go to the top and try to read in data; run;

39 Craps… In the dice game “craps” you throw two dice and the number you roll determines if you win or lose. How do you simulate rolling 10 pairs of dice? data craps ; do trial = 1 to 10; do dieNumber = 1 to 2; roll = ceil(6*ranuni( )); output; end; return; run;

40 Summing


Download ppt "Summary HRP223 – 2009 October 28, 2009 Copyright © 1999-2009 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected."

Similar presentations


Ads by Google