Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © 1999-2009 Leland Stanford Junior University. All rights reserved. Warning: This.

Similar presentations


Presentation on theme: "1 Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © 1999-2009 Leland Stanford Junior University. All rights reserved. Warning: This."— Presentation transcript:

1 1 Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © 1999-2009 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.

2 2 Topics For Today Organization Sharing a SAS dataset – As.sas7bdat files or other formats Renaming – Datasets – Variables Subsetting a dataset – Select a few variables – Select a few records SQL reports for a single table of data – Selecting/renaming variables – Applying labels and formats – Creating tables with SQL

3 3 Avoiding Spaghetti Code Programmers refer to unstructured, poorly thought-through, unorganized code as spaghetti code. Your EG projects will literally look like a tangled mess of spaghetti if you do not structure them in advance. – Use several named process flows – Use lots of notes in the project – Include a lot of comments if you write code This is bad. Organization

4 4 Process Management Typically you will have a process flow that creates the library and does importing from the source file(s) and does data cleaning and splits the data into subsets. If you do different sets of analyses to the subsets, add in a process flow for each subset. Create a dataset called analysis that has all the information used in the analysis. Organization

5 5 Right click on the process flow and give it a meaningful name. You may want to link the library to the dataset and then uncheck Auto Arrange to have it show you the arrow. Organization

6 6 The Greater Right of the Left Your process flows should have the source of the data on the left. The left margin should have: – A note saying what the flowchart does – A code node that creates a toy dataset or a library (or libraries) that contains the data Organization

7 7 A Good Process Flow Organization

8 8 Organization in Programs All my SAS code begins with the same header information. The /* */ are used to mark large comments.

9 9 Display manager deletes output text and log. Do not show the name of the procedures in output. Do X commands ASAP. Don’t show the date in output and reset page # to 1. Delete graphics in the work library. Specify where output will be stored. Make the folder where output will be stored if it does not exist. Delete what is there if it exists. Set file path to that directory. Make a library to store output datasets. Make a web page to display all output. Make pretty graphics. Run other programs. Turn off graphics and output.

10 10 Sharing Data You can share SAS data sets just like Excel files. Create a library. Copy the data into the library. If the data has formats associated with it, be sure to send the formats. – More on this on a later date. Sharing

11 11 Exporting the Easy Way Double click the data set you want to export and use the Export context dependent menu. Sharing

12 12 With Code…. Create a library with the GUI or use the libname statement libname blah "C:\blah"; Write a little program: proc copy in = work out = blah; select humans; run; Sharing

13 13 This code is efficient. Sharing

14 14 Alternatives Novices underuse proc copy. Instead they typically write less efficient data steps. For example, data blah.humans; set work.humans; run; Or they may write: data "C:\blah\humans.sas7bdat"; set work.humans; run; Sharing

15 15 Sharing

16 16 Export Code for a Different Format Sharing

17 17 Note that you have to manually connect the code node to the right place in the flow chart and the exported item does not show up on the process flow. Sharing

18 18 Copy and Rename If you want to copy and rename a file, use the GUI or write code. – Double click the data set. – Choose Query Builder from the context sensitive menu. Renaming datasets

19 19 Renaming datasets

20 20 With code… data blah.test; set work.humans; run; Renaming datasets

21 21 Select a Few Variables From Fake Data The next task is to select a couple of variables from a data set that has a LOT of variables. If you get a premade dataset with lots of extra variables, you want to drop the ones you will never use. Do this as soon as you can. First I will make some fake data. The data set will have a simulated test value filled into 6 “month” variables. Fake data

22 22 How to make a fake subject Comments Fake data

23 23 Fake data

24 24

25 25 You can use the Filter and Sort context sensitive menu to select a few variables. To rename a variable or change how it prints in reports you need to use the Query Builder or write code. Selecting variables and renaming Rename and label variables

26 26 Click on a variable name. Then use the properties button to change the name and the display label. Drag and drop the variables you want into the Select Data windowpane. Rename and label variables

27 27 Rename and label variables

28 28 I usually display the variable names instead of the labels. Rename and label variables

29 29 What it did… Rename and label variables

30 30 Data Step Version Notice where the ; is found. This is one long statement. Rename and label variables

31 31 Minimal SQL Print a report showing the contents of variables from a single data set. Put a comma-delimited list of variables here or * for all variables. Specify a library.table here. Note that there is no create table ____ as SQL reports

32 32 What variables? Typically you will use a coma delimited list but you can use an * to indicate that you want all variables selected instead of typing them all. There is no syntax to specify variables based on position in the source files. That is, you can not specify that you want to select the 2 nd and 7 th variables (from left to right) or to select the first 3 variables. SQL reports

33 33 Use of Minimal SQL Note that the order of the list sets the order in the report (or the order in a new dataset). SQL reports – selecting variables

34 34 Renaming and Labels You can rename a variable in the list with an as statement. You can also specify variable labels. SQL reports – rename/label

35 35 Using Formats Labels affect column headings and similar titles, and formats affect how values appear without changing the values themselves. Notice the lowercase i. The capitalization is set when the variable is created. SQL reports – format

36 36 Preview of User Defined Formats Note the $ means a character format. SQL reports – format

37 37 blah SQL tables New table. Original table

38 38 More Tweaks The from line references tables which are in libraries. Complex queries require you to reference the table name over and over again. Instead of having to type the long library and dataset names repeatedly, you can refer to the files as an alias. Print the column called dude from the table blah which is in the fakedata library. Here the b. is optional because dude is only in one table (the query only uses one table). SQL reports – table aliases

39 39 Data Step Version…. Rename label and format variables


Download ppt "1 Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © 1999-2009 Leland Stanford Junior University. All rights reserved. Warning: This."

Similar presentations


Ads by Google