Use the SET statement to: –create an exact copy of a SAS dataset –modify an existing SAS dataset by creating new variables, subsetting (using a subsetting.

Slides:



Advertisements
Similar presentations
Summary Statistics/Simple Graphs in SAS/EXCEL/JMP.
Advertisements

SAS Programming:File Merging and Manipulation. Reading External Files (review) data barf; * create the dataset BARF; infile ’s:\mysas\Table7.1'; * open.
Slide C.1 SAS MathematicalMarketing Appendix C: SAS Software Uses of SAS  CRM  datamining  data warehousing  linear programming  forecasting  econometrics.
I OWA S TATE U NIVERSITY Department of Animal Science Modifying and Combing SAS Data Sets (Chapter in the 6 Little SAS Book) Animal Science 500 Lecture.
Today: Run SAS programs on Saturn (UNIX tutorial) Runs SAS programs on the PC.
1. Preparing Research Datasets Data Request Data Cleaning Dataset Preparation Documentation Beverly Musick 2.
Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09.
Chapter 18: Modifying SAS Data Sets and Tracking Changes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
Introduction to SAS Lecture 2 Brian Healy.
Topics in Data Management SAS Data Step. Combining Data Sets I - SET Statement Data available on common variables from different sources. Multiple datasets.
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
Lecture 5 Sorting, Printing, and Summarizing Your Data.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
PROC REPORT organizes the output in many ways, from the simple to highly complex… PROC REPORT NOWINDOWS HEADLINE HEADSKIP; COLUMN variable-list; DEFINE.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
Chapter 21 Reading Hierarchical Files Reading Hierarchical Raw Data Files.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
Tier 4 “Online” Tutorial
SAS SQL Part 2 Alan Elliott. Dealing with Missing Values Title "Dealing with Missing Values in SQL"; PROC SQL; select INC_KEY,GENDER, RACE, INJTYPE, case.
Key Data Management Tasks in Stata
BMTRY 789 Lecture 3: Categorical Data and Dates Readings – Chapter 3 & 4 Lab Problems 3.1, 3.2, 3.19, 4.1, 4.3, 4.5 Homework – HW 2 Book Problems Due 6/24!
PREPARING DATA FOR STATISTICAL ANALYSIS Data Cleaning Data Cleaning Dataset Preparation Dataset Preparation Documentation Documentation 9 September 2008.
1 Experimental Statistics - week 2 Review: 2-sample t-tests paired t-tests Thursday: Meet in 15 Clements!! Bring Cody and Smith book.
Chapter 15: Combining Data Horizontally 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Use the UPDATE statement to: –update a master dataset with new transactions (e.g. a bank account updated regularly with deposits and withdrawals…). Not.
Faculty Webpage Design Minimum Requirements. Go to: then High Schoolhttp://gcsc.groupfusion.net/
Analyses using SPSS version 19
Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.
How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging.
Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
BMTRY 789 Lecture 11: Debugging Readings – Chapter 10 (3 rd Ed) from “The Little SAS Book” Lab Problems – None Homework Due – None Final Project Presentations.
Here’s another problem (see section 2.13 on page 54). A file contains two different types of records (say A’s and B’s) and we only want to read in the.
Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently created dataset –all the observations –all the appropriate.
1 EPIB 698C Lecture 4 Raul Cruz-Cano Summer 2012.
Summer SAS Workshop Lecture 3. Summer SAS Workshop Website
1 Statistical Software Programming. STAT 6360 –Statistical Software Programming Modifying and Combining Datasets For most tasks we need to work with multiple.
Lecture 4 Ways to get data into SAS Some practice programming
Practical Uses of the DOW Loop Richard Allen Peak Stat April 8, 2009.
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
FORMAT statements can be used to change the look of your output –if FORMAT is in the DATA step, then the formats are permanent and stored with the dataset.
Numerical descriptions of distributions
1 Project 2: Using Variables and Expressions. 222 Project 2 Overview For this project you will work with three programs Circle Paint Ideal_Weight What.
Chapter 2 Getting Data into SAS Directly enter data into SAS data sets –use the ViewTable window. You can define columns (variables) with the Column Attributes.
1 By Shafi Chowdhury PhUSE 2011 Managing Data Issues.
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
SAS Programming Training Instructor:Greg Grandits TA: Textbooks:The Little SAS Book, 5th Edition Applied Statistics and the SAS Programming Language, 5.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 8, 13, & 24 By Tasha Chapman, Oregon Health Authority.
Longitudinal Data Techniques: Looking Across Observations Ronald Cody, Ed.D., Robert Wood Johnson Medical School.
03/20161 EPI 5344: Survival Analysis in Epidemiology Estimating S(t) from Cox models March 29, 2016 Dr. N. Birkett, School of Epidemiology, Public Health.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
By Sasikumar Palanisamy
Chapter 6: Modifying and Combining Data Sets
Instructor: Raul Cruz-Cano
Match-Merge in the Data Step
Assume as previously that we have k samples on as many treatments
Introduction to DATA Step Programming SAS Basics II
Introduction to DATA Step Programming: SAS Basics II
Combining Data Sets in the DATA step.
Hans Baumgartner Penn State University
Wilcoxon Rank-Sum Test
Presentation transcript:

Use the SET statement to: –create an exact copy of a SAS dataset –modify an existing SAS dataset by creating new variables, subsetting (using a subsetting IF) –concatenate datasets; i.e., stack them one on top of the other. Most useful when the datasets contain the same variables but different observations… –interleave datasets; i.e., keep sorted datasets sorted… use a BY statement with the SET statement… –the general form is DATA new_data_set; SET old_data_set; … other statements as needed…

*make permanent SAS dataset from the data here; * time of train, # of cars, total # of riders; *then create new dataset with new variable average number of riders per car; * see page 171; 10: : : : : : : :

*CONCATENATION OF TWO DATASETS - NOTE WHAT HAPPENS WHEN THEY HAVE DIFFERENT VARIABLES; DATA southentrance; INPUT Entrance $ PassNumber PartySize Age DATALINES; S S S ; PROC PRINT DATA = southentrance; TITLE 'South Entrance Data'; DATA northentr ance; INPUT Entrance $ PassNumber PartySize Age Lot DATALINES; N N N N ; PROC PRINT DATA = northentrance; TITLE 'North Entrance Data'; DATA both; SET southentrance northentrance; IF Age =. THEN AmountPaid =.; ELSE IF Age < 3 THEN AmountPaid = 0; ELSE IF Age < 65 THEN AmountPaid = 17; ELSE AmountPaid = 12; PROC PRINT DATA = both; TITLE 'Both Entrances'; RUN; QUIT;

* INTERLEAVE datasets to keep an ordering in them; * The datasets to be INTERLEAVED must be SORTED by the interleaving variable; DATA S; SET southentrance; PROC SORT DATA=S; BY PassNumber; run; PROC PRINT DATA = S; TITLE 'South Entrance Data'; DATA N; SET northentrance; PROC SORT DATA = N; BY PassNumber; PROC PRINT DATA = N; TITLE 'North Entrance Data'; *now put them together in order of passnumber; DATA interleave; SET N S; BY PassNumber; PROC PRINT DATA = interleave; TITLE 'Both Entrances, By Pass Number'; RUN; QUIT; Now concatenate the two therapy datasets creating a new total number of patients variable first, without sorting; then interleave them, BY month.

One-to-one match merge Use the MERGE statement with the BY statement to combine two datasets with a common so-called matching variable (the BY- variable) to uniquely identify each observation in the datasets... DATA new; MERGE old1 old2; BY matching_vars; NOTE: If the two datasets have variables with the same names (besides the BY variables) the variables from the second dataset will overwrite the values of the variables in the first dataset with the same names. Also note that all observations from both datasets are included in the new dataset whether they had a match or not...

Try to merge the demographic information about the patients with the dataset on the visits they made to the doctor’s office… (ID, age, sex, date of birth): A M 05/22/75 A M 06/15/63 A F 08/17/72 A004. F 03/27/69 A F 02/24/52 A M 11/01/57 Next is the “visits” data (ID, visit#, sysBP, diasBP, Weight, date of visit):

A /05/01 A /13/01 A /14/02 A /14/01 A /12/01 A /15/01 A /30/01 A /27/01 A /02/01 A /04/01 A /22/01 What’s the merging variable? Do you need to sort by that variable before merging? Can you merge them in descending order?? Try the next one too - see section 6.4 on p

dm log 'clear'; dm output 'clear'; options ls=80; DATA sales; INPUT CodeNum $ 1-4 PiecesSold 6-7; DATALINES; C K086 9 A S K014 1 A B DATA descriptions; INPUT CodeNum $ 1-4 Name $ 6-14 Description $ 15-60; DATALINES; A206 Mokka Coffee buttercream in dark chocolate A536 Walnoot Walnut halves in bed of dark chocolate B713 Frambozen Raspberry marzipan covered in milk chocolate C865 Vanille Vanilla-flavored rolled in ground hazelnuts K014 Kroon Milk chocolate with a mint cream center K086 Koning Hazelnut paste in dark chocolate M315 Pyramide White with dark chocolate trimming S163 Orbais Chocolate cream in dark chocolate ;

PROC SORT DATA = sales; BY CodeNum; PROC SORT DATA= descriptions; BY CodeNum; * Merge data sets by CodeNum; DATA chocolates; MERGE sales descriptions; BY CodeNum; PROC PRINT DATA = chocolates; TITLE "Today's Chocolate Sales"; RUN; quit; NOTE the SORTing of the two datasets by the merging variable... all observations from both datasets are included in the merged dataset, whether they had a match or not.

One-to-many match merge Same MERGE statement is used as in one-to-one... but the result will be different if you have many observations in one dataset to be matched with a single observation in the other... A good example of this is merging summary statistics with original data from which the statistics were computed. Go over sections to see how this is done... Also note the output (p. 181) from the PROC PRINT when both a BY and an ID statement are used together. Make the two “shoes” programs work (in section 6.5 and 6.6).... use the code below and get the data from the book datasets and put them into files that the INFILE statements can read…

dm log 'clear'; dm output 'clear'; options ls=80; DATA shoes; INFILE ''; INPUT Style $ 1-15 ExerciseType $ sales; PROC SORT DATA = shoes; BY ExerciseType; DATA discount; INFILE ''; INPUT ExerciseType $ Adjustment; * Perform many-to-one match merge; DATA prices; MERGE shoes discount; BY ExerciseType; NewPrice=ROUND(RegularPrice-(RegularPrice*Adjustment),.01); PROC PRINT DATA = prices; TITLE 'Price List for May'; RUN; quit;

Now consider an important merging application: putting summary statistics back in the same dataset with the data used to compute the summaries. Note this is a many-to- one merge, since there are only a few statistics computed (one for each by-group value) and they are to be merged back with the many individual values used to compute those statistics… As an example, compute the sum of the sales variable for each of the three classes of shoes in the previous example, and then merge that total back into the original dataset. After this you can compute the percentage of the total that each group represents, for example… (get the sales data from page 180)

PROC MEANS can be used to compute a grand total for a variable, but then it can't be merged back into the original dataset since you don't have a common BY variable in the two datasets… Thus we must use a different technique (see section 6.7) … it turns out that the SET statement with an IF - THEN will do the job: DATA new; if _n_=1 then SET summary_data_set; SET original_data_set; The original dataset has many obs., while the summary dataset has only one. SAS reads this one obs. with the SET statement, but only for the first obs (i.e., when _n_ = 1)… this works because SET automatically RETAINS the observations from the first read - go over the example given in section 6.7: Combining a Grand Total with the Original Data

For Monday: In the “padgett” data, get the mean and standard deviation of plant height for the ”pizza hut" and ”shell island" marshes. Merge those statistics back into the original dataset and then calculate a z-score for each plant in the dataset. Recall that Z = (X - mean)/(s.d.), where X is the original height. There are two means and two standard deviations, one for each of the two marshes ”pizza hut" and ”shell island” and so you want to be sure that the correct mean and s.d. gets used with plants from the respective marshes…