Multiple Uses for a Simple SQL Procedure Rebecca Larsen University of South Florida.

Slides:



Advertisements
Similar presentations
Haas MFE SAS Workshop Lecture 3:
Advertisements

Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
Guide to extract/download multiple databases from Mainframe Tapes to PC using SAS PC Fereydoun J. Foroudian Blue Cross of California SAS is a registered.
I OWA S TATE U NIVERSITY Department of Animal Science Getting Started Using SAS Software Animal Science 500 Lecture No. 2.
Beginning Data Manipulation HRP Topic 4 Oct 19 th 2011.
1 Combining (with SQL) HRP223 – 2010 October 27, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation.
1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Introduction to Structured Query Language (SQL)
Developing Effective Reports
Database Design Concepts Info 1408 Lecture 2 An Introduction to Data Storage.
Introduction to SQL Session 1 Retrieving Data From a Single Table.
Chapter 2 Querying a Database
Concepts of Database Management Sixth Edition
Microsoft Access 2010 Chapter 7 Using SQL.
Database Design IST 7-10 Presented by Miss Egan and Miss Richards.
Let SAS Do the Coding for You! Robert Williams Business Info Analyst Sr. WellPoint Inc.
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
MS Access Advanced Instructor: Vicki Weidler Assistant:
Developing Effective Reports
Statistical Discovery. TM From SAS. JMP ® Software: Introduction to Categorical Data Analysis.
SAS SQL SAS Seminar Series
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. ACCESS 2007 M I C R O S O F T ® THE PROFESSIONAL APPROACH S E R I E S Lesson 4 – Creating New.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
PHP meets MySQL.
Niraj J. Pandya, Element Technologies Inc., NJ.  Summarize all possible combinations of class level variables even if few categories are altogether missing.
My ODS: Real-World Uses of Modifying Table Templates Steve James Centers for Disease Control and Prevention Atlanta, Ga.
PROC SQL: Tips and Translations for Data Step Users By: Gail Jorgensen Susan Marcella.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Using Special Operators (LIKE and IN)
Concepts of Database Management Seventh Edition
SQL Chapter Two. Overview Basic Structure Verifying Statements Specifying Columns Specifying Rows.
Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
Concepts of Database Management Eighth Edition Chapter 3 The Relational Model 2: SQL.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Database Applications – Microsoft Access Lesson 7 Designing Custom Reports Updated 11/13 27 Slides in Presentation.
Research Across Multiple Systems: Probabilistic Population Estimation (PPE) Diane Haynes, University of South Florida Rebecca Larsen, University of South.
IFS Intro to Data Management Chapter 5 Getting More Than Simple Columns.
Copyright © 2005, SAS Institute Inc. All rights reserved. SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries.
NSF DUE ; Wen M. Andrews J. Sargeant Reynolds Community College Richmond, Virginia.
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.
A Guide to SQL, Eighth Edition Chapter Four Single-Table Queries.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
1 Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
Online Programming| Online Training| Real Time Projects | Certifications |Online Classes| Corporate Training |Jobs| CONTACT US: STANSYS SOFTWARE SOLUTIONS.
 CONACT UC:  Magnific training   
Database (Microsoft Access). Database A database is an organized collection of related data about a specific topic or purpose. Examples of databases include:
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
COM621: Advanced Interactive Web Development Lecture 11 MySQL – Data Manipulation Language.
Build your Metadata with PROC CONTENTS and ODS OUTPUT Louise S. Hadden Abt Associates Inc.
Better Metadata Through SAS® II: %SYSFUNC, PROC DATASETS, and Dictionary Tables.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
Session 1 Retrieving Data From a Single Table
Tips for Mastering Relational Databases Using SAS/ACCESS®
Advantages of sas for reporting
A brief introduction to the topic
Using the Set Operators
ECONOMETRICS ii – spring 2018
Database Applications – Microsoft Access
Using the Set Operators
PROC DOC III: Self-generating Codebooks Using SAS®
Lab 3 and HRP259 Lab and Combining (with SQL)
Automate Repetitive Programming Tasks: Effective SAS® Code Generators
Topic 11 Lesson 1 - Analyzing Data in Access
Using the Set Operators
Efficient Use of Disk Space in SAS® Application Programs
Presentation transcript:

Multiple Uses for a Simple SQL Procedure Rebecca Larsen University of South Florida

Introduction The SQL Procedure…Equivalent or advantageous to other SAS data steps and procedures? Handy SQL Procedure: CREATE TABLE… INSERT INTO…SELECT Several brief examples of using the above procedure will be presented SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or Trademarks of their respective companies.

Overview of Topics 1.Inserting Rows from Multiple Tables into One Table 2.Inserting Rows from Multiple Queries into One Table 3.Changing the Order of Variables in a SAS Data Set 4.Changing Variable Attributes and/or Names in a SAS Data Set

1. Inserting Rows from Multiple Tables into One Table Why would you want to insert rows from multiple tables into one table? –Information related to one customer/patient/client is stored in multiple data sets and linked by a common identifier –Information related to one issue/problem you would like to study is stored in multiple data sets and needs to be combined

Health Care Claims Example In this first example, there are multiple source files with different record layouts –HomeHealth_clm – record layout 1 –Hospice_clm – record layout 1 –SkilledNursing_clm – record layout 1 –Inpatient_clm – record layout 1 –Outpatient_clm – record layout 2 –PartB_clm – record layout 2 All files contain information needed to complete your research study You need to combine information from all source data sets into one “master” analysis data set

Record Layouts of Health Care Claims Example Data Sets Record Layout 1: –person_id –health_status_cd –service_category_cd –race_cd –procedure_amt* –procedure_dt* Record Layout 2: –person_id –health_status_cd –service_category_cd –race_cd –service_amt* –service_dt* *In this example, source variables have different names in the different record layouts, although they contain analogous information and should be analyzed as one variable in the “master” data set

Create a new, blank table using the following code: PROC SQL; CREATE TABLE allclaims ( person_idchar(11), health_status_cdchar(2), service_category_cdchar(2), race_cdchar(1), serv_amt_cunum, service_dtnum ) ; QUIT;

Use a macro to insert values from multiple source files into one data set /** 1. Define macro **/ %MACRO insert(amount, date, file); PROC SQL; INSERT INTO allclaims SELECT person_id, status_cd, service_category_cd, race_cd, &amount., &date. FROM &file.; QUIT; %MEND insert;

(continued) /** 2. Call macro **/ %insert(service_amt, service_dt, HomeHealth_clm); %insert(service_amt, service_dt, Hospice_clm); %insert(service_amt, service_dt, Inpatient_clm); %insert(service_amt, service_dt, SkilledNursing_clm); %insert(procedure_amt, procedure_dt, Outpatient_clm); %insert(procedure_amt, procedure_dt, PartB_clm);

Notice… …that different source variables for the original files are inserted into a single variable in the new file –Both variables must be of the same type in the original file as being created in the new file (i.e. numeric into numeric) service_amt (num) vs. procedure_amt (num) service_dt (num) vs. procedure_dt (num) One master data set was created for the purpose of completing analysis on information related to health care claims data from multiple source files

2. Inserting Rows from Multiple Queries into One Table Why would you want to insert rows from multiple queries into one table? –You want to create summary records from your master data set based on different “where clause” conditions in separate queries and insert them into one summary data set

Health Care Claims Example, Part 2 Continuing with the health care claims example provided above, a summary table with one record per person is desired to be used in statistical analyses, i.e. ANOVA, PROC LOGISTIC, etc. You want the summary data set to be created with the following characteristics: –One record per person, –“Race_cd” (5 class levels) to be categorized to create the new variable, “minority_cd” (2 class levels),

(continued) –Summary variable “mean_cu” to be created as the mean amount paid per person, –Records with missing service amount to be excluded, –Persons with more than one value for minority_cd were excluded (to eliminate persons with indistinguishable minority status) For statistical tests with a CLASS variable such as minority status, you want to make sure the classes are mutually exclusive, or you will compromise the validity of your results

First, create the blank summary table using the following code: PROC SQL; CREATE TABLE minority ( person_id char(11), health_status_cd char(2), service_category_cd char(2), minority_cd char(4), mean_cu num ) ;

Next, Insert values from multiple queries into one summary data set (Query 1) /** Query 1 **/ INSERT INTO minority SELECT distinct person_id, health_status_cd, service_category_cd, case when race_cd = ‘1’ then ‘1’ else ‘0’ end as minority_cd, mean(serv_amt_cu) as mean_cu FROM allclaims WHERE serv_amt_cu is not missing and service_category_cd = ‘mh’ and count(distinct minority_cd) = 1 GROUP BY person_id;

Continue inserting values into the summary data set (Query 2) /** Query 2 **/ INSERT INTO minority SELECT distinct person_id, health_status_cd, service_category_cd, case when race_cd = ‘1’ then ‘1’ else ‘0’ end as minority_cd, mean(serv_amt_cu) as mean_cu FROM allclaims WHERE serv_amt_cu is not missing and service_category_cd = ‘ph’ and count(distinct minority_cd) = 1 GROUP BY person_id; QUIT;

Cool Code… The count function in this WHERE clause is an example of PROC SQL code that can accomplish in one step, what would take more than one step and several more lines of regular SAS code where serv_amt_cu is not missing and service_category_cd = ‘ph’ and count(distinct minority_cd) = 1 group by person_id The case expression can be used to create a new variable that is a “re-categorization” of the values of another variable case when race_cd = ‘1’ then ‘1’ else ‘0’ end as minority_cd

3. Changing the Order of Variables in a SAS Data Set Why would you want to change the order of the variables in your data set? –Perhaps your data step manipulations have altered the order of the variables from the original, and you need to export the data in the same format you received it –Or you wish to arrange certain variable types together for coding processes to be run on sequential variables in your data set

Example of Changing the Order of Variables in Your Data Set In the following example, oldfile (Table 1) has variables in a certain order that you would like to change, and newfile (Table 2) has variables in the order to which you would like to change

Before changing Table 1. Order of variables before changing (oldfile) pin sex_cd race_cd diagnosis_cd procedure_cd DOB enrollment_dt termination_dt death_dt age Table 2. Order of variables after changing (newfile) pin sex_cd race_cd DOB age diagnosis_cd procedure_cd enrollment_dt termination_dt death_dt …After changing

The following code was used to change the variable order: /** Step 1 **/ PROC SQL; CREATE TABLE newfile ( pinchar(11), sex_cd char(1), race_cdchar(1), DOBnum, age num, diagnosis_cdchar(5), procedure_cdchar(5), enrollment_dtnum, termination_dtnum, death_dtnum ); /** Step 2 **/ INSERT INTO newfile SELECT pin, sex_cd, race_cd, DOB, age, diagnosis_cd, procedure_cd, enrollment_dt, termination_dt, death_dt FROM oldfile; QUIT;

Remember… …the order in which you specify the variables in the INSERT INTO…SELECT statement must be the exact order of the variables that are created in the blank data set, NOT the order of the data set from which they came.

4. Changing Variable Attributes and/or Names in a SAS Data Set The names of the variables do not have to be identical in the newly created table as the table from which you are selecting the records Other attributes (i.e. formats, informats, labels, character length, etc.) can be easily added or changed For example, we will create a new file called, “newfile2”, with different variable names and attributes from the data set in the previous example called, “newfile”

Use the following code to change the name of variables and add a date formats in your data set: /** Step 1 **/ PROC SQL; CREATE TABLE newfile2 ( pin_id char(11), sex_cd char(1), race_cd char(1), DOB_dt num format=date9., age_iv num, diagnosis_cd char(5), procedure_cd char(5), enrollment_dt num format=date9., termination_dt num format=date9., death_dt num format=date9. ); /** Step 2 **/ INSERT INTO newfile2 SELECT pin, sex_cd, race_cd, DOB, age, diagnosis_cd, procedure_cd, enrollment_dt, termination_dt, death_dt FROM newfile; QUIT;

Notice… If all you want to do is change names or attributes of variables in a data set, then PROC DATASETS is a much more efficient tool; however, If you have other steps that need to be accomplished by writing a new file, then the ability to change names and attributes is available in PROC SQL

Conclusion There are many more applications for the CREATE TABLE…INSERT INTO…SELECT statements in PROC SQL than the few examples shown –Many times it is just another way to do the same thing –Other times it may cut down the number of sorts, data steps & procedures or lines of code required –Try it out for yourself…you may find something you will wish you had always known!

My preferred SQL coding resource is the following user’s guide: –SAS Institute Inc. (2000), SAS® SQL Procedure User’s Guide, Version 8, Cary, NC: SAS Institute Inc.

About the Speaker Speaker: Location of Company: Telephone: Fax: Rebecca Larsen, Assistant in Research Bruce B. Downs Boulevard MHC2617 University of South Florida Tampa, Florida (813) (813) To view the presentation slides online, go to