SAS SQL SAS Seminar Series

Slides:



Advertisements
Similar presentations
Effecting Efficiency Effortlessly Daniel Carden, Quanticate.
Advertisements

Haas MFE SAS Workshop Lecture 3:
Slide C.1 SAS MathematicalMarketing Appendix C: SAS Software Uses of SAS  CRM  datamining  data warehousing  linear programming  forecasting  econometrics.
S ORTING WITH SAS L ONG, VERY LONG AND LARGE, VERY LARGE D ATA Aldi Kraja Division of Statistical Genomics SAS seminar series June 02, 2008.
Introduction to Structured Query Language (SQL)
Introduction to Structured Query Language (SQL)
1 Creating a Non-Conditional List A- What are you going to do? You will “list” “all of the records” in a database. (it means you will not use any condition!)
Introduction to SQL Session 1 Retrieving Data From a Single Table.
Introduction to Structured Query Language (SQL)
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
Microsoft Access 2010 Chapter 7 Using SQL.
Let SAS Do the Coding for You! Robert Williams Business Info Analyst Sr. WellPoint Inc.
UNIX Filters.
Chapter 18: Modifying SAS Data Sets and Tracking Changes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Introduction to SQL J.-S. Chou Assistant Professor.
ASP.NET Programming with C# and SQL Server First Edition
Relational DBs and SQL Designing Your Web Database (Ch. 8) → Creating and Working with a MySQL Database (Ch. 9, 10) 1.
Microsoft Access 2010 Chapter 7 Using SQL. Change the font or font size for SQL queries Create SQL queries Include fields in SQL queries Include simple.
An Animated Guide©: Sending SAS files to Excel Concentrating on a D.D.E. Macro.
SAS SQL Part 2 Alan Elliott. Dealing with Missing Values Title "Dealing with Missing Values in SQL"; PROC SQL; select INC_KEY,GENDER, RACE, INJTYPE, case.
Niraj J. Pandya, Element Technologies Inc., NJ.  Summarize all possible combinations of class level variables even if few categories are altogether missing.
Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools.
CHAPTER:14 Simple Queries in SQL Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
1 Working with MS SQL Server Textbook Chapter 14.
© FPT SOFTWARE – TRAINING MATERIAL – Internal use 04e-BM/NS/HDCV/FSOFT v2/3 Working with MSSQL Server Code:G0-C# Version: 1.0 Author: Pham Trung Hai CTD.
Multiple Uses for a Simple SQL Procedure Rebecca Larsen University of South Florida.
SAS Macro: Some Tips for Debugging Stat St. Paul’s Hospital April 2, 2007.
PROC SQL Phil Vecchione. SQL Structured Query Language Developed by IBM in the early 1970’s From the 70’s to the late 80’s there were different types.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
CPS120: Introduction to Computer Science Lecture 19 Introduction to SQL.
1 TAC2000/ Protocol Engineering and Application Research Laboratory (PEARL) Structured Query Language Introduction to SQL Structured Query Language.
Structure Query Language SQL. Database Terminology Employee ID 3 3 Last name Small First name Tony 5 5 Smith James
5/30/2010 SAS Macro Language Group 6 Pradnya Nimkar, Li Lin, Linsong Zhang & Loc Tran.
Using Special Operators (LIKE and IN)
Lesson 2 Topic - Reading in data Chapter 2 (Little SAS Book)
Introduction to Using the Data Step Hash Object with Large Data Sets Richard Allen Peak Stat.
SQL Chapter Two. Overview Basic Structure Verifying Statements Specifying Columns Specifying Rows.
1 Efficient SAS Coding with Proc SQL When Proc SQL is Easier than Traditional SAS Approaches Mike Atkinson, May 4, 2005.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Database Fundamental & Design by A.Surasit Samaisut Copyrights : All Rights Reserved.
Lecture 8 – SQL Joins – assemble new views from existing tables INNER JOIN’s The Cartesian Product Theta Joins and Equi-joins Self Joins Natural Join.
Chapter 17: Formatting Data 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
An Introduction Katherine Nicholas & Liqiong Fan.
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.
Customize SAS Output Using ODS Joan Dong. The Output Delivery System (ODS) gives you greater flexibility in generating, storing, and reproducing SAS procedure.
Oracle & SQL. Oracle Data Types Character Data Types: Char(2) Varchar (20) Clob: large character string as long as 4GB Bolb and bfile: large amount of.
Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)
1 Working with MS SQL Server Beginning ASP.NET in C# and VB Chapter 12.
LM 5 Introduction to SQL MISM 4135 Instructor: Dr. Lei Li.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapter 26 By Tasha Chapman, Oregon Health Authority.
1 Ready To Become Really Productive Using PROC SQL? Sunil Gupta Gupta Programming.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
Retrieving Information Pertemuan 3 Matakuliah: T0413/Current Popular IT II Tahun: 2007.
COM621: Advanced Interactive Web Development Lecture 11 MySQL – Data Manipulation Language.
Better Metadata Through SAS® II: %SYSFUNC, PROC DATASETS, and Dictionary Tables.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
SAS and Other Packages SAS can interact with other packages in a variety of different ways. We will briefly discuss SPSSX (PASW) SUDAAN IML SQL will be.
Session 1 Retrieving Data From a Single Table
CHAPTER 7 DATABASE ACCESS THROUGH WEB
DATA MANAGEMENT MODULE: USING SQL in R
DATA MANAGEMENT MODULE: USING SQL in R
Chapter 18: Modifying SAS Data Sets and Tracking Changes
Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha
SAS Essentials How SAS Thinks
Index Note: A bolded number or letter refers to an entire lesson or appendix. A Adding Data Through a View ADD_MONTHS Function 03-22, 03-23,
Shelly Cashman: Microsoft Access 2016
Presentation transcript:

SAS SQL SAS Seminar Series Shamika Ketkar July 14th, 2008

SQL Structured Query Language Developed by IBM in the early 1970’s From the 70’s to the late 80’s there were different types of SQL, based on different databases. In 1986 the first unified SQL standard (SQL-86) was created. In 1987 database interface for SQL was added to the Version 6 Base SAS package A “language within a language”

Anatomy of A PROC SQL Statement SQL Nomenclature Tables (datasets) Rows (observations) Columns (variables) Anatomy of A PROC SQL Statement PROC SQL; SELECT column list FROM table list WHERE condition list GROUP BY column list ORDER BY column list ; quit;

Features SQL looks at datasets differently from SAS SAS looks at a dataset one record at a time, using an implied loop that moves from the first record to the last SQL looks at all the records, as a single object Because of this difference SQL can easily do a few things that are more difficult to do in SAS SQL commands are available for creating tables, changing table structures, changing values in tables, functions and more…

Processing Large Datasets: Create View When a table is created, the query is executed and the resulting data is stored in a file. When a view is created, the query itself is stored in the file. The data is not accessed at all in the process of creating a view. By default, PROC SQL will print the resultant query (use NOPRINT option to suppress this feature). But NO output is produced when a view is created.

Create View Log Snippet PROC SQL; CREATE VIEW out.c1data AS SELECT * FROM data.allgenostarc1 AS a, pheno.new_gtriplet AS b WHERE a.subject=b.subject; ORDER BY a.subject; QUIT; Log Snippet NOTE: SQL view ME.C1DATA has been defined. NOTE: PROCEDURE SQL used (Total process time): real time 0.86 seconds cpu time 0.01 seconds

The CONTENTS Procedure Log Snippet The CONTENTS Procedure Data Set Name out.c1data Observations . Member Type VIEW Variables 4 Engine SQLVIEW Indexes 0 Protection Compressed NO Data Set Type Sorted YES # Variable Type Len Format Informat 3 age Num 8 5 pedid Num 8 BEST12. F12. 4 sex Num 8 BEST12. F12. 1 subject Num 8 11. F11. SAS stores it with an extension ‘sas7bvew’

View from View PROC SQL; CREATE VIEW out.agecat as SELECT *, CASE WHEN . lt age le 18 THEN 1 WHEN 18 lt age le 25 THEN 2 WHEN 25 lt age le 40 THEN 3 WHEN 40 lt age le 55 THEN 4 WHEN 55 lt age le 70 THEN 5 WHEN age gt 70 THEN 6 ELSE . END AS agecat format=1. FROM out.c1data; QUIT;

SQL Functions PROC SQL; SELECT COUNT(DISTINCT subject), agecat, sex FROM out.agecat GROUP BY agecat, sex; QUIT; $ agecat sex --------------------- 1 1 0 79 2 0 118 2 1 322 3 0 380 3 1 608 4 0 741 4 1 461 5 0 452 5 1 42 6 0 32 6 1 Macro Variable PROC SQL noprint; SELECT COUNT(DISTINCT subject) INTO :subj1-:subj2 FROM out.agecat GROUP BY sex; QUIT; %PUT "Males=" &subj1 “Female =“ &subj2;

SQL Functions PROC SQL supports all the functions available to the SAS DATA step that can be used in a proc sql select statement Because of how SQL handles a dataset, these functions work over the entire dataset Common Functions: COUNT DISTINCT MAX MIN SUM AVG VAR STD STDERR NMISS RANGE SUBSTR LENGTH UPPER LOWER CONCAT ROUND MOD PROC SQL does not support LAG, DIF, and SOUND functions.

Creating Index PROC SQL; CREATE UNIQUE INDEX id ON data.goldn(id); Indexes are auxiliary data structures that can be used to improve performance of large data sources Stored in the same directory as the indexed table in a different file, same name, different extension

Why use Indexes? NO Index? Lookups must read the entire data portion of the table from start to finish to be certain of finding all matches This means a lot of CPU and I/O time used to read records that are never needed Index? SAS will automatically detect and exploit the index if it can improve performance The index file contains a list of key variable values and their location within the data table The index supplies a list of matching records positions which is then used to interrogate the table itself Only the parts of the table that are needed are read which means less CPU and I/O time

Merge without Sort No presorting required PROC SQL; CREATE TABLE goldndata AS SELECT * FROM goldn.gtriplet AS a, goldn.blood AS b WHERE a.id=b.id; QUIT; No presorting required No requirement for common variable names to join on (should be same type, length) PROC SQL; CREATE TABLE goldndata AS SELECT * FROM goldn.gtriplet AS a, goldn.blood AS b WHERE a.myid=b.id; QUIT;

Combining Datasets: Joins InnerJoin Full Join If a and b; If a or b; Right Join Left Join If a; If b;

Changing the Order of Variables Changing the Order of Variables in Your Data Set – some genetics software require id as first column… Table 1. Order of variables before changing (oldfile) Age Sex Subject Table 2. Order of variables after changing (newfile) Subject Sex Age

Changing the order… PROC SQL; CREATE TABLE newfile ( subject num, sex num, age num ); INSERT INTO newfile SELECT subject, sex, age FROM me.c1data; QUIT; proc contents data=newfile; run; Log Snippet… Alphabetic List of Variables and Attributes # Variable Type Len 3 age Num 8 2 sex Num 8 1 subject Num 8

Matching, Sounds-Like… Phonetic Matching: Sounds-Like Operator =* A technique for finding names that sound alike or have variations in spelling. The sounds-like operator "=*" searches and selects character data based on two expressions: the search value and the matched value. Pattern Matching: % Wildcard character The % acts as a wildcard character representing any number of characters, including any combination of uppercase or lowercase characters. Combining the LIKE predicate with the % (percent sign) permits case-sensitive searches.

Matching, Sounds-Like… PROC SQL; CREATE VIEW map AS SELECT * FROM map.map; QUIT; FROM map WHERE GeneSymbol LIKE 'CYP%'; * WHERE GeneSymbol =* "CYP19";

Creating Macro Variables with Proc SQL Select ALL Unique Values Into a Macro Variable: Keyword DISTINCT eliminates duplicates. PROC SQL NOPRINT; SELECT DISTINCT genesymbol INTO :gene SEPARATED BY ', ' FROM map.map; QUIT; %put &gene; List file Snippet GIMAP4,GIMAP5,GIMAP6,GIMAP7,GIMAP8,GIOT-1,GIP,GIPC1,GIPC2,… Without the SEPARATED BY clause each value put into the macro variable would overwrite the previous value and we would end up with an array with the single value which would be the last value of the variable.

Macro Variables with Proc SQL contd… Select ALL Unique Values Into a Macro Variable but this time add double quotes using Quote function and delete consecutive blanks using compbl function. PROC SQL NOPRINT; SELECT DISTINCT quote(compbl(genesymbol)) INTO :gene SEPARATED BY ', ' FROM map.map; QUIT; %put &gene; List file Snippet… "GIMAP4 ","GIMAP5 ","GIMAP6 ","GIMAP7 ","GIMAP8 ","GIOT-1 ","GIP ","GIPC1 ","GIPC2

CREATING MACRO ARRAYS USING PROC SQL Select all variable names and create a macro array: the simplest way would include the output from proc contents: PROC CONTENTS DATA=mydata(KEEP = diabetes -- asthma ) OUT=vars(KEEP = name varNum ) NOPRINT; RUN ; PROC SQL NOPRINT ; SELECT name INTO :row_1 - :row_&SysMaxLong FROM vars ORDER BY varnum ; QUIT ;

Finale PROC SQL is an additional tool with its own strengths and challenges Many times it is just another way to do the same thing BUT other times it might be much more efficient and may cut down the number of sorts, data steps & procedures or lines of code required.

Suggested Readings Papers Books SQL for People Who Don’t Think They Need SQL: Erin M. Christen (PharmaSUG 2003) Ten Great Reasons to Learn SAS® Software's SQL Procedure: Kirk Paul Lafler (SUGI23) Books Proc SQL Beyond the Basics: Kirk Paul Lafler SAS Guide to the SQL Procedure

Thank you!