Presentation is loading. Please wait.

Presentation is loading. Please wait.

SAS SQL SAS Seminar Series

Similar presentations


Presentation on theme: "SAS SQL SAS Seminar Series"— Presentation transcript:

1 SAS SQL SAS Seminar Series
Shamika Ketkar July 14th, 2008

2 SQL Structured Query Language Developed by IBM in the early 1970’s
From the 70’s to the late 80’s there were different types of SQL, based on different databases. In 1986 the first unified SQL standard (SQL-86) was created. In 1987 database interface for SQL was added to the Version 6 Base SAS package A “language within a language”

3 Anatomy of A PROC SQL Statement
SQL Nomenclature Tables (datasets) Rows (observations) Columns (variables) Anatomy of A PROC SQL Statement PROC SQL; SELECT column list FROM table list WHERE condition list GROUP BY column list ORDER BY column list ; quit;

4 Features SQL looks at datasets differently from SAS
SAS looks at a dataset one record at a time, using an implied loop that moves from the first record to the last SQL looks at all the records, as a single object Because of this difference SQL can easily do a few things that are more difficult to do in SAS SQL commands are available for creating tables, changing table structures, changing values in tables, functions and more…

5 Processing Large Datasets: Create View
When a table is created, the query is executed and the resulting data is stored in a file. When a view is created, the query itself is stored in the file. The data is not accessed at all in the process of creating a view. By default, PROC SQL will print the resultant query (use NOPRINT option to suppress this feature). But NO output is produced when a view is created.

6 Create View Log Snippet PROC SQL; CREATE VIEW out.c1data AS SELECT *
FROM data.allgenostarc1 AS a, pheno.new_gtriplet AS b WHERE a.subject=b.subject; ORDER BY a.subject; QUIT; Log Snippet NOTE: SQL view ME.C1DATA has been defined. NOTE: PROCEDURE SQL used (Total process time): real time seconds cpu time seconds

7 The CONTENTS Procedure
Log Snippet The CONTENTS Procedure Data Set Name out.c1data Observations Member Type VIEW Variables Engine SQLVIEW Indexes Protection Compressed NO Data Set Type Sorted YES # Variable Type Len Format Informat 3 age Num 5 pedid Num BEST F12. 4 sex Num BEST F12. 1 subject Num F11. SAS stores it with an extension ‘sas7bvew’

8 View from View PROC SQL; CREATE VIEW out.agecat as SELECT *, CASE
WHEN . lt age le 18 THEN 1 WHEN 18 lt age le 25 THEN 2 WHEN 25 lt age le 40 THEN 3 WHEN 40 lt age le 55 THEN 4 WHEN 55 lt age le 70 THEN 5 WHEN age gt 70 THEN 6 ELSE . END AS agecat format=1. FROM out.c1data; QUIT;

9 SQL Functions PROC SQL; SELECT COUNT(DISTINCT subject), agecat, sex
FROM out.agecat GROUP BY agecat, sex; QUIT; $ agecat sex Macro Variable PROC SQL noprint; SELECT COUNT(DISTINCT subject) INTO :subj1-:subj2 FROM out.agecat GROUP BY sex; QUIT; %PUT "Males=" &subj1 “Female =“ &subj2;

10 SQL Functions PROC SQL supports all the functions available to the SAS DATA step that can be used in a proc sql select statement Because of how SQL handles a dataset, these functions work over the entire dataset Common Functions: COUNT DISTINCT MAX MIN SUM AVG VAR STD STDERR NMISS RANGE SUBSTR LENGTH UPPER LOWER CONCAT ROUND MOD PROC SQL does not support LAG, DIF, and SOUND functions.

11 Creating Index PROC SQL; CREATE UNIQUE INDEX id ON data.goldn(id);
Indexes are auxiliary data structures that can be used to improve performance of large data sources Stored in the same directory as the indexed table in a different file, same name, different extension

12 Why use Indexes? NO Index?
Lookups must read the entire data portion of the table from start to finish to be certain of finding all matches This means a lot of CPU and I/O time used to read records that are never needed Index? SAS will automatically detect and exploit the index if it can improve performance The index file contains a list of key variable values and their location within the data table The index supplies a list of matching records positions which is then used to interrogate the table itself Only the parts of the table that are needed are read which means less CPU and I/O time

13 Merge without Sort No presorting required
PROC SQL; CREATE TABLE goldndata AS SELECT * FROM goldn.gtriplet AS a, goldn.blood AS b WHERE a.id=b.id; QUIT; No presorting required No requirement for common variable names to join on (should be same type, length) PROC SQL; CREATE TABLE goldndata AS SELECT * FROM goldn.gtriplet AS a, goldn.blood AS b WHERE a.myid=b.id; QUIT;

14 Combining Datasets: Joins
InnerJoin Full Join If a and b; If a or b; Right Join Left Join If a; If b;

15 Changing the Order of Variables
Changing the Order of Variables in Your Data Set – some genetics software require id as first column… Table 1. Order of variables before changing (oldfile) Age Sex Subject Table 2. Order of variables after changing (newfile) Subject Sex Age

16 Changing the order… PROC SQL; CREATE TABLE newfile ( subject num,
sex num, age num ); INSERT INTO newfile SELECT subject, sex, age FROM me.c1data; QUIT; proc contents data=newfile; run; Log Snippet… Alphabetic List of Variables and Attributes # Variable Type Len 3 age Num 2 sex Num 1 subject Num

17 Matching, Sounds-Like…
Phonetic Matching: Sounds-Like Operator =* A technique for finding names that sound alike or have variations in spelling. The sounds-like operator "=*" searches and selects character data based on two expressions: the search value and the matched value. Pattern Matching: % Wildcard character The % acts as a wildcard character representing any number of characters, including any combination of uppercase or lowercase characters. Combining the LIKE predicate with the % (percent sign) permits case-sensitive searches.

18 Matching, Sounds-Like…
PROC SQL; CREATE VIEW map AS SELECT * FROM map.map; QUIT; FROM map WHERE GeneSymbol LIKE 'CYP%'; * WHERE GeneSymbol =* "CYP19";

19 Creating Macro Variables with Proc SQL
Select ALL Unique Values Into a Macro Variable: Keyword DISTINCT eliminates duplicates. PROC SQL NOPRINT; SELECT DISTINCT genesymbol INTO :gene SEPARATED BY ', ' FROM map.map; QUIT; %put &gene; List file Snippet GIMAP4,GIMAP5,GIMAP6,GIMAP7,GIMAP8,GIOT-1,GIP,GIPC1,GIPC2,… Without the SEPARATED BY clause each value put into the macro variable would overwrite the previous value and we would end up with an array with the single value which would be the last value of the variable.

20 Macro Variables with Proc SQL contd…
Select ALL Unique Values Into a Macro Variable but this time add double quotes using Quote function and delete consecutive blanks using compbl function. PROC SQL NOPRINT; SELECT DISTINCT quote(compbl(genesymbol)) INTO :gene SEPARATED BY ', ' FROM map.map; QUIT; %put &gene; List file Snippet… "GIMAP4 ","GIMAP5 ","GIMAP6 ","GIMAP7 ","GIMAP8 ","GIOT-1 ","GIP ","GIPC1 ","GIPC2

21 CREATING MACRO ARRAYS USING PROC SQL
Select all variable names and create a macro array: the simplest way would include the output from proc contents: PROC CONTENTS DATA=mydata(KEEP = diabetes -- asthma ) OUT=vars(KEEP = name varNum ) NOPRINT; RUN ; PROC SQL NOPRINT ; SELECT name INTO :row_1 - :row_&SysMaxLong FROM vars ORDER BY varnum ; QUIT ;

22 Finale PROC SQL is an additional tool with its own strengths and challenges Many times it is just another way to do the same thing BUT other times it might be much more efficient and may cut down the number of sorts, data steps & procedures or lines of code required.

23 Suggested Readings Papers Books
SQL for People Who Don’t Think They Need SQL: Erin M. Christen (PharmaSUG 2003) Ten Great Reasons to Learn SAS® Software's SQL Procedure: Kirk Paul Lafler (SUGI23) Books Proc SQL Beyond the Basics: Kirk Paul Lafler SAS Guide to the SQL Procedure

24 Thank you!


Download ppt "SAS SQL SAS Seminar Series"

Similar presentations


Ads by Google