Presentation is loading. Please wait.

Presentation is loading. Please wait.

SAS SQL Part 2 Alan Elliott. Dealing with Missing Values Title "Dealing with Missing Values in SQL"; PROC SQL; select INC_KEY,GENDER, RACE, INJTYPE, case.

Similar presentations


Presentation on theme: "SAS SQL Part 2 Alan Elliott. Dealing with Missing Values Title "Dealing with Missing Values in SQL"; PROC SQL; select INC_KEY,GENDER, RACE, INJTYPE, case."— Presentation transcript:

1 SAS SQL Part 2 Alan Elliott

2 Dealing with Missing Values Title "Dealing with Missing Values in SQL"; PROC SQL; select INC_KEY,GENDER, RACE, INJTYPE, case when ISS=-81 then. else ISS end as ISS, case when EDGCSTOTAL=-81 then. else EDGCSTOTAL end as EDGCSTOTAL, case when AGE=-81 then. else AGE end as AGE from "C:\sasdata\trauma_sm"; quit; SEE SQLMISSING.SAS

3 Partial Output INC_KEYGENDERRACEINJTYPEISSEDGCSTOTALAGE 468879FemaleBlackBlunt1.9.1 468942FemaleBlackBlunt20.17.7 468961FemaleWhite, not of Hispanic Origin Blunt5.16.6 468971FemaleBlackBlunt5156.5 469030MaleBlack..6.7 469055MaleBlack11510.8 487580MaleHispanicPenetrating25617.7 597075MaleWhite, not of Hispanic Origin Blunt91514.6 603091FemaleWhite, not of Hispanic Origin Blunt1735.1 Missing Values

4 Character variables Take INJTYPE out of initial list of variables on the SELECT statement and add the following code: end as AGE, case when INJTYPE="Burn" then "" else INJTYPE end as INJTYPE And rerun

5 Results 603227MaleWhite, not of Hispanic Origin 261516Blunt 603228MaleWhite, not of Hispanic Origin 1157.6Blunt 603237FemaleWhite, not of Hispanic Origin 17316.8Blunt 603238MaleBlack21515.7Blunt 603243MaleWhite, not of Hispanic Origin 14157.8Blunt 603246MaleWhite, not of Hispanic Origin 1151.4 603259MaleWhite, not of Hispanic Origin 11514Penetrating 603260FemaleWhite, not of Hispanic Origin 91513.3Blunt 603262FemaleWhite, not of Hispanic Origin 26147.1Blunt WAS BURN

6 Combined CASE / Conditional PROC SQL; select INC_KEY,GENDER,RACE, AGE,DISSTATUS, case when AGE LT 10 and DISSTATUS in ("Dead") then "CHILD DEATH" when AGE GE 10 and DISSTATUS in ("Dead") then "OTHER DEATH" ELSE "Alive" end as TYPEDEATH from "C:\sasdata\trauma_sm"; quit; SEE SQLCASE.SAS

7 Partial Output 603157FemaleBlack3.5Alive 603158MaleWhite, not of Hispanic Origin 7.7Alive 603169FemaleWhite, not of Hispanic Origin 5.1Alive 603170MaleWhite, not of Hispanic Origin 14.7Alive 603173MaleWhite, not of Hispanic Origin 5.7DeadCHILD DEATH 603188MaleWhite, not of Hispanic Origin 17.8Alive 603192MaleWhite, not of Hispanic Origin 12.8DeadOTHER DEATH 603196MaleWhite, not of Hispanic Origin 15.6Alive 603200FemaleWhite, not of Hispanic Origin 13.7Alive

8 Order by TYPEDEATH (Descending) Modify the code to read … … end as TYPEDEATH from "C:\sasdata\trauma_sm" ORDER BY TYPEDEATH DESC;

9 Results INC_KEYGENDERRACEAGEDISSTATUSTYPEDEATH 603192MaleWhite, not of Hispanic Origin 12.8DeadOTHER DEATH 603122MaleHispanic14.1DeadOTHER DEATH 603294FemaleWhite, not of Hispanic Origin 10.6DeadOTHER DEATH 603173MaleWhite, not of Hispanic Origin 5.7DeadCHILD DEATH 603371MaleWhite, not of Hispanic Origin 7.0DeadCHILD DEATH 603280MaleWhite, not of Hispanic Origin 18.0Alive 603262FemaleWhite, not of Hispanic Origin 7.1Alive 603463MaleAsian or Pacific Islander 12.4Alive

10 Summarize and Count PROC SQL; select case when AGE LT 10 and DISSTATUS in ("Dead") then "CHILD DEATH" when AGE GE 10 and DISSTATUS in ("Dead") then "OTHER DEATH" ELSE "Alive" end as TYPEDEATH, count(calculated TYPEDEATH) as COUNTDEATH from "C:\sasdata\trauma_sm" GROUP BY TYPEDEATH; quit; SEE SQLSUMMARY1.SAS (NOTE: Take calculates out of above statement, and observe error.)

11 Results TYPEDEATHCOUNTDEATH Alive95 CHILD DEATH2 OTHER DEATH3

12 Reorder the table Change the code (at end to read) from "C:\sasdata\trauma_sm" GROUP BY TYPEDEATH order by typedeath;

13 Results TYPEDEATHCOUNTDEATH OTHER DEATH3 CHILD DEATH2 Alive95

14 SQL Summary Functions Summary Function Description AVG, MEAN Average or mean of values COUNT, FREQ, N Aggregate number of non-missing values CSS Corrected sum of squares CV Coefficient of variation MAX Largest value MIN Smallest value NMISS Number of missing values PRT Probability of a greater absolute value of Student’s t RANGE Difference between the largest and smallest values STD Standard deviation STDERR Standard error of the mean SUM Sum of values SUMWGT Sum of the weight variable values which is 1 T Testing the hypothesis that the population mean is zero USS Uncorrected sum of squares VAR Variance

15 Using Some Summary Functions proc sql; select count(brand) as Tot_Cars, sum(minivan) as TOT_Minivans, min(CITYMPG) as MIN_MPG, max(CITYMPG) as MAX_MPG, SUM(CITYMPG)/COUNT(CITYMPG) as AVG_MPG from sasdata.cars; quit; SEE SQLSUMMARY2.SAS Tot_CarsTOT_MinivansMIN_MPGMAX_MPGAVG_MPG 108130106019.28955

16 Add group statement from sasdata.cars group by minivan; quit; Tot_CarsTOT_MinivansMIN_MPGMAX_MPGAVG_MPG 10510106019.32445 30 162018.06667

17 Compare Sort (Data Step) DATA SORT DATA MYDATA; INPUT @1 LAST $20. @21 FIRST $20. @45 PHONE $12.; Label LAST = 'Last Name' FIRST = 'First Name' PHONE = 'Phone Number'; DATALINES; Reingold Lucius 201-876-0987 Jones Pam 987-998-2948 Etc… ; *-------- Modify to sort by first name within last (by last first); PROC SORT; BY LAST FIRST; PROC PRINT LABEL NOOBS; TITLE 'ABC Company'; TITLE2 'Telephone Directory'; RUN; RUN this code, observe results. SEE DATASORT.SAS

18 Results from DATA statement ABC Company Telephone Directory Last NameFirst NamePhone Number AdamsAbby214-876-0987 BakerCrusty222-324-3212 JonesJackie456-987-8077 JonesPam987-998-2948 ReingoldLucius201-876-0987 SmithArnold234-321-2345 SmithBev213-765-0987 SmithJohn234-943-0987 ZollTim Bob303-987-2309

19 Sort Using SQL PROC SQL; SELECT LAST LABEL="Last Name", FIRST LABEL="First Name", PHONE LABEL "Phone Number" from MYDATA ORDER by LAST, FIRST; QUIT; SEE SQLSORT.SAS Note – variables appear in table in order selected…

20 Results for SQL Sort ABC Company Telephone Directory Note – easier to order variable names. First NameLast NamePhone Number AbbyAdams214-876-0987 CrustyBaker222-324-3212 JackieJones456-987-8077 PamJones987-998-2948 LuciusReingold201-876-0987 ArnoldSmith234-321-2345 BevSmith213-765-0987 JohnSmith234-943-0987 Tim BobZoll303-987-2309

21 Appending/Concatenating Two Files Recall from the Data Step, to append two data files you can use the code DATA NEW; SET OLD1 OLD2; RUN; (See SQLAPPEND.SAS)

22 Results for Data Append ObsSUBJAGEYRS_SMOKEMARRIED 10013412. 20034414. 30045535. 4006213. 501133111 601225190 702365451 803271551

23 Append Files using SQL PROC SQL; select * from old1 union select * from old2; QUIT; UNION-means concatenate the query results. It produces all the unique rows that result from both queries

24 Results (same as Data Append) SUBJAGEYRS_SMOKEMARRIED 0013412. 0034414. 0045535. 006213. 01133111 01225190 02365451 03271551

25 Basic SQL Operators UNIONproduces all unique rows from both queries. EXCEPTproduces rows that are part of the first query only. INTERSECTproduces rows that are common to both query results. OUTER UNIONconcatenates the query results. Combine two or more queries in various ways by using the following set operators:

26 Duplicate records Suppose there are duplicate records. See SQLAPPEND2.SAS DATA OLD1; INPUT SUBJ $ AGE YRS_SMOKE; datalines; 001 34 12 003 44 14 004 55 35 006 21 3 011 33 11 ; DATA OLD2; INPUT SUBJ $ AGE YRS_SMOKE MARRIED; datalines; 006 21 3. 011 33 11 1 012 25 19 0 023 65 45 1 032 71 55 1 ; RUN; This record added

27 Union appends, keeps unique rows PROC SQL; select * from old1 union select * from old2; QUIT; One row 6 is unique. Two row 11’s are unique. (UNION keeps all unique.) SUBJAGEYRS_SMOKEMARRIED 0013412. 0034414. 0045535. 006213. 0113311. 01133111 01225190 02365451 03271551 This is the same code from before – only difference is the duplicated records in the data sets.

28 Union all To keep all rows, use UNION ALL PROC SQL; select * from old1 union all select * from old2; QUIT; Add ALL to the code and re-run.

29 Results – Union All SUBJAGEYRS_SMOKEMARRIED 0013412. 0034414. 0045535. 006213. 0113311. 006213. 01133111 01225190 02365451 03271551 Both SUBJ 6 records included (even though not unique).

30 EXCEPT To keep only the data from the first data set that are not in the 2 nd set (but all variables) use EXCEPT PROC SQL; select * from old1 except select * from old2; QUIT; Run this code and observe output.

31 Except Output SUBJAGEYRS_SMOKEMARRIED 0013412. 0034414. 0045535. 0113311 Note – Record 6 is the same, so it is not kept. Record 11 is different so it is kept.

32 Switch Data Set order PROC SQL; select * from old2 except select * from old1; QUIT; Run this code and observe output.

33 Output SUBJAGEYRS_SMOKEMARRIED 01133111 01225190 02365451 03271551 Note – (Same as before) Record 6 is the same, so it is not kept. Record 11 is different so it is kept.

34 Except ALL datalines; 001 34 12 003 44 14 004 55 35 006 21 3 011 33 11 ; Suppose there was a duplicate record 006 in the first data set. Using EXCEPT, record 6 would not appear in the result because there is a record 6 in the second data set. If you want a duplicate record that is not a duplicate matched in the 2 nd data set to appear in the result, use EXCEPT ALL See SQLAPPEND2a.SAS Added duplicate record

35 Except ALL PROC SQL; select * from old1 except all select * from old2; QUIT; SUBJAGEYRS_SMOKEMARRIED 0013412. 0034414. 0045535. 006213. 0113311. 6 is a duplicate record in OLD1 and there is not a duplicate 6 in OLD 2, so it appears in the results

36 Intersect The INTERSECT command returns only those records that occur in both data sets. Change EXCEPT ALL to INTERSECT PROC SQL; select * from old1 intersect select * from old2; QUIT;

37 Intersect Results SUBJAGEYRS_SMOKEMARRIED 006213. Only record 6 was duplicated in both data sets.

38 Compare Union with Outer Union UNION- produces all unique rows from both queries. OUTER UNION – concatenates the series results See SQLAPPEND3.SAS

39 Compare Output PROC SQL; select * from old1 UNION select * from old2; QUIT; PROC SQL; select * from old1 OUTER UNION select * from old2; QUIT;

40 Results of Union Results of UNION SUBJAGEYRS_SMOKEMARRIED 0013412. 0034414. 0045535. 006213. 0113311. 01133111 01225190 02365451 03271551

41 Results of Outer Union SUBJAGEYRS_SMOKESUBJAGEYRS_SMOKEMARRIED 0013412... 0034414... 0045535... 006213... 006213... 0113311.....006213...01133111..01225190..02365451..03271551 Note: SQL allows you to create a data set with DUPLICATE variable names.

42 Cartesian Join Combines ALL rows from one file with ALL rows from another. PROC SQL; select * from old1, old2; QUIT; SEE SQLCARTESIAN_JOIN.SAS

43 Cartesian Join SUBJAGEYRS_SMOKESUBJAGEYRS_SMOKEMARRIED 0013412006213. 001341201133111 001341201225190 001341202365451 001341203271551 0034414006213. 003441401133111 00344140122519 Note SUBJ 1 appears 5 times

44 Using Table Aliases select a.subj, a.age, b.subj as sub_from_b, etc… from old1 a, old2 b Table Alias allows you to distinguish variables from different tables without ambiguity. Note “a” variables prefix specifies that the variable is from the table “old1” since “old1” is labeled as the “a” table below. Table old1 is labeled as table “a” in this code.

45 Inner Join (Using Table Alias) PROC SQL; select a.subj, a.age, b.subj, b.age, b.married as Married from old1 a, old2 b where a.subj=b.subj; QUIT; In an INNER JOIN, only observations with both key values matching are selected. SEE SQL_INNER_JOIN.SAS

46 Inner Join Code PROC SQL; select a.subj, a.age, b.subj, b.age, b.married as Married from old1 a, old2 b where a.subj=b.subj; QUIT; “Where” limits the join to those that are in BOTH Tables Note Married is from table “b” but will be called Married in output

47 Inner Join Results Results of INNER (Conventional) Join SUBJAGESUBJAGEMarried 0062100621. 0062100621. 01133011331

48 Inner Join (1-to-1 merge) Example 2 PROC SQL; select * from old1, old2 where old1.subj=old2.subect; QUIT; Use the table names as the alias. SEE SQL_INNER_JOIN2.SAS

49 Results SUBJAGEYRS_SMOKESUBJECTSBPMARRIED 00344140031101 0013412001120. 0045535004900 006213006100 Note – data for both files where not in order. Only those in both tables with a matching key variable are included in the result. EXERCISE – Add the phrase “order by old1.subj” to put the table in Subject order.

50 One to Many Merge Suppose you have data like this… you want to match building to employees. Data LOC; input BUILDING $ Location $; datalines; A1 DALLAS A2 WACO A3 HOUSTON ; RUN; DATA EMPLOYEE; input EID $ LOC $ ROOMNUMBER; datalines; 001 A2 103 003 A1 100 005 A1 1001 006 A3 12 002 A1 101.1 ; run;

51 SQL Code 1-to-Many PROC SQL; select * from LOC, EMPLOYEE where LOC.BUILDING=EMPLOYEE.LOC order by EMPLOYEE.EID; quit; Use the table names as the alias. SEE SQL_INNER_JOIN3.SAS

52 Results of 1-to-Many BUILDINGLocationEIDLOCROOMNUMB ER A2WACO001A2103 A1DALLAS002A1101.1 A1DALLAS003A1100 A1DALLAS005A11001 A3HOUSTON006A312 Each employee is matched to a building.

53 End – Do Exercises


Download ppt "SAS SQL Part 2 Alan Elliott. Dealing with Missing Values Title "Dealing with Missing Values in SQL"; PROC SQL; select INC_KEY,GENDER, RACE, INJTYPE, case."

Similar presentations


Ads by Google