Presentation is loading. Please wait.

Presentation is loading. Please wait.

This is the Modern World: Simple, Overlooked SAS® Enhancements Bruce Gilsen Federal Reserve Board June 2009.

Similar presentations


Presentation on theme: "This is the Modern World: Simple, Overlooked SAS® Enhancements Bruce Gilsen Federal Reserve Board June 2009."— Presentation transcript:

1 This is the Modern World: Simple, Overlooked SAS® Enhancements Bruce Gilsen Federal Reserve Board June 2009

2 Introduction  Smaller, less dramatic enhancements fall thru cracks at Federal Reserve (SAS consultant for 24 years) SAS-L Conferences  Users still use more cumbersome methods  Enhancements from V92, V9, V8, V7, even V6!  Impetus: yymmddn8. in earlier paper

3 Underutilized enhancements  Write date values in form yyyymmdd  Increment date values w/INTNX function  Transport files: PROC CPORT/CIMPORT vs PROC COPY w/EXPORT engine  Count # times character or substring occurs in a character string  Concatenate character strings  Sort by numeric part of character values  Retrieve DB2 data on z/OS mainframes

4 1. Write date values in form yyyymmdd  Lots of informats, formats to read/write date values 9.1.3: 22 date informats, 46 date formats  Most common input form: yyyymmdd YYMMDD8. informat reads dates in form yyyymmdd  Most common output form (overwhelmingly): yyyymmdd YYMMDD8. format writes dates as yy-mm-dd YYMMDD10. format writes dates as yyyy-mm-dd

5 Writing dates as yyyymmdd, pre-V8  Some date output data one; date1 = "18oct2008"d; * 10/18/2008 date value; put date1 yymmdd8.; * writes 08-10-18 ; put date1 yymmdd10.; * writes 2008-10-18 ;  Before Version 8 PUT: character string of form yyyy-mm-dd COMPRESS: remove dashes datetemp = compress(put(date1,yymmdd10.),'-'); put datetemp $8. ; * writes 20081018 ;

6 Writing dates as yyyymmdd, V8  New format, YYMMDDxw x: optional separator w: number of bytes  x, one of: B blank N none C colon P period D dash S slash date1 = "18oct2008"d; * 10/18/2008 date value; put date1 yymmddn8.; * writes 20081018 ;

7 YYMMDDxw format, examples data _null_ ; date1 = "18oct2008"d; * 10/18/08 date value ; put date1 yymmddb8.; * 08 10 18 ; put date1 yymmddc8.; * 08:10:18 ; put date1 yymmddd10.; * 2008-10-18 ; put date1 yymmddn8.; * 20081018 ; put date1 yymmddp8.; * 08.10.18 ; put date1 yymmdds10.; * 2008/10/18 ; put date1 yymmddn6.; * 081018 ;

8 2. Increment date values with INTNX  INTNX returns SAS ® date value incremented by number of intervals (days, months, quarters, years,…)  SAS date value: integer, number of days before or after January 1, 1960 December 31, 1959 -1 January 1, 1960 0 January 2, 1960 1... October 18, 2008 16,727

9 INTNX Syntax (informal, incomplete) INTNX(interval, start-from, increment )  interval: unit (years, months, days,...) start-from is incremented by  start-from: SAS date value to increment  increment: # of intervals start-from is incremented by  alignment: where start-from aligned after incrementing. Values: BEGINNING, MIDDLE, END, SAMEDAY (v9). Default: BEGINNING.

10 Before Version 9: 2 INTNX issues  Alignment BEGINNING (start of period)  Leap year handling

11 INTNX issue 1: alignment to start of period  Increment by 2 days, months, years  Unexpected result for months, years data one; date1 = ‘18oct2008'd; dateplus2day = intnx(‘day',date1,2); dateplus2month = intnx(‘month',date1,2); dateplus2year = intnx(‘year',date1,2); run; Variable Description Expect Actual dateplus2day 10/18/08+2 days 10/20/08 10/20/08 dateplus2month 10/18/08+2 months 12/18/08 12/1/08 dateplus2year 10/18/08+2 years 10/18/10 1/1/10

12 dateplus2day = intnx(‘day',date1,2);  date1 = ‘18oct2008'd; = 17,823, date value for 10/18/2008  Increment by 2 days First, increment by 2 days Then, align to start of 10/20/2008 (beginning of interval, DAY), no effect  DATEPLUS2DAY = 17,825, date value for 10/20/2008 as expected

13 dateplus2month = intnx(‘month',date1,2);  date1 = ‘18oct2008'd; = 17,823, date value for 10/18/2008  Increment by 2 months First, increment by 2 months Then, align to start of 12/2008 (beginning of interval, MONTH)  DATEPLUS2MONTH = 17,867, date value for 12/1/2008

14 dateplus2year = intnx(‘year',date1,2);  date1 = ‘18oct2008'd; = 17,823, date value for 10/18/2008  Increment by 2 years First, increment by 2 years Then, align to start of 2010 (beginning of interval, YEAR)  DATEPLUS2MONTH = 18,263, date value for 1/1/2010

15 INTNX issue 2: leap years  Example of shifting by years offset = 2; * number of years to increment; if day(date1) = 29 and month(date1) = 2 and mod(offset,4) ne 0 and mod(offset, 400) ne 0 then date1 = mdy(2, 28, year(date1)+offset); else date1 = mdy(month(date1), day(date1), year(date1)+offset) ;  Need more generalized code for all frequencies

16 Solution: SAMEDAY  New alignment value in V9  Preserves date value’s alignment in interval after it’s incremented  No effect when interval = DAY  Use for all intervals except DAY

17 Solution: SAMEDAY  dateplus2day = intnx(‘day’, date1, 2, “sameday”) 2 days after 10/18/2008 Result: 17,825, SAS date value for 10/20/2008  dateplus2month = intnx(‘month’, date1, 2, “sameday”) 2 months after 10/18/2008 Result: 17,884, SAS date value for 12/18/2008  dateplus2year = intnx(‘year’, date1, 2, “sameday”) 2 years after 10/18/2008 Result: 18,553, SAS date value for 10/18/2010

18 Some interesting dates  date2 = intnx(‘year’, ’29feb2000’d, 1, “sameday”) 1 year after 2/29/2000 Result: 15,034, SAS date value for 2/28/2001  date3 = intnx(‘year’, ’29feb2000’d, 4, “sameday”) 4 years after 2/29/2000 Result: 16,130, SAS date value for 2/29/2004  2000, 2004 = leap years, 2001 not leap year

19 Some interesting dates  date4 = intnx(‘month’, ’31mar2003’d, -1, “sameday”) 1 month before 3/31/2003 Result: 15,764, SAS date value for 2/28/2003  date5 = intnx(‘month’, ’31mar2004’d, -1, “sameday”) 1 month before 3/31/2004 Result: 16,130, SAS date value for 2/29/2004  2000, 2004 = leap years, 2003 not leap year

20 SAMEDAY until 9.2: single, non-shifted date intervals only  Following intervals might return wrong answer, no error or warning multiple (e.g., month2 = two-month interval) shifted (e.g., month.4 = month interval starting on April 1) time datetime  Interval = DAY, WEEK, WEEKDAY, TENDAY, SEMIMONTH, MONTH, QTR, SEMIYEAR, YEAR okay  Fixed in 9.2  http://support.sas.com/techsup/unotes/SN/016/016184.html

21 3. Transport files: PROC CPORT/CIMPORT vs PROC COPY w/EXPORT engine  Transport file File organized in machine-independent format SAS can read on any operating system Use to copy SAS data sets from one operating system to another  Starting w/ Version 6, two types of transport files XPORT-style transport files CPORT-style transport files

22 XPORT-style transport files: 3 steps  1. On original host, create transport file with data set(s) libname inn 'directory_with_data_set(s)' ; libname trans xport 'transport_file_to_create'; proc copy in=inn out=trans ; run ;  2. Copy transport file to destination host with ftp command or another method  3. On destination host, transport file ---> SAS data set(s) libname trans xport 'transport_file' ; libname sasdata1 ‘directory_for_data_sets‘; proc copy in=trans out=sasdata1 ; run ;

23 CPORT-style transport files: 3 steps  1. On original host, create transport file with data set(s) libname sasdata1 'directory_with_SAS_data_set(s)'; filename trans 'transport_file_to_create'; proc cport library=sasdata1 file=trans ; run ;  2. Copy transport file to destination host with ftp command or another method  3. On destination host, transport file ---> SAS data set(s) filename trans 'transport_file' ; libname sasdata1 ‘directory_for_data_sets' ; proc cimport infile=trans library=sasdata1 ; run ;

24 Version 7 changes and transport files  Version 7 changes Variable name maximum length: 8  32 Character variable maximum length: 200  32768 Variable names display based on “first reference”  Transport files XPORT-style created in Version 6 format CPORT-style understands Version 7 changes Always use CPORT-style

25 XPORT-style transport files, Version 7 and beyond  options validvarname=v6; Documented in V8, undocumented but works in V9 Empty data set if any character variable length more than 200, error in log Variable names longer than 8 characters truncated, unique Variable names converted to upper case  Otherwise Empty data set if any character variable length more than 200 or variable name longer than 8 characters, error in log Variable names converted to upper case

26 4. Count number of times a character or substring occurs in a character string  Typical objective, count number of times "ab" occurs in "ab1aba344444abb555ab4“ (4 times) "a” or “b" occur in "ab1aba344444abb555ab4“ (10 times)  Pre-Version 9: iterative methods (loops)  Version 9 COUNT: # times substring occurs in a character string COUNTC: # times 1 or more characters occur in a character string

27 Count occurrences of substring "ab" in "ab1aba344444abb555ab4" data one; char1 = "ab1aba344444abb555ab4"; * character variable to check; * Before Version 9 ; num_ab = 0; * counter for number of occurrences ; loc=index(char1,"ab"); * find 1st occurrence ; do while (loc ne 0); num_ab+1; * count occurrences of "ab" ; char1 = substr(char1,loc+2); * resume search after "ab" ; loc = index(char1, "ab"); * find next occurrence ; end; * Or, a single statement in Version 9 ; num_ab = count(char1,"ab"); * Version 9 ; run;

28 Count occurrences of characters "a" or "b" in "ab1aba344444abb555ab4" data one; char1 = "ab1aba344444abb555ab4"; * character variable to check; * Before Version 9 ; num_a_or_b = 0; * counter for number of occurrences ; loc=indexc(char1,"ab"); * find 1st occurrence ; do while (loc ne 0); num_a_or_b+1; * count occurrences of "a” or “b“ ; char1 = substr(char1,loc+1); * resume search after "a” or “b" ; loc = indexc(char1, "ab"); * find next occurrence ; end; * Or, a single statement in Version 9 ; num_a_or_b = countc(char1,"ab"); * Version 9 ; run;

29 Countw: Version 9.2  Counts number of words in a character string  Can also replace iterative programming

30 5. Concatenate character strings  Pre-Version 9 || to concatenate TRIM removes trailing blanks TRIM(LEFT) removes leading and trailing blanks  Version 9, new functions concatenate character strings CAT does not remove leading or trailing blanks CATT removes trailing blanks CATS removes leading and trailing blanks CATX removes leading and trailing blanks, inserts delimiter (separator) between each argument Note difference in length of result

31 data one; length char1 char2 char3 $8.; char1="xxxxxxxx"; char2=" yyy"; * 1 leading blank ; char3="zzzz"; * Leading and trailing blanks included; cat1 = char1 || char2 || char3; cat1 = cat(char1, char2, char3); * V9 ; * Remove trailing blanks ; cat2 = trim(char1) || trim(char2) || trim(char3); cat2 = catt(char1, char2, char3); * V9 ; cat1=xxxxxxxx yyy zzzz cat2=xxxxxxxx yyyzzzz

32 data one; length char1 char2 char3 $8.; char1="xxxxxxxx"; char2=" yyy"; * 1 leading blank ; char3="zzzz"; * Remove leading and trailing blanks ; cat3 = trim(left(char1)) || trim(left(char2)) || trim(left(char3)); cat3 = cats(char1, char2, char3); cat3=xxxxxxxxyyyzzzz

33 Important difference: length of result data one; length char1 char2 char3 $8.; char1="xxxxxxxx"; char2=" yyy"; * 1 leading blank ; char3="zzzz"; cat1 = char1 || char2 || char3; cat1 = cat(char1, char2, char3); etc.  If length of result (e.g., CAT1) not defined ||: sum of length of contributing variables: 8+8+8=24 CAT, CATT, CATS, CATX: 200

34 data one; length char1 char2 char3 $8.; char1="xxxxxxxx"; char2=" yyy"; * 1 leading blank ; char3="zzzz"; aa = '123'; * Delimiter: value of AA, can vary by observation ; char_catx1 = catx(aa, char1, char2); * Delimiter: comma ; char_catx2 = catx(',', char1, char2, char3); * Delimiter: '123' ; char_catx3 = catx('123', char1, char2, char3); * Delimiter: single blank ; char_catx4 = catx(' ', char1, char2, char3); run; char_catx1=xxxxxxxx123yyy char_catx2=xxxxxxxx,yyy,zzzz char_catx3=xxxxxxxx123yyy123zzzz char_catx4=xxxxxxxx yyy zzzz

35 6. Sort by the numeric part of character values  Variable BANK: bank2 bank100 bank1 bank10  Conventional PROC SORT left to right character by character comparison "1" in "bank100" less than "2" in "bank2“  Wanted: sort integer values w/in text by numeric value Conventional sort order Desired sort order bank1 bank10 bank2 bank100 bank10 bank2 bank100

36 Before Version 9.2: one method  Make new data set with variable w/numeric part of BANK  Sort by new variable  This example assumes numeric part of value always starts at same location (5th character) at most three digits  More general case slightly more complicated data two; set one; banktemp = input(substr(bank,5),3.); run; proc sort data=two out=three; by banktemp; run;

37 Version 9.2: linguistic collation  SORTSEQ=LINGUISTIC + NUMERIC_COLLATION sorts integer values w/in text by numeric part of value  Don’t create extra data set proc sort data=one out=two sortseq=linguistic (numeric_collation=on); by bank; run;  Sort order for BANK: bank1 bank2 bank10 bank100

38 Numeric collation not just at end of character values  ADDRESS values East 14th Street East 2nd Street  Conventional sort: “1” less than “2”  Sortseq=linguistic (numeric_collation=on): 2 less than 14 Conventional sort order Linguistic collation order East 14th Street East 2nd Street East 2nd Street East 14th Street

39 7. Retrieve DB2 data on z/OS mainframes  Old methods PROC DB2EXT PROC ACCESS  Modern methods SAS/ACCESS LIBNAME statement Pass-Through Facility

40 PROC DB2EXT  Since Version 6 (1990), undocumented but works  In Version 9, message NOTE: Beginning with SAS version 10, DB2EXT will no longer be available.  “Grandfathered” to Version 6 Long DB2 column names truncated to 8 characters Algorithm ensures unique variable names

41 PROC ACCESS  Introduced in Version 6 (1990)  Use access and view descriptors  Documented and supported, not recommended  PROC CV2VIEW: convert access and view descriptors to SQL views  “Grandfathered” to Version 6 Long DB2 column names truncated to 8 characters Algorithm ensures unique variable names

42 SAS/ACCESS LIBNAME statement  Introduced in Version 7  Assign libref directly to DB2 database  Easiest method  Long DB2 column names processed directly

43 Pass-through facility  Introduced in Version 6  Retrieve DB2 data by sending DBMS-specific SQL statements to DB2 DBMS = Database Management System  Long DB2 column names Processed directly unless options validvarname=v6; options validvarname=v6; Long name truncated to 8 characters, with unique names as per PROC DB2EXT, PROC ACCESS Documented in V8, undocumented but works in V9

44 Retrieving data: 4 methods options db2ssid=dsn; /* 1. PROC DB2EXT */ run; proc db2ext out=new1; select cntry_nm from nicua.cuv_country_nm; run; proc access dbms=db2; /* 2. PROC ACCESS */ create work.new1.access; table = nicua.cuv_country_nm; ssid = dsn; create work.new1.view; select cntry_nm; run;

45 Retrieving data: 4 methods /* 3. SAS/ACCESS to DB2 LIBNAME statement */ libname in1 db2 ssid=dsn authid=nicua; data new1; set in1.cuv_country_nm (keep = cntry_nm); run; proc sql; /* 4. Pass_Through Facility */ connect to db2 (ssid=dsn); create table new1 as select * from connection to db2 (select cntry_nm from nicua.cuv_country_nm); disconnect from db2; quit;

46 Notes  SAS/ACCESS LIBNAME statement vs Pass-Through Facility Both recommended SAS/ACCESS LIBNAME statement easiest Pass-Through Facility can be more efficient, especially for complex queries Compared in SAS/ACCESS manual  If convert from PROC DB2EXT or PROC ACCESS DB2 column names could be longer In subsequent code: RENAME or change code

47 For more information, please contact Bruce Gilsen Federal Reserve Board, mail stop 157 Washington, DC 20551 phone: 202-452-2494 e-mail: bruce.gilsen@frb.gov SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies.


Download ppt "This is the Modern World: Simple, Overlooked SAS® Enhancements Bruce Gilsen Federal Reserve Board June 2009."

Similar presentations


Ads by Google