Presentation on theme: "Chapter 13 Transforming Data with SAS Functions"— Presentation transcript:
1Chapter 13 Transforming Data with SAS Functions ObjectivesLearn to use a variety of SAS functions to perform the following tasks:Convert character (numeric) data to numeric (character) dataCreate SAS date valuesExtract time intervals from a SAS date valuePerform calculations with date, datetime and time valuesExtract, edit, concatenate, and search the values of character stringsReplace, remove occurrences of a particular word within a character string
2General From of SAS Functions SAS functions are build-in routines that enable to complete a predefined tasks for data manipulations.General syntax of a SAS function:Function-name(argument-1, ….. <argument-n>);Arguments may be:Variables,Constants,Expressions,Variable List
3When arguments are in Arrays or Variable List Variable List is, for example, Var1 – Var5 is the same as Var1, Var2, Var3, Var4, Var5Varx --Vary : consists of all variables from Varx to Vary.The syntax of a SAS function involving Variable List or Arrays:Function-name( OF variable list);Example:MEAN( OF Var1 - Var4) ; computes the mean of Var1 to Var4MEAN( Var1 – Var4); does not compute the mean of Var1 to Var4; instead, if computes the average of Var1 MINUS Var4
4Target Variables for SAS Functions Target variable is the variable to which the result of a SAS function is assigned. For Example:Avg_score = Mean (of Quiz1 – Quiz 5);Avg_score is the target variable.One important property of a Target Variable is the Variable Length. The length depends on the function.For Numeric Target Variable, the typical default length is 8.However, for Character Target, it varied greatly. It can be from 1 to 200.It is important to specify the LENGTH statement prior to the first appearance of the Target Variable.LENGTH char_var $ n num_var n;
5Functions for Sample Statistics Some useful syntax for computing sample statistic using SUM function:SUM (x1, x2, x3, x4);SUM(of x1 – x4);SUM(of x -- y);SUM (y, z, of x1 – x4);SUM (4, 24, 10, 6);NOTE: Missing values are ignored in the computation.
6Other useful functions for computing sample statistics MEAN MEDIAN MIN MAX VAR : variance STD : standard deviation N : num of non missing NMISS : num of missing RANGE : max - min IQR : Q3 – Q1, (3rd quartile – 1st quartile) PCTL : (percentile, numeric list); Compute the percentile from the numeric list.
7Exercise 1Run the following program and observe how SAS functions work.Data quiz;input name $ 1-5 q1 6-9 q q q q ;/* COMPUTE SUM AND AVERAGE OF QUIZ SCORES FOR EACH STUDENT USING FUNCTIONS */TOTQUIZ=SUM(Q1,Q2,Q3,Q4,Q5); AVGQUIZ=MEAN(Q1,Q2,Q3,Q4,Q5); SUMQUIZ=SUM(OF Q1-Q5);meanq=sumq/5; MEANQUIZ=MEAN(OF Q1-Q5);/*NOTE: The following statement computes the difference of Q1 - Q5, NOT sum of Q1 to Q5.*/DIFFQ1Q5=SUM(Q1-Q5);/* NOTE: If we use the assignment statement, then, missing cases will make the summary statistics a missing value as well.*/asum = q1+q2+q3+q4+q5; amean = (q1+q2+q3+q4+q5)/5;datalines;AAAAABAACAADAADAAEAAFAAGAAHAAI;proc print; run;
8Convert Character to Numeric using INPUT function The function used to convert character to Numeric type: INPUT(Source, Informat); Source is the character variable, constant, or expression to be converted. Informat is the format to INPUT (to read) the character into numeric. The informat is particularly important if the character variable involves with nonstandard ‘numeric data values’ for a character variable. For example: a variable payment is defined as character variable, and the data values are stored as $4, The informat is dollar9.2 , which is the format to read nonstandard numeric data values such as this. The following INPUT function converts the payment variable to numeric variable: Num_Pay = INPUT(payment , dollar9.2)
9Automatic Conversion from Character to Numeric WITOHUT INPUT function For example: Salary = payrate*hours;Suppose payrate is read as a character variable:SAS first create a temporary numeric value for each character value.if the character value of payrate can be converted into a valid numeric value, the temporary numeric value is used in the computation.If the character value of payrate can NOT be converted into a valid numeric value, then INPUT function is required in order to have a valid numeric value.
10Examples for automatic conversation from Character to Numeric Character ValueNumeric Values by Automatic conversation24.5-12.61.45E21452,595.4.$14,605.34
11Examples using INPUT function Character values for Pay variableINPUT(Charater, Informat)Converted Numeric Values24.5INPUT(Pay, 4.1)-12.6INPUT(Pay, 5.1)1.45E2INPUT(Pay, 6.2)1452,595.4INPUT(Pay, comma7.1)2595.4$14,605.34INPUT(Pay, dollar10.2)
12Convert Numeric Variable to Character Variable using PUT function The PUT SAS function conducts numeric-to-character conversation:PUT(source , Format);Source is the numeric variable to be converted to character.Format is the format for to write the source into a character string.The format must agree with the source type. Since the source is Numeric variable, the format MUST be a numeric format.
13Automatic Numeric-to-character conversation This is similar to character –to-numeric conversation, numeric data values are converted to character values when they are used in character context. The format used for automatic conversation is BEST12. format for writing the numeric into character value, and then the resulting character data value is RIGHT-ALIGNED.
14A LENGTH problem when using Automatic Numeric-Character conversation NOTE: if the numeric value has less than 12 digits, then, since it is right-aligned, there will have some leading blanks. Fro example, The following is a case of a raw data: ZIP (in numeric ) address (in character) PE109, CMU We want to concatenate these together as PE109, CMU Com_Address = address || ZIP; NOTE: || is the operator to concatenate strings together. The result will be : PE109, CMU NOTE: there will have 7 leading blanks in between CMU and 48859
15Exercise 2Run the following program 1 and observe the results and variable attributes. data C_N_Conv; cv1='542.3'; cv2='1.456E2'; cv3='2,368'; cv4='$6,421.5'; N_Cv1 = INPUT(cv1, 5.1); N_cv2 = INPUT(cv2, 7.3); N_cv3 = INPUT(cv3, comma5.0); N_cv4 = INPUT(cv4, dollar8.1); proc contents varnum; run; proc print; run; Run the following program 2 and observe the results and variable attributes. data N_C_Conv; var1=245; var2=124.6; var3=1245; C_var1 =put(var1, 4.); C_var2 = put(var2,6. ); C_var3 = put(var3, 7.1); proc contents varnum; run; proc print; run;
16Manipulating SAS Date Values with Functions Recall :SAS date is numeric data value defined starting at 1/1/1960 as date value 0.Ex: 1/30/1960 has the date value 29.SAS time defines the relative time in a given date in 24 hours span, and store the time as the number of seconds since mid-night (00:00:00 to be 0 second of the date).Ex: For any given date, say today, 1:30:25 am has the time value in seconds: 5425 seconds.SAS datetime is the absolute time counting in seconds starting from the mid-night on 1/1/1960.
17SAS Date, Time Functions to create numeric SAS date, time values MDY(Mon, Day, Year) : result in a SAS date value NOTE: if you use two digits year, the default year-cutoff is applied (1920). Ex: MDY(11, 1, 15); is the date value of Nov 1st, MDY(11, 1, 35); is the date value of Nov 1st, MDY (11, 1, 1915); is the date value of Nov 1st, TODAY(): gives today’s date value DATE(): gives today’s date value TIME(): gives current time as a SAS time (in seconds) DATETIME() gives current datetime as a SAS datetime (in seconds).
18SAS Functions to extract Months, Quarter, Days, Years from SAS date values DAY(date) gives day of month (1 to 31) QTR(date) gives quarter in the year of the date (1 to 4) YEAR(date) gives the year of the date (4 digit year) WEEKDAY(date) gives the day of week (1 to 7; 1 is Sunday, and so on). MONTH(date) gives the month of the date (1 to 12)
19SAS Function, INTCK, for finding the number of time intervals occurred in a given time span The following function counts the # of time intervals in a given time span. INTCK(‘interval’, from, to); The possible time intervals can be: DAY, WEEKDAY, WEEK, TENDAY, SEMIMONTH, MONTH, QTR, SEMIYEAR, YEAR From: specifies a SAS date, time or datetime value that identifies the beginning of the time span. TO: specifies a SAS date, time or datetime value that identifies the end of the time span.
20Some rules for using INTCK function It counts the # of intervals crossed between the ‘FROM’ and ‘TO’.Partial intervals are not counted.INTCK SAS statementValueWeeks1=INTCK(‘week’ , ’31DEC2009’d , ’01JAN2010’d)Months=INTCK(‘Month’ , ’31DEC2009’d , ’01JAN2010’d)1Years=INTCK(‘Year’ , ’31DEC2009’d , ’01JAN2010’d)Week2=INTCK(‘week’ , ’31DEC2009’d , ’03JAN2010’d)
21INTNX(‘interval’, start-from, increment <,alignment>); The INTNX Function determines the time based on start-from time and increments of the intervalsGeneral Syntax:INTNX(‘interval’, start-from, increment <,alignment>);The function returns a SAS date, time or datetime valuesInterval can be : DAY, WEEKDAY, WEEK, TENDAY, SEMIMONTH, MONTH, QTR, SEMIYEAR, YEARStart-from: specifies the starting SAS date, time, datetime.Increment: specifies a negative (back to the past) or positive integer (to the future).Alignment: forces the alignment of the returned date to be the beginning (‘b’), middle (‘m’), or end (‘e’) of the time interval. The default is the beginning.
22How does INTNX works?The following shows some examples of using INTNX function:SAS INTNX functionResultINTNX(‘month’, ’01NOV2010’d, 5);18718 (April 1, 2011)INTNX(‘month’, ’01NOV2010’d, 5, ‘b’);INTNX(‘month’, ’01NOV2010’d, 5, ‘m’);18732 (April 15, 2011)INTNX(‘month’, ’01NOV2010’d, 5, ‘e’);18747 (April 30, 2011)
23Calculating Date difference and Year difference between two dates DATDIF counts # of dates between two dates.YRDIF counts # of years between two dates.General Syntax:DATDIF(Start_date, End_date, basis);YRDIF(Start_date, Eend_date, basis);Start_Date specifies the starting date as a SAS date value.End_Date specifies the end date as a SAS date value.Basis is a string specifies the basis for calculating the date or year difference. The basis is ‘n/m’ , where n is the # of days per months, and m is number of days per year. For example, ’30/360’ uses 30 days per months to calculate # of months, and use 360 days to calculate # of years.
24Possible basis for DATDIF and YRDIF The following is the basis that can be applied:Basis (string)MeaningValid in DATDIFValid in YRDIF’30/360’30 days per month, 360 days per yearYES‘ACT/ACT’Actual # of days for the month, actual # of days for the year‘ACT/360’Actual # of days for month, 360 days per yearNO‘ACT/365’Actual # of days for moth, 365 days per year
25Examples of computing DATDIF and YRDIF DATA USE_DIF; DATEDF1=DATDIF('01SEP1984'D,'01NOV2010'D, '30/360'); DATEDF2=DATDIF('01SEP1984'D,'01NOV2010'D, 'ACT/ACT'); YEARDF1=YRDIF('01SEP1984'D,'01NOV2010'D, '30/360'); YEARDF2=YRDIF('01SEP1984'D,'01NOV2010'D, 'ACT/ACT'); PROC PRINT; RUN; Results: Obs DATEDF1 DATEDF2 YEARDF1 YEARDF
26Exercise 3Run the following program and observe how functions TODAY, YEAR, MONTH, QTR. WEEKDAT, DAY work.data datefunctions;date1='25DEC2010'd;date2=TODAY();YEAR_date1=YEAR(date1);MONTH_Date1=MONTH(Date1);QTR_Date1=QTR(Date1);WEEKDAY_Date1=WEEKDAY(Date1);DAY_Date1=DAY(date1);proc print; format date1 date2 date9.; run;
27Exercise 4Run the following program and observe how INTCK and INTNX functions work.data dateFunc2;date1 = '25DEC2010'd;date2 = TODAY();NDAYS=INTCK('DAY', Date1, Date2);NYEARS=INTCK('YEAR', date1, date2);NMONTH=INTCK('MONTH', date1, date2);NQTR=INTCK('QTR',date1, date2);NWEEK=INTCK('WEEK', date1, date2);Incmonth1=INTNX('MONTH', today(),6, 'b');incmonth2=INTNX('MONTH', today(),6, 'm');incmonth3=intnx('month', today(), 6, 'e');Datediff=DATDIF(date1, date2,'ACT/ACT');Yeardiff=YRDIF(date1, date2, 'ACT/ACT');proc print;format date1 date2 Incmonth1 incmonth2 incmonth3 date9.;run;
28Modify Character Values using SAS Functions This section focuses on manipulating character strings. The objectives include:Replace the contents of a character valueTrim trailing or leading blanks from a character valueSearch a character value and extract a proportion of the valueCovert a character values to UPPER, lower and Proper cases.
29SAS Functions for manipulating Character Values There are many SAS functions for manipulating character strings. This section will discuss the following functions:FunctionPurposeSCANLook for a specific word from a character stringSUBSTRExtract a substring or replaces character valuesTRIMTrim trailing blanks from character valuesLEFTLeft align the string that is right-aligned to allow for TRIM the traling blanksUPCASEConvert the character value to UPPER caseLOWCASEConvert the character value to lower casePROPCASEConvert the character value to Proper caseCATXConcatenate strings, remove leading, trailing blanks and insert separatorINDEXSearch character value for a specific stringFINDSearch for a specific substring with a character string user specifiesTRANWRDReplace or remove all occurrences of a pattern of characters within a character string
30How does SCAN function works? SCAN allows users to separate words in a character string using separators.General Syntax:SCAN(argument, n, <,delimiters>);Argument is the character variable or expression to be scannedn specifies which word to readDelimiters are special characters, which must be enclosed in a single quotation mark. If you do not specify delimiters, default delimiters are used.Default delimiters include:blank < ( + | & ! $ * ) ; ^ / , %The default length from the SCAN function is 200. Therefore, it is essential to specify the LENGTH statement prior to the SCAN function.
31Some Examples of using SCAN function Name = ‘CURTIS, BEN MIKE’; To search for the first name, we can use SCAN function: Fname1=scan(Name, 2); gives BEN Fname2=scan(Name, 2, ‘, ‘); gives BEN Fname3=SCAN(Name, 3); gives the result MIKE Fname4=SCAN(Name, 2, ‘,’); gives BEN MIKE
32SUBSTR function SUBSTR serves two purposes: extracts a portion of a character string by starting at a specified position:General syntax (Right side SUBSTR):Target = SUBSTR(string, position <,n>);Replace the content of a character string:General Syntax (Left side SUBSTR):SUBSTR(string, position <, n>) = ‘substring’;The string does not need to be marked by delimiters.If n is omitted in the SUBTR function, then, all remaining characters are included in (or replaced by) the substring.The length of the substring has the same length of the string.Hence, it is important to define the LENGTH statement as needed prior to the SUBSTR function.
33Examples of using Right-side SUBSTR function SUBSTR serves two purposes:Extract a substring from a character string (right SUBSTR). Here are some examples for Right-side SUBSTR:NAME = ‘CURTIS, BEN MIKE’;To extract the middle initial, one can use SCAN to locate Middle name, MIKE, then use SUBTR to extract the middle initial, M:MidName=SCAN(name, 3);Midinit = SUBSTR(MidName, 1, 1);
34Example of using Left-side SUBSTR function The 2nd purpose of SUBSTR is to replace a substring in a string: For example, NAME = ‘CURTIS, BEN MIKE’; The correct middle name is MICHAEL not MIKE. One can use Left-side SUBSTR function: SUBSTR(Name, 13)=‘MICHAEL’; NOTE: The size of the substring is not specified. This will replace everything starting at the 13th position in the string by ‘MICHAEL’.
35TRIM FunctionTRIM function helps to trim the trailing blanks before concatenating strings together. The general syntax: TRIM(Variable); In case there are LEADING blanks, we can use the function LEFT(variable), which turn the variable to Left-align, and create Trailing blank, instead. We can then apply TRIM function: TRIM(LEFT(Variable));
36Converting character values into UPPER, Lower, and Proper cases UPCASE(character value) returns the character values all in UPPER case. Ex: UPCASE( ‘Mission street’) returns ‘MISION STREET’ LOWCASE(character value) returns the strings all in lower case. Ex: LOWCASE( ‘Mission street’) returns ‘mission street’ PROPCASE(character value) returns the value with 1st character upper case and the rest in lower cases. Ex: PROPCASE( ‘MISSION street’) returns ‘Mission Street’ These functions are very useful when dealing with character values, especially when we use IF statement that involving character values, especially when values are stored in mixed cases.
37CATX FunctionWhen concatenating character strings, it often requires to trim leading and trailing blanks, and provides separator to separate words in order to obtain the correct new character strings. One can use TRIM, LEFT, concatenating separators to do the task. Starting SAS 9.1, a new SAS function, CATX is created to handle all of these at the same step.The general Syntax of CATX function:CATX(separator, string-1 <, …… string-n>);Separator specifies the character string used for separating between concatenating stings. It must be in a quotation mark ‘ ‘String-n specifies a SAS character string.
38CATX function exampleThe following data consists of Name (1-20), Jobtype (22-40), city(42-53), state(55-63), zipcode(65-71):AARON, BRAD MAC Network Technician Alma MichganFLEMING, TIM WAREN Computer Analyst Mt Pleasant MichganCHEN, DAVID MICHAEL Instructor MT PLEASANT MICHGANThe following program reads the data set and creates the address label for each individual using CATX functionData job;INFILE ‘ ‘ ;input name $ 1-20 jobtype $ city $ State $ zipcode 65-71;Address = CATX(‘, ‘ , PROPCASE(city), PROPCASE(state), ZIPCODE);
39Exercise 5 Open c13_1 program Run program 1, observe how SCAN function works, and see the variable attributes, especially the variable length. Then, add the Length statement to define the length for Fname0 to Fname10 the length of 10. Run the program, check the length again.Run program 2 and observe the results and the variable attributes. Then add Length statement for Name with length 20. Run the program, and see the results.Run program 3 and observe the results and the variable attributes. Then add Length statement for Name with length 20. Run the program, and see the results.
40INDEX FunctionINDEX function is used to search a character value for a specified string.It searches from Left to Right, looking for the first occurrence of the string, and returns the POSITION of the string’s first character. If the string does not exist, it returns 0.General syntax:INDEX(source, excerpt);Source specifies the character variables or expression to search.Excerpt is a character string that is enclosed in quotation marks, ‘ ‘ to be searched from the source.Ex: INDEX(upcase(jobtype), ‘WORD PROCESSING’); returns the position of W when WORD PROCESSING’ first found, or Zero if no such string is found.
41FIND FunctionAnother way for searching a string is the FIND function, which searches for a specific substring within a character string.FIND function searches for the first occurrence of the substring, and return the POSITION of the substring. If no such a string, it returns zero.General Syntax:FIND(string, substring <,modifiers> <,startpos>);String is a character constant, variable, or expression to be searched for the substring.Substring is a character constant, variable, or expression to be searched from within the String.Modifiers is a character constant, variable, or expression specifying one or more modifiers.Startpos is an integer specifies the position at which the search should start and the direction of the search. If Startpos is not given, FIND searches from left to right starting from 1st position.
42What are the Modifiers in FIND function, and what for? NOTE: FIND function is similar to INDEX function with some differences. One is that it allows search started at a given position, and allows to search backward or forward.Another difference is the modifiers, which will help to speed up the search under different searching conditions.The modifiers include:Modifier ’i’ causes the FIND function to ignore character cases during the search.Modifier ‘t’ trims the trailing blanks from string or substring.If no modifier is specified, FIND search for the substrings with the same case as the characters in substrings.If the modifier is a constant, enclose it in quotation marks. One can specify more than one modifier, all are in one single quotation. Ex. To use both i and t modifiers, use ‘i t’ as modifiers.
43Examples of FIND function NOTE: FIND function without using modifier nor startpos behaves the same as INDEX function. Similar to INDEX, FIND is also case sensitive. Make sure you use UPCASE(string) or LOWCASE(string) in the FIND function if the cases may be mixed. Here are some examples using FIND function: FIND(lowcase(job), ‘data mining’, ‘t’); One case combine IF statement and FIND function to select observations that having job title ‘data mining’: Data dmjob; Set alljobs; IF FIND(lowcase(job), ‘data mining’, ‘t’) > 0; Run;
44TRANWRD FunctionTRANWRD function replaces or removes all occurrences of a pattern of characters from within a character string.A situation using TRANWRD is to update existing variables in place, such as change ‘MISS’ to ‘MS.’, change ‘Doctor’ to Dr.’ and so on.General Syntax:TRANWRD(source, target, replacement);Source is the source string to be translated or updated.Target is the string SAS is looking for in the source that is to be removed or replaced.Replacement specifies the new string to replace the target.To remove the target from source, simply use ‘’ as replacement.
45Examples of using TRANWRD Note: TRANWRD function is case sensitive. Use UPCASE , LOWCASE function as needed. Example: TRANWRD(name, ‘Miss’ , ‘Ms.’); TRANWRD(propcase(name), ‘Doctor’ , ‘Dr.’);
46Nesting SAS FunctionsAs you have seen in the previous examples, SAS functions can be nested with another SAS function. For example, name = ‘Curtis, Ben, mike’ Midname= TRIM(UPCASE(substr(scan(name,3),1,1,)))||’.’; is to look for the middle name, then, locate the 1st position of the middle name, then, select one character as the middle initial. Make it as upper case, trim the trailing blanks, and add a period.
47SAS Functions for modifying numeric values In manipulating numeric values, one may be interested in only integer part of a value, may need to round off to a certain # of digits, and so on. SAS has a set of functions to modify numeric values: INT(argument); returns the integer part of the argument. ROUND(argument, round-off-unit); returns the value rounded of to the unit specified. CEIL(argument); returns the value ‘round-up’ to the next largest integer. FLOOR(argument); returns the value ‘round-down’ to the next smallest integer.
48Examples of modifying numeric values Data valueINTROUND(value, .1)CEILFLOOR1.25911.32-1.259-1-1.3-220.9342020.921-20-20.9-21