Presentation is loading. Please wait.

Presentation is loading. Please wait.

“LAG with a WHERE” and other DATA Step Stories Neil Howard A.

Similar presentations


Presentation on theme: "“LAG with a WHERE” and other DATA Step Stories Neil Howard A."— Presentation transcript:

1 “LAG with a WHERE” and other DATA Step Stories Neil Howard A

2 Table of Contents  Chapter 1 “LAG with a WHERE”  Chapter 2 “A DIFferent LAG”  Chapter 3 “To LAG or to LEAD”  Chapter 4 “When RETAIN Doesn’t Retain”  Chapter 5 “Don’t Order My Variables Around”  Chapter 6 “The Case of the Missing Values”

3 Chapter One LAG with a WHERE

4 “LAG with a WHERE”  an interesting fairy tale  must first understand: u LAG function u WHERE statement u implications of conditional execution

5 LAG function  Syntax:  n specifies number of lagged values  argument is numeric or character LAG (argument)

6 LAG function  LAG functions return values from a queue  A LAGn function stores a value in a queue and returns a value stored previously in that queue  Each occurrence of a LAGn function generates its own queue  n is the length of the queue

7 LAG function  LAG function is executable  LAG function can be conditionally executed  NOTE: storing and returning values from the queue occurs only when the function is executed

8 data new; input x @@; input x @@; lag1=lag1(x); lag1=lag1(x); lag2=lag2(x); lag2=lag2(x);cards; 1 2 3 4 5 6 ; SIMPLE LAG

9 X LAG1 LAG2 1.. 2 1. 3 2 1 4 3 2 5 4 3 6 5 4 (Note initialization to missing)

10 data new; input a b @@; LAGa = LAG(a); if b=2 then LAGb=LAG(a); cards; 1 1 2 1 3 2 4 1 5 2 6 1 ; CONDITIONAL LAG

11 A B LAGA LAGB 1 1.. 2 1 1. 3 2 2. 4 1 3. 5 2 4 3 6 1 5.

12 data new; input x @@; * conditional; if mod(x,2)=0 then condLAG1 = lag(x); LAGx=lag(x); * unconditional; if mod(x,2)=0 then condLAG2 = LAGx; cards; 1 2 3 4 5 6 7 8 ; Every other lagged value ?

13 X LAGx condLAG1 condLAG2 1... 2 1. 1 3 2.. 4 3 2 3 5 4.. 6 5 4 5 7 6.. 8 7 6 7 right answer

14 WHERE statement  Selects observations before they’re brought into the LPDV  After data set options applied  Before any other data step statements executed, including SET, BY, etc.  Functions differently with BY and first. and last.  Only works w/ SAS data (not raw data)

15 VISIT WEIGHT VISIT WEIGHT 01JAN2003 88 02JAN2003 22 03JAN2003 154 04JAN2003 21 05JAN2003 112 CUTOFF Given this data:

16 data w ; set q ; lagwgt = lag(weight) ; where visit>"01jan2003"d ; run ; data w ; set q ; lagwgt = lag(weight) ; if visit > "01jan2003"d ; run ; Subsetting IF WHERE DIFFERENCE? WHERE will not pick up first lagged valueWHERE will not pick up first lagged value subsetting IF will…subsetting IF will…

17 Output from WHERE VISIT WEIGHT LAGWGT 02JAN2003 22. 03JAN2003 154 22 04JAN2003 21 154 05JAN2003 112 21

18 Output from subsetting IF VISIT WEIGHT LAGWGT 02JAN2003 22 88 03JAN2003 154 22 04JAN2003 21 154 05JAN2003 112 21

19 Chapter Two A DIFferent LAG

20 “A DIFferent LAG”  DIF function  Syntax:  n specifies number of lags  argument is numeric DIF (argument)

21 DIF function  DIF function returns the first difference between the argument and its n th lag.  Defined as: DIF(X) = X - LAGn(X) ;

22 DIF function  Same storing/returning from LAGn queues apply  Same caveats for conditional execution

23 data new; input x @@; input x @@; lagx = lag(x); lagx = lag(x); difx = dif(x); difx = dif(x);cards; 1 2 8 4 3 9 7 ;

24 x lagx difx 1.. 2 - 1 = 1 8 2 6 4 8 -4 3 4 -1 9 3 6 7 9 -2

25 Chapter Three To LAG or to LEAD

26 Is there a LEAD function?  No LEAD function or negative LAG  Several solutions at: u www.sconsig.com www.sconsig.com  Including: u Sort in descending order (reverse) u …then use the LAG function

27 Most elegant solution:  MERGE the data set with itself  Read the data set twice u Using a 1:1 MERGE u No BY statement u Using firstobs=2

28 data lagged ; merge master ( keep = var ) master ( keep = var ) master ( firstobs = 2 master ( firstobs = 2 rename = (var =nextvar ) ) ; rename = (var =nextvar ) ) ; **** no BY statement ; **** no BY statement ;run;

29 varnextvar 12 23 34 45 56 6. Results of merge

30 Chapter Four When RETAIN Doesn’t Retain

31 Retained Variables  all SAS special variables, e.g. u _N_ u _ERROR_  all vars in RETAIN statement  all vars from SET or MERGE  accumulator vars in SUM stmt

32 Variables Not Retained  Variables from INPUT statement  User-defined variables/ vars created in DATA step  UNLESS……what?

33 data B ; input id $; cards;020010300506900; data c; set A B ; set A B ; if missing(site) then site = substr(id,1,2); if missing(site) then site = substr(id,1,2);run; data A ; input id $ site $; site $;cards; 10212 00 10213 00 ;concatenation

34 idsite 1021200 1021300 0200102 0300502 0690002 ??

35 data C; set A B (in=inb); set A B (in=inb); if inb then site = substr(id,1,2); if inb then site = substr(id,1,2);run; test that the observation has come from B and only then extract the site value.... Solution

36 idsite 1021200 1021300 0200102 0300503 0690006 !

37 Chapter Five Don’t Order My Variables Around “the variable order is not always declared where it seems to occur…” Ron Fehd

38 Question posed: How do I reorder the variables in my SAS data set?

39 “Don’t Order My Variables Around”  WHY? u exporting / export wizard u SAS Viewer end users u manipulate groups/lists of vars (age - - diag) u with PUT or ARRAY u what else?

40 “Don’t Order My Variables Around”  storage: u in LPDV u in SAS data set  presentation layer

41 My question to you: What forces the order of the variables in a SAS data set in the first place? The order in which they are seen by the compiler when the data set is created.

42 “Don’t Order My Variables Around”  RETAIN statement  (ATTRIB statement)  (LENGTH statement)  (PROC TRANSPOSE)  ??????

43 “Don’t Order My Variables Around”  Why RETAIN?  retain functionality implicit for vars coming from SET or MERGE  Nothing you can mess up (attributes, etc.)!

44 Original Original CONTENTS PROCEDURE Variables Ordered by Position- ----- Variables Ordered by Position- ---- # Variable Type Len Pos 1 NAME Char 8 0 2 SEX Char 8 8 3 AGE Num 8 16 4 ID Num 8 24 5 RX_GRP Num 8 32

45 Original Original NAME SEX AGE ID RX_GRP John M 35 101 2 Dan M 53 206 1 Howard M 45 321 3

46 data new; retain id rx_grp name sex age; retain id rx_grp name sex age; *** 1 st reference to compiler; *** 1 st reference to compiler; set master; set master;run;

47 Reordered Reordered CONTENTS PROCEDURE -----Variables Ordered by Position----- -----Variables Ordered by Position----- # Variable Type Len Pos 1 ID Num 8 0 2 RX_GRP Num 8 8 3 NAME Char 8 16 4 SEX Char 8 24 5 AGE Num 8 32

48 Reordered Reordered ID RX_GRP NAME SEX AGE 101 2 John M 35 206 1 Dan M 53 321 3 Howard M 45

49 Chapter Six The Case of the Missing Values

50 “How do MISSINGs compare?”  QUESTION: If A > B then ; If either A or B is missing, isn’t the statement just ignored? What if both are missing?

51 28* NUMERIC Missing Values._..a.b.c …….z All less than all negative numbers >>> low…….………………..to………..…….………high >>> * Confirmed in SAS documentation…..only 28

52 To answer to a question raised in the meeting about missing values: SPECIAL MISSING VALUE is a type of numeric missing value that enables you to represent different categories of missing data by using the letters A-Z or an underscore. SAS accepts either uppercase or lowercase letters. Values are displayed and printed as uppercase. If you do not begin a special numeric missing value with a period, SAS identifies it as a variable name. Therefore, to use a special numeric missing value in a SAS expression or assignment statement, you must begin the value with a period, followed by the letter or underscore, as in the following example: x=.d; When SAS prints a special missing value, it prints only the letter or underscore. When data values contain characters in numeric fields that you want SAS to interpret as special missing values, use the MISSING statement to specify those characters.

53 Master File NAME A B John 10 7 Dan X _ Howard.. numeric

54 data subset ; set old; set old; if A > B; if A > B;run;

55 Records deleted NAME A B John 10 > 7 Dan X > _ Howard. =. deleted numeric

56 One (1) Character Missing Value less than all negative numbers regardless of length collating sequence determines where “ ” falls in order of values <blank> b

57 if C=D then MSG=‘same’; if C=D then MSG=‘same’; C D C D NAME length=4 length=1 MSG John XXXX Y Dan Q Q same Howard same

58 Deus ex machina*  www.sasCommunity.org www.sasCommunity.org  http://groups.google.com/group/com p.soft-sys.sas/topics http://groups.google.com/group/com p.soft-sys.sas/topics http://groups.google.com/group/com p.soft-sys.sas/topics  SAS-L  www.support.sas.com www.support.sas.com  Documentation and HELP facility  TRIAL and ERROR  Testing!! * resolution

59 Thank you!! Thank you!! Neil.Howard@amgen.com

60 Subscribing to SAS-L  To have the messages mailed to you as they are available, send e-mail to any of the mail servers:  listserv@vm.marist.edu Marist University  listserv@listserv.vt.edu Virginia Polytechnic University  listserv@listserv.uga.edu University of Georgia  listserv@AKH-WIEN.AC.AT University of Vienna  The subject line is ignored and the body should contain the command: subscribe sas-l your name here e.g. subscribe sas-l Tom Smith is how Tom Smith would subscribe.

61 SAS-L Stuff  http://www.listserv.uga.edu/archives/ sas-l.html http://www.listserv.uga.edu/archives/ sas-l.html http://www.listserv.uga.edu/archives/ sas-l.html  http://groups.google.com/group/com p.soft-sys.sas/topics http://groups.google.com/group/com p.soft-sys.sas/topics http://groups.google.com/group/com p.soft-sys.sas/topics  From www.sconsig.com: www.sconsig.com u SAS-L On-Line (from www.sconsig.com) SAS-L On-Linewww.sconsig.com SAS-L On-Linewww.sconsig.com u How to Subscribe to SAS-L How to Subscribe to SAS-L How to Subscribe to SAS-L


Download ppt "“LAG with a WHERE” and other DATA Step Stories Neil Howard A."

Similar presentations


Ads by Google