Presentation is loading. Please wait.

Presentation is loading. Please wait.

SHRUG, F EB 2013: N ETWORKING EXERCISE Many Ways to Solve a SAS Problem.

Similar presentations


Presentation on theme: "SHRUG, F EB 2013: N ETWORKING EXERCISE Many Ways to Solve a SAS Problem."— Presentation transcript:

1 SHRUG, F EB 2013: N ETWORKING EXERCISE Many Ways to Solve a SAS Problem

2

3

4 D ATA SETS work.cohort work.sesquart

5 W HAT DO YOU CONSIDER ? How easy was the code to read and understand? How short was the code? In the code samples that follow, are there new techniques that you’ll use? Is there new syntax you learned? Does the formatting of the code help legibility? Do you see code that could create more user- friendly output?

6 S OLUTION 1 C ODE & R ESULTS – D AVID M OSES, R ON D EWAR proc rank data=COHORT out=COHORT_QUARTILE groups=4; var INCOME; ranks INCOME_QUARTILE; run; work.cohort_quartile

7 S OLUTION 2: C ODE – A RT T ABACHNECK data _for_format; set SESQUART; fmtname="income"; type="N"; do label=1 to 4; if label eq 1 then do; start=0; end=Pctl_25-.0000000001; end; else if label eq 2 then do; start=Pctl_25; end=Pctl_50-.0000000001; end; else if label eq 3 then do; start=Pctl_50; end=Pctl_75-.0000000001; end; else do; start=Pctl_75; end=99999999999; end; output; end; run; proc format cntlin = _for_format; run ; data want; set cohort; quartile=put(income,income.); run;

8 S OLUTION 2 - R ESULTS work._for_format work.want

9 S OLUTION 3: C ODE – A NONYMOUS data inc_quart; *this appends the income cutpoints to the cohort dataset; if _n_ = 1 then do; set SESQUART (keep = Pctl_25 Pctl_50 Pctl_75); end; else; set COHORT ; *assign cutpoints to an array, so income can be compard in a loop; array q(*) Pctl_25 Pctl_50 Pctl_75 ; income_quartile =.; * who knows? income may be missing; if income ^=. then do; quart = 1; do i = 1 to 3; if income > q(i) then income_quartile = i+1; end; drop i Pctl_25 Pctl_50 Pctl_75 ; run;

10 S OLUTION 3 - R ESULTS work.inc_quart

11 S OLUTION 4: C ODE – P ETER Y E proc sql noprint; select Pctl_25,Pctl_50,Pctl_75 into:q25, :q50, :q75 from sesquart; quit; data cohort; set cohort; select; when(missing(income)) quantile =.; when(income < &q25) quantile = 25; when(income < &q50) quantile = 50; when(income < &q75) quantile = 75; otherwise quantile = 100; end; run;

12 S OLUTION 4 - R ESULTS work.cohort

13 S OLUTION 5 C ODE : A NONYMOUS ( X 2) * create formats for income; proc format; value incfmt.=‘Missing’ low-29721='At or below 25th percentile' 29722-36775='Above 25th percentile to median' 36776-44924='Above median to 75th percentile' other='Above 75th percentile‘ ; run; * put income into quartiles; proc sql feedback; create table incquartiles as select a.*, put(income,incfmt.) as incquart length=35 from work.cohort as a ; quit;

14 S OLUTION 5 - R ESULTS work.incquartiles

15 S OLUTION 6: C ODE – A NONYMOUS data a; set work.sesquart; x=1; *x=1 added to this record; data b; set work.cohort; x=1; *x=1 added to each record; data c; merge a b; by x; *25th, 50th & 75th percentiles added to each record; q1=(y>pctl_75) +(y>pctl_50) +(y>pctl_25) +(income>=0); *quartile calculated by method 1; q2=(y>=pctl_75)+(y>=pctl_50)+(y>=pctl_25)+(income>=0); *quartile calculated by method 2; q =(q1+q2)/2; * 0 signifies invalid (missing or negative) income *1.5 signifies income can be classified as either 1st or 2nd quartile; *2.5 signifies income can be classified as either 2nd or 3rd quartile; *3.5 signifies income can be classified as either 3rd or 4th quartile; drop pctl_25 pctl_50 pctl_75 q1 q2 x; *drop variables that are no longer required; /* Dataset c contains the original cohort data with quartile (q) added to each record. q= 0 signifies an invalid (missing or negative) income. q=1.5 signifies ties in the value of income. Client could be classified as either 1st or 2nd cohort. q=2.5 signifies ties in the value of income. Client could be classified as either 2nd or 3rd cohort. q=3.5 signifies ties in the value of income. Client could be classified as either 3rd or 4th cohort. */ RUN;

16 S OLUTION 6 - R ESULTS work.c NOTE: Variable y is uninitialized. NOTE: There were 1 observations read from the data set WORK.A. NOTE: There were 10 observations read from the data set WORK.B. NOTE: The data set WORK.C has 10 observations and 6 variables. NOTE: DATA statement used (Total process time): real time 0.07 seconds cpu time 0.01 seconds

17 S OLUTION 7 – C ODE : A NONYMOUS * create formats for income; proc format; value incfmt.=‘Missing’ low-29721='At or below 25th percentile' 29722-36775='Above 25th percentile to median' 36776-44924='Above median to 75th percentile' other='Above 75th percentile' ; run; * create a new data set with income categorized into quartiles; data quartiles_inc; set cohort; if _n_=1 then set sesquart; attrib incquart length=$35 label='Income quartile'; * categorize income based on observation; incquart=put(income,incfmt.); run;

18 S OLUTION 7 - R ESULTS work.quartiles_inc

19 S UMMARY OF T ECHNIQUES Solutio n number Highlights 1proc rank 2create proc format using data + data step 3data step with arrays 4macros + data step 5proc format + proc sql 6data steps with merge 7data step with 2 set statements


Download ppt "SHRUG, F EB 2013: N ETWORKING EXERCISE Many Ways to Solve a SAS Problem."

Similar presentations


Ads by Google