Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lesson 4 - Topics Creating new variables in the data step SAS Functions.

Similar presentations


Presentation on theme: "Lesson 4 - Topics Creating new variables in the data step SAS Functions."— Presentation transcript:

1 Lesson 4 - Topics Creating new variables in the data step SAS Functions

2 Creating New Variables Direct assignments(formulas): c = a + b ; d = 2*a + 3*b + 7*c ; bmi = weight/(height*height); Indirect assignments (if/then/else) if age < 50 then young = 1; else young = 2; if income < 15 then tax = 1; else if income < 25 then tax = 2; else if income >=25 then tax = 3;

3 Direct Assignments (Formulas) Example c = a + b ; So if a = 2, b =3, c = 5; What if a is missing, what is c? C will be missing What if b is missing?

4 If/then/else Statements With if-then-else definitions SAS stops executing after the first true statement if income < 15 then tax = 1; else if income < 25 then tax = 2; else if income >=25 then tax = 3; What if income is 10? What if income is 23? What if income is 30? What if income is missing? Tax = 1 Tax = 2 Tax = 3 Tax = 1

5 Create a new variable with 2 levels, one for college graduates and one for non-college graduates. Creating New Variables

6 Program 5 DATA tdata; INFILE ‘C:\SAS_Files\tomhs.data' ; INPUT @ 1 ptid $10. @ 49 educ 1. @123 sbp12 3. ; * This way will code missing values to the value 2; if educ < 7 then grad1 = 2 ; else if educ >=7 then grad1 = 1 ; * The next two ways are equivalent and are correct; if educ < 7 and educ ne. then grad2 = 2; else if educ >=7 then grad2 = 1; * IN is a useful function in SAS ; if educ IN(1,2,3,4,5,6) then grad3 = 2; else if educ IN(7,8,9) then grad3 = 1; New variable defines go after the input statement

7 PROC FREQ DATA=tdata; TABLES educ grad1 grad2 grad3 ; Cumulative Cumulative educ Frequency Percent Frequency Percent --------------------------------------------------------- 1 3 3.03 3 3.03 3 4 4.04 7 7.07 4 23 23.23 30 30.30 5 14 14.14 44 44.44 6 12 12.12 56 56.57 7 16 16.16 72 72.73 8 10 10.10 82 82.83 9 17 17.17 99 100.00 Frequency Missing = 1 Cumulative Cumulative grad1 Frequency Percent Frequency Percent ----------------------------------------------------------- 1 43 43.00 43 43.00 2 57 57.00 100 100.00 Cumulative Cumulative grad2 Frequency Percent Frequency Percent ----------------------------------------------------------- 1 43 43.43 43 43.43 2 56 56.57 99 100.00 Frequency Missing = 1 Cumulative Cumulative grad3 Frequency Percent Frequency Percent ----------------------------------------------------------- 1 43 43.43 43 43.43 2 56 56.57 99 100.00 Frequency Missing = 1 Coded the missing value for educ to 2

8 PROC FREQ DATA=tdata; TABLES educ*grad1 /MISSING NOCUM NOPERCENT NOROW NOCOL; TITLE 'Use Crosstabulation to Verify Recoding'; RUN; Table of educ by grad1 educ grad1 Frequency‚ 1‚ 2‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ. ‚ 0 ‚ 1 ‚ 1 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 1 ‚ 0 ‚ 3 ‚ 3 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 3 ‚ 0 ‚ 4 ‚ 4 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 4 ‚ 0 ‚ 23 ‚ 23 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 5 ‚ 0 ‚ 14 ‚ 14 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 6 ‚ 0 ‚ 12 ‚ 12 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 7 ‚ 16 ‚ 0 ‚ 16 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 8 ‚ 10 ‚ 0 ‚ 10 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 9 ‚ 17 ‚ 0 ‚ 17 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 43 57 100 This shows that the missing value for educ got assigned a value of 2

9 * Recode sbp12 into 3 levels; if sbp12 =. then sbp12c =. ; else if sbp12 < 120 then sbp12c = 1 ; else if sbp12 < 140 then sbp12c = 2 ; else if sbp12 >=140 then sbp12c = 3 ; With if-then-else definitions SAS stops executing after the first true statement Values < 120 will be assigned value of 1 Values 120-139 will be assigned value of 2 Values >=140 will be assigned value of 3 Missing values will be assigned to missing

10 PROC FREQ DATA=tdata; TABLES sbp12c sbp12; RUN; OUTPUT Cumulative Cumulative sbp12c Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 36 39.13 36 39.13 2 43 46.74 79 85.87 3 13 14.13 92 100.00 Frequency Missing = 8 Cumulative Cumulative sbp12 Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 93 1 1.09 1 1.09 94 1 1.09 2 2.17 101 1 1.09 3 3.26 104 1 1.09 4 4.35 105 1 1.09 5 5.43 (more values) 147 1 1.09 87 94.57 148 1 1.09 88 95.65 149 1 1.09 89 96.74 153 1 1.09 90 97.83 154 1 1.09 91 98.91 158 1 1.09 92 100.00 Frequency Missing = 8

11 * Easy but costly error to make; if sbp12 =. then sbp12c =. ; else if sbp12 < 120 then sbp12c = 1 ; else if sbp12 < 140 then sbp12 = 2 ; else if sbp12 >=140 then sbp12c = 3 ; PROC FREQ DATA=tdata; TABLES sbp12c; RUN; The FREQ Procedure Cumulative Cumulative sbp12c Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 36 73.47 36 73.47 3 13 26.53 49 100.00 Frequency Missing = 51 How come no values of 2 and why so many missing?

12 Important Facts When Creating New Variable 1.New variables are initialized to missing 2.Missing values are < any value if var < value (true if var is missing) 3.Reference missing values for numeric variables as. 4.Reference missing values for character variables as ' ' if sbp =. then... (or if missing(sbp)) if clinic = ' ' then...

13 SAS Handling of Missing Data When Creating New Variables Direct assignments(formulas): c = a + b ; d = 2*a + 3*b + 7*c ; bmi = weight/(height*height); If any variable on the right-hand side is missing then the new variable will be missing Indirect assignments if age < 50 then young = 1; else young=2; New variables are initialized to missing but may be given a value if any of the IF statements are true

14 What Value to Set New Variable if age < 20 then teenager = 1; else if age >=20 then teenager = 2; if age < 20 then teenager = 1; else if age >=20 then teenager = 0; if age < 20 then teenager = ‘YES’; else if age >=20 then teenager = ‘NO’;

15 * Program 6 SAS Functions ; DATA example; INFILE ‘C:\SAS_Files\tomhs.data' ; INPUT @058 height 4.1 @085 weight 5.1 @172 ursod 3. @236 (se1-se10) (1.0 + 1); bmi = (weight*703.0768)/(height*height); rbmi1 = ROUND(bmi,1); lursod = LOG(ursod); seavg = MEAN (OF se1-se10); semax = MAX (OF se1-se10); semin = MIN (OF se1-se10);

16 * Use of dash notation ; seavg = MEAN (OF se1-se10); This is the same as seavg = MEAN (se1,se2,se3,se4,se5,se6,se7,se8,se9,se10); The OF is very important. Otherwise SAS thinks you are subtracting se10 from se1. To use this notation the ROOT of the name must be the same.

17 * Two ways of computing average ; seavg = MEAN (se1,se2,se3,se4,se5,se6,se7,se8,se9,se10); Versus seavg = (se1+se2+se3+se4+se5+se6+se7+se8+se9+se10)/10; Using mean function computes the average of non- missing values. Result is missing only if all values all missing. Using + formula requires all values be non-missing otherwise result will be missing if N(of se1-se10) > 5 then seavg = MEAN(of se1-se10); What does this statement do?

18 PROC PRINT DATA = example (OBS=15); VAR bmi rbmi1 rbmi2 seavg semin semax ; TITLE 'Listing of Selected Data for 15 Patients '; RUN; PROC FREQ DATA = example; TABLES semax; TITLE 'Distribution of Worse Side Effect Value'; TITLE2 'Side Effect Scores Range from 1 to 4'; RUN; ods graphics on; PROC UNIVARIATE DATA = example ; VAR ursod lursod; QQPLOT ursod lursod; TITLE 'Quantile Plots for Urine Sodium Data'; RUN;

19 Listing of Selected Data for 10 Patients Obs bmi rbmi1 seavg semin semax 1 28.2620 28 1.1 1 2 2 35.9963 36 1.0 1 1 3 27.0489 27 1.0 1 1 4 28.2620 28 1.1 1 2 5 33.2008 33 1.0 1 1 6 27.7691 28 1.2 1 2 7 32.6040 33 1.0 1 1 8 22.4057 22 1.2 1 2 9 37.2037 37 1.1 1 2 10 33.1717 33 1.7 1 3

20 Distribution of Worse Side Effect Value Side Effect Scores Ranges from 1 to 4 The FREQ Procedure Cumulative Cumulative semax Frequency Percent Frequency Percent ---------------------------------------------------------- 1 33 33.00 33 33.00 2 52 52.00 85 85.00 3 13 13.00 98 98.00 4 2 2.00 100 100.00 2 patients had at least 1 severe side effect

21 Log transformed value shows a better linear pattern


Download ppt "Lesson 4 - Topics Creating new variables in the data step SAS Functions."

Similar presentations


Ads by Google