Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analysis of VA Health Care Data with SAS Paul G. Barnett, PhD September, 2003.

Similar presentations


Presentation on theme: "Analysis of VA Health Care Data with SAS Paul G. Barnett, PhD September, 2003."— Presentation transcript:

1 Analysis of VA Health Care Data with SAS Paul G. Barnett, PhD September, 2003

2 Overview of presentation Use of SAS to analyze VA health care data at Austin Automation Center Suggestions on how to be a more accurate programmer Suggestions on how to document work

3 Common SAS procedures used to analyze health care data DATA step PROC CONTENTS PROC MEANS PROC SUMMARY PROC FREQ

4 Definitions Record– one observation in a dataset, e.g., a hospital stay Variable- one of the characteristics of that observation, e.g. the length of stay Database- a collection of records Data types: character, numeric, SAS date format

5 DATA step The procedure to manipulate datasets –Select records –Select variables –Make new variables by transforming pre- existing variables –Combine datasets

6 Alternatives ways to read data into a DATA step Read raw data with INPUT –This is to convert text files to SAS –Some Austin financial data sets are text files e.g., CDR or FMS Read existing SAS files with: –SET –MERGE

7 DATA step Read in data Takes one record at a time Create new variables from old KEEP or DROP statement to select variables IF statement to select observations

8 Example of DATA step * Create temporary data set; DATA PM01; * Read permanent data set PTF main; SET IN1.PM01; * Select stays of one specific patient; IF SCRSSN in (‘7677569876’); * Keep the following four variables; KEEP SCRSSN STA3N ADMITDAY DISDAY;

9 Creating new variables * Example of calculation using SAS dates; * Dates are stored as number of days since 1/1/1960; * Find length of hospital stay; LOS=DISDAY-ADMITDAY;

10 Sub-setting IF statement Select records in a data step using IF [true- false condition]; Only records that meet [true-false condition] will be retained It is also possible to select records without a data step –Can select records in a proc step using WHERE [true-false condition];

11 IF-THEN-ELSE statement If [true-false condition] then [statement to carry out when condition is true]; Else [statement to carry out when condition is not true]; * Note that you can have an if-then statement without an else statement;

12 Example of IF-THEN-ELSE statement * Create count of ICU days in bedsection file; * ICUDAYS are equal to the number of days in this bed section if it is the medical (12) or surgical (63) intensive care unit; IF BEDSECN=12 OR BEDSECN=63 THEN ICUDAYS=LSB; ELSE ICUDAYS=0;

13 IN operator The IN operator is a shorthand way of stating a series of conditions separated by OR The following are equivalent IF statments –IF BEDSECN IN (12,63); –IF BEDSECN=12 OR BEDSECN=63; When data are character, they should be in quotes, e.g. –IF STA6A in (“640CNH”, “640A0 ”);

14 Conditions that use ICD-9 codes ICD-9 code has 6 characters Decimal in ICD-9 code is not recorded –Blanks are on right –Thus code ICD-9 250.1 is stored as “2501_ _”

15 Substring function Manipulate character strings Syntax: –SUBSTR(variable,start,number of chars) Example –DXLSF=‘2501_ _’ –X=SUBSTR(DXLSF,1,3); –X has the value ‘250’

16 Use of Substring function to create an indicator IF SUBSTR(DXLSF,1,3)=‘250’ THEN DIABETES=1; ELSE DIABETES=0; Checks if DXLSF takes any value beginning with 250, 250.0, 250.1, 250.2, etc. If condition is true, variable DIABETES is assigned a value of 1, otherwise it is assigned a value of zero.

17 The “=:” operator Shorthand for substring function –with starting place at one –for number of characters on the right hand side of comparison The following conditions are equivalent –SUBSTR(DXLSF,1,3)=‘250’; –DXLSF=:’250’;

18 Array Statement A collection of variables Useful to process series of variables in same fashion Syntax for array with N members ARRAY Name(N) [Variables to be mapped to the array]; Example of 10 member array ARRAY DX(10) DXLSF DXF2-DXF10; Variables can now be referred to by their array name, e.g., DXF3 can also be called DX(3)

19 Do Statement Useful for a series of statements after a condition is tested IF [condition] THEN DO; [statements]; END;

20 Other Do Loops Execute statements as long as condition is met DO WHILE [Condition]; [statements]; END; Useful to repeatedly execute statements DO I=1 to 10; [statements]; END;

21 Example of using an array in a do loop Check all 10 diagnoses to see if any of them are the ICD-9 code for diabetes * Initialize variable to zero; DIABETES=0; * Start do loop and run it 10 times; DO X=1 to 10; IF DX(X)=:’250’ THEN DIABETES=1; END; Why can’t an else statement be used to set DIABETES=0?

22 Combining files: concatenation vs. merge Concatenation –Stack data sets with different observations described by variables –SET data1 data2 Merge –Combine datasets with common elements but different variables –MERGE data1 data2 –Requires BY statement

23 Example of concatenation Data sets: Become: NameAge Joe22 NameAge Sam53 NameAge Sam53 Joe22

24 Example of merging Data sets to be merge BY name: Become: NameZip Joe95617 Sam94025 NameAge Joe22 Sam53 NameZipAge Joe9561722 Sam9402553

25 MERGE BY statement may have more than one variable Datasets must have already been sorted At least one dataset must have no duplication of BY variable values

26 Common errors with MERGE to avoid: Many-to-Many merge One-to-one merge or one-to-many merge are o.k. Many-to-many merge is not possible

27 Example of one-to-many merging Data sets to be merged BY name: Become: NameZip Joe95617 Sam94025 NameAdmitday Joe1/22/01 Sam5/10/01 Sam5/21/01 NameZipAdmitday Joe956171/22/01 Sam940255/10/01 Sam940255/21/01

28 Impossible to merge many-to-many Data sets cannot be merged BY name: NameZip Sam95617 Sam94025 NameAdmitday Sam5/10/01 Sam5/21/01

29 Common errors with MERGE to avoid- Variables in both data sets but with different value –One value will copy over the other value Lesson: The only variables that are in both datasets should appear in the BY statement Alternatives: –Rename variables that appear in both datasets or –Use PROC SQL

30 Manipulating result of merge statement with IN values * IN option indicates which data set record came from; DATA new; MERGE old1 (IN=INOLD1) old2 (IN=IN0LD2); BY NAME IF INOLD1 and INOLD2; * If statement keeps only those records that are in both data sets;

31 Another example of manipulating result of merge statement * IN option indicates which data set record came from; DATA both oneonly twoonly; MERGE old1 (IN=INOLD1) old2 (IN=IN0LD2); BY NAME IF INOLD1 and INOLD2 THEN OUTPUT both; ELSE IF INOLD1 THEN OUTPUT oneonly; ELSE OUTPUT twoonly; * If-else statements create dataset with records that merged, and datasets of records that didn’t merge;

32 PROC CONTENTS * Basic information about a new data set; PROC CONTENTS DATA=file;

33 PROC MEANS Basic information about a new data set; PROC MEANS DATA=file ; VAR (list of variables); Can specify particular statistics PROC MEANS DATA=file N MEAN MIN MAX SUM; VAR (list of variables); Minimum and maximum useful to find unexpected values

34 PROC PRINT PROC PRINT DATA=source NOOBS; VAR [names of variables to print]; FORMAT variables format; SUM [numeric variables]; NOOBS option suppresses observation number Some useful formats: –MMDDYY. To print dates –COMMA12. To print comma delmited numbers, e.g., 21,000,000

35 PROC SUMMARY Extremely useful to summarize data in a number of records –Calculate number of ICU days by medical center –Find patients who had a diagnosis of diabetes in any outpatient visit during the year

36 PROC SUMMARY EXAMPLE * Summarize data in file created from bed section; * NWAY option specifies not to create records with totals or subtotals; PROC SUMMARY DATA=PB01 NWAY; * Summarize by medical center; BY STA3N; * Count of ICU data created in prior data step; VAR ICUDAYS; * Create new data set with one record per med center; OUTPUT OUT=new SUM (ICUDAYS)=ICUDAYS;

37 PROC SUMMARY EXAMPLE 2 * Examine whether patient every had outpatient visit with diabetes diagnosis; * NWAY option specifies don’t create record with totals; PROC SUMMARY DATA=se NWAY; BY SCRSSN; VAR DIABETES; OUTPUT OUT=new MAX(DIABETES)=DIABETES; *In prior data step, variable diabetes assigned has a value of 0 or 1 based on diagnoses; * We find the maximum value achieved for this patient;

38 PROC SUMMARY DETAILS CLASS vs. BY statement –Statements have equivalent effect –BY statement is for large jobs. Must sort data first. –CLASS is for small jobs. No need to presort data. NWAY option suppresses creation of record with total –and subtotals if there are multiple CLASS variables

39 PROC SUMMARY details- ID statement Specifies variable whose value is to be carried forward; here is an appropriate use: –CLASS SCRSSN –ID DOB (birth date) This is appropriate because every record with the same SCRSSN also has the same birthrate ID statement should not be used if different records being summarized have different values Use the ID statement with care!

40 PROC FREQ Counts number of records (frequency) with each value of a categorical variable Can make two-dimensional table for all combinations of two categorical variables

41 PROC TABULATE Find sub-total of a numeric variable for each value of a categorical variable

42 How to be an accurate programmer Check Double check Audit Write down what you did (Documentation)

43 Check your work Check work by hand –Review log file. You should have no error messages or warnings! –Work with a small data set Use OBS=NN option to limit number of records processed –Print sample records and check that result is what you intended

44 Double check your work Double check –Find result by independent method –Use an alternate program- was the result the same? Audit programming –Have another programmer review your code

45 Document your work Use comments * Comment ; /* Block of comments */ Write comments in program /************************* * Program: OUTCOM1.SAS * Date: September 2, 2003 * Programmer: P. Barnett *************************/

46 Documentation (cont.) Use variable labels –Define label when creating a new variable LABEL DOB=“Birth date”; One label statement can label many variables Use dataset labels –In a Data step that creates a new dataset, use option DATA OUT1.OUTCOM02 (LABEL=“QUERI Outcomes Outpatient Visits FY02”);

47 Documentation (cont.) Use easily understood variable and file names –Consider giving program same name as dataset that it creates Use indentation –to show which “DO” statement matches to which “END” statement –To show which “IF” statements matches to which “ELSE statement

48 Documentation (cont.) Keep a list of programs –Program name –Date created –Purpose –Names of source data sets –Names of data sets created Distinguish “final” programs from draft programs

49 Documentation (cont.) You are the audience for your documentation!

50 How to resolve problems Break program into many small steps Print interim results after each small step Test syntax of program commands with a small invented data set RTFM (Read the Funny Manual) If these steps don’t work, ask for help


Download ppt "Analysis of VA Health Care Data with SAS Paul G. Barnett, PhD September, 2003."

Similar presentations


Ads by Google