Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09.

Similar presentations


Presentation on theme: "Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09."— Presentation transcript:

1 Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09

2 The Question In the WHI Observational Study, are women with pet fish less likely to have ever taken pills for high cholesterol at baseline?

3 What We Want to Do Find the data Download the appropriate zip files Load them into SAS Merge our sets together Do a basic Chi-Square test

4 A Few Notes: The data files used for this example are subsets of the full form data. This was done to reduce download time and ease the replication of this analysis All processes we will go through are identical to what you would do for a normal analysis

5 Finding the Data The first step is to figure out what we need to answer our question. We will need: Pet data Cholesterol data Demographic data (to help us select only women in the observational study)

6 First we want to go to the study operations web site: www.whiops.orgwww.whiops.org

7 Select the Study Operations Link

8 Click on the “Data” Tab

9 The Data Screen

10 Data available for both WHI and WHIMS Images of all forms Options to look for dictionaries by category Link to the Data Distribution Agreement - Anyone who uses the data should fill it out - PI’s are responsible for data at the clinics

11 Let’s Look for Our Data First, let’s hunt for the fish data. Since we don’t know what form it’s on, let’s click on the ‘Data dictionaries by analysis category’ link.

12 Where Would Fish Be? Let’s take a look in the Psychosocial/Behavioral subcategory

13 Since there are 216 variables, it will be easier to right click on the document and search for “fish”

14 Searching for Fish

15 Found It!

16 The Fish Variable is on Form 37 - We should also note that it is a sub question of ‘Do you have a pet” and is a “Mark all that apply” question!

17 Now Let’s Find High Cholesterol Going back to the ‘Data Dictionaries by Category’ screen, it will be in the Medical History section

18 Medical History is Broken Up into Subcategories It should be under Cardiovascular

19 It looks like it is on Form 30

20 Now We Just Need an Indicator to tell us which Participants are in the OS All trial flags and indicators are in the Demographics file Now We’re Ready to Download the Data!

21 Back to the Data Screen Click on ‘Datasets’

22 The Datasets Screen

23 An Aside: The Datasets Page All data is arranged by form In addition to the zip files with the data, the.pdf files of the data dictionaries can also be downloaded separately For more detailed info on what’s in a zip file, please see the Appendix at the end of the walkthrough

24 Downloading the Data For the purposes of this demo, smaller sets have been created that anyone with a WHI password can download Only PI’s can normally download the actual data files Scroll down to the bottom of the Datasets page to find these files

25 WHI Example Files for Downloading

26 Downloading the Data When you click on the zip file link, you get a pop up box Save the file in the directory of your choice

27 Downloading Data For my example, I’ve saved all of the data in a directory I created called “DataTraining”

28 Extracting the Data from the Zip Files Double click on the first zip file, the demographics file, you should be able to see the contents Click on the ‘Extract’ button

29 Extracting the Data from the Zip Files Extract the files to the same directory as your zip files

30 Extracting the Data from the Zip Files Repeat with the other two zip files. The resulting directory should look like this:

31 Analyzing the Data We now have everything we need to look at the data For the purposes of this example, I’m going to use SAS Other software such as S-Plus, Stata, R, SPSS, and others can also be used Even if using another program, the SAS Load code provided can be used to determine the order of variables in the dataset as well as formats

32 Loading in the Data From the Default SAS screen, go up to the File menu and select ‘Open Program’

33 Loading in the Data Select all three of the files and click ‘Open’

34 Loading in the Data Let’s start with the demographics data One change needs to be made to each file to let SAS know where the data is located Find where the actual file is being read in, this is the line in the file that begins with INFILE We can also change the name of the file we are creating in the line above the INFILE statement

35 Loading in the Data In the example, we’ve put the data in ‘S:\DataTraining’ I’ve also renamed the file ‘demographics’ instead of the default, which was ‘dem_ctos_train’

36 Loading in the Data Now that the location of the datafile has been updated, we can run the SAS Code Go to the ‘running man’ icon, which is the button to submit code

37 Loading in the Data If you are concerned or unsure whether it worked or not, you can look at the SAS log. The tab is at the bottom of the screen. Any errors would show up as RED in the log

38 SAS Log for Loading in Demographics

39 Loading in the Data Now we want to repeat the process for the other two files. First for Form 30

40 Loading in the Data Then for Form 37

41 Looking at What We Have Let’s make a new SAS program file to look at the data Go to the File Menu and select ‘New Program’

42 Looking at What We Have We can also now close the three files used to load the data into SAS You should now have your new program, the log, and the output tabs

43 Looking at What We Have To know the names of the files we’ve loaded we can use some PROC DATASETS code.

44 Looking at What We Have Once the code is typed in, click the submit button again and then go to the LOG tab

45 Looking at What We Have In the log we see the three files we’ve loaded: - DEMOGRAPHICS (The Demographics File) - FORM30 (The Cholesterol Data) - FORM37 (The Pet Fish Data) Now we need to do some data manipulation to pull this all together

46 The Demographics File Let’s look at the demographics file (DEMOGRAPHICS) first PROC CONTENTS can be used to determine what variables are in a file Highlight the code and then hit the submit button

47 The Demographics File On the output screen we see what variables are available We only want to keep OS participants, so we will need the OSFLAG variable, which has a value of 1 for participants in the observational study We also want to keep the ID variable for merging the files later

48 The Demographics File Let’s Look at the Code to do this: We are manipulating the ‘demographics’ file and creating a new file ‘demographics_2’ with our changes We only want to keep the ID and OSFLAG variables

49 The Form 30 File This is our medical history data Looking at the data dictionary, we see that this file is a baseline file with one row per participant

50 The Form 30 File Let’s look at the contents of the Form 30 file

51 The Form 30 File The only variables we are interested in are ID and HICHOLRP

52 The Form 37 File Finally, let’s get our fish data from Form 37 From the data dictionary it looks like there are multiple rows per participant, so we will have to filter down to one baseline visit per person

53 The Form 37 File To do this, we have to use some additional variables

54 The Form 37 File The F37VTYP can be used to only select screening visits The F37VCLO can be used to ensure there is only one record per participant The PET variable is also needed since the FISH variable is a mark-all-that-apply subquestion We are interested in keeping the ID, PET, and FISH variables

55 The Form 37 File Let’s add to the code and then submit all of our new work

56 Sorting the Data Now let’s put it all together The ID variable is the key to merging the data, so we need to sort all of our files by ID.

57 Merging the Data Now we can put the three sets together It is critical to make sure you merge BY ID. If you don’t, SAS will put the files together as they stand which can lead to incorrect results.

58 Merging the Data Let’s submit the code we’ve just created to put the sets together

59 Did it Work? Let’s go to the SAS Log tab

60 Did It Work? From the log tab we can see that the Demographics file had 161808 participants, the correct CT/OS total Our merged file, ‘fishset’, also has this same total Combined with no error messages in the log, it looks like we are in good shape Now lets look at the contents of this new file using another PROC CONTENTS statement: PROC CONTENTS DATA=fishset; RUN;

61 Did It Work? We have the variables we need: Pet Fish High Cholesterol OS Flag

62 But first… We do have two more items to consider: Because the fish variable is a subquestion of the ‘Do You Have a Pet?’ mark-all-that-apply question, anyone who answered ‘No’ to the pet question will have a missing value for ‘Fish’. Because these people do not have fish, we want to set their fish values to ‘No’ We also want to limit our analysis to observational study participants We’ll use one last data step to do this

63 One Last Data Step

64 Let’s Do the Analysis Finally we’re ready to look at our frequencies. PROC FREQ in SAS can be used to do this:

65 What Did We Discover?

66 15.0% of OS participants without pet fish at baseline had ever taken pills for high cholesterol 14.8% of OS participants with pet fish at baseline had every taken pills for high cholesterol p-value from a Chi-Square test: 0.74

67 What Did We Discover? It looks like having pet fish does not seem to be associated with taking pills for high cholesterol Perhaps they are still useful, however.

68 What Did We Discover? In other ways….

69 Questions?

70 Appendix: What’s in a Zip File?

71 Zip File Components The Zip file usually consists of six parts: WHI Data Preparation Documentation (.pdf) WHI Data Collection Frequency Chart (.pdf) The Form (.pdf) The Data Dictionary (.pdf) The SAS load code (.sas) The data (.dat)

72 Looking at a Zip File

73 The Data Preparation Documentation

74 The Data Prep Document details on how the files on the web are set up - Important variables are defined - Individual form issues are presented

75 WHI Collection Frequencies

76 The Collection Frequency Chart is arranged by form and list when each form was collected and what sample it was collected on

77 Form and Data Dictionary

78 The original form is useful to determine exactly how a question was asked The data dictionary informs on any issues relating to specific variables and also lets you know the structure of the dataset

79 SAS Load Code Will put the raw data into a SAS data set Also useful if using other programs, as data includes formats and data order

80 The Actual Data

81 Zip File Summary The data preparation notes, data collection chart, forms, and data dictionaries are all key components for understanding the data The SAS Load Code is useful for the structure of the individual data file, regardless of what program you are going to use to analyze the data


Download ppt "Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09."

Similar presentations


Ads by Google