Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Large Databases for Research

Similar presentations


Presentation on theme: "Using Large Databases for Research"— Presentation transcript:

1 Using Large Databases for Research
Melissa Schiff, MD, MPH UNM Hospitalists’ Research Club July 16, 2019

2 How many people have used a large database for a research project?

3 Outline Reasons for using large databases
How to evaluate if a database will work for your research Nuts and bolts of using large databases Useful databases for internal med / hospitalist research projects

4 “Epidemiologists are data scavengers”
Secondary database – administrative data collected by an organization, surveillance data Use of secondary data revolutionized by tech advances and information technology “Big Data” used in health and many other fields outside of medicine

5 Reasons to use Large Databases
Primary data collection – expensive in terms of money and time Secondary database advantages – Data available on the web, from data collection organization or third party Ex: Medicare data available from CMS Data collected on large number of people – large sample size, ascertain rare diseases or exposures Ex: Cerner Health Facts 600 hospitals, 85+ million patients

6 Population-based data - calculate incidence rates, avoid referral bias
Ex: Birth certificates, death certificates, NM HIDD Data less biased – collected for another purpose Ex: Self-report alcohol-related admissions vs hospitalization data

7 Linkage of secondary database to primary data – independent measure or validation
Ex: Occupational lung disease study linked to employment records Use of natural language processing – search text for keywords Ex: OMI data searched for oil and gas worker deaths

8 Evaluating Large Database for your Research
Research question definition, variables Exposure(s) Outcome(s) Confounding factors Identify the database of interest – publically available, documentation available Population included in database Demographics, geographic location Inclusion/exclusion criteria

9

10 Database – SEER linked to Medicare, 2001-2011
Research question: Does use of preventive care differ by race among Medicare beneficiaries with early stage endometrial cancer? Database – SEER linked to Medicare, Population –women 65+ years (Medicare), endometrial cancer (SEER) N=13,054 Exposure – race (Medicare) Outcome – preventive care (Medicare outpatient visits) Well visit Flu vaccine Mammogram Diabetes screening

11 Time frame – months, years
Data dictionary – document listing the variables, definitions Are your key variables included in the database? How are they measured? How much missing data for key variables? How many cases available in database? May need to download database to evaluate

12 Nuts and Bolts for using Large Databases
Accessing database – Available online, from organization May need to submit written request with research question Cost of database Finding help Researchers at UNM with specific database experience CTSC State health department Experts at federal agencies (e.g. CMS)

13 Statistical / data management needs
Statistical package capacity for large databases Biostatistician with familiarity of database Data collection at the individual level or encounter level – un-duplication Subject identifiers Longitudinal - analysis over time IRB – typically considered “Exempt” status, check with UNM HRPO

14 Efficient use of one database
Main research question Specific populations - age groups, disease severity Trends over time Evaluation of policy change Financial data – health economics Identify refined research question – future primary data collection Become the local expert

15 Useful Databases Databases available at UNM
Health Facts – Cerner EMR data for UNM hospital, 600 hospitals Truven – 240 million patients, pharmaceuticals I2B2 – identify number of cases in UNM EMR Vizient New Mexico Tumor Registry (SEER) – all cancer cases in NM

16 New Mexico databases Office of Medical Investigator
New Mexico Death Certificates SEER-Medicare linked data Behavioral Risk Factor Surveillance System New Mexico Prescription Monitoring Program Indicator-based Information System (IBIS)

17 National databases Health Care Cost and Utilization Project (HCUP)
National Inpatient Sample (NIS) – 7 million hospital stays National Health And Nutrition Examination Survey (NHANES) – 5000 people annually National Ambulatory Medical Care Survey Medicaid Medicare Veteran’s Administration

18 List of large databases - https://www.ehdp.com/vitalnet/datasets.htm

19 Summary Research question - consider using a large database
Investigating a large database – population, variables, missing data, cost Accessing a large database – availability, time frame to get data, data management/analysis, people to help Multiple uses – consider variety of research questions to answer

20 Questions? Contact information


Download ppt "Using Large Databases for Research"

Similar presentations


Ads by Google