Doing Data Science – Chapter 12: Epidemiology Vast amounts of individual patient medical data is available – Detailed – visits, prescriptions, outcomes,

Doing Data Science – Chapter 12: Epidemiology Vast amounts of individual patient medical data is available – Detailed – visits, prescriptions, outcomes, etc. – Records cover lifetimes – Largest databases have records on 80 million people However many medical studies are observational – Not founded on data – Results effect actions of doctors and insurance regulators

Confounder Problem and Stratification Confounding problem: an extraneous variable which correlates to both the dependent and independent statistical variable, giving an incorrect perception of cause and effect Stratification: partitioning a case into subcases and evaluating just the subcases to reach conclusions about the top level case – Weighted average is one way of evaluating subcases Example [p.294-295]: – In study where equal number of women (50) and men (50) had treatment but different numbers (80 women, 20 men) were in the control group – Original causal effect is 10% – Stratified causal effect is 5% for men and 11.25% for women – This does NOT prove that the treatment side effects are twice as strong for women Problem – errors in causality if the numbers in the groups after stratification are too different to give meaningful statistics

Data Driven Studies Analysis of 50 studies of drug/outcome pairs – 5000 analyses for each pair on nine databases – Example: ACE inhibitors (treatment for hypertension)/swelling of the heart Results varied between databases from 3X risk to 6X risk – For 20 of 50 pairs, risk or no risk was database dependent – By adjusting factors of databases, confounders, and time windows, all studies can show risk or no risk

Data Driven Studies Observational Medical Outcomes Partnership (OMOP) – See how well current methods predict things we already know – 10 large medical databases containing records for 200 million people – $25M – Determined an ROC curve. Area Under the Curve (AUC) was 0.65, not much better than a random 0.5 – Databases are self-consistent – using one database gave better accuracy (0.92 in one case) – Graphs below show ~80% sensitivity with ~10% false-positive rate [p.302]

“The epidemiologists in general don’t believe the results of this study.” In other words, they prefer to rely on observational rather than data driven conclusions

References http://en.wikipedia.org/wiki/Confounding https://en.wikipedia.org/wiki/ACE_inhibitor

Doing Data Science – Chapter 12: Epidemiology Vast amounts of individual patient medical data is available – Detailed – visits, prescriptions, outcomes,

Similar presentations

Presentation on theme: "Doing Data Science – Chapter 12: Epidemiology Vast amounts of individual patient medical data is available – Detailed – visits, prescriptions, outcomes,"— Presentation transcript:

Similar presentations

About project

Feedback

Войти

Auth with social network:

Doing Data Science – Chapter 12: Epidemiology Vast amounts of individual patient medical data is available – Detailed – visits, prescriptions, outcomes,

Similar presentations

Presentation on theme: "Doing Data Science – Chapter 12: Epidemiology Vast amounts of individual patient medical data is available – Detailed – visits, prescriptions, outcomes,"— Presentation transcript:

Similar presentations

About project

Feedback