Time to CARE: A collaborative engine for practical disease prediction

Time to CARE: A collaborative engine for practical disease prediction
D. Davis et al. (2009) in Data mining and knowledge discovery Speaker: Sang Ho Oh Feb. 20th on 2018

Introduction Annual health care expenditure in the U.S. alone is an overwhelming sum. Majority of this money is used for disease treatment. Experts expect the burden on the medical system to continually increase in coming years. In 2001, 3.1 visits/patient were made to physician. In history, researchers shown many conditions to have recognizable indicators before onset/preventable risk factors. The prospective medicine and aim at minimizing the risk can be done. Current situation: Physicians can use family and health history and physical examination to approximate the risk of patient. Medical care is reactive, stepping in once the symptoms have emerged. How to prevent? Prevailing model of prospective health care -> Genome revolution. Not yet matured. Then what is the option? Phenotype and disease history based approaches offer the promise of advances towards disease prediction.

Purpose of the study Aim of the study:
Development of a predictive system (called CARE: Collaborative Assessment and Recommendation Engine). How? Examining the use of medical history For? To examine information about disease correlations and inexpensively assess risk. How to predict about the future diseases a patient may develop? Generate a patient’s prognosis based on the experiences of other similar patients. Method used in the study: Collaborative filtering (will be explained in next page). Contributions of the study: A novel application of collaborative filtering in the medical domain for advancing the field of prospective medicine. Present a general system which makes predictions on all types of diseases and medical conditions (using ICD-9-CM). *ICD-9-CM: International classification of diseases codes.

Collaborative filtering
It is designed to predict the preferences of one person(active user) based on the preferences of other similar persons(users). Assumption: people will enjoy the same items as their similar peers. Having some common preferences is a strong predictor of additional common preferences. Predictions are based on datasets consisting of many user profiles Accomplished by calculating a weight of similarity between active user and all others. Active user’s opinion is determined by the weighted average of the others’ opinion. How is it applied in medical area? Each user is a patients whose profile is a diagnosed disease. Using collaborative filtering, they generated predictions on other diseases based on a set of other similar patients. Difference between original and modified version of collaborative filtering The rating is binary: either patient has a disease (1) or not (0).

Data used The database comprises the Medicare records of 13,039,018 elderly patients in U.S. with total of 32,341,348 visits. The input for the methods consists of each patient’s diagnosis history and provided per inpatient visit. Each data record consists of hospital visit, patient ID, and list of up to 10 diagnosis codes per visit. The diagnosis code – International Classification of Diseases, 9th revision, Clinical Modification (ICD-9-CM). Each disease is given a unique code that can be up to 5 character long. ICD-9 codes are hierarchical in nature so it can be collapsed to fewer characters which identifies a small family of related medical conditions. There are total of 18,207 unique disease codes expressed. *Example of collapsing code malignant hypertensive heart disease with heart failure. 4020 – non-speciﬁc malignant hypertensive heart disease. 402 - family of all hypertensive heart disease.

The CARE methodology The testing patient (denoted as 𝑎) is the individual for whom we are making predictions based on the histories of training patients (denoted as 𝐼,with each individuals denoted as 𝑖∈𝐼). The doted lines represent optional methods. All patients are represented by their medical history The training set is constrained to patients With at least two disease in common with testing patient. This will results the group of patients similar to the testing set patient. Collaborative filtering is performed generating predictions for the future visits of the testing patient. The multiple resulting predictions are combined. The output is the ranked list of diseases for the subsequent visit of the testing patient, ranked from the highest risk to the lowest.

Vector similarity Collaborative filtering is used to make a prediction 𝑝(𝑎,𝑗) on an active user 𝑎 for item 𝑗 based on the similarity between user 𝑎 and every other user 𝑖 who has previously given a vote 𝑣 𝑖,𝑗 for that item. where 𝑣 𝑖 – average vote of each user. 𝑘 – normalizing constant (makes sum of weights equal to 1). 𝐼 – The entire training set of users 𝐼 𝑗 – the subset of users who have voted on 𝑗 The similarity 𝑤(𝑎,𝑗) is calculated by vector similarity: where 𝐽 𝑖 - set of items rated by user 𝑖

Inverse frequency They further extended the vector similarity equation to include inverse frequency. Gives lower weights to very common diseases in the training set. Based on intuition that sharing rare disease has more impact on similarity than sharing common disease. There can be many medical diagnoses shared between patients but the most important contributions arises from uncommon connections. The inverse frequency of disease 𝑗 is defined as: where 𝑛 – number of patients in the training set 𝑛 𝑗 – number of patients who have 𝑗 This incorporated into vector similarity by multiplying each disease vote by corresponding IF factor. This results the following equation:

Grouping of training patients
Before application of collaborative filtering, a group of relevant training patients is determined. Based on the number of diagnoses in common with the testing patient. Why? To remove the influence of patients who have little or no similarity. Training patients with no disease in common with the active patient do not contribute to the prediction score. Removing those does not result in loss of information but effectively reduces the runtime of the algorithm. How it works in CARE? In CARE, they include all patients with 2 or more diseases in common. This constraint enforces stronger similarities for all patients influencing the predictions. Helps to avoid the noise.

Optional methods ICARE ICD-9-CM code collapse Time-sensitive CARE
This means the “Iterative CARE” This method developed to capture the effect of each individual disease with minimal noise from other diseases but without loss of information due to removing them. ICD-9-CM code collapse In some cases, it is desirable for 4/5 digit ICD-9-CM codes to be collapsed in to more general 3 digit code which represent small groups of related/similar disease. There are two method: truncated to 3-digits before (pre-collapse) or after (post-collapse) applying collaborative filtering. Pre-collapse: significantly reduces the runtime of algorithm. Post-collapse: makes the result simpler to evaluate and interpret. Time-sensitive CARE CARE & ICARE do not take the order of or length between disease diagnoses when generating vector similarity. But matching with two diseases which occurred many years apart may not be relevant. For that reason, they modified the method to incorporate the length of time between medical event.

Experiments Evaluate the performance on predicting diseases which happen on a later data than those that the collaborative algorithm was given. They determined performance based on the overall list of predictions ranked in order from the most likely to the least likely. Metrics used: Coverage: the percentage of diseases for which prediction is made and ranked. Average rank: it is desirable for future diseases to have low rank positions. Half-life accuracy: measures the expected utility of the ranked list.

Performance trends To check how performance changes with respect to the amount of data known about the testing patient. This provides guidelines for minimum amount of information needed for meaningful result (better than baseline) and threshold for good result. The visit and diseases trend show that performance continually increases as more information is known. In (a), just 1 visit is sufficient to outperform the baseline. (b) shows that visit should have at least 3 diseases. But the data more than 35 diseases is too sparse for further conclusion. (c) shows that older diagnoses are less relevant to immediate concerns which is very obvious result.

Conclusions The goal of the paper is to come up with a system that can assist a medical practitioner in decision making. The authors proposed CARE, a collaborative recommendation engine for prospective and proactive healthcare. This CARE, ICARE, and time-sensitive CARE can predict and provide the future diagnoses of the patient to doctor. then appropriate medical test can be proceeded. Improves the quality of life for the patient. Also can reduce the health care costs.

Thank you

Time to CARE: A collaborative engine for practical disease prediction

Similar presentations

Presentation on theme: "Time to CARE: A collaborative engine for practical disease prediction"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Time to CARE: A collaborative engine for practical disease prediction

Similar presentations

Presentation on theme: "Time to CARE: A collaborative engine for practical disease prediction"— Presentation transcript:

Similar presentations

About project

Feedback