Presentation is loading. Please wait.

Presentation is loading. Please wait.

STUDENTLIFE PREDICTIVE MODELING Hongyu Chen Jing Li Mubing Li CS69/169 Mobile Health March 2015.

Similar presentations


Presentation on theme: "STUDENTLIFE PREDICTIVE MODELING Hongyu Chen Jing Li Mubing Li CS69/169 Mobile Health March 2015."— Presentation transcript:

1 STUDENTLIFE PREDICTIVE MODELING Hongyu Chen Jing Li Mubing Li CS69/169 Mobile Health March 2015

2 Motivation Let’s go further than StudentLife 1.0! Standardized, normalized data set Proof of concept Scientific finding to our question: Can we predict depression from a two week window of StudentLife data collection? Study Design Data cleaning/parsing Feature selection Class determination Predictive classifiers through supervised machine learning methods Validation Case study

3 StudentLife Dataset PHQ9 Threshold Non- depressed whole time Depressed whole time Depresstion status changed EMA: Sleep Mood Stress Social Exercise, etc. Sensor: Audio Conversation Activity Dark, etc. Feature Class SVM, etc. Prediction N-fold CV Result analysis Accruacy F statistics Precision/ Recall Sensitivity/Sp ecificity Data Preprocessing & Interpolation: Linear Nearest Neighbour Concatenation Case Study PCA Data separation by week Project Design/Workflow Feature Selection

4 Class Determination via Thresholding Why? Keeps it a classification problem, not a regression problem Depression presents in many different ways Small sample size

5 Class Determination via Thresholding PHQ-9 scores of students before and after StudentLife

6 Class Determination via Thresholding Threshold determined by visual inspection on strip plot

7 Class Determination via Thresholding Consistent with medical literature? PHQ-9 scoreDiagnosis 0-4No Depression 5-9Mild Depression 10-14Moderate Depression 15-19Moderately Severe Depression 20-27Severe Depression Do you at least moderate depression?

8 Linear Interpolation for EMA Data EMA data is very sparse Interpolation increases number of points

9 Nearest-Neighbor Interpolation for Sensor Data Sensor data is too dense Interpolation decreases number of points

10 Standardized Data Set? In the first iteration of StudentLife: Every data collection modality had Different scaling Different periodicity Different quality Now: All 15 depression-related modalities have One value per 24-hour period Comparable scaling A guarantee of good quality (279 samples removed)

11 Feature Selection Step 1: Decide sliding window time frame Two weeks Balance of enough time to make diagnosis, but short enough to have enough time points for testing Step 2: Feature aggregation Step 3: Dimensionality Reduction We cannot use 105 dimensions to classify only a couple hundred cases!

12 Principle Component Analysis (PCA)

13 Top Features from PCA

14 Random Forest Decision Trees

15 Predictive classifier Classes: (not depressed, depressed) -1 +1 Features: top features by PCA Training set All depressed Samples(50%) Selected not depressed Samples(50%) SVM model Cross Validation Accuracy = 96.6667%

16 Case study ●Participant No.16: Beginning of the term: Not depressed (-1) End of the term: Depressed (+1) Not depressed Depressed

17 Future Directions Why is this important? 1. Contributes (marginally) to existing medical literature about depression 2. Proof of concept for possible interventions Imagine app that tells you when you could be depressed Connects you with resources to help 3. Standardized data set available Opens door to future analyses Not only on depression Small taste of the beginnings of… StudentLife2.0?


Download ppt "STUDENTLIFE PREDICTIVE MODELING Hongyu Chen Jing Li Mubing Li CS69/169 Mobile Health March 2015."

Similar presentations


Ads by Google