Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning Jian Zhang Supervised by: Karen Petrie 1.

Similar presentations


Presentation on theme: "Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning Jian Zhang Supervised by: Karen Petrie 1."— Presentation transcript:

1 Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning Jian Zhang Supervised by: Karen Petrie 1

2 Background Cancer research has become an extremely data rich environment. Plenty of analysis packages can be used for analyzing the data. Data preprocessing. 2

3 Rich data environment 3 There are some factors about breast cancer

4 Raw clinical data sample Yes-No data: yes: yes, Yes, Ye, yed, yef … no: No, n, not … null: don’t know, no data, waiting for lab Positive-Negative data: Positive: +, ++, p, p++… Negative: -, n, neg, n---… Null: no data, ruined sample, waiting for lab 4

5 Basic version 5

6 Question? Could we make the process automated? 6

7 Introduction Decision Tree learning Weka 7

8 Decision Tree Learning Decision tree learning is a method for approximating discrete-valued functions, which is one of the most popular inductive algorithms. 8

9 Decision tree sample 9

10 Weka Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, which contains a collection of algorithms for data analysis and predictive modeling. 10

11 Experiment Data: Training dataset with 100 instances Test dataset with 100 instances, which has 17 different values from the training dataset Tool: weka 11

12 Experiment Experiment 1 : training dataset Experiment 2 : training dataset, test dataset 12

13 Experiment 1 Name of TreeCorrectly Classified Instances (%) Testing (%)Root mean squared error BFTree89990.0588 DecisionStump47550.422 FT87980.1698 J4882980.0976 J48graft82980.0976 LADTree81900.2317 LMT84910.2344 NBTree80980.2326 RandomForest831000.0781 RandomTree831000.0447 REPTree82980.0985 SimpleCart89960.1511 13

14 Experiment 2 Name of TreeCorrectly Classified Instances(%) Testing (%) Root mean squared error BFTree89880.2813 DecisionStump47490.4318 FT87900.2194 J4882880.2098 J48graft82880.2098 LADTree81890.2494 LMT84890.234 NBTree80880.2569 RandomForest83880.2095 RandomTree83880.209 REPTree82880.2098 SimpleCart89870.2848 14

15 Result Through the results, the decision tree has a good classification and prediction for the existing entries, but for the unknown entries, the prediction is not as good as expected. 15

16 Future work Find and correct the incorrect prediction in the process Automated transformation for unknown entries 16

17 Thank you ! 17


Download ppt "Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning Jian Zhang Supervised by: Karen Petrie 1."

Similar presentations


Ads by Google