Download presentation

Presentation is loading. Please wait.

Published byTanya Keel Modified over 2 years ago

1
Replacing Missing Values Jukka Parviainen Tik-61.181 Special Course in Information Technology 27.10.1999

2
Jukka Parviainen2 Agenda n Motivation n Objectives n Meaning for the conclusions n Origin of missing values (MV) n Detection of missing values n Replacing missing values n Examples

3
27.10.1999Jukka Parviainen3 References n Pyle, DP for DM, chapter 8 n Hair, Anderson, Tatham, Black: Multivariate Data Analysis n Bishop: NN for PR

4
27.10.1999Jukka Parviainen4 …missing values?

5
27.10.1999Jukka Parviainen5 Motivation n There are always MVs in a real data set n MVs may have an impact on modeling, in fact, they can destroy it! n MVs contain also information!!! n Hint for the modeler: Avoid-Detect- Replace-Understand

6
27.10.1999Jukka Parviainen6 “Definitions” n Missing value - not captured in the data set: errors in feeding, transmission,... n Empty value - no value in the population n Outlier, out-of-range value

7
27.10.1999Jukka Parviainen7 Objectives n Controlled and understood by the modeler n “Least harm”, no “new” information into a data set n statistical estimation of MVs not the primary issue, but DM n KISS - speed and simplicity n PIE-I/O - training+testing+execution

8
27.10.1999Jukka Parviainen8 Origin and Detection n Missing data process n Degree of randomness u nonrandom u missing at random u missing completely at random n Detecting missing value patterns u number of MVs in each variable/case u compare MVP to complete sets

9
27.10.1999Jukka Parviainen9 Replacing missing values n Randomness of MVs? n Methods u Use the complete data u Delete variable(s)/case(s) u Imputation methods... u Model based (ML, Bayes) u Use robust models

10
27.10.1999Jukka Parviainen10 Imputation methods n Process of estimating MVs based on valid values of other variables / cases n Techniques: u distribution characteristics from all available valid values u replacing: case, mean substitution, cold deck, regression imputation

11
27.10.1999Jukka Parviainen11 Examples n Polls, Questionnaires u Planning more than essential u human factors! u small amounts of data n Data from steel plant u Information system u errors, default values u lots of data

12
27.10.1999Jukka Parviainen12 Questions n Does software applications help or hide the effect of missing values? (SPSS Clementine) n Execution/prediction phase of DM process? n What to do with alpha variables?

Similar presentations

OK

Evaluating data quality issues from an industrial data set Gernot Liebchen Bheki Twala Mark Stephens Martin Shepperd Michelle.

Evaluating data quality issues from an industrial data set Gernot Liebchen Bheki Twala Mark Stephens Martin Shepperd Michelle.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google