Presentation is loading. Please wait.

Presentation is loading. Please wait.

Replacing Missing Values Jukka Parviainen Tik-61.181 Special Course in Information Technology 27.10.1999.

Similar presentations


Presentation on theme: "Replacing Missing Values Jukka Parviainen Tik-61.181 Special Course in Information Technology 27.10.1999."— Presentation transcript:

1 Replacing Missing Values Jukka Parviainen Tik Special Course in Information Technology

2 Jukka Parviainen2 Agenda n Motivation n Objectives n Meaning for the conclusions n Origin of missing values (MV) n Detection of missing values n Replacing missing values n Examples

3 Jukka Parviainen3 References n Pyle, DP for DM, chapter 8 n Hair, Anderson, Tatham, Black: Multivariate Data Analysis n Bishop: NN for PR

4 Jukka Parviainen4 …missing values?

5 Jukka Parviainen5 Motivation n There are always MVs in a real data set n MVs may have an impact on modeling, in fact, they can destroy it! n MVs contain also information!!! n Hint for the modeler: Avoid-Detect- Replace-Understand

6 Jukka Parviainen6 “Definitions” n Missing value - not captured in the data set: errors in feeding, transmission,... n Empty value - no value in the population n Outlier, out-of-range value

7 Jukka Parviainen7 Objectives n Controlled and understood by the modeler n “Least harm”, no “new” information into a data set n statistical estimation of MVs not the primary issue, but DM n KISS - speed and simplicity n PIE-I/O - training+testing+execution

8 Jukka Parviainen8 Origin and Detection n Missing data process n Degree of randomness u nonrandom u missing at random u missing completely at random n Detecting missing value patterns u number of MVs in each variable/case u compare MVP to complete sets

9 Jukka Parviainen9 Replacing missing values n Randomness of MVs? n Methods u Use the complete data u Delete variable(s)/case(s) u Imputation methods... u Model based (ML, Bayes) u Use robust models

10 Jukka Parviainen10 Imputation methods n Process of estimating MVs based on valid values of other variables / cases n Techniques: u distribution characteristics from all available valid values u replacing: case, mean substitution, cold deck, regression imputation

11 Jukka Parviainen11 Examples n Polls, Questionnaires u Planning more than essential u human factors! u small amounts of data n Data from steel plant u Information system u errors, default values u lots of data

12 Jukka Parviainen12 Questions n Does software applications help or hide the effect of missing values? (SPSS Clementine) n Execution/prediction phase of DM process? n What to do with alpha variables?


Download ppt "Replacing Missing Values Jukka Parviainen Tik-61.181 Special Course in Information Technology 27.10.1999."

Similar presentations


Ads by Google