Download presentation

Presentation is loading. Please wait.

Published byTanya Keel Modified over 2 years ago

1
Replacing Missing Values Jukka Parviainen Tik-61.181 Special Course in Information Technology 27.10.1999

2
Jukka Parviainen2 Agenda n Motivation n Objectives n Meaning for the conclusions n Origin of missing values (MV) n Detection of missing values n Replacing missing values n Examples

3
27.10.1999Jukka Parviainen3 References n Pyle, DP for DM, chapter 8 n Hair, Anderson, Tatham, Black: Multivariate Data Analysis n Bishop: NN for PR

4
27.10.1999Jukka Parviainen4 …missing values?

5
27.10.1999Jukka Parviainen5 Motivation n There are always MVs in a real data set n MVs may have an impact on modeling, in fact, they can destroy it! n MVs contain also information!!! n Hint for the modeler: Avoid-Detect- Replace-Understand

6
27.10.1999Jukka Parviainen6 “Definitions” n Missing value - not captured in the data set: errors in feeding, transmission,... n Empty value - no value in the population n Outlier, out-of-range value

7
27.10.1999Jukka Parviainen7 Objectives n Controlled and understood by the modeler n “Least harm”, no “new” information into a data set n statistical estimation of MVs not the primary issue, but DM n KISS - speed and simplicity n PIE-I/O - training+testing+execution

8
27.10.1999Jukka Parviainen8 Origin and Detection n Missing data process n Degree of randomness u nonrandom u missing at random u missing completely at random n Detecting missing value patterns u number of MVs in each variable/case u compare MVP to complete sets

9
27.10.1999Jukka Parviainen9 Replacing missing values n Randomness of MVs? n Methods u Use the complete data u Delete variable(s)/case(s) u Imputation methods... u Model based (ML, Bayes) u Use robust models

10
27.10.1999Jukka Parviainen10 Imputation methods n Process of estimating MVs based on valid values of other variables / cases n Techniques: u distribution characteristics from all available valid values u replacing: case, mean substitution, cold deck, regression imputation

11
27.10.1999Jukka Parviainen11 Examples n Polls, Questionnaires u Planning more than essential u human factors! u small amounts of data n Data from steel plant u Information system u errors, default values u lots of data

12
27.10.1999Jukka Parviainen12 Questions n Does software applications help or hide the effect of missing values? (SPSS Clementine) n Execution/prediction phase of DM process? n What to do with alpha variables?

Similar presentations

OK

Topic (ii): New and Emerging Methods Maria Garcia (USA) Jeroen Pannekoek (Netherlands) UNECE Work Session on Statistical Data Editing Paris, France, 28-30.

Topic (ii): New and Emerging Methods Maria Garcia (USA) Jeroen Pannekoek (Netherlands) UNECE Work Session on Statistical Data Editing Paris, France, 28-30.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google