Presentation is loading. Please wait.

Presentation is loading. Please wait.

Forward Semi-Supervised Feature Selection Jiangtao Ren, Zhengyuan Qiu, Wei Fan, Hong Cheng, and Philip S. Yu.

Similar presentations


Presentation on theme: "Forward Semi-Supervised Feature Selection Jiangtao Ren, Zhengyuan Qiu, Wei Fan, Hong Cheng, and Philip S. Yu."— Presentation transcript:

1 Forward Semi-Supervised Feature Selection Jiangtao Ren, Zhengyuan Qiu, Wei Fan, Hong Cheng, and Philip S. Yu

2 Feature Selection Challenges of high dimension dataChallenges of high dimension data Dimensional curse Noise Objective of feature selectionObjective of feature selection Improving the performance of the predictorsImproving the performance of the predictors Providing more cost-effective predictorsProviding more cost-effective predictors Better understanding of the underlying process that generated the dataBetter understanding of the underlying process that generated the data

3 Supervised / unsupervised learning Supervised learningSupervised learning Used labeled data onlyUsed labeled data only Unsupervised learningUnsupervised learning Used unlabeled data onlyUsed unlabeled data only

4 Challenges of traditional feature selection methods A lot of supervised learning methodsA lot of supervised learning methods Lack of labeled dataLack of labeled data The class labels are obtained manuallyThe class labels are obtained manually The class labels are expensive to obtainedThe class labels are expensive to obtained Data biasData bias Challenges:Challenges: The training dataset cannot reflect the distribution of the real data in some time.The training dataset cannot reflect the distribution of the real data in some time. The model constructed on training set may be not suitable for the unseen dataThe model constructed on training set may be not suitable for the unseen data

5 Abundance of the unlabeled data Easy to obtainEasy to obtain Don t need the manually-labeled informationDon t need the manually-labeled information Can reflect the distribution of the real dataCan reflect the distribution of the real data

6 Then … How to used unlabeled data effectively?

7 Forward Semi-Supervised Feature Selection Basic ideaBasic idea Random selection from unlabeled data with predicted labelsRandom selection from unlabeled data with predicted labels Form new training setForm new training set Feature selection on new training setFeature selection on new training set Perform several iterationsPerform several iterations Add the most frequent one to the result feature subsetAdd the most frequent one to the result feature subset

8 Forward Semi-Supervised Feature Selection Iterations Select the best features feature subset SFFS Random selection New training set Select the most frequent one Unlabeled data with predicted labels Iterations Form the new Feature subset Train the Classifier and Prediction

9 Forward semi-supervised feature selection

10 Experiment DatasetsDatasets UCIUCI ClassifiersClassifiers NaiveBayes, NNge, and k-NNNaiveBayes, NNge, and k-NN Comparison FULL, SFFS and SLS Z. Zhao and H. Liu. ``Semi-supervised Feature Selection via Spectral Analysis", SIAM International Conference on Data Mining (SDM-07), April 26-28, 2007, Minneapolis, Minnesoda ------------------ SLSSDM-07

11 Empirical Results

12 Conclusion The proposed algorithm works in an iterative procedure;The proposed algorithm works in an iterative procedure; Unlabeled examples receive labels from the classifier constructed on currently selected feature subset;Unlabeled examples receive labels from the classifier constructed on currently selected feature subset; Form joint dataset with labeled and randomly selected unlabeled data with predicted labels;Form joint dataset with labeled and randomly selected unlabeled data with predicted labels; Experiment results show that the proposed approach, can obtained higher accuracy than other supervised and semi-supervised feature selection algorithms in sometime.Experiment results show that the proposed approach, can obtained higher accuracy than other supervised and semi-supervised feature selection algorithms in sometime.


Download ppt "Forward Semi-Supervised Feature Selection Jiangtao Ren, Zhengyuan Qiu, Wei Fan, Hong Cheng, and Philip S. Yu."

Similar presentations


Ads by Google