Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project.

Slides:



Advertisements
Similar presentations
Chapter 5 Multiple Linear Regression
Advertisements

Design of Experiments Lecture I
Chapter Outline 3.1 Introduction
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
Uncertainty Quantification & the PSUADE Software
Prachi Saraph, Mark Last, and Abraham Kandel. Introduction Black-Box Testing Apply an Input Observe the corresponding output Compare Observed output with.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
12 Multiple Linear Regression CHAPTER OUTLINE
COLLABORATIVE FILTERING Mustafa Cavdar Neslihan Bulut.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Feature Selection Presented by: Nafise Hatamikhah
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Pattern Recognition and Machine Learning
Part I – MULTIVARIATE ANALYSIS C3 Multiple Linear Regression II © Angel A. Juan & Carles Serrat - UPC 2007/2008.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
4 th NETTAB Workshop Camerino, 5 th -7 th September 2004 Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini
Reduced Support Vector Machine
Classification with reject option in gene expression data Blaise Hanczar and Edward R Dougherty BIOINFORMATICS Vol. 24 no , pages
1 Chapter 9 Variable Selection and Model building Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Dimension Reduction and Feature Selection Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Multiple Linear Regression
Applications of Data Mining in Microarray Data Analysis Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
Dan Simon Cleveland State University
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Radial Basis Function Networks
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Elements of Multiple Regression Analysis: Two Independent Variables Yong Sept
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Presented By Wanchen Lu 2/25/2013
Lecture 12 Model Building BMTRY 701 Biostatistical Methods II.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Usman Roshan Machine Learning, CS 698
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.
Sparse Signals Reconstruction Via Adaptive Iterative Greedy Algorithm Ahmed Aziz, Ahmed Salim, Walid Osamy Presenter : 張庭豪 International Journal of Computer.
A New Temporal Pattern Identification Method for Characterization and Prediction of Complex Time Series Events Advisor : Dr. Hsu Graduate : You-Cheng Chen.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Variable Selection 1 Chapter 8 Variable Selection Terry Dielman Applied Regression Analysis:
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
CpSc 881: Machine Learning
A comparative study of survival models for breast cancer prognostication based on microarray data: a single gene beat them all? B. Haibe-Kains, C. Desmedt,
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Classification Ensemble Methods 1
Logistic Regression & Elastic Net
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
1 Applications of Slow Intelligence Systems. 2 Outline Application: Social Influence Analysis Application: Product & Service Optimization Application:
Ultra-high dimensional feature selection Yun Li
Introduction Background Medical decision support systems based on patient data and expert knowledge A need to analyze the collected data in order to draw.
Computacion Inteligente Least-Square Methods for System Identification.
Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi
Feature Selection for Pattern Recognition
Advanced Artificial Intelligence Feature Selection
Data Warehousing and Data Mining
Linear Model Selection and regularization
Lecture 12 Model Building
Basis Expansions and Generalized Additive Models (2)
Presentation transcript:

Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project

Outline Midway results Framework of Slow Intelligence System Tasks for project Iterative feature selection Introduction

 UltraHigh-dimensional variable selection is the hot topic in statistics and machine learning.  Model relationship between one response and associated features, based on a sample of size n.

Application  Associated studies between phenotypes and SNPs.  Gene selection or disease classification in bioinformatics. n Patients’ degree of disease sickness each patient’s data with p genes Important genes selected one Gene expression level

Challenges  Dimensionality grows rapidly with interactions of the features Portfolio selection and networking modeling: 2000 stocks involves over 2 millions unknown parameters in the covariance matrix. Protein-protein interaction: the sample size may be in the order of thousands, but the number of features can be in the order of millions.  To construct effective method to learn relationship between features and response in high dimension for scientific purposes.

Outline Midway results Framework of Slow Intelligence System Tasks for project Iterative feature selection Introduction Iterative feature selection

Existing methods 1.LASSO : L1 regularization linear regression 2.Forward regression: sequentially add variables 3.Backward regression: start with them all then delete them on the bases of smallest change in 4.Stepwise regression: at each step one can be entered (on basis of greatest improvement in but one also may be removed if the change (reduction) in is not significant. 5.Least-angle regression: estimated parameters are increased in a direction equiangular to each one's correlations with the residual.

State-of-the-Art Approach  Interactive feature selection method proposed by Jianqing Fan in Princeton University “Ultrahigh dimensional feature selection: beyond the linear model”  Contribution:  Ultrahigh dimensional data  Accuracy  slow

Step 1: Large-scale screening  Apply Pearson correlation and ranking to pick a set

Step 2 : Moderate-scale selection  Employ an existing regression method to select a subset of these indices.

Step 3: Large-scale screening  Adding other features one each time with to the regression model:

Step 3 (con’t)  Ranking j features according to, select the top numbers of features. And add to forming the new feature set  Repeats Steps 2-3, select new from, then form new until

Outline Midway results Framework of Slow Intelligence System Tasks for project Iterative feature selection Introduction Framework of Slow Intelligence System

Slow Intelligence System  “A General Framework for Slow Intelligence Systems”, by S.K. Chang, International Journal of Software Engineering and Knowledge Engineering

Time Controller  Slow decision cycle(s) to complement quick decision cycle(s ): SIS possesses at least two decision cycles. Therefore, Slow Intelligence Systems work usually correctly but not always fast.  Time Controller Design  Panic Button  Petri-net model

Motivation  “Modeling Human Intelligence as A Slow Intelligence System” by Tiansi Dong, DMS2010  SIS for object mapping between scenes  Two object tracing results due to two different priorities 1. Priority on spatial changes (minimal spatial changes) 2. Priority on object categories (objects are mapped within same categories)

SIS1 for Object tracing (priority on spatial changes)  Enumerate all possible mapping  Elimination and concentration the mapping with the minimal spatial changes

SIS2 for Object tracing (priority on object category)  Enumerate all possible mapping  Elimination and concentration the mapping with the same category

Outline Midway results Framework of Slow Intelligence System Tasks for project Iterative feature selection Introduction Tasks for project

Task one  Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System  Use SIS to model Iterative feature selection method to five phases: Enumeration, Elimination, Adaptation, Propagation, Concentration.  The whole SIS system contains additional Sub-SIS system.  Represent it in Mathematical formulation

Task two  Design time controller in term of Petri Net and introduce Knowledge base  Time controller controls the time to evoke each phase of SIS.  Knowledge base contains five different moderate-scale selection algorithms. KB can be changed and updated in the slow cycle.  Represent Time controller in Petri Net using ReNew Editor

Future work  Experiment: Use some real data (colon cancer data) to do the experiment and compare the results with some existing feature selection method like LASSO, forward, backward regression, etc. Weka: Demo  I will use some visualization tool to visualize the result and the process of feature selection.

Outline Midway results Framework of Slow Intelligence System Tasks for project Iterative feature selection Introduction Midway results

Diagram: Main SIS System

Diagram: Sub SIS system

Diagram: Petri-Net Model