A Probabilistic Model for Classification of Multiple-Record Web Documents June Tang Yiu-Kai Ng.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Pattern Recognition and Machine Learning
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Chapter 4: Linear Models for Classification
Second order cone programming approaches for handing missing and uncertain data P. K. Shivaswamy, C. Bhattacharyya and A. J. Smola Discussion led by Qi.
Assuming normally distributed data! Naïve Bayes Classifier.
Classification and risk prediction
Stochastic Differentiation Lecture 3 Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania EURO Working Group on Continuous.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Recognizing Ontology-Applicable Multiple-Record Web Documents David W. Embley Dennis Ng Li Xu Brigham Young University.
Pattern Classification, Chapter 3 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P.
Machine Learning CMPT 726 Simon Fraser University
Filtering Multiple-Record Web Documents Based on Application Ontologies Presenter: L. Xu Advisor: D.W.Embley.
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Presented by Zeehasham Rasheed
1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 3 (part 1): Maximum-Likelihood & Bayesian Parameter Estimation  Introduction  Maximum-Likelihood Estimation  Example of a Specific Case  The.
7/16/20151 Ontology-Based Binary-Categorization of Multiple- Record Web Documents Using a Probabilistic Retrieval Model Department of Computer Science.
Classification with several populations Presented by: Libin Zhou.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
Pattern Recognition: Baysian Decision Theory Charles Tappert Seidenberg School of CSIS, Pace University.
Principles of Pattern Recognition
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Computational Intelligence: Methods and Applications Lecture 12 Bayesian decisions: foundation of learning Włodzisław Duch Dept. of Informatics, UMK Google:
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Optimal Bayes Classification
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Linear Discriminant Analysis (LDA). Goal To classify observations into 2 or more groups based on k discriminant functions (Dependent variable Y is categorical.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Linear Methods for Classification : Presentation for MA seminar in statistics Eli Dahan.
Latent Dirichlet Allocation
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Jen-Tzung Chien, Meng-Sung Wu Minimum Rank Error Language Modeling.
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Review of Statistical Terms Population Sample Parameter Statistic.
NTU & MSRA Ming-Feng Tsai
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 3: Maximum-Likelihood Parameter Estimation
CH 5: Multivariate Methods
Discrimination and Classification
Overview of Supervised Learning
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Classification Discriminant Analysis
Classification Discriminant Analysis
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Lecture Slides Elementary Statistics Twelfth Edition
EE513 Audio Signals and Systems
Pattern Recognition and Machine Learning
Michal Rosen-Zvi University of California, Irvine
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Multivariate Methods Berlin Chen, 2005 References:
Learning to Rank with Ties
Discrimination and Classification
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Presentation transcript:

A Probabilistic Model for Classification of Multiple-Record Web Documents June Tang Yiu-Kai Ng

Overview Probabilistic Model – Bayes decision theory – Document and query representations – Ranking-function construction Multivariant Statistical Analysis

Approach Constructing a rank function for a probabilistic model based on multivariant statistical analysis Minimizing expected cost of misclassification Deriving a classification rule Deriving a linear classification rule Deriving a sample linear classification rule

Application Ontology

Document Representation (Year, Make, Model, Mileage, Price, Feature, PhoneNr) Total records: 60 (Year:62) (Make:58) (Model:48) (Mileage:12) (Price:58) (Feature:49) (PhoneNr:33) (62,58,48,12,58,49,33) (1.03,0.97,0.80,0.20,0.97,0.82,0.55)

Elementary Concepts Variables are things that we measure, control, or manipulate in research Multi-variant analysis considers multiple variables together as a single unit Normal distribution represents one of the empirically verified elementary "truths about the general nature of reality"

Multivariant Statistical Analysis Let A be an application ontology D be a set of Web documents R be a set of relevant documents R be a set of irrelevant document X = (X1, X2, …, Xp) represent a document  be the set of all possible values on which X can take  =  1   2

Expected Cost of Misclassification (ECM) Here, Two density functions f1 and f2

Classification Rule

Multivariate Normal Density Functions Where Assume that density functions are normal

Document x is classified as relevant if Linear Classification Rule Assume that density functions are normal and  1,  2, and  are equal

Linear Discrimination Function Threshold: ?

Parameter Estimations Suppose we have n1 relevant documents and n2 irrelevant documents Such that n1+n2>=p and p is the dimension of vector x

Parameter Estimations (Cont.)

Sample Classification Rule Document x is classified as relevant if

Misclassification Probabilities Lachenbruch’s “holdout” procedure where

Precision Measure

Experimental Result (Relevant)

Experimental Result (Irrelevant)

Conclusion Precision: 85% (VSM: 77.5%) Multivariant Statistical Analysis Extendibility to Multiple Categorization Classification