Berendt: Advanced databases, winter term 2007/08, 1 Advanced databases – Inferring implicit/new.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

1 Web Usage Mining Modelling: frequent-pattern mining I (sequence mining with WUM), classification and clustering) Prof. Dr. Bettina Berendt Humboldt Univ.
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
Bayes Rule The product rule gives us two ways to factor a joint probability: Therefore, Why is this useful? –Can get diagnostic probability P(Cavity |
CSC 380 Algorithm Project Presentation Spam Detection Algorithms Kyle McCombs Bridget Kelly.
Overview Full Bayesian Learning MAP learning
Probabilistic inference
Assuming normally distributed data! Naïve Bayes Classifier.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Data Mining with Naïve Bayesian Methods
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
1 1 1 Berendt: Advanced databases, 2010, Advanced databases – Inferring implicit/new knowledge from data(bases):
Data-intensive Computing Algorithms: Classification Ref: Algorithms for the Intelligent Web 6/26/20151.
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Naïve Bayes Classification Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 14, 2014.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
1 Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Further advanced methods Chapter 17.
Introduction to machine learning
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Review: Probability Random variables, events Axioms of probability
Exercise Session 10 – Image Categorization
1 1 1 Web Mining – An introduction to Web content (text) mining Bettina Berendt. Last update: 2 March 2010.
Pattern Recognition: Baysian Decision Theory Charles Tappert Seidenberg School of CSIS, Pace University.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Bayesian Networks. Male brain wiring Female brain wiring.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Inductive learning Simplest form: learn a function from examples
Processing of large document collections Part 2 (Text categorization, term selection) Helena Ahonen-Myka Spring 2005.
1 1 1 Berendt: Advanced databases, 2011, Advanced databases – Large-scale data storage and processing (1):
Text Classification, Active/Interactive learning.
1 1 1 Berendt: Advanced databases, 2009, Advanced databases – Inferring implicit/new knowledge from data(bases):
Naive Bayes Classifier
Universit at Dortmund, LS VIII
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
1 Berendt: Advanced databases, first semester 2011, 1 Advanced databases – Inferring new knowledge.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Classification Techniques: Bayesian Classification
27-18 września Data Mining dr Iwona Schab. 2 Semester timetable ORGANIZATIONAL ISSUES, INDTRODUCTION TO DATA MINING 1 Sources of data in business,
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
Artificial Intelligence 8. Supervised and unsupervised learning Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.2 Statistical Modeling Rodney Nielsen Many.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Review: Probability Random variables, events Axioms of probability Atomic events Joint and marginal probability distributions Conditional probability distributions.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Data Mining and Decision Support
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
KNN & Naïve Bayes Hongning Wang
Data Mining Chapter 4 Algorithms: The Basic Methods Reporter: Yuen-Kuei Hsueh.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Introduction to Information Retrieval Probabilistic Information Retrieval Chapter 11 1.
Text Classification and Naïve Bayes Formalizing the Naïve Bayes Classifier.
Naive Bayes Classifier
Data Science Algorithms: The Basic Methods
Lecture 15: Text Classification & Naive Bayes
Data Mining Lecture 11.
Machine Learning. k-Nearest Neighbor Classifiers.
Classification Techniques: Bayesian Classification
The Naïve Bayes (NB) Classifier
Naive Bayes Classifier
NAÏVE BAYES CLASSIFICATION
Presentation transcript:

Berendt: Advanced databases, winter term 2007/08, 1 Advanced databases – Inferring implicit/new knowledge from data(bases): Text mining Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science Last update: 5 December 2007

Berendt: Advanced databases, winter term 2007/08, 2 Agenda Motivation Brief overview of text mining Preprocessing text: word level Preprocessing text: document level Application: Happiness (& intro to Naïve Bayes class.) More on preprocessing text: changing representation

Berendt: Advanced databases, winter term 2007/08, 3 Classification (1)

Berendt: Advanced databases, winter term 2007/08, 4 Classification (2): spam detection (Note: Typically done based not only on text!)

Berendt: Advanced databases, winter term 2007/08, 5 Topic detection (and document grouping)

Berendt: Advanced databases, winter term 2007/08, 6 Opinion mining

Berendt: Advanced databases, winter term 2007/08, 7 What characterizes / differentiates news sources? (1)

Berendt: Advanced databases, winter term 2007/08, 8 What character- izes / differen- tiates news sources? (2)

Berendt: Advanced databases, winter term 2007/08, 9 Information extraction: filling database slots from text (here: extracting job openings from the Web)

Berendt: Advanced databases, winter term 2007/08, 10 Information extraction: Scientific papers

Berendt: Advanced databases, winter term 2007/08, 11 Agenda Motivation Brief overview of text mining Preprocessing text: word level Preprocessing text: document level Application: Happiness (& intro to Naïve Bayes class.) More on preprocessing text: changing representation

Berendt: Advanced databases, winter term 2007/08, 12 (slides 2-6 from the Text Mining Tutorial by Grobelnik & Mladenic)

Berendt: Advanced databases, winter term 2007/08, 13 KDD phases talked about today

Berendt: Advanced databases, winter term 2007/08, 14 Agenda Motivation Brief overview of text mining Preprocessing text: word level Preprocessing text: document level Application: Happiness (& intro to Naïve Bayes class.) More on preprocessing text: changing representation

Berendt: Advanced databases, winter term 2007/08, 15 (One) goal n Convert a „text instance“ (usually a document) into a representation amenable to mining / machine learning algorithms n Most common form: vector-space representation / bag-of- words

Berendt: Advanced databases, winter term 2007/08, 16 (slides 7-19 from the Text Mining Tutorial by Grobelnik & Mladenic)

Berendt: Advanced databases, winter term 2007/08, 17 Agenda Motivation Brief overview of text mining Preprocessing text: word level Preprocessing text: document level Application: Happiness (& intro to Naïve Bayes class.) More on preprocessing text: changing representation

Berendt: Advanced databases, winter term 2007/08, 18 (slides from the Text Mining Tutorial by Grobelnik & Mladenic)

Berendt: Advanced databases, winter term 2007/08, 19 Agenda Motivation Brief overview of text mining Preprocessing text: word level Preprocessing text: document level Application: Happiness (& intro to Naïve Bayes class.) More on preprocessing text: changing representation

Berendt: Advanced databases, winter term 2007/08, 20 “What makes people happy?” – a corpus-based approach to finding happiness

Berendt: Advanced databases, winter term 2007/08, 21 Bayes‘ formula and its use for classification 1. Joint probabilities and conditional probabilities: basics n P(A & B) = P(A|B) * P(B) = P(B|A) * P(A) n  P(A|B) = ( P(B|A) * P(A) ) / P(B) (Bayes´ formula) n P(A) : prior probability of A (a hypothesis, e.g. that an object belongs to a certain class) n P(A|B) : posterior probability of A (given the evidence B) 2. Estimation: n Estimate P(A) by the frequency of A in the training set (i.e., the number of A instances divided by the total number of instances) n Estimate P(B|A) by the frequency of B within the class-A instances (i.e., the number of A instances that have B divided by the total number of class-A instances) 3. Decision rule for classifying an instance: n If there are two possible hypotheses/classes (A and ~A), choose the one that is more probable given the evidence n (~A is „not A“) n If P(A|B) > P(~A|B), choose A n The denominators are equal  If ( P(B|A) * P(A) ) > ( P(B|~A) * P(~A) ), choose A

Berendt: Advanced databases, winter term 2007/08, 22 Simplifications and Naive Bayes [Repeated from previous slide:] If ( P(B|A) * P(A) ) > ( P(B|~A) * P(~A) ), choose A 4. Simplify by setting the priors equal (i.e., by using as many instances of class A as of class ~A) n  If P(B|A) > P(B|~A), choose A 5. More than one kind of evidence n General formula: n P(A | B 1 & B 2 ) = P(A & B 1 & B 2 ) / P(B 1 & B 2 ) = P(B 1 & B 2 | A) * P(A) / P(B 1 & B 2 ) = P(B 1 | B 2 & A) * P(B 2 | A) * P(A) / P(B 1 & B 2 ) n Enter the „naive“ assumption: B 1 and B 2 are independent given A n  P(A | B 1 & B 2 ) = P(B 1 |A) * P(B 2 |A) * P(A) / P(B 1 & B 2 ) n By reasoning as in 3. and 4. above, the last two terms can be omitted n  If (P(B 1 |A) * P(B 2 |A) ) > (P(B 1 |~A) * P(B 2 |~A) ), choose A n The generalization to n kinds of evidence is straightforward. n These kinds of evidence are often called features in machine learning.

Berendt: Advanced databases, winter term 2007/08, 23 Naive Bayes applied to texts (modelled as bags of words) Hypotheses and evidence n A = The blog is a happy blog, the is a spam , etc. n ~A = The blog is a sad blog, the is a proper , etc. n B i refers to the i th word occurring in the whole corpus of texts Estimation for the bag-of-words representation: n Example estimation of P(B 1 |A) : l number of occurrences of the first word in all happy blogs, divided by the total number of words in happy blogs (etc.)

Berendt: Advanced databases, winter term 2007/08, 24 WEKA – NaiveBayes and NaiveBayesMultinomial n The WEKA classifier learning scheme NaiveBayesMultinomial implements this model of „the probability that a word occurs in a document given that the document is in that classs“. l Its output is a table giving these probabilities n The WEKA classifier learning scheme NaiveBayes assumes that the attributes are normally distributed. l Needed when the attributes are numerical and not necessarily 0 | 1 l Its output describes the parameters of these normal distributions l Explanation of the annotations of the attributes: n Explanation of the error measures:

Berendt: Advanced databases, winter term 2007/08, 25 So what is happiness? (Rada Mihalcea‘s presentation at AAAI Spring Symposium 2006)

Berendt: Advanced databases, winter term 2007/08, 26 Agenda Motivation Brief overview of text mining Preprocessing text: word level Preprocessing text: document level Application: Happiness (& intro to Naïve Bayes class.) More on preprocessing text: changing representation

Berendt: Advanced databases, winter term 2007/08, 27 (slides from the Text Mining Tutorial by Grobelnik & Mladenic)

Berendt: Advanced databases, winter term 2007/08, 28 Next lecture Motivation Brief overview of text mining Preprocessing text: word level Preprocessing text: document level Application: Happiness (& intro to Naïve Bayes class.) More on preprocessing text: changing representation Coming full circle: Induction in Sem.Web & other DBs

Berendt: Advanced databases, winter term 2007/08, 29 References and background reading; acknowledgements n Grobelnik, M. & Mladenic, D. (2004). Text-Mining Tutorial. In: Learning Methods for Text Understanding and Mining, January 2004, Grenoble, France. n Mihalcea, R. & Liu, H. (2006). A corpus-based approach to finding happiness, In Proceedings of the AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs. Picture credits: n p. 4: n p. 6: n p. 8: Fortuna, B., Galleguillos, C., & Cristianini, N. (in press). Detecting the bias in media with statistical learning methods. n p.9: from Grobelnik & Mladenic (2004), see above