1 1 1 Berendt: Advanced databases, 2010, Advanced databases – Inferring implicit/new knowledge from data(bases):

Slides:



Advertisements
Similar presentations
CHAPTER 9: Decision Trees
Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe
Decision Tree Approach in Data Mining
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Classification Techniques: Decision Tree Learning
Decision Trees.
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
1 Berendt: Advanced databases, first semester 2011, 1 Advanced databases – Inferring new knowledge.
Classification: Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Biological Data Mining (Predicting Post-synaptic Activity in Proteins)
Decision Trees an Introduction.
Three kinds of learning
Inferring rudimentary rules
Classification.
Berendt: Advanced databases, winter term 2007/08, 1 Advanced databases – Inferring implicit/new.
Jump to first page The objective of our final project is to evaluate several supervised learning algorithms for identifying pre-defined classes among web.
Classification: Decision Trees 2 Outline  Top-Down Decision Tree Construction  Choosing the Splitting Attribute  Information Gain and Gain Ratio.
1 Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
SEEM Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN
Learning Chapter 18 and Parts of Chapter 20
Berendt: Knowledge and the Web, 2014, 1 Knowledge and the Web – Inferring new knowledge from data(bases):
1 1 1 Web Mining – An introduction to Web content (text) mining Bettina Berendt. Last update: 2 March 2010.
Data Mining: Classification
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
1 1 1 Berendt: Advanced databases, 2011, Advanced databases – Large-scale data storage and processing (1):
1 1 1 Berendt: Advanced databases, 2009, Advanced databases – Inferring implicit/new knowledge from data(bases):
Decision Trees & the Iterative Dichotomiser 3 (ID3) Algorithm David Ramos CS 157B, Section 1 May 4, 2006.
Chapter 9 – Classification and Regression Trees
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Data Management and Database Technologies 1 DATA MINING Extracting Knowledge From Data Petr Olmer CERN
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
CS690L Data Mining: Classification
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.2 Statistical Modeling Rodney Nielsen Many.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Data Mining – Algorithms: Naïve Bayes Chapter 4, Section 4.2.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
Slide 1 DSCI 4520/5240: Data Mining Fall 2013 – Dr. Nick Evangelopoulos Lecture 5: Decision Tree Algorithms Material based on: Witten & Frank 2000, Olson.
Berendt: Knowledge and the Web, 2015, 1 Knowledge and the Web – Inferring new knowledge from data(bases):
Berendt: Advanced databases, winter term 2007/08, 1 Advanced databases – Inferring implicit/new.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.5: Mining Association Rules Rodney Nielsen.
Chapter 4: Algorithms CS 795. Inferring Rudimentary Rules 1R – Single rule – one level decision tree –Pick each attribute and form a single level tree.
Decision Trees by Muhammad Owais Zahid
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Data Mining Chapter 4 Algorithms: The Basic Methods - Constructing decision trees Reporter: Yuen-Kuei Hsueh Date: 2008/7/24.
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Artificial Intelligence
Ch9: Decision Trees 9.1 Introduction A decision tree:
Data Science Algorithms: The Basic Methods
Data Mining Lecture 11.
Machine Learning: Lecture 3
Dept. of Computer Science University of Liverpool
Learning Chapter 18 and Parts of Chapter 20
Chapter 7: Transformations
Junheng, Shengming, Yunsheng 10/19/2018
Data Mining CSCI 307, Spring 2019 Lecture 15
Presentation transcript:

1 1 1 Berendt: Advanced databases, 2010, Advanced databases – Inferring implicit/new knowledge from data(bases): Text mining (used, e.g., for Web content mining) Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science Last update: 17 November 2010

2 2 2 Berendt: Advanced databases, 2010, Agenda Classification: ex. decision-tree learning with ID3 Text classification and Naïve Bayes More on current approaches to (Web) text mining Opinion mining Text mining and WEKA

3 3 3 Berendt: Advanced databases, 2010, Input data... Q: when does this person play tennis? NoTrueHighMildRainy YesFalseNormalHotOvercast YesTrueHighMildOvercast YesTrueNormalMildSunny YesFalseNormalMildRainy YesFalseNormalCoolSunny NoFalseHighMildSunny YesTrueNormalCoolOvercast NoTrueNormalCoolRainy YesFalseNormalCoolRainy YesFalseHighMildRainy YesFalseHighHotOvercast NoTrueHighHotSunny NoFalseHighHotSunny PlayWindyHumidityTempOutlook

4 4 4 Berendt: Advanced databases, 2010, The goal: a decision tree for classification / prediction In which weather will someone play (tennis etc.)?

5 5 5 Berendt: Advanced databases, 2010, Constructing decision trees Strategy: top down Recursive divide-and-conquer fashion  First: select attribute for root node Create branch for each possible attribute value  Then: split instances into subsets One for each branch extending from the node  Finally: repeat recursively for each branch, using only instances that reach the branch Stop if all instances have the same class

6 6 6 Berendt: Advanced databases, 2010, Which attribute to select?

7 7 7 Berendt: Advanced databases, 2010, Which attribute to select?

8 8 8 Berendt: Advanced databases, 2010, Criterion for attribute selection Which is the best attribute?  Want to get the smallest tree  Heuristic: choose the attribute that produces the “purest” nodes Popular impurity criterion: information gain  Information gain increases with the average purity of the subsets Strategy: choose attribute that gives greatest information gain

9 9 9 Berendt: Advanced databases, 2010, Computing information Measure information in bits  Given a probability distribution, the info required to predict an event is the distribution’s entropy  Entropy gives the information required in bits (can involve fractions of bits!)‏ Formula for computing the entropy:

10 Berendt: Advanced databases, 2010, Example: attribute Outlook

11 Berendt: Advanced databases, 2010, Computing information gain Information gain: information before splitting – information after splitting Information gain for attributes from weather data: gain(Outlook ) = bits gain(Temperature ) = bits gain(Humidity ) = bits gain(Windy ) = bits gain(Outlook )= info([9,5]) – info([2,3],[4,0],[3,2])‏ = – = bits

12 Berendt: Advanced databases, 2010, Continuing to split gain(Temperature )= bits gain(Humidity ) = bits gain(Windy )= bits

13 Berendt: Advanced databases, 2010, Final decision tree Note: not all leaves need to be pure; sometimes identical instances have different classes  Splitting stops when data can’t be split any further

14 Berendt: Advanced databases, 2010, Wishlist for a purity measure Properties we require from a purity measure:  When node is pure, measure should be zero  When impurity is maximal (i.e. all classes equally likely), measure should be maximal  Measure should obey multistage property (i.e. decisions can be made in several stages): Entropy is the only function that satisfies all three properties!

15 Berendt: Advanced databases, 2010, Properties of the entropy The multistage property: Simplification of computation: Note: instead of maximizing info gain we could just minimize information

16 Berendt: Advanced databases, 2010, Discussion / outlook decision trees Top-down induction of decision trees: ID3, algorithm developed by Ross Quinlan  Various improvements, e.g.  C4.5: deals with numeric attributes, missing values, noisy data  Gain ratio instead of information gain [see Witten & Frank slides, ch. 4, pp ] Similar approach: CART …

17 Berendt: Advanced databases, 2010, Agenda Classification: ex. decision-tree learning with ID3 Text classification and Naïve Bayes More on current approaches to (Web) text mining Opinion mining Text mining and WEKA

18 Berendt: Advanced databases, 2010, (Recap: How do the basic ideas of relational-database-table mining transfer to text mining)

19 Berendt: Advanced databases, 2010, What makes people happy?

20 Berendt: Advanced databases, 2010, Happiness in the blogosphere

21 Berendt: Advanced databases, 2010, Well kids, I had an awesome birthday thanks to you. =D Just wanted to so thank you for coming and thanks for the gifts and junk. =) I have many pictures and I will post them later. hearts current mood: Home alone for too many hours, all week long... screaming child, headache, tears that just won’t let themselves loose.... and now I’ve lost my wedding band. I hate this. current mood: What are the characteristic words of these two moods? [Mihalcea, R. & Liu, H. (2006). A corpus-based approach to finding happiness, In Proceedings of the AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs.] Slide based on Rada Mihalcea‘s slides in the presentation.

22 Berendt: Advanced databases, 2010, Data, data preparation and learning LiveJournal.com – optional mood annotation 10,000 blogs: happysad n 5,000 happy entries / 5,000 sad entries n average size 175 words / entry n post-processing – remove SGML tags, tokenization, part-of- speech tagging quality of automatic “mood separation” n naïve bayes text classifier l five-fold cross validation n Accuracy: 79.13% (>> 50% baseline) Based on Rada Mihalcea`s talk at CAAW 2006

23 Berendt: Advanced databases, 2010, Results: Corpus-derived happiness factors yay shopping79.56 awesome79.71 birthday78.37 lovely77.39 concert74.85 cool73.72 cute73.20 lunch73.02 books73.02 goodbye18.81 hurt17.39 tears14.35 cried11.39 upset11.12 sad11.11 cry10.56 died10.07 lonely 9.50 crying 5.50 Based on Rada Mihalcea`s talk at CAAW 2006

24 Berendt: Advanced databases, 2010, Bayes‘ formula and its use for classification 1. Joint probabilities and conditional probabilities: basics n P(A & B) = P(A|B) * P(B) = P(B|A) * P(A) n  P(A|B) = ( P(B|A) * P(A) ) / P(B) (Bayes´ formula) n P(A) : prior probability of A (a hypothesis, e.g. that an object belongs to a certain class) n P(A|B) : posterior probability of A (given the evidence B) 2. Estimation: n Estimate P(A) by the frequency of A in the training set (i.e., the number of A instances divided by the total number of instances) n Estimate P(B|A) by the frequency of B within the class-A instances (i.e., the number of A instances that have B divided by the total number of class-A instances) 3. Decision rule for classifying an instance: n If there are two possible hypotheses/classes (A and ~A), choose the one that is more probable given the evidence n (~A is „not A“) n If P(A|B) > P(~A|B), choose A n The denominators are equal  If ( P(B|A) * P(A) ) > ( P(B|~A) * P(~A) ), choose A

25 Berendt: Advanced databases, 2010, Simplifications and Naive Bayes 4. Simplify by setting the priors equal (i.e., by using as many instances of class A as of class ~A) n  If P(B|A) > P(B|~A), choose A 5. More than one kind of evidence n General formula: n P(A | B 1 & B 2 ) = P(A & B 1 & B 2 ) / P(B 1 & B 2 ) = P(B 1 & B 2 | A) * P(A) / P(B 1 & B 2 ) = P(B 1 | B 2 & A) * P(B 2 | A) * P(A) / P(B 1 & B 2 ) n Enter the „naive“ assumption: B 1 and B 2 are independent given A n  P(A | B 1 & B 2 ) = P(B 1 |A) * P(B 2 |A) * P(A) / P(B 1 & B 2 ) n By reasoning as in 3. and 4. above, the last two terms can be omitted n  If (P(B 1 |A) * P(B 2 |A) ) > (P(B 1 |~A) * P(B 2 |~A) ), choose A n The generalization to n kinds of evidence is straightforward. n In machine learning, features are the evidence.

26 Berendt: Advanced databases, 2010, Example: Texts as bags of words Common representations of texts n Set: can contain each element (word) at most once n Bag (aka multiset): can contain each word multiple times (most common representation used in text mining) Hypotheses and evidence n A = The blog is a happy blog, the is a spam , etc. n ~A = The blog is a sad blog, the is a proper , etc. n B i refers to the i th word occurring in the whole corpus of texts Estimation for the bag-of-words representation: n Example estimation of P(B 1 |A) : l number of occurrences of the first word in all happy blogs, divided by the total number of words in happy blogs (etc.)

27 Berendt: Advanced databases, 2010, The „happiness factor“ “Starting with the features identified as important by the Naïve Bayes classifier (a threshold of 0.3 was used in the feature selection process), we selected all those features that had a total corpus frequency higher than 150, and consequently calculate the happiness factor of a word as the ratio between the number of occurrences in the happy blogposts and the total frequency in the corpus.”  What is the relation to the Naïve Bayes estimators?

28 Berendt: Advanced databases, 2010, Agenda Classification: ex. decision-tree learning with ID3 Text classification and Naïve Bayes More on current approaches to (Web) text mining Opinion mining Text mining and WEKA

29 Berendt: Advanced databases, 2010, webversion.ppt

30 Berendt: Advanced databases, 2010, Agenda Classification: ex. decision-tree learning with ID3 Text classification and Naïve Bayes More on current approaches to (Web) text mining Opinion mining Text mining and WEKA

31 Berendt: Advanced databases, 2010, 1stsemester/adb/Lecture/Session9/L3.pdf

32 Berendt: Advanced databases, 2010, Agenda Classification: ex. decision-tree learning with ID3 Text classification and Naïve Bayes More on current approaches to (Web) text mining Opinion mining Text mining and WEKA

33 Berendt: Advanced databases, 2010, The steps of text mining 1. Application understanding 2. Corpus generation 3. Data understanding 4. Text preprocessing 5. Search for patterns / modelling l Topical analysis l Sentiment analysis / opinion mining 6. Evaluation 7. Deployment

34 Berendt: Advanced databases, 2010, From HTML to String to ARFF Problem: Given a text file: How to get to an ARFF file? 1. Remove / use formatting l HTML: use html2text (google for it to find an implementation in your favourite language) or a similar filter l XML: Use, e.g., SAX, the API for XML in Java ( 2. Convert text into a basic ARFF (one attribute: String): Convert String into bag of words (this filter is also available in WEKA‘s own preprocessing filters, look for filters – unsupervised – attribute – StringToWordVector) l Documentation:

35 Berendt: Advanced databases, 2010, Next lecture Classification: ex. decision-tree learning with ID3 Text classification and Naïve Bayes More on current approaches to (Web) text mining Opinion mining Text mining and WEKA How can we make all this scale up?

36 Berendt: Advanced databases, 2010, Literature To be done