A Statistical Model for Domain- Independent Text Segmentation Masao Utiyama and Hitoshi Isahura Presentation by Matthew Waymost.

Slides:



Advertisements
Similar presentations
A Tutorial on Learning with Bayesian Networks
Advertisements

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Analysis and Modeling of Social Networks Foudalis Ilias.
Dynamic Bayesian Networks (DBNs)
Generated Waypoint Efficiency: The efficiency considered here is defined as follows: As can be seen from the graph, for the obstruction radius values (200,
Qualifying Exam: Contour Grouping Vida Movahedi Supervisor: James Elder Supervisory Committee: Minas Spetsakis, Jeff Edmonds York University Summer 2009.
A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts 04 10, 2014 Hyun Geun Soo Bo Pang and Lillian Lee (2004)
Hidden Markov Models Theory By Johan Walters (SR 2003)
Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,
1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.
Xformations Statistical Natural Language Processing Presented by : Charu Jain.
1 Numerical geometry of non-rigid shapes Consistent approximation of geodesics in graphs Consistent approximation of geodesics in graphs Tutorial 3 © Alexander.
Assuming normally distributed data! Naïve Bayes Classifier.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Identifying Patterns in Road Networks Topographic Data and Maps Henri Lahtinen Arto Majoinen.
Hidden Markov Models Pairwise Alignments. Hidden Markov Models Finite state automata with multiple states as a convenient description of complex dynamic.
Statistical based IDS background introduction. Statistical IDS background Why do we do this project Attack introduction IDS architecture Data description.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
Scalable Text Mining with Sparse Generative Models
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.
Learning Hidden Markov Model Structure for Information Extraction Kristie Seymour, Andrew McCullum, & Ronald Rosenfeld.
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
Albert Gatt Corpora and Statistical Methods Lecture 9.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Summary of the Bayes Net Formalism David Danks Institute for Human & Machine Cognition.
Isolated-Word Speech Recognition Using Hidden Markov Models
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Graphical models for part of speech tagging
Text Classification, Active/Interactive learning.
Statistical NLP: Lecture 8 Statistical Inference: n-gram Models over Sparse Data (Ch 6)
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Mining Social Network for Personalized Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.
Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
Page  1 LAND COVER GEOSTATISTICAL CLASSIFICATION FOR REMOTE SENSING  Kęstutis Dučinskas, Lijana Stabingiene and Giedrius Stabingis  Department of Statistics,
Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection Rachna Vargiya and Philip Chan Department of Computer Sciences Florida.
Bayesian Travel Time Reliability
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
Machine Learning 5. Parametric Methods.
Applications of HMMs in Computational Biology BMI/CS 576 Colin Dewey Fall 2010.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
LexPageRank: Prestige in Multi-Document Text Summarization Gunes Erkan, Dragomir R. Radev (EMNLP 2004)
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
CS4670/5670: Image Scissors Noah Snavely Today’s Readings
Statistical Models for Automatic Speech Recognition
Summary Presented by : Aishwarya Deep Shukla
Probabilistic Data Management
Intelligent Information System Lab
Data Mining Lecture 11.
Sequential Pattern Discovery under a Markov Assumption
Predict Failures with Developer Networks and Social Network Analysis
N-Gram Model Formulas Word sequences Chain rule of probability
Prepared by: Mahmoud Rafeek Al-Farra
Topic Models in Text Processing
Discriminative Probabilistic Models for Relational Data
Statistical based IDS background introduction
Presentation transcript:

A Statistical Model for Domain- Independent Text Segmentation Masao Utiyama and Hitoshi Isahura Presentation by Matthew Waymost

Introduction Algorithm find maximum-probability segmentation using a statistical method. No training required. Domain-independent.

Other Methods Lexical Cohesion Statistical –Hidden Markov model (Yamron et al., 1998)

Statistical Model Find the probability of a segmentation S given a text W. Use Bayes rule to find maximum-probability segmentation.

Definition of Pr( W | S ) Assume statistical independence of topics and of words within the scope of a topic. Assume different topics have different word distributions. Can breakdown into double product of probabilities across words and segments. Uses Laplace estimator for word frequency prediction.

Definition of Pr( S ) Varies depending on prior information. In general, assume no prior information. Prevents the algorithm from generating too many segments; counteracts Pr( W | S ).

Algorithm Convert the probability function into a cost function by taking the negative log. Given a text W, define g i to be the gap between word w i and w i+1. Create a directed graph where the nodes are the gaps between words and the edges cover a segment between the gaps the edge connects. Calculate all edge weights by using the cost function and find the minimum-cost path from the first to last node.

Algorithm The calculated path represents the minimum- cost segmentation by correlating the edges to segments.

Algorithm – Features Determines the number of segments, but can also specify the number of edges in the shortest path. Can specify where segmentation occurs by only using a subset of all possible edges where both nodes connected by the edge meet user-specified conditions. Algorithm is insensitive to text length. –Good for summarization

Algorithm – Evaluation Compared algorithm against C99 (Choi 2000). Artificial test corpus extracted from the Brown corpus used. Probabilistic error metric used to evaluate performance. Results of Utiyama algorithm significantly better at 1% level than Choi algorithm.

Algorithm – Evaluation Assessment of algorithm using real texts is needed. Advantages over HMM –No training required (implies domain- independence). –Can incorporate probabilistic information into model. Might be expandable to detect word descriptions in text.