Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
(SubLoc) Support vector machine approach for protein subcelluar localization prediction (SubLoc) Kim Hye Jin Intelligent Multimedia Lab
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Supervised Learning Recap
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
Grammar induction by Bayesian model averaging Guy Lebanon LARG meeting May 2001 Based on Andreas Stolcke’s thesis UC Berkeley 1994.
Statistical techniques in NLP Vasileios Hatzivassiloglou University of Texas at Dallas.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Scalable Text Mining with Sparse Generative Models
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
1 Multimodal Group Action Clustering in Meetings Dong Zhang, Daniel Gatica-Perez, Samy Bengio, Iain McCowan, Guillaume Lathoud IDIAP Research Institute.
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging for Bengali with Hidden Markov Model Sandipan Dandapat,
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Map of the Great Divide Basin, Wyoming, created using a neural network and used to find likely fossil beds See:
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
1 Prosody-Based Automatic Segmentation of Speech into Sentences and Topics Elizabeth Shriberg Andreas Stolcke Speech Technology and Research Laboratory.
National Taiwan University, Taiwan
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Statistical techniques for video analysis and searching chapter Anton Korotygin.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Pattern Recognition NTUEE 高奕豪 2005/4/14. Outline Introduction Definition, Examples, Related Fields, System, and Design Approaches Bayesian, Hidden Markov.
Personality Classification: Computational Intelligence in Psychology and Social Networks A. Kartelj, School of Mathematics, Belgrade V. Filipovic, School.
A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
1 Experiments with Detector- based Conditional Random Fields in Phonetic Recogntion Jeremy Morris 06/01/2007.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Part-Of-Speech Tagging Radhika Mamidi. POS tagging Tagging means automatic assignment of descriptors, or tags, to input tokens. Example: “Computational.
Sentiment analysis algorithms and applications: A survey
PRESENTED BY: PEAR A BHUIYAN
A Black-Box Approach to Query Cardinality Estimation
Restricted Boltzmann Machines for Classification
Conditional Random Fields for ASR
Map of the Great Divide Basin, Wyoming, created using a neural network and used to find likely fossil beds See:
Machine Learning in Natural Language Processing
Statistical NLP: Lecture 9
CS4705 Natural Language Processing
CPSC 503 Computational Linguistics
CS639: Data Management for Data Science
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Artificial Intelligence 2004 Speech & Natural Language Processing
Word representations David Kauchak CS158 – Fall 2016.
Statistical NLP : Lecture 9 Word Sense Disambiguation
The Application of Hidden Markov Models in Speech Recognition
Presentation transcript:

Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds

Introduction Significant advances in Machine Learning approaches to the automatic analysis of corpora A range of Machine Learning approaches Three dimensions of classification –Levels of linguistic analysis; –Machine Learning techniques; –Current research in Discourse Analysis A framework for further development

Levels of linguistic analysis Tokenisation Part-of-Speech tagging Parsing Semantic analysis Discourse analysis

Low-level Linguistic Analysis Tokenisation: breaks up the sequence of characters in a text by locating the word boundaries Part-of-Speech: assigns correct Part-of-Speech and additional grammatical features to each word A forced move from hand- built to Machine Learning approaches Many systems learn statistical models from a training corpus, e.g. CLAWS Transformation-Based Learning is the most popular alternative approach

Parsing and Semantic Analysis Parsing: take a formal grammar and a linguistic input and apply the grammar to the input to produce a parse-tree –Top-Down and Bottom Up reflect contrasting perspectives Semantic Analysis: augment data to facilitate automatic recognition of the underlying semantic content and structure –A common practice is to label documents with thesaurus classes for document classification and management

Discourse Analysis Discourse analysis extends beyond sentence boundaries No universal agreement on discourse analysis categories or labels A growing range of dialogue transcript corpora have been hand-annotated with dialogue-act or speech-act tags designed for specific applications

Machine Learning Techniques for Linguistic Annotation of Corpora N-gram Markov models, HMMs Neural Networks, Semantic Networks Transformation-Based Learning Decision-Tree classification Vector-based clustering

N-gram, Markov models N-gram and Markov Models A Markov Model of a sequence of states or symbols (e.g. words or Part-of-Speech tags) is used to estimate the probability or likelihood of a symbol sequence Hidden Markov Models (HMMs) are a variant including 2 layers of states: –a visible layer corresponding to input symbols –a hidden layer learnt by the system

Neural Networks, Semantic Networks Neural networks have been developed in many fields in the hope of achieving human-like learning A related model is the semantic network –Typically nodes represent concepts –Connections represent semantically meaningful associations between these concepts.

Transformation-Based Learning Brill (1995) developed a symbolic Machine Learning method called Transformation-Based Learning (TBL) Given a tagged training corpus, TBL produces a sequence of rules that serves as a model of the training data

Decision Tree Classification and Vector-Based Clustering A decision tree is constructed by partitioning the training set, selecting, at each step, the feature that most reduce the uncertainty about the class in each partition, and using it as a split Vector-based clustering uses co-occurrence statistics to construct vectors that represent word classes or meanings by virtue of their direction in multi-dimensional word-collocation space

Discourse Analysis 1/2 1994: Woszczyna and Waibel – N-grams, Markov Model 1996: Reithinger, Engel, Kipp and Klesen – N- grams, HMM 1996: Mast et al. – Decision Trees, N-grams 1997: Reithinger and Klesen – N-grams, Bayesian network

Discourse Analysis 2/2 1998: Samuel, Carberry, and Vijay-Shanker – Transformation-Based Learning 1998: Wright – N-grams, CART Decision Tree, Neural Networks 1998: Taylor, King, Isard, and Wright – Combined N-grams and HMM 1998: Fukada et al – Bi- grams, HMM 1998: Stolcke et al. – HMM, Decision Trees

Conclusion This survey has explored algorithms underlying different levels of linguistic analysis, providing a framework for further research Better to combine 2 or more ML approaches? Discourse Analysis: HMM/n-grams + ano Future work –Explore systems which can be used and re-used –Integrate such systems and comparatively evaluate Machine Learning techniques for corpus analysis