Download presentation
Presentation is loading. Please wait.
Published byOctavia Pitts Modified over 9 years ago
1
A Survey of NLP Toolkits Jing Jiang Mar 8, 2007
2
03/08/20072 Outline WordNet Statistics-based phrases POS taggers Parsers Chunkers (syntax-based phrases) NER SNoW, OpenNLP and LingPipe
3
03/08/20073 Outline (cont.) What does the tool provide? Is the tool easy to use as a stand-alone program? Is the tool easy to modify or integrate with my program?
4
03/08/20074 WordNet Background: –Princeton, George Miller, 1985 –“WordNet: An Electronic Lexical Database” –Current version: WordNet 3.0 What does it provide? –A database of words and their relations Nouns, verbs, adjectives and adverbs Lexical relations: morphology Semantic relations: synonyms, hypernyms/hyponyms, holonyms/meronyms, etc.
5
03/08/20075 WordNet To use as a stand-alone program? –A command line program –Web interface To modify or integrate with my program? –API in C –Online manual not very clear (http://wordnet.princeton.edu/doc)http://wordnet.princeton.edu/doc –Interfaces in other languages (http://wordnet.princeton.edu/links#local)http://wordnet.princeton.edu/links#local Java Perl Many others
6
03/08/20076 WordNet::Similarity Background –Ted Pedersen et al. What does it provide: –Semantic similarity between two words measured in various ways using WordNet –Need to understand the measures to make the best use Demo: –http://marimba.d.umn.edu/cgi-bin/similarity.cgihttp://marimba.d.umn.edu/cgi-bin/similarity.cgi
7
03/08/20077 WordNet::Similarity To use as a stand-alone program? –A Perl script to call from command line –Web interface To modify or integrate with my program? –A Perl module –Online API with details and examples
8
03/08/20078 Ngram Statistics Package What does it provide: –N-grams from a corpus ranked by a user- selected statistical measure of association (e.g. mutual information, chi-squared test)
9
03/08/20079 Ngram Statistics Package To use as a stand-alone program? –count.pl, statistic.pl –Input can be flat text –Regular expressions to define tokens can be specified by the user To modify or integrate with my program? –Perl module –Online API with details and examples –User can define new statistical measures of association
10
03/08/200710 LingPipe: Significant Phrases What does it provide: –Collocations (similar to NSP) –Relatively new terms Foreground vs. background Web application: Amazon “SIPs”, Yahoo “Buzz Index”, Google “in the news” http://www.alias- i.com/lingpipe/demos/tutorial/interestingPhrases/re ad-me.htmlhttp://www.alias- i.com/lingpipe/demos/tutorial/interestingPhrases/re ad-me.html
11
03/08/200711 POS Taggers What do they provide? –POS tags How many POS tags are there? –Penn Treebank Tag Set http://www.cis.upenn.edu/~treebank/ http://www.cis.upenn.edu/~treebank/ –Which tags are useful to your task?
12
03/08/200712 Brill Tagger Background –Eric Brill, PhD thesis, U Penn, 1993 –Transformation-based error-driven learning Accuracy and speed –~96% –~5000 sentences ~4 seconds
13
03/08/200713 Brill Tagger To use as a stand-alone program? –Call from command line –Input must be one sentence per line, tokenized E.g. We ’re going today, are you ? To modify or integrate with my program? –No API
14
03/08/200714 Charniak Parser Background –Eugene Charniak, Brown University –State-of-the-art What does it provide? –Syntactic parse tree
15
03/08/200715 Charniak Parser To use as a stand-alone program? –Call from command line –Input must be one sentence per line To modify or integrate with my program? –No API
16
03/08/200716 Collins Parser Background –Michael Collins, PhD thesis, U Penn, 1999 –Head-driven statistical models What does it provide? –Syntactic parse trees –Head word for each production (dependency relations, but no relation labels)
17
03/08/200717 Collins Parser To use as a stand-alone program? –Call from command line –Input must be one sentence per line, tokenized, POS tagged To modify or integrate with my program? –No API
18
03/08/200718 MiniPar Background –Dekang Lin, U Alberta What does it provide? –Dependency parse trees –Dependency relation labels Accuracy and speed –~88% precision, ~80% recall for dependency relations –300 words / second (Pentium II 300, 128MB)
19
03/08/200719 Examples of Dependency Relations The Fulton County Grand Jury said Friday an investigation of Atlanta 's recent primary election produced… say V:s:N Fulton County Grand Jury Fulton County Grand Jury N:det:Det the Fulton County Grand Jury N:lex-mod:U Fulton Fulton County Grand Jury N:lex-mod:U County Fulton County Grand Jury N:lex-mod:U Grand say V:subj:N Fulton County Grand Jury say V:guest:N Friday produce V:s:N investigation investigation N:det:Det an investigation N:mod:Prep of
20
03/08/200720 MiniPar To use as a stand-alone program? –A command line program –Input must be one sentence per line To modify or integrate with my program? –API in C –Parse tree and dependency relations are stored in some data structure for easy access
21
03/08/200721 Comparison of Parsers Accuracy: –Charniak > Collins > MiniPar Dependency relations: –Collins, MiniPar Dependency relation labels: –MiniPar Speed –MiniPar
22
03/08/200722 Chunkers (Shallow Parsers) What do they provide? –Phrase structure of a sentence –E.g. [NP He] [VP reckons] [NP the current account deficit] [VP will narrow] [PP to] [NP only 1.8 billion] [PP in] [NP September] Compare with collocations
23
03/08/200723 Named Entity Recognizers What do they provide? –Named entities of various pre-defined types (e.g. Person, Location, Organization, Number, etc.)
24
03/08/200724 SNoW-based Tools Use SNoW as the underlying learner In C++ API available for many components
25
03/08/200725 SNoW-based Tools Sentence splitter Tokenizer POS tagger Dependency parser Chunker NE tagger SRL
26
03/08/200726 OpenNLP Java-based, open source project Maximum entropy models Pipeline structure –Sentence detector tokenizer POS tagger Chunker Java API
27
03/08/200727 OpenNLP Sentence boundary detector Tokenizer POS tagger Chunker Parser Name Finder Coreference
28
03/08/200728 LingPipe Java-based libraries for various linguistic analysis http://www.alias-i.com/lingpipe/index.html
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.