Presentation is loading. Please wait.

Presentation is loading. Please wait.

Course G22.2580 - Web Search Engines 3/9/2011 Wei Xu

Similar presentations


Presentation on theme: "Course G22.2580 - Web Search Engines 3/9/2011 Wei Xu"— Presentation transcript:

1 Course G22.2580 - Web Search Engines 3/9/2011 Wei Xu xuwei@cs.nyu.edu

2  WordNet®  a large lexical database of English  a combination of dictionary and thesaurus  created and maintained by Cognitive Science Lab of Princeton University  designed to establish the connections between words

3  http://wordnet.princeton.edu/

4  WORDnet  4 types of Parts of Speech (POS) ▪ Noun, Verb, Adjective, Adverb  Synset ▪ the smallest unit in WordNet ▪ a synonym set ▪ Represent a specific meaning of a word

5  wordNET  Synsets are connected to one anther through semantic and lexical relations  Type of relations (based on POS) ▪ hypernyms (kind-of): ‘vehicle’ is a hypernym of ‘car’ ▪ hyponyms (kind-of): ‘car’ is a hyponym of ‘vehicle’ ▪ holonym (part-of): ‘building’ is a holonym of ‘window’ ▪ meronym(part-of): ‘window’ is a meronym of ‘building’ ▪ similar to: ‘smart’ is similar to ‘intelligent’ ▪ antonyms: ‘smart’ is antonym of ‘unintelligent’

6 hypernym hyponym

7  Unix-style manual  Web Interfaces  Local Interfaces/APIs  Java  Perl  C# http://wordnet.princeton.edu/wordnet/related- projects/#web

8  Definition:  the process for removing suffixes of words to get their base or root form  Example:  ‘fishing’, ‘fished’, ‘fish’, ‘fisher’  ‘fish’

9  Porter Stemmer  http://tartarus.org/~martin/PorterStemmer/ http://tartarus.org/~martin/PorterStemmer/  Krovetz Stemmer (in Lemur package)  http://www.lemurproject.org/phorum/read.php?1 1,1394 http://www.lemurproject.org/phorum/read.php?1 1,1394  WordNet Stemmer  http://tipsandtricks.runicsoft.com/Other/JavaSte mmer.html http://tipsandtricks.runicsoft.com/Other/JavaSte mmer.html

10  Tokenization  The process of breaking a stream of text up into “words” and punctuation marks.  Sentence Splitting  Part of Speech Tagging  Example: He/PRP 's/VBZ at/IN peace/NN with/IN the/DT house/NN and/CC could/MD stay/VB there/RB indefinitely/RB./.

11  Name Entity Recognition  The process of labeling sequences of words which are the names of things, such as person, company, location names.  Example: Jim bought 300 shares of Acme Corp. in 2006.

12  Stanford POS tagger  http://nlp.stanford.edu/software/tagger.shtml http://nlp.stanford.edu/software/tagger.shtml  Stanford NER  http://nlp.stanford.edu/software/CRF-NER.shtml http://nlp.stanford.edu/software/CRF-NER.shtml  GATE  http://gate.ac.uk/ http://gate.ac.uk/  JET  http://cs.nyu.edu/grishman/jet/license.html http://cs.nyu.edu/grishman/jet/license.html  http://www.cs.nyu.edu/courses/spring10/G22.2590- 001/schedule.html http://www.cs.nyu.edu/courses/spring10/G22.2590- 001/schedule.html


Download ppt "Course G22.2580 - Web Search Engines 3/9/2011 Wei Xu"

Similar presentations


Ads by Google