Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Linguist’s Search Engine 02/04/2004. Background Address: Address:

Similar presentations


Presentation on theme: "The Linguist’s Search Engine 02/04/2004. Background Address: Address:"— Presentation transcript:

1 The Linguist’s Search Engine 02/04/2004

2 Background Address: http://lse.umiacs.umd.edu/ Address: http://lse.umiacs.umd.edu/http://lse.umiacs.umd.edu/ Developed at the University of Maryland by Resnik, Elkiss et al. in collaboration with Fellbaum (Princeton) and Olsen (Microsoft). Developed at the University of Maryland by Resnik, Elkiss et al. in collaboration with Fellbaum (Princeton) and Olsen (Microsoft). Accessible to a general audience since 20 January 2004 (brand new!) Accessible to a general audience since 20 January 2004 (brand new!) No fees or complicated registration process No fees or complicated registration process

3 Some Facts – Built-in Corpus Preprocessed corpus of about three million sentences taken from the Internet Archive www.archive.org Preprocessed corpus of about three million sentences taken from the Internet Archive www.archive.org www.archive.org Automatically annotated in Penn Treebank style syntactic bracketing Automatically annotated in Penn Treebank style syntactic bracketing Relies on computational linguistic tools (such as MXTERMINATOR, MXPOST, Charniak’s stochastic parser, the Minipar Parser, Wordnet, etc.) Relies on computational linguistic tools (such as MXTERMINATOR, MXPOST, Charniak’s stochastic parser, the Minipar Parser, Wordnet, etc.)

4 Searching the built-in corpus Nice features: Nice features: –Query by example –Limited regular expressions support (e.g. disjunction, negation) –Wordnet relations are supported –Save queries for later reuse –Offensive content filter (for less embarrassing live demonstrations) Problems: Problems: –Only English is supported (without even once mentioning this fact anywhere in the documentation!)

5 Demo – Simple Search Simple search of the built-in corpus Simple search of the built-in corpus –Query by example Search for of-genitive constructions Search for of-genitive constructions –Query by hand Search for ‘s-genitives where the possessor is not a proper name (i.e. NNP / NNPS) Search for ‘s-genitives where the possessor is not a proper name (i.e. NNP / NNPS) Searching for synonyms of fearsome: fearsome#a#1/syns Searching for synonyms of fearsome: fearsome#a#1/syns GO TO THE LSE GO TO THE LSE

6 Some Facts – Customized Corpora You can build your own collection of sentences and have them annotated You can build your own collection of sentences and have them annotated Uses AltaVista as a basis for web-wide search www.altavista.com (about 1.000.000 pages) Uses AltaVista as a basis for web-wide search www.altavista.com (about 1.000.000 pages) www.altavista.com Extracts sentences from retrieved pages and annotates them Extracts sentences from retrieved pages and annotates them Job-based with fair scheduling procedures Job-based with fair scheduling procedures Query syntax restricted to AltaVista queries plus expansion of inflectional forms Query syntax restricted to AltaVista queries plus expansion of inflectional forms

7 Demo – Customized Collection Demo search on a collection of sentences with the verb give Demo search on a collection of sentences with the verb give How to start a new collection How to start a new collection GO TO THE LSE GO TO THE LSE

8 Further Information LSE Starter’s Guide: lse.umiacs.umd.edu/lse_guide.html LSE Starter’s Guide: lse.umiacs.umd.edu/lse_guide.htmllse.umiacs.umd.edu/lse_guide.html LSE User’s Guide: lse.umiacs.umd.edu/lseuser/lseuser.pdf LSE User’s Guide: lse.umiacs.umd.edu/lseuser/lseuser.pdf lse.umiacs.umd.edu/lseuser/lseuser.pdf LSE Users’ Forum: lse.umiacs.umd.edu/forum LSE Users’ Forum: lse.umiacs.umd.edu/forumlse.umiacs.umd.edu/forum AltaVista Documentation: www.altavista.com/help/search/help_adv AltaVista Documentation: www.altavista.com/help/search/help_adv www.altavista.com/help/search/help_adv Penn Tagset: www.computing.dcu.ie/~acahill/tagset.html Penn Tagset: www.computing.dcu.ie/~acahill/tagset.html www.computing.dcu.ie/~acahill/tagset.html Still ugly but flexible alternative: www.stanford.edu/~jstrunk/ Still ugly but flexible alternative: www.stanford.edu/~jstrunk/ www.stanford.edu/~jstrunk/


Download ppt "The Linguist’s Search Engine 02/04/2004. Background Address: Address:"

Similar presentations


Ads by Google