Presentation is loading. Please wait.

Presentation is loading. Please wait.

Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.

Similar presentations


Presentation on theme: "Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008."— Presentation transcript:

1 Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008

2 Trends in NL Analysis General Technologies Information Retrieval - identify relevant documents from a collection of docs Text Mining - measures word-word, word-passage and passage-passage relations. Not just frequency of word in a document but also looks a word in local contexts over very large numbers of occurrences: word in sentence, in paragraph Computational Linguistics Natural Language Processing Human Language Technologies

3 Trends in NL Analysis Examples where NL Analysis Can Help in the Future www.krugle.org - Search open source code repositorieswww.krugle.org www.markmail.org - Search mail of open source projectswww.markmail.org Both of these sites use XML tagging based on simple text analysis of the source documents to add more structure and improve the searchability of the documents. If you try these sites out, think about how you might improve your ability to find information you need by extending text analysis and semantic metadata.

4 Trends in NL Analysis

5

6 Identification of Text Content Indexing – going beyond keywords to capture semantics Distribution Hypothesis: “a word is recognized by the company it keeps” Concept Discovery: extraction to semantic category - “this article is about water, juice and Pepsi” => “drink” - Latent Semantic Analysis - Lexicons - Category trees, ontologies - WordNet, FrameNet, Semantic Web

7 Trends in NL Analysis Text Content Named Entity Recognition (NER) People people by, people from, people in, births, deaths, by occupation, surname, given name, biography,... Organizations company, team, business, media by, political party, club, union, newpaper, church,... Places / GeoPolitical Entities city, town, village, state, province, country, territory,... Money Vehicles Dates

8 Trends in NL Analysis Identification of Text Content Indexing (cont'd) Latent Semantic Analysis: measures associations of words by frequency of occurrence within a document

9 Trends in NL Analysis Text Content Key Problems Addressed: ✔ Measure similarity in word meanings ✔ Classify relations between words ✔ Discover different senses of words ✔ Extract keywords from documents Major Approaches to identifying meaning ✔ Lexicon based – labor intensive ✔ Statistical semantics – focus on meanings of common words and relations between common words discovered through algorithms applied to corpora Strong interest in including learning mechanisms into methods being used

10 Trends in NL Analysis Categorization and correlations of words

11 Trends in NL Analysis Identification of Text Content Semantic Analysis FrameNet -

12 Trends in NL Analysis Identification of Text Content Semantic Analysis FrameNet -

13 Trends in NL Analysis Text Summarization Selection of specific sentences or phrases from the source document Example: Snippets in search results - First sentence Assumed to contain basic orientation for user - Whole or partial sentences with query phrase keywords - Shallow domain knowledge (such as standard form) may guide selection and output Traditional Approach Process: Selected relevant segments are extracted

14 Trends in NL Analysis Text Summarization Summary should accurately reflect source content and expression Thus for less structured texts, there can be significant variation. Compare: - Form based documents – e.g., Insurance vs. - Free text – office memos, email, discussion threads Good summarization of free text is typically human labor intensive. Consider: summarizing discussion threads Basic Problem Emphasis is on identifying key content across multiple sentences, paragraphs and documents, rather than having good representations of each individual sentence.

15 Trends in NL Analysis Text Summarization Example of Domain Knowledge Co-occurrence of “unidentified assailant” and “terrorist attack” leads to assumption of assailant as performing the attack.

16 Trends in NL Analysis Text Summarization Process: - Categorization (key classification terms) - Identification of specific domain(s) - Patterns (e.g., script: joint venture creation; marketing campaign; steps in argumentation) - Restatement of content Therefore: shallow statistical processing, lexicon, shallow parsing, semantic webs, and natural language generation. User profile (query keywords, domain expertise, etc.) may constrain output. Challenge

17 Trends in NL Analysis Thank you for your attention!


Download ppt "Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008."

Similar presentations


Ads by Google