Presentation is loading. Please wait.

Presentation is loading. Please wait.

Natural Language Processing: Overview & Current Applications

Similar presentations


Presentation on theme: "Natural Language Processing: Overview & Current Applications"— Presentation transcript:

1 Natural Language Processing: Overview & Current Applications
Noriko Tomuro April 7, 2006

2 What is NLP? Natural Language Processing (NLP) is a field in Computer Science devoted to creating computers that use natural language as input and/or output.

3 NLP involves other disciplines..
Linguistics NLP is also called ”Computational Linguistics” Psychology Mathematics and Statistics Information Theory

4 Machines that Can Speak (1)
HAL 9000 in “2001: A Space Odyssey”

5 Machines that Can Speak (2)
C3PO in Star Wars KITT in Knight Rider

6 But Still a Sci-Fi… “However, as 2001 approached it became clear that 2001's predictions in computer technology were far fetched. Natural language, lip reading, planning and plain common sense in computers were still the stuff of science fiction.” HAL Wikipedia

7 NLP is AI-complete “The most difficult problems in AI manifest themselves in human language phenomena.” Use of language is the touchstone of intelligent behavior.

8 Test for Intelligence – Turing Test
Alan Turing (1950) proposed a test of a machine's capability to perform human-like conversation. A human judge engages in a natural language conversation with two other parties, one a human and the other a machine; if the judge cannot reliably tell which is which, then the machine is said to pass the test.

9 Early Conversational Programs
ELIZA (by Joseph Weizenbaum), 1966 A psychotherapist No real understanding; simple pattern-matching to respond to user input (my ?x depresses me) (why does your ?x depress you) 5 (life ?x) (why do you say it ?x) 3 (I could ?x) (you could ?x) 2 (because ?x) (that is a good reason) 3 (?x) (tell me more) 0

10 Loebner Prize & Chatterbots/Chatbots (1)
In 1990, Hugh Loebner started an annual Turing Test competition. Conversational Programs are nowadays called Chatterbots (or Chatbots). $100,000 to the first bot that judges cannot distinguish from a real human in a Turing test that includes text, visual, and auditory input. The Prize dissolves once the $100,000 prize is won.

11 Loebner Prize & Chatterbots/Chatbots (2)
Nobody has won the prize yet.

12 NLP Applications NLP can be stand-along applications or components embedded in other systems. NLP components/tasks include: Part-of-speech tagging Named Entity identification Chunking (Partial parsing) I discuss some current NLP stand-alone applications.

13 1. Machine Translation (MT)
One of the very earliest pursuits in computer science (after WWII). Broad application domain – military to literary to search engines Basic approaches: Inter-lingual (rule-based) Direct translation (corpus-based)  more popular these days

14 Let’s translate! Google includes a MT engine (based on SYSTRAN system developed in EC). Let’s translate “Saddam Hussein has dismissed evidence suggesting he approved the execution of people under 18.” (BBC world news, April 5, 2006)

15 2. Text Summarization Create a summary of a text or texts.
Many difficult problems, including: Paraphrases Anaphora (e.g.“it”, “they”)

16 3. Question Answering Finds an answer (not a document) to a question typed in as a natural language sentence (not keywords). Most systems can only answer simple, trivial pursuit type questions. Ask.com FAQFinder Some search engines perform limited, phrase-based Q&A, e.g. Google

17 Let’s ask questions! “Who wrote “The Da Vinci Code”? ”
“What is the longest river in Australia?” “What is the name of the main character in James Joyce's “Ulysses"? “

18

19

20

21 4. Analyzing Web Documents
Recently there have been many NLP applications which analyze (not just retrieve) web documents Blogs – for semantic analysis, sentiment (polarity/opinion) identification Spam Filtering – but most often systems utilize simple word probability A general approach “Web as a corpus” – web as the vast collection of documents

22 5. Tutoring Systems We’ll hear from Peter shortly 

23 Why is NLP so hard..? Understanding natural languages is hard … because of inherent ambiguity Engineering NLP systems is also hard … because of Huge amount of data resource needed (e.g. grammar, dictionary, documents to extract statistics from) Computational complexity (intractable) of analyzing a sentence

24 Ambiguity (1) “At last, a computer that understands you like your mother.” Three possible (syntactic) interpretations: It understands you as well as your mother understands you. It understands that you like your mother. It understands you as well as it understands your mother.

25 Ambiguity (2) At the acoustic level,
“.. computer that understands you like your mother” “.. computer that understands you lie cured mother” At the semantic level, a “mother” is: a female parent ? a cask used in vinegar-making ?

26 Ambiguity (3) Word segmentation e.g. 社長兼業務部長 Possibilities include:
- 社長 兼 業務 部長 president both business general-manager - 社長 兼業 務 部長 president multi-business Tsutomu general-manager

27 Ambiguity -- summary Ambiguity occurs at different levels.
To resolve ambiguities, deep linguistic (as well as common-sense) knowledge is required. Two immediate ramifications: Knowledge bottleneck – How do we acquire and encode ALL such information? Computational Complexity – O(cn), exponential w.r.t. the length of the sentence

28 The Bottom Line Complete NL Understanding (thus general intelligence) is impossible. But we can make incremental progress. Also we have made successes in limited domains. [But NLP is costly – Lots of work and resources are needed, but the return is sometimes not worth it.]

29 Sentence Analysis “John ate the cake” Syntactic structure
Semantic structure S NP V “John” “ate” “the cake” (ACTION ingest (ACTOR John-1) (PATIENT food))

30 Syntactic Parsing The process of deriving the phrase structure of a sentence is called “parsing”. The structure (often represented by a Context Free parse tree) is based on the grammar. S NP VP V “John” “ate” “the” Det N “cake” Grammar R0: R1: R2: R3: R4: R5: R6: R7: cake" " N the" Det ate" V John" NP VG VP S

31 Parsing Algorithms Top-down Parsing -- (top-down) derivation
Bottom-up Parsing Chart Parsing Earley’s Algorithm – most efficient, O(n3) Left-corner Parsing – optimization of Earley’s and lots more…

32 (Bottom-up) Chart Parsing
“John ate the cake” Grammar (2) shift 2 (1) shift 1 “John” (7) shift 2 (9) reduce (10) reduce (5) shift 2 (3) shift 1 “ate” (6) shift 1 “the” (8) shift 1 “cake” (4) shift 2 (11) reduce --- 1 2 3 4

33 Earley’s Algorithm “John ate the cake” 0 1 2 3 4 Grammar (2) scanner
Grammar (2) scanner “John” (4) predictor (5) scanner “ate” (7) predictor (8) scanner “the” (9) scanner “cake” (10) completor (11) completor (1) predictor (3) predictor (6) completor (12) completor

34 Demo using my CF parser

35 Probabilistic Parsing
For ambiguous sentences, we’d like to know which parse tree is more likely than others. So we must assign probability to each parse tree … but how? A probability of a parse tree t is where r is a rule used in t. and p(r) is obtained from a (annotated) corpus.

36 Partial Parsing Parsing fails when the coverage of the grammar is not complete – but it’s almost impossible to write out all legal syntax (without accepting ungrammatical sentences). We’d like to at least get pieces even when full parsing fails. Why not abandon full parsing and aim for partial parsing from the start…

37 Semantic Analysis (1) Derive the meaning of a sentence.
Often applied on the result of syntactic analysis. “John ate the cake.” NP V NP ((action INGEST) ; syntactic verb (actor JOHN-01) ; syntactic subj (object FOOD)) ; syntactic obj To do semantic analysis, we need a (semantic) dictionary (e.g. WordNet,

38 Semantic Analysis (2) Semantics is a double-edged sword…
Can resolve syntactic ambiguity “I saw a man on the hill with a telescope” “I saw a man on the hill with a hat” But introduces semantic ambiguity “She walked towards the bank” But in human sentence processing, we seem to resolve both types of ambiguities simultaneously (and in linear time)…

39 Demo using my Unification parser


Download ppt "Natural Language Processing: Overview & Current Applications"

Similar presentations


Ads by Google