Presentation is loading. Please wait.

Presentation is loading. Please wait.

Natural Language Processing A language is defined as a set of strings without reference to any world being described or task to be performed. By studying.

Similar presentations


Presentation on theme: "Natural Language Processing A language is defined as a set of strings without reference to any world being described or task to be performed. By studying."— Presentation transcript:

1 Natural Language Processing A language is defined as a set of strings without reference to any world being described or task to be performed. By studying the language knowledge about the world is acquired. Acquisition can be in the form of : written text, speech/voice, images /patterns etc….. Natural language means a native langauge like Hindi, English, French,Urdu etc… For a NLP m/c requirement is how to : “Generate, Understand and Translate” By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

2 State of The Art NLP includes both understanding & generation. This is a subfield of AI and Linguistics deals with “problems of automated Generation and Understanding of language” Conversion of computer database info into normal sounding human language. Samples of human language are converted to more formal representations that are easier for computer programs to manipulate. “NLU is a AI complete problem” “Definition of understanding is a major problem in NLP system” Understanding something is to transform one representation into another By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

3 Entire NLP problem cn be sub-divided as: (i) Processing of written text using Lexical, Syntactic & Semantic Knowledge of language as well as real world information. (ii) Processing spoken language, using all info. Needed plus additional knowledge of “Phonolgy & Ambiguity Resolving ” Idea is to control a m/c by talking them in our native language a interactive manner. This requires firstly to find the underlying task and goal. “Natural language is ambiguous so it leads to difficulty in processing at various levels of Knowledge Domain” Till date human linguistics communication in speech form are used majorly as compared to written text. By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

4 NLP methodology and the concerned problem domain have attracted the researchers & educationalist from different areas and discipline of knowledge such : Classical & Computational Linguistics Computer Sc. & Engg. Psycholinguistics Statistics “Open domain Question & answers are required. Multi document summarization and info. Interaction are required in a wide variety of languages”. Current Problems are : 1.Ambiguity at written as well as speech level. 2.Discourse Analysis. 3.Generation of various degrees of complexities in a Intelligent System. 4.Knowledge acquisition methods to incorporate data in World Net, Lexicon Methodology, KB system for Multi-Lingual text classification and Hyperlinking. By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

5 Natural language constructs are made up of an infinite no. of sentences. So Much ambiguity in Natural Language Constructs. Levels of Ambiguity 1.Syntactic ambiguity: Syntax relates to the structure of language, how the word are put together? “Can be more than one correct interpretations for a same sentence”. E.g : “I hit the man with the hammer”. Was the hammer the weapon used or was it in the hand of the victim? E.g : Back can be : an adverb (go back), an adjective (back door), a noun (the back of room) or a verb (back up your files) By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

6 2. Lexical ambiguity: Ambiguity in lexemes i.e. words having more than one meanings. eg: I went to the bank. Now whether Bank is finance org. or river bank…… 3. Referential ambiguity: Concerned with what the sentence refers to ? It my refer to more than one thing. E. g: “Ram killed Ravana because he liked Sita”. Who liked Sita, (Ram or Ravana) ? 4. Semantic level ambiguity: Ambiguity in meaning associated with a single sentence. E.g: He saw her duck. Whether he dip down or saw a web footed bird. Semantic ambiguity can also occur if no lexical /syntactic ambiguity E.g : A sentence “cat person” can be someone who likes felines…. or it may be the lead of movie ” Attack of the cat people”. By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

7 5. Pragmatic ambiguity: Level of interpretation within its context i.e. a same word /phrase may be interpreted differently in two distinct contexts/situations. E.g: “I went to the doctor yesterday “. Here yesterday depends on the context, when the sentence was spoken. Example: (i) I waited for a long time at the bank. (ii) There is a drought because it hasn’t rained for a long time. (iii) Dinosaurs have been extinct for a long time. “In above three sentences phrase a long time refers to different time intervals depending on their context”. By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

8 Levels of Knowledge used in NLU Phonological Knowledge: “ Phoneme is the smallest unit of sound and relates to the sound of word”. This may lead to phonetic ambiguity in speech recognition system due to different accent used by different people from different parts/region. Syntactic Knowledge: How words are arranged together to form a coherent, grammatically correct sentence. Semantic Knowledge: Relates to the meaning of the word/phrases & how they combine to form a meaningful sentence. Morphological Knowledge : Word construction from Morphemes. Pragmatic Knowledge: Relates to the use of sentences in different contexts & how contexts affects meaning of sentence. Word Knowledge: Language of the user to carry out conversation. By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

9 Computational Model of Language Processing ** Naom Chomsky developed the theory of language processing. ** Designed Chomsky Classification/Chomsky Grammar  Syntactic Analysis  Semantic Analysis  Pragmatic Analysis  Morphological analysis  Discourse Integration ** Discourse is any string of language ususally one that is more than one sentence long. Eg: text books, novels, Web page, weather reports etc….  Meaning of a sentence may depend on preceding as well as up coming words & phrases. By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

10 E.g: “ Ram wanted it ”. ** In this sentence it depends on the prior dicourse, like a CAR which Ram wants to purchase. ** Where as in “he purchased the car”, a next coming sentence, he is influenced by Ram in the previous sentence. Note: This type of interpretation is of a PRONOUN/DEFINITE NOUN PHRASE which refers to the world object/entity/Agent. Choosing the best referent is a process of disambiguation, depending on combining variation in Syntactic, Semantic & Pragmatic info. Pronouns must agree in gender and number with their antecedents : he can refer to Bobby not Arisha. they can refer to a group, not a single person By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

11 An Example Sentence Arisha dropped the cup on the plate. Above sentence pose a problem that “Not clear whether cup /plate is referent of it (ambiguity at referential level). Now consider a larger context: Arisha was fond of the blue cup. The cup was presented to her by her mother. Unfortunately, one day while washing utensils, Arisha dropped the cup on the plate. It broke. Here cup is the focus of attention and hence is the referent (Ambiguity resolved) By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

12 Parsing /Syntax analysis Two components (i) declarative representation, called grammar, of syntactic facts about the language. (ii) A procedure called a parser, that compares the grammar against i/p sentences to produce parsed structure. Formal Language: “ Infinite set of strings”. Each string is concatenation of terminal symbols, also called words. e.g: Java, First order predicate logic, C, C++ etc. These languages have strict mathematical definitions as compared to natural language like Hindi, English. Formal Grammar: G= { V, T, S, P } V is the set of variables or non-terminals.Usually written in Upper Case T is the finite set of terminals or lexemes or tokens, (Lower Case) S is the start symbol of grammar rules. P is the set of productions of the form By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

13 Key Points for Natural Language Grammar (e.g: English) Most grammar rule formalisms are based on the idea of phrase structure i.e. strings are composed of sub strings called phrases Example : Noun Phrase (NP), Verb phrase (VP), Prepositional Phrase (PP), Adverb Phrase (ADVP)…… Here NP, VP, PP, ADVP are all Non terminals/variables of formal grammar for a English sentence. Other non –terminals can be Noun (N), Verb (V), Preposition (P), Articles (ART), Determiners (DET like a, an, the ). ART and DET can be used interchangeably. Terminals/Lexemes/Tokens can be words like: a, an, the, Ram, Joseph, run, upon, into, put,good, long, very, fast, etc………infinitely By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

14 Example: “Joseph ate the chicken” Grammar rules of G: S → NP VP NP → ART N PP → PREP NP VP → V | NP | V NP PP | V PP N → Ram | Joseph | tree | tea | road | chicken V → ate | walk | drink | sit AUXV → is | am | are | was | were PREP → with | under| into | on ART → a | an | the V= { S, NP, VP, PP, PREP, ART, N, V, AUXV }, set of non terminals T = { Joseph, ate, the chicken } S is start symbol of grammar G. By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

15 Parsing Techniques Top down Parsing S → NP VP → N VP → Joseph VP → Joseph V NP → Joseph ate ART N → Joseph ate the N → Joseph ate the chicken Bottom up Parsing → Joseph ate the chicken → N ate the chicken → N V the chicken → N V ART chicken → N V ART N → NP V NP → NP VP → S Top down & Bottom up parsing By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

16 I/P String Parser LEXICON O/P representation structure O/P representation structure to find the meaning of a word, parser access to lexicon. While selecting a word from i/p stream parser locates the word in lexicon Extracts possible meanings, attributes, syntax, semantics of that word. By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

17 “ Lexicon is the dictionary words (like morphemes, tokens, lexemes, phonemes) containing syntactic, semantic, pragmatic knowledge “ Organization & enteries of lexicons vary from one implementation to another. Usually made up of variable length data structures such as lists, dynamic arrays, arranged in alphabetical order Depending upon usage frequency of words (e.g : a, an, the, to, by,of, from etc…) lists can be initialized with these words to minimize the search time for locating lexemes. Access of words can be facilitated by : Indexing Binary search Hashing By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

18 Knowledge Based System Approaches in NLP 1. SHRDLU System developed by Winograd at MIT in 1970’s Controls a robot in a restricted “Blocks ” domain. No. of blocks of various shapes, size, colors, textures. Robot can manipulate the blocks world as per instructions given in natural language. Example: Instructions can be 1.Find a block which is taller than the one you are holding & place it in the box. Refer. Ambiguity. It refers to what?) 2.How many blocks are on the top of the green block? (Semantic ambiguity) 3. Put the red pyramid on the block in the box. (Syntactic Ambiguity, either block is in the box or red pyramid) By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

19 2. Information matching & Extraction Knowledge based system extraction/machine learning methods are deployed for rapid prototyping techniques and incorporating data acquisition. Set of events, objects & their attributes built a Word Model. Supports inheritance and transforms word model to Discourse model specific to a particular text. 3. Machine Translation Began in 1950s….Norbert Weiner translated Russian script to English IBM also worked on this…. IBM introduced statistical approach to language & parameter estimation in m/c translation through Mathematical Models…… E.g: Hidden Markov Model (HMM), Boolean keyword model, probabilistic model based on Bayesian Classification By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

20 Machine Translation Approaches Direct m/c translation Rule Based Transl.Corpus based transl. knowledge Based Transl. Transfer based m/c Translation. Interlingua Based m/c Translation By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

21 Direct Machine Translation This carries out word by word translation with the help of a bilingual dictionary, usually followed by some syntactic arrangement Monolithic Approach is followed i.e “Consider all the details of one language pair”. Little analysis of source text required, no parsing. Source text Morphological Text Lexical transfer using Bilingual dictionary Local reordering Target language text By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

22 Corpus based m/c translation(CBMT) Also called data driven translation Overcomes the problem of knowledge acquisition in Rule Based m/c Translation (RBMT). Uses bilingual parallel corpus to obtain knowledge for new incoming translation. Fully automated, less human intervention as in RBMT Statistical Machine Translation (SMT) Uses bilingual corpus to learn translation models Uses monolingual corpus to learn the grammar of the target language. SMT models are trained on a sentence aligned translation corpus which is based on : 1.) n- gram modeling and 2.) probability distribution of some target language pair in a very large corpus. By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

23 Bilingual Corpus Maximize Probabilities From Models Maximize Probabilities From Models Monolingual Corpus Transl. model P(S/T) Language Model P(T) Tranl. Result T is target language, S is source language, Translation Probability P(S/T), P(T) is target language probability. By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

24 Advantages of SMT 1)No knowledge of linguistics required, so saves cost and time in knowledge acquisition from the Domain Experts 2). Expertise transfer is minimize. 3). Fast and less costly as compared to DMT. By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

25 Intelligent Computing Model from English to Sanskrit m/c Translation Input ES tokenizer POS target module GNP detection module Adverb Conversation table module By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

26 Tense & sentence detection module Sanskrit rule detection ANN based system Roop, Dhaatu detection Noun & object detection Word form generation Dhaatu form generation From GNP module By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

27 Concatenation of Kartaa, adjective, karma, adverb, verb Output Sanskrit Source From word form From dhaatu form Adverb conversation  GNP module detects the gender, number & person of Noun in the English sentence  Noun & object detection module gives nouns for Sanskrit of equivalent English noun.  Roop Dhaatu module gives verbs for Sanskrit of equivalent verbs.  ANN is a feed forward n/w, performs: Encoding of user data vector(UDV), I/O generation of UDV & finally Decoding of UDV. By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

28 What is Computer Vision? “ Computing properties of the 3D world from one or more digital images” Sockman and Shapiro: To make useful decisions about real physical objects and scenes based on sensed images Ballard and Brown: The construction of explicit, meaningful description of physical objects from images Forsyth and Ponce: Extracting descriptions of the world from pictures or sequences of pictures By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

29 What is in this image? 1. A hand holding a man? 2. A hand holding a mirrored sphere? 3. An Escher drawing? Interpretations are ambiguous The forward problem (graphics) is well-posed By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

30 What do you see? Changing viewpoint Moving light source Deforming Shape By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

31 What was happening? Changing viewpoint Moving light source Deforming shape By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

32 Why study Computer Vision? Images and movies are everywhere Fast-growing collection of useful applications building representations of the 3D world from pictures automated surveillance (who’s doing what) movie post-processing face recognition Various deep and attractive scientific mysteries How does object recognition work? Beautiful marriage of math, biology, physics, engineering Greater understanding of human vision By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

33 Some Objectives Segmentation Breaking images and video into meaningful pieces Reconstructing the 3D world – from multiple views – from shading – from structural models Recognition What are the objects in a scene? What is happening in a video? Control Obstacle avoidance Robots, machines, etc. By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

34 Applications: Touching your life Football Movies Surveillance HCI – hand gestures, American Sign Language Face recognition & Biometrics Road monitoring Industrial inspection Robotic control Autonomous driving Space: planetary exploration, docking Medicine – pathology, surgery, diagnosis Microscopy Military Remote Sensing By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

35 Image Interpretation - Cues Variation in appearance in multiple views – stereo – motion Shading & highlights Shadows Contours Texture Blur Geometric constraints Prior knowledge By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

36 ILLumination Variability “ The variations between the images of the same face due to illumination and viewing direction are almost always larger than image variations due to change in face identity.” By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

37 Early Vision in One Image Representing small patches of image – For three reasons We wish to establish correspondence between (say) points in different images, so we need to describe the neighborhood of the points Sharp changes are important in practice --- known as “edges”. Representing texture by giving some statistics of the different kinds of small patch present in the texture. E.g : “Tigers have lots of bars, few spots while Leopards are the other way” By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

38 Segmentation Which image components “belong together”? Belong together=lie on the same object Cues – similar color – similar texture – not separated by contour – form a suggestive shape when assembled By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

39 Boundary Detection: Local cues By: Anuj Khanna(Asst. Prof.) www.uptunotes.com

40 Boundary Detection Finding the Corpus Callosum By: Anuj Khanna(Asst. Prof.) www.uptunotes.com


Download ppt "Natural Language Processing A language is defined as a set of strings without reference to any world being described or task to be performed. By studying."

Similar presentations


Ads by Google