Presentation is loading. Please wait.

Presentation is loading. Please wait.

Vishal Vachhani CFILT and DIL, IIT Bombay CS 671 ICT For Development 19 th Sep 2008.

Similar presentations


Presentation on theme: "Vishal Vachhani CFILT and DIL, IIT Bombay CS 671 ICT For Development 19 th Sep 2008."— Presentation transcript:

1 Vishal Vachhani CFILT and DIL, IIT Bombay CS 671 ICT For Development 19 th Sep 2008

2 Agro Explorer A Meaning Based Multilingual Search Engine Vishal Vachhani2

3  Web-site for Indian farmers  Farmers can submit their problems related to their crops  Queries are answered by Agricultural Experts at KVK, Baramati  Languages supported: Marathi, Hindi, English Vishal Vachhani3

4 Why Need Multilingual Search  Vast Amount of Information available on the Web  Almost 70% of the Information is in English  The Indian rural populace is not English- Literate  “A Big Language Barrier”  Information has to be made available to them in their local languages. Vishal Vachhani4

5 Why Need Meaning Based Search  Most of the current Search Engines are Keyword Based.  They do not consider the semantics of the query  The result set contains a large number of extraneous documents.  Search based on the Meaning of the query will help narrow down on the desired information quickly. Vishal Vachhani5

6 6 Query in Hindi English Document System Marathi Document search English Document Result in Hindi

7 Vishal Vachhani7 Same Keywords Different Semantics Moneylenders Exploit Farmers Farmers Exploit Moneylenders Found 1 ResultFound 0 Result

8 Provides both  Meaning Based Search  Cross-Lingual Information Access Vishal Vachhani8

9 System Architecture Vishal Vachhani9

10 10

11 Vishal Vachhani11

12 Vishal Vachhani12

13 Vishal Vachhani13

14 Vishal Vachhani14

15 Conclusion Provides two independent features  Multi-Linguality  Meaning Based Search. Because of UNL both multi-lingual and meaning based properties can be incorporated together rather than using separate language translators in search engines. The scheme admits itself to Integration of multiple languages in a seamless, scalable manner. Vishal Vachhani15

16 Vishal Vachhani16 UNL Universal Networking Language

17 Vishal Vachhani17 UNL Englis h Frenc h Tam il Marath i Hind i

18  Direct translation - translation will be done directly - N*(N-1) translator are needed for N languages translation.  Intermediate Language - intermediate language will be used for language translation - Only 2*N translators are required. Vishal Vachhani18

19  UNL is an acronym for “Universal Networking Language”.  UNL is a computer language that enables computers to process information and knowledge across the language barriers.  UNL is a language for representing information and knowledge provided by natural languages  Unlike natural languages, UNL expressions are unambiguous. Vishal Vachhani19

20  Although the UNL is a language for computers, it has all the components of a natural language.  It is composed of Universal Words (UWs), Relations, Attributes.  Knowledge :semantic graph ◦ Nodes  concepts ◦ Arcs  relation between concepts Vishal Vachhani20

21  A UW represents simple or compound concepts. There are two classes of UWs: ◦ unit concepts ◦ compound structures of binary relations grouped together ( indicated with Compound UW-Ids)  A UW is made up of a character string (an English- language word) followed by a list of constraints. ◦ ::= [ ] ◦ example  state(icl>express)  state(icl>country) Vishal Vachhani21

22 ◦ A relation label is represented as strings of 3 characters or less. ◦ The relations between UWs are binary.  rel (UW1, UW2) ◦ They have different labels according to the different roles they play. ◦ At present, there are 46 relations in UNL ◦ For example, agt (agent), ins (instrument), pur (purpose), etc. Vishal Vachhani22

23  Attribute labels express additional information about the Universal Words that appear in a sentence. ◦ They show what is said from the speaker’s point of view; how the speaker views what is said. (time, reference, emphasis, attitude, etc) ◦ @entry, @present, @progressive, @topic, etc. Vishal Vachhani23

24 Example: Ram eats rice. {unl} agt(eat.@entry.@present, Ram) obj(eat.@entry.@present, rice(icl>eatable)) {/unl} Vishal Vachhani24

25 Vishal Vachhani25 Ram eat rice plcagt

26 Example: The boy who works here went to school. {unl} agt(go(icl>move).@entry.@past, :01) plt(go(icl>occur).@entry.@past,school(icl>institutio n)) agt:01(work(icl>do), boy(icl>person.@entry)) plc:01(work(icl>do),here) {/unl} Vishal Vachhani26

27 Vishal Vachhani27 agt plc plt agt go here workschool boy :01

28 Vishal Vachhani28 Enconvertor Intermediate Language Deconvertor Source language target language

29  It’s a Language Independent Generator  It can deconvert UNL expressions into a variety of native languages, using a number of linguistic data such as Word Dictionary, Grammatical Rules of each language.  The DeConverter transforms the sentence represented by a UNL expression into Natural language sentence. Vishal Vachhani29

30 Vishal Vachhani30

31 Vishal Vachhani31 Dictionary Syntax Planning Rules UNL Parser Case Marking Module Morphology Module Syntax Planning Module Case Marking Rules Morphology Rules UNL Doc Hind iDoc Language dependent Module Language Independent Module

32 UNL parser module will do following tasks –Check input format of UNL document –Separate attributes form UWs –Separate attributes form dictionary entries –Replace UWs with Hindi root words

33  Category of morpho-syntactic properties which distinguish the various relations that a noun phrase may bear to a governing head.  ने, पर, के, से, पे,etc.  A rule base based on : ◦ UNL attributes ◦ lexical attributes from dictionary Vishal Vachhani33

34  Case marking is implemented using rules.  We analyze all UNL as well as dictionary attributes and decide next and previous case marker.  Also we use relation with parent to extract the right case mark. Vishal Vachhani34

35  agt:null:null:null: ने :@past#V:VINT:N:null  Structure ◦ relName : ◦ parent previous case marker: ◦ parent next case marker: ◦ child previous case marker: ◦ child next case marker: ◦ the rest four are in form of ◦ attr'REL'relationname ◦ and attr will be separated by # ◦ also relation name are separated by # Vishal Vachhani35

36  What is Morphology ◦ Study of Morphemes ◦ Their formation into words, including inflection, derivation and composition Vishal Vachhani36

37  Noun, Verb and Adjective Morphology ◦ Depends on the phonetic properties of the Hindi word  Noun Morphology ◦ Depends on gender, number and vowel ending of the noun  Adjective Morphology ◦ अच्छा लडका, अच्छी लडकी, अच्छे लडके ◦ adjective अच्छ changes, lexical attribute “AdjA”  Verb Morphology ◦ Depends upon tense, gender, number, person etc. Vishal Vachhani37

38  Verbs are categorized by ◦ Tense (past,present,future) ◦ Gender(male,female) ◦ Person (1 st, 2 nd, 3 rd ) ◦ Number (sg,pl)  Example ◦ Ladaka khana kha raha hai.  It contains present continuous tense,male, sg, and 3 rd person Vishal Vachhani38

39  Arranging word according to the language structure  Rule based module  It is priority based graph traversal Vishal Vachhani39

40 Algorithm for Syntax Planning: 1) Start traversing the UNL graph from the entry node. 2) If node has no children then add this node to final string. 3) If there is more than one child of one node then sort children based on the priority of the relations. Relation having highest priority will be traversed first. 4) Mark that node as visited node. 5) Repeat steps 3 and 4 until all the children of that node get visited. 6) If all the children of that node get visited then add that node to final string. 7) Repeat steps 2 to 4 until all the nodes get traversed. Vishal Vachhani40

41  Also, spray 5% Neemark solution. Vishal Vachhani 41 man qua mod obj spray also solution Neemarkpercent 5 obj:17 man:9 mod:5 qua:5 U-3

42 Vishal Vachhani42 spray Entry

43 Vishal Vachhani43 spray Entry objman

44 Vishal Vachhani44 spray Entry obj:17man:9

45 Vishal Vachhani45 spray Entry obj:17man:9 solution

46 Vishal Vachhani46 spray Entry obj:17man:9 solution mod

47 Vishal Vachhani47 spray Entry obj:17man:9 solution mod:5

48 Vishal Vachhani48 spray Entry obj:17man:9 solution mod:5 percent

49 Vishal Vachhani49 spray Entry obj:17man:9 solution mod:5 percent

50 Vishal Vachhani50 spray Entry obj:17man:9 solution mod:5 percent qua:5

51 Vishal Vachhani51 spray Entry obj:17man:9 solution mod:5 percent qua:5 5 Output : 5

52 Vishal Vachhani52 spray Entry obj:17man:9 solution mod:5 percent qua:5 5 Output : 5 percent

53 Vishal Vachhani53 spray Entry obj:17man:9 solution mod:5 percent qua:5 5 Neemark Output : 5 percent Neemark

54 Vishal Vachhani54 spray Entry obj:17man:9 solution mod:5 percent qua:5 5 Neemark Output : 5 percent Neemark solution

55 Vishal Vachhani55 spray Entry obj:17man:9 solution mod:5 percent qua:5 5 Neemark also Output : 5 percent Neemark Solution also

56 Vishal Vachhani56 spray Entry obj:17man:9 solution mod:5 percent qua:5 5 Neemark also Output : 5 percent Neemark Solution also spray

57 Output: 5 percent Neemark solution also spray 5 प्रतिशत नीमअर्क घोल भी छिड़क् | 5 प्रतिशत नीमअर्क घोल भी छिड़को | Vishal Vachhani57

58 Vishal Vachhani58 Input sentence: Its roots are affected by bacterial infection. ModuleOutput UNL parser जड़् प्रभावित जीवाण्विक संक्रमण् Case marking Morphology Syntax Planning जड़् प्रभावित जीवाण्विक संक्रमण् से इसकी जड़ें जीवाण्विक प्रभावित होती हैं संक्रमण से | जीवाण्विक संक्रमण से इसकी जड़ें प्रभावित होती हैं | Output: जीवाण्विक संक्रमण से इसकी जड़ें प्रभावित होती हैं | InputIts roots are affected by bacterial infection.

59  UNL 2005 Specifications: http://www.undl.org/unlsys/unl/unl2005/  S.Singh, M.Dalal, V.Vachhani, P.Bhattacharrya and O.Damani “Hindi generation from interlingua” MTsummit 2007 (www.cse.iitb.ac.in/~vishalv)www.cse.iitb.ac.in/~vishalv  Mrugank Surve, Sarvjeet Singh, Satish Kagathara, Venkatasivaramasastry K, Sunil Dubey, Gajanan Rane, Jaya Saraswati, Salil Badodekar, Akshay Iyer, Ashish Almeida, Roopali Nikam, Carolina Gallardo Perez, Pushpak Bhattacharyya, AgroExplorer Group: AgroExplorer: a Meaning Based Multilingual Search Engine, International Conference on Digital Libraries (ICDL), New Delhi, India, Feb 2004.  Agro Explorer : http://agro.mlasia.iitb.ac.inhttp://agro.mlasia.iitb.ac.in  aAQUA : http://www.aaqua.org Vishal Vachhani59


Download ppt "Vishal Vachhani CFILT and DIL, IIT Bombay CS 671 ICT For Development 19 th Sep 2008."

Similar presentations


Ads by Google