Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presented by: AKHIL GADA CSCI 572 University of Southern California Full Text Indexing Based On Lexical Relations An Application :Software Library by YS.

Similar presentations


Presentation on theme: "Presented by: AKHIL GADA CSCI 572 University of Southern California Full Text Indexing Based On Lexical Relations An Application :Software Library by YS."— Presentation transcript:

1 Presented by: AKHIL GADA CSCI 572 University of Southern California Full Text Indexing Based On Lexical Relations An Application :Software Library by YS Maarek and F.A. Smadja

2 July 15 th, 2010 1 Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries REQUIREMENT FOR SEARCH IN SOFTWARE LIBRARY  SEARCH FOR FUNCTIONALLY SIMILAR COMPONENTS  E.g. Yahoo Search API and Google Search API for query “I want to search pages”

3 2 Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15 th, 2010 A.I. OR Knowledge Base Approach I.R. OR Free Text Based Approach ENTER DOMAIN KNOWLEDGE NO PRIOR KNOWLEDGE REQUIRED MANUAL OR SEMI-AUTOMATIC COMPLETELY AUTOMATIC SPECIFIC AND DIFFICULT TO SCALE TO NEW DOMAIN GENERIC AND VERRY EASY TO SCALE TO NEW DOMAIN SEMANTIC UNDERSTANDING OF DOCUMENTS NO SEMANTIC UNDERSTANDING OF DOCUMENTS

4 3 Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15 th, 2010 SINGLE KEYWORDLEXICAL RELATION CONTEXT INFORMATION IS LOST E.g. Apple Fruit VS Apple Computers REVEALS CONTEXT INFORMATION HIGH FREQUENCY GENERIC TERMS MIGHT INTRODUCE NOISE. E.g. Word “File” in UNIX manual does not characterize the functionality of any command HIGH FREQUENCY OF LEXICAL TERM PROVIDES HIGH FUNCTIONAL INFORMATION OF DOCUMENT E.g. Word “Copy File” in UNIX VS

5 4 Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15 th, 2010 LINEAR IR USING INVERTED INDEX CLUSTERING IR USING HAC(Hierarchical Agglomerative Clustering)

6 5 Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15 th, 2010 LEXICAL RELATIONS  TWO WORDS IN A SENTENCE HAVING SYNTACTIC RELATIONSHIP BETWEEN THEM : Subject-Verb, Verb-Direct object, Verb-Indirect object, etc  OPEN CLASS WORD – NOUNS,ADJECTIVE,ADVERBS ARE MEANING BEARING.  CLOSED CLASS WORD – Conjunctions (and, or), Articles (the, a), Demonstratives (this, that), and Prepositions (to, from, at, with). Does not convey any Meaning to sentenceConjunctionsArticles DemonstrativesPrepositions

7 6 Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15 th, 2010 5 – Word Window EXTRACT [1] LEXICAL RELATIONS ALGO.[2] W1 W2 W3 W4 W5

8 7 Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15 th, 2010 EXTRACT [1] LEXICAL RELATIONS ALGO. [2]

9 8 Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15 th, 2010 EXTRACT [1] LEXICAL RELATIONS ALGO. [2]

10 9 Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15 th, 2010 RESOLVING POWER OUTPUT FROM EXTRACT [1] ALGORITHM. [0]

11 10 Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15 th, 2010 SELECT TOP N INFORMATIVE (RESOLVING POWER)LEXICAL RELATION FOR EACH DOCUMENT FORMING PROFILE FOR THE DOCUMENT. CREATE INVERTED INDEX. [2]

12 11 Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15 th, 2010 SIMILARITY MEASURE BETWEEN TWO DOCUMENTS [2] LET X = set of top N resolving power lexical relations for document dx Y = set of top N resolving power lexical relations for document dy (X ∩ Y) = Set of Lexical Relations Common Between dx and dy dx dy ∂(dx,dy) ∂(dx,dy)= ∑ Vi€(X ∩ Y) (Pi(dx)*Pi(dy)) Where: Pi(dx) and Pi(dy) =Resolving Power Of lexical Relation - i w.r.t. document dx and dy respectively ∂(dx,dy)= ∑ Vi€(X ∩ Y) (Pi(dx)*Pi(dy)) Where: Pi(dx) and Pi(dy) =Resolving Power Of lexical Relation - i w.r.t. document dx and dy respectively

13 12 Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15 th, 2010 CLUSTER SIMILAR FUNCTIONAL COMPONENTS USING HIERARCHICAL AGGLOMERATIVE CLUSTERING [2] {d1} ∂({d1},{d2}) ∂({d3},{d4}) {d2} {d3} {d4} {d5} ∂({d3,d4},{d5})

14 13 Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15 th, 2010 INFORMATION RETRIEVAL[2] USER SPECIFY FREE TEXT QUERY SEARCH AND RETURN RESULTS - LINEAR I.R. USING INVERTED INDEX USER SATISFIED ?? ALLOW USER TO TRAVERSE THROUGH CLUSTERED HIERARCHY NO

15 14 Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15 th, 2010 LINEAR INFORMATION RETRIEVAL [2] dq d1 ∂(dq,d2) ∂(dq,d1) ∂(dq,dn) d2 dn

16 15 Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15 th, 2010 GURU : WORKING SYSTEM SNAPSHOT [2]

17 16 Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15 th, 2010 EVALUATION [2] MAINTENANCE COST : INCREMENTAL INSERTION [3] OF NEW COMPONENTS IS EASY EFFICIENCY: 2.5 secs on RT ;0.15 secs on IBM RISC for query containing 5 to 15 Lexical Relation RETRIEVAL EFFECTIVENESS : Contd…

18 17 Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15 th, 2010 EVALUATION Precision-Recall Curve[ 2] If c = Total number of records retrieved after executing query q R= Total Number of expected correct result - Determined before query is executed. r = Total number of correct result retrieved after executing query q. Then Recall = r/R Prescision= r/c

19 18 Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15 th, 2010 PROS:  EASY TO EXTEND TO ANY DOMAIN i.e. GENERIC APPROACH  VERY SIMPLE AND ELEGANT APPROACH  PAPER ADEQUATELY PROVIDED BACKGROUND BY DESCRIBING PAST RESEARCH

20 19 Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15 th, 2010 CONS:  May fail in following case  E.g. ‘xcalc’ and ‘bc’

21 20 Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15 th, 2010 FURTHER RESEARCH:  COMBINE KNOWLEDGE BASE APPROACH WITH THIS TECHNIQUE e.g. Knowledge bc=calculator can be added to GURU to increase recall.  IMPROVED ALGORITHMS FOR INCREMENTAL UPDATION OF INDICES.

22 21 Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15 th, 2010 References 0 - Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries by Yoelle S. Maarek, Frank A Smadja 1 - F. De Saussure, Cours de Linguistique Geaerale, Qualridme edition. Librairie Payot, Paris, France, 1949. 2 – GURU-Information Retrieval For Reuse - Y S. Maarek,Deniel M Berry,Gail E. Kaiser. 3 - Kaplan and Maarek, 1990: Incremental maintenance of semantic links in dynamically changing hypertext systems.Interacting with Computers

23 22 Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15 th, 2010 Q & A


Download ppt "Presented by: AKHIL GADA CSCI 572 University of Southern California Full Text Indexing Based On Lexical Relations An Application :Software Library by YS."

Similar presentations


Ads by Google