Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.

Similar presentations


Presentation on theme: "1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007."— Presentation transcript:

1 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007

2 2 Outline Information Retrieval Document Content User Behavior Markov Chains The Proposed Models Conclusion

3 3 IR Architecture IR System Query String Document corpus Ranked Documents 1. Doc1 2. Doc2 3. Doc3.

4 4 Document Content Document Content (set of words + their weights)

5 5 User Behavior Query submissions Clicks on documents Time spent reading the document Query refinements

6 6 System Modeling System Query1 Query2 Query n Ranked Documents Document Description Clicks +Time Ranks User Update

7 7 Markov Chains[5] U q S d R

8 8 Markov Chains cont’d

9 9 Inference Networks … … D1D1 D0D0 DnDn Q w1w1 w0w0 wmwm Document Layer Concept Layer Query Layer CN R

10 10 Example

11 11 Example Cont’d

12 12 Example Cont’d

13 13 Example Cont’d

14 14 POMDPs Observation: user query, clicked document by user, Time spent on the document Rewards : time spent on a document States: the concept the user is looking for Action: Ranking the documents

15 15 POMDPs cont’d q1q1 T1T1 U1U1 q0q0 U0U0 q2q2 T2T2 U2U2 aa T1T1 d1d1 TnTn dndn … UP

16 16 Example of a System Belief

17 17 Conclusion Using AI techniques eventually users (not the search engine ) rank the documents – improving any ranking algorithm Resist the effect of search engine on surviving/taking out web pages [2,3]

18 18 Experiment Setup Data – AOL User Session Collection [1] Database – MySQL, 277 MB data, 216 MB index Length – At the moment experiments on 1,500,000 clickthrough (one tenth of available clickthrough), Application in Java – So far more than 500 line of code without comments and test cases

19 19 Classes URL Query User (for the purpose of user modeling) Term IR (run class)

20 20 Class Diagrams

21 21 Data Schema aolLogTable – AnonID 1205043 – Query “public records” – QueryTime 2006-04-06 03:19:42.0 – URLRank 1 – URL http://www.searchsystems.net

22 22 Example: SearchSytems.net SearchSystems.net - The Largest Public Records Directory SearchSystems.net is the internet's largest directory of public records databases, Search for all these records public, property, Federal, State,Local,... SearchSystems.net - The Largest Public Records Directory www.searchsystems.net/ - 39k - Similar pagesSimilar pages

23 23 Example cont’d Result set for SearchSystems.net resultSet= Select a.AnonID AS AnonID, a.Query AS Query, a.QueryTime AS QueryTime, a.URLRank AS URLRank, a.URL AS URL from aolLogTable a where a.URL=“ http://www.searchsystems.net ”;

24 24 Sample Results AnonIDQueryQueryTimeRankURL 10422043germany 18502006-05-07 13:00:28.054http://www.searchsystems.net 10432858tax liens in gretna2006-05-28 14:30:04.02http://www.searchsystems.net 10434732search public records2006-05-22 21:12:41.01http://www.searchsystems.net 10559651free unclaimed propert search2006-03-28 17:10:35.03http://www.searchsystems.net 10825800free criminal offense search2006-04-06 23:15:20.01http://www.searchsystems.net 10971516public records2006-05-09 23:01:09.01http://www.searchsystems.net 11199274mentor ohio criminal records2006-05-22 19:42:12.01http://www.searchsystems.net 11412322texas public records of birth2006-04-14 10:51:14.06http://www.searchsystems.net 11412322free inmate locator2006-04-23 17:39:21.017http://www.searchsystems.net 11655138public court records bakersfield2006-04-09 15:56:52.02http://www.searchsystems.net 11752893free online public records2006-05-26 20:32:45.01http://www.searchsystems.net

25 25 Observation 1 Number of clicks for URLs increases exponentially www.microsoft.com www.searchsystems.com

26 26 Getting Query Chains resultSet= Select a.AnonID AS AnonID, a.Query AS Query, a.QueryTime AS QueryTime, a.URLRank AS URLRank, a.URL AS URL from aollogtable1 a where a.AnonID= _AnonID and a.QueryTime< _QueryTime order by a.QueryTime desc ; For the purpose of recursive calls for query chains (see next slide)

27 27 Getting Query Chains cont’d preURLsRecursive(_QueryTime) { if (resultSet) { result=resultSet.next; QueryTimePrime = resultSet.getTimestamp(); if (_QueryTime - QueryTimePrime < timeThresh) { preURLsRecursive(QueryTimePrime); return result; }

28 28 Sample of Results User has not clicked any result here 484518indiana state prison2006-03-06 13:24:22.01http://www.in.gov 484518morgan county indiana jail2006-03-06 13:27:38.01http://scican3.scican.net 484518indiana inmate locator2006-03-06 13:28:54.01http://www.in.gov 484518fugitives of indiana2006-03-06 13:37:51.01http://www.criminalwatch.com 484518indiana fugitives caught2006-03-06 13:39:12.00 484518west virgina public records wills2006-03-06 13:40:48.00 484518west virgina public records2006-03-06 13:41:11.00 484518west virginia public records2006-03-06 13:41:18.01http://www.searchsystems.net

29 29 Observation 2: Term Weights We used data logs to obtain weight of word w for URL d, R(w,d), q i s are queries in which word w occur q j s are all queries for URL d Rank(q i,d) is the rank of URL d for query q i

30 30 Observation 2 cont’d Top 40 terms for URL SearchSystems.net – county, records, court, free, public, florida, cases, michigan, germany, probate, tax, pasco, oregon, nc, indiana, deeds, sheriff, ohio, search, hanover, etowah, criminal, texas, property, warrants, databases

31 31 Next step More accurately obtain of word weights for URLs – Use of information in query chains for obtaining top term of URLs – Use of other methods? Obtain of document summaries for several URLs and evaluate the results

32 32 Thanks

33 33 Discussions and Questions Can proposed model eventually provide us a fix document content? (Does the method converge?) Any other technique which might be helpful.

34 34 References [1] Jian-Tao Sun, Dou Shen, Hua-Jun Zeng, Qiang Yang, Yuchang Lu, Zheng Chen: Web-page summarization using clickthrough data. SIGIR 2005: 194-201 [2] Alexandros Ntoulas, Junghoo Cho, Christopher Olston: What's new on the web?: the evolution of the web from a search engine perspective. WWW 2004: 1-12 [3] Junghoo Cho, Sourashis Roy: Impact of search engines on page popularity. WWW 2004: 20-29

35 35 Reference [4]G. Pass et al., "A Picture of Search" The First International Conference on Scalable Information Systems, Hong Kong, June, 2006 Copyright (2006) AOL [5]J. Lafferty, C. Zhai, “Document Language Models, Query Models, and Risk Minimization" SIGIR 2001


Download ppt "1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007."

Similar presentations


Ads by Google