Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 1 of 24 CSA1013 Historical Perspectives of Dr. Christopher.

Similar presentations


Presentation on theme: "University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 1 of 24 CSA1013 Historical Perspectives of Dr. Christopher."— Presentation transcript:

1 University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 1 of 24 cstaff@cs.um.edu.mt CSA1013 Historical Perspectives of Dr. Christopher Staff Department of Computer Science & AI University of Malta Information Search and Retrieval

2 University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 2 of 24 cstaff@cs.um.edu.mt Aims and Objectives What is Information Search and Retrieval? What’s the “state-of-the-art”? How did we get here? What are the issues? Where are we likely to go next?

3 University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 3 of 24 cstaff@cs.um.edu.mt What’s Information Search and Retrieval? What’s information? –Structured vs. unstructured Where is it? Question answering vs. Information lack or information need

4 University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 4 of 24 cstaff@cs.um.edu.mt What’s the “state-of-the-art”? Information Retrieval in the “real” world –Web-based search engines Google, AllTheWeb, AltaVista, etc. Web directories –Yahoo, Excite, etc.

5 University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 5 of 24 cstaff@cs.um.edu.mt What’s the “state-of-the-art”? Google, and Google-like search engines –Index > 24 billion web pages ( pdf, doc, html, … ) –User expresses “Query” terms, natural language query, etc –System “compares” query to indexed documents –Returns “list” of “relevant” documents

6 University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 6 of 24 cstaff@cs.um.edu.mt What’s the “state-of-the-art”? Recent study by Jansen & Spink [Jansen] shows: –|Query| = 2.14 terms [Spink] –Queries with 1 term = 53%! –54% of users are satisfied with first page of results (list of 10 documents) –80% of users view not more than 10 - 20 results –27.6% read only one document! –66% read < 5 documents

7 University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 7 of 24 cstaff@cs.um.edu.mt Has life always been this good? It would seem that we’re living in information heaven Any info we seek is just a couple of query terms away In reality, although majority of queries appear to be “trivial”, the reality is quite different

8 University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 8 of 24 cstaff@cs.um.edu.mt Has life always been this good? What if we want to find all relevant information? (“The Invisible Web”) What if we want to find something that is difficult to describe? What if we don’t know what we’re looking for? –What tools do we use to find info in encyclopaedias, dictionaries, newspapers, reference manuals, novels and other books?

9 University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 9 of 24 cstaff@cs.um.edu.mt Here beginneth the history lesson… People have devised tools to find information again ever since we learnt to write things down… Think of information stored on your personal computers… how do you find something that you wrote last month, last year?

10 University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 10 of 24 cstaff@cs.um.edu.mt Prehistory! Well, nearly! Early writings –Papyrus scrolls –No paragraph, page numbers, etc –Couldn’t “scroll to the end” to read an index –Instead, Greek/Roman libraries used “sillybus”/“index” of title

11 University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 11 of 24 cstaff@cs.um.edu.mt Greeks/Romans 3BC, Greeks probably use alphabetization in Library of Alexandria Around 2BC (Rome), evidence of hierarchies of information/classification systems –Greeks probably earlier Also, Tables of Contents date from around 2BC (Pliny the Elder reports before 79AD)

12 University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 12 of 24 cstaff@cs.um.edu.mt Printing Press Not much else was to happen until 1455, with the advent of the printing press Previously, still difficult to refer to information “within” a book, because copies were inaccurate –Info on one page in one book could be on a different page in other copies

13 University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 13 of 24 cstaff@cs.um.edu.mt Indices and the Printing Press Still, alphabetization was on initial letter, then on first four letters… Not until 18th Century did full alphabetization occur!

14 University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 14 of 24 cstaff@cs.um.edu.mt The Second World War and beyond In 1945, Vannevar Bush publishes “As We May Think” in the Atlantic Monthly In 1949, Warren Weaver writes that if Chinese is English + codification, then Machine Translation should be possible These give rise to “intelligent” and “statistical” (or surface-based) approaches to Information Search and Retrieval respectively (amongst other things :- ))

15 University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 15 of 24 cstaff@cs.um.edu.mt Intelligent vs. Surface-based “Concepts” 1950’s Lay in waiting for years, because hardware/software not around “Words” 1950’s First approaches were “Key Words in Context” (KWIC)

16 University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 16 of 24 cstaff@cs.um.edu.mt Intelligent vs. Surface-based 1960’s Generality in AI (John McCarthy) 1960’s Boolean Search Measures of performance effectiveness Thesaural Lookup Vector Space Model

17 University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 17 of 24 cstaff@cs.um.edu.mt Intelligent vs. Surface-based 1970’s Expert Systems Still about “understanding” information and reasoning with and about it 1970’s Explosion in availability of electronic text collections Library Retrieval Systems Full-text indexing Probabilistic IR Relevance Feedback

18 University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 18 of 24 cstaff@cs.um.edu.mt Intelligent vs. Surface-based 1980’s Conceptual IR Knowledge Rep Langs Lenat’s CYC Contextual Reasoning 5th Generation Computing, Japan LSI feeds Statistical IR 1980’s OPACs IR used by non- specialists Extended Boolean IR Word Sense Disambiguation Statistical IR (LSI, etc) Internet

19 University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 19 of 24 cstaff@cs.um.edu.mt Intelligent vs. Surface-based 1990’s Better language processing information extraction entity name recognition Advances in contextual reasoning, ontologies 1990’s WWW (1995 c. 10M pages, 2003 c. 3B!) Multimedia Indexing & Retrieval Web-based search engines

20 University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 20 of 24 cstaff@cs.um.edu.mt Intelligent vs. Surface-based 2000’s Semantic Web 2000’s Faster processors More memory Cheaper storage space More superficial comparisons

21 University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 21 of 24 cstaff@cs.um.edu.mt Intelligent vs. Surface-based The future Computers that can find precisely the information you seek –Even if the answer is non-obvious –Or the answer needs to be the result of reasoning MyLifeBits The future Computers that can approximate the information you seek –At much less cost –At the expense of “correctness” MyLifeBits

22 University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 22 of 24 cstaff@cs.um.edu.mt

23 University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 23 of 24 cstaff@cs.um.edu.mt Main Issues Architecture to handle ever increasing numbers of docs + efficient data structures Freshness, indexing and retrieval speed (Efficient algorithms) What is “relevance”? (Better, cheaper and more accurate algorithms to understand what the user really wants)

24 University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 24 of 24 cstaff@cs.um.edu.mt Main References Paijmans, J.J., last updated 2004, “The Retrieval of Information from historical perspective”, http://pi0959.kub.nl/Paai/Onderw/V- I/Content/history.htmlhttp://pi0959.kub.nl/Paai/Onderw/V- I/Content/history.html American Society of Indexers, last updated 2005, “How Information Retrieval Started”, http://www.asindexing.org/site/history.shtmlhttp://www.asindexing.org/site/history.shtml [Jansen] Jansen, B.J., and Spink, A., 2003, ‘An Analysis of Web Documents Retrieved and Viewed’, in Proceedings of the 4th International Conference on Internet Computing, Las Vegas, Nevada, 23-26 June 2003. http://ist.psu.edu/faculty_pages/jjansen/academic/pubs/pages_viewed.p df http://ist.psu.edu/faculty_pages/jjansen/academic/pubs/pages_viewed.p df [Spink] Spink, A., et. al., 2001, ‘ Searching the Web: The Public and their Queries ’, in JASIST 2001. http://jimjansen.tripod.com/academic/pubs/jasist2001/jasist2001.html


Download ppt "University of Malta CSA1013:Information Search and Retrieval © 2003- Chris Staff 1 of 24 CSA1013 Historical Perspectives of Dr. Christopher."

Similar presentations


Ads by Google