Presentation on theme: "Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh www.Gelbukh.com."— Presentation transcript:
Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh www.Gelbukh.com
2 Motivation Info: representation, storage, organization, access Search Engines (IR systems) User information need oPlain English description query First for libraries, but now WWW!!! Modern IR: omodeling oclassification, categorization, filtering osystem architecture ouser interfaces, visualization, query languages
3 Data vs. Information Retrieval Data Retrieval Precise description Well-structured data Precise results Yes-or-no results Science Information Retrieval Vague information need Natural Language, images,... Semantic interpretation Approximate results Relevance ranking Art!
4 Basic Concepts User task (search) oCan formulate what they need: Retrieval (classical) oCant (or does not know): Browsing (new to IR) Still not very well integrated oFiltering (user passive, contents active) Logical view of docs o... (Added linguistic info) oFull text oText operations: reduce complexity to index terms Keywords, stopwords Stemming, noun groups. Linguistic processing! oCategories Slow, good Fast, bad
5 Past, Present, and Future Since clay tablets oAlphabetical index (formal) oTable of Contents (by order) oClassifications (by meaning) Libraries oAutomation of classical techniques. Catalogs. oSearch by fields (author, title, keywords) Web. Digital Libraries: interactive oCheaper huge amount of data oNetworks remote access, wider audience oFree publishing unprepared, heterogeneous data Artificial Intelligence and Linguistic methods
6 Main concerns Open audience oHelp people to formulate their information need oImprove retrieval quality. Intelligent methods Efficiency (speed) oDevelopment of fast techniques Interaction oWatch user behavior to improve quality oPrivacy! Open content oLegal issues. Copyright. Responsibility for info quality oIntelligent methods
7 Retrieval process Database oDefine the logical view: text operations, text model Index (e.g., inverted file) User query oQuery operations (users are not good at this!) Retrieved docs oRanked by likelihood (relevance) Feedback cycle
11 Chapters: Text IR Models and Evaluation oModeling (basic concepts) oRetrieval Evaluation Improvements on Retrieval oQuery Languages oQuery Operations oText Languages and Properties oText Operations Efficiency oIndexing and Searching
12 Chapters: Interfaces, Applications Interfaces oUser Interfaces and Visualization Applications oSearching the Web oLibraries and Bibliographical Systems oDigital Libraries
13 Books web page sunsite.dcc.uchile.cl/irbook/ Errata Test data Other courses, papers, and a lot more Korean version is NOT recommended. Read in English!
14 Conferences General conferences on text processing oACL oCOLING oCICLing oDEXA (databases) oNLDB Confs on IR oACM SIGIR oTREC oSPIRE
15 Conclusions User Information Need oVague oSemantic, not formal Document Relevance oOrder, not retrieve Huge amount of information oEfficiency concerns oTradeoffs Art more than science