Presentation is loading. Please wait.

Presentation is loading. Please wait.

Real World IR Challenges (CS598-CXZ Advanced Topics in IR Presentation) Jan. 20, 2005 ChengXiang Zhai Department of Computer Science University of Illinois,

Similar presentations


Presentation on theme: "Real World IR Challenges (CS598-CXZ Advanced Topics in IR Presentation) Jan. 20, 2005 ChengXiang Zhai Department of Computer Science University of Illinois,"— Presentation transcript:

1 Real World IR Challenges (CS598-CXZ Advanced Topics in IR Presentation) Jan. 20, 2005 ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign

2 There are many research problems to work on. It’s more beneficial to the society if we work on problems that reflect real world challenges…

3 What is a Good Research Problem? A good research problem is a solvable challenge that is well connected to a real world need/problem Real word challenges vs. imaginary challenges –Not all challenges are interesting (to the society) –Real world challenges are always interesting to work on –Imaginary challenges may (happen to) be interesting –Spend your effort to solve interesting challenges so that you’ll make more contributions to the society However, not all real world problems are challenges; some are straightforward to solve Not all challenges/problems are solvable (with limited resources, time, money, tools, etc)

4 Real Word vs. Imaginary Challenges Real World Needs/Problems Challenges Imaginary Needs/Problems Real world challenges

5 Identify a Good Research Problem Level of Challenges Impact/Usefulness Known Unknown Good applications Not interesting for research High impact Low risk (easy) Good short-term research problems High impact High risk (hard) Good long-term research problems Low impact Difficult Maybe publishable, but not good research problems Low impact Low risk Bad research problems (May/May not be publishable)

6 Three Basic Questions to Ask for an IR Problem Who are the users? –Everyone vs. Small group of people What data do we have? –Web (whole web vs. sub-web) –Email (public email vs. personal email) –Literature (general vs. special discipline) What functions do we want to support? –Information access vs. knowledge acquisition –Decision and task support Everyone (who has an Internet connection) The whole web (indexed by Google) Search (by keywords)

7 Map of IR Applications Web pages News articles Email messages Literature Organization docs Legal docs/Patents Medical records Customer complaint letter/transcripts … Kids UIUC community LawyersScientists SearchBrowsingAlertMining Task/Decision support Customer Service People Email management + automatic reply “Google Kids” Legal Info Systems Literature Assistant Intranet Search Local Web Service

8 High-Level Challenges in IR How to make use of imperfect IR techniques to do something useful? –Save human labor (e.g., partially automate a task) –Create “add on” value (e.g., literature alert) – A lot of HCI issues (e.g., allowing users to control) How to develop robust, effective, and efficient methods for a particular application? –Methods need to “work all the time” without failure –Methods need to be accurate enough to be useful –Methods need to be efficient enough to be useful

9 Challenge 1: From Search to Information Access Search is only one way to access information Browsing and recommendation are two other ways How can we effectively combine these three ways to provided integrated information access? E.g., artificially linking search results with additional hyperlinks, “literature pop- ups”…

10 Challenge 2: From Information Access to Task Support The purpose of accessing information is often to perform some tasks How can we go beyond information access to support a user at the task level? E.g., automatic/semi-automatic email reply for customer service, literature information service for paper writing (suggest relevant citations, term definitions, etc)

11 Challenge 3: Support Whole Life Cycle of Information A life cycle of information consists of “creation”, “storage”, “transformation”, “consumption”, “recycling”, etc Most existing applications support one stage (e.g., search supports “consumption”) How can we support the whole life cycle in an integrated way? E.g., Community publication/subscription service (no need for crawling, user profiling)

12 Challenge 4: Collaborative Information Management Users (especially similar users) often have similar information need Users who have explored the information space can share their experiences with other users How to exploit the collective expertise of users and allow users to help each other? E.g., allowing “information annotation” on the Web (“footprints”), collaborative filtering/retrieval,

13 IR Problems Around Us (Web) Finding information about our alumni (motivated by Siebel), more generally, targeted crawling Paper filter (Can we filter out non-research pages in Google’s results?), more generally, a user-end filter How to better design our department website? (Currently, it’s running Google; can we do better for searching our department website?) Course information integration (Can we automatically generate a virtual Machine Learning course website that serves as a portal to all course information related to machine learning?) UIUC Yellow Pages & White Pages, more generally, can we automatically generate such directories for any website? (Web site summarization?) …

14 IR Problems Around Us (Email) How to recognize and block spams? How to better manage my personal email (thread-based organization, appointment extraction, reply-assistant) How to better manage our newsgroups? How to help the TSG group to increase their productivity? (e.g., automatic generation of FAQs from an email archive, suggest related answers to a question) …

15 IR Problems Around Us (Literature) How can we build a literature recommender/alert system? Can we mine the CS literature to discover “what’s hot in CS?” Can we discover emerging interdisciplinary topics between DAIS area and network area from literature? Can we automatically recognize survey/review papers and collect all surveys about a topic? …

16 Plan for the Next 3 Classes Goals: –Move from real world problems to research topics and further to specific research questions –Identify interesting research topics/questions for the 3 domains Class format: –Brainstorming: Everyone will bring in at least one research topic –Discussions/debates on topics –Select topics to cover in the course

17 Assignment For each of the 3 domains (Web, Email, Literature), every one identifies at least one interesting real world challenge about text information management; the more the better If you can’t think of one –Surf on the web and see what problems are being addressed –Ask yourself, what kind of information management tool do I wish to have, but doesn’t already exist? –Ask yourself, what features/capabilities do you wish Google to have? –… –Randomly combine some IR function with a group of users and some data For each challenge, identify –Who are the users? (Who will benefit from solving this challenge?) –What are the data involved in the challenge? –What kind of function(s) will be developed? (What is exactly the challenge?) Write one small paragraph for each problem to state clearly what the challenge is and argue why it is an interesting problem to solve. Email all your paragraphs and your domain preferences to me by next Monday night (11:59pm, Jan 24)


Download ppt "Real World IR Challenges (CS598-CXZ Advanced Topics in IR Presentation) Jan. 20, 2005 ChengXiang Zhai Department of Computer Science University of Illinois,"

Similar presentations


Ads by Google