Presentation is loading. Please wait.

Presentation is loading. Please wait.

INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Similar presentations


Presentation on theme: "INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University."— Presentation transcript:

1 INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University

2 Query Query is a representation of the user’s information needs Query is a representation of the user’s information needs  It may not represent the information needs exactly because  Information needs are difficult to describe -- semantic difficulty  Query must be in a format acceptable to the retrieval system -- syntactic difficulty

3 Content-based queries Words Phrases Proximity Pattern Matchingword matching Prefix/suffix Wildcard search Error handling Extended patterns BooleanVector Natural Language

4 Boolean Queries Request: Request: What are the likely problems when someone gets hurt on his knees when playing basketball? Write your best Boolean query for this request: Write your best Boolean query for this request: If the query returns zero hits, how do you modify the query? If the query returns zero hits, how do you modify the query? If the query returns too many hits, how do you modify the query? If the query returns too many hits, how do you modify the query?

5 How does AskJeeves translate the request? How does AskJeeves translate the request?  What are the likely problems when someone gets hurt on his knees when playing basketball? What are the likely problems when someone gets hurt on his knees when playing basketball? What are the likely problems when someone gets hurt on his knees when playing basketball?

6 Construct your best Boolean query for this request: Construct your best Boolean query for this request:  I am doing a research on personal space boundaries. I want to know if there are any sex or race differences in personal space boundaries.

7 Interaction with Queries Starts with a SEED query Starts with a SEED query  The System responds with a list of related terms Adds selected terms from the list to the query Adds selected terms from the list to the query  The system updates the list of related terms Repeat as needed Repeat as needed

8 Example: MedLine Search Assistant

9 Association-based Queries Find documents similar to this document. Find documents similar to this document. Find documents that links to this document Find documents that links to this document  Explicitly  Implicitly

10 Field-based Queries

11 Field-based queries will likely improve search precision. Field-based queries will likely improve search precision. Field-based queries require that the data source has a fixed structure and are indexed by the structure. Field-based queries require that the data source has a fixed structure and are indexed by the structure.

12 Citation-based Queries Retrieve all documents that document A cites. Retrieve all documents that document A cites. Find all documents that cite document A. Find all documents that cite document A. Find all documents that cite this author Find all documents that cite this author Find all document that cite both document A and document B Find all document that cite both document A and document B Find documents that cites both author A and author B Find documents that cites both author A and author B

13 Co-Citation The college has more than 20 years tradition on Co-citation research. The college has more than 20 years tradition on Co-citation research. Co-citation is the mentioning of any two earlier documents in the bibliographic references of a later third document. Co-citation is the mentioning of any two earlier documents in the bibliographic references of a later third document. Later Document 3 Document 1 cites Document 2 cites ?

14 Co-Citation Analysis The count of mentions may grow over time as new writings appear. Thus, co- citation counts can reflect citers’ changing perceptions of documents as more or less strongly related. The count of mentions may grow over time as new writings appear. Thus, co- citation counts can reflect citers’ changing perceptions of documents as more or less strongly related. Documents shown to be related by their co-citation counts can be mapped as proximate in intellectual space. Documents shown to be related by their co-citation counts can be mapped as proximate in intellectual space.

15 Co-Citation Mapping Detects patterns in the frequency with which any works by any two authors are jointly cited in later works. Detects patterns in the frequency with which any works by any two authors are jointly cited in later works. Only recurrent co-citation is significant: The more times authors are cited together, the more strongly related they are in the eyes of citers. Only recurrent co-citation is significant: The more times authors are cited together, the more strongly related they are in the eyes of citers.

16 A Map of Information Scientists

17 AuthorLinks

18 Link-Based Queries Hypertext Structure Hypertext Structure  Is a link a query?  http://www.google.com/search?hl=en&q=i nformation+retrieval  This is called query-mediated link.  It is also called “soft link.”  Is a query a link?  Many pages are dynamically generated from a database or a search engine. Your review pagesYour review pagesYour review pagesYour review pages

19 Queries, Links, Is there a difference – SIGCHI’97 An experiment was conducted to compare browsing behavior in query- and link- based interfaces. Results suggest that query-mediated links are as effective as explicit queries, and that strategies adopted by users affect performance. This work has implications for the design of information exploration interfaces. An experiment was conducted to compare browsing behavior in query- and link- based interfaces. Results suggest that query-mediated links are as effective as explicit queries, and that strategies adopted by users affect performance. This work has implications for the design of information exploration interfaces.

20 Query Structure Hierarchical Structure Hierarchical Structure  What does the user want when searching for “substance abuse”  We may not know, but adding narrower terms of “substance abuse” will likely get better results  Alcohol Abuse;  Drug Abuse;  Alcohol-Related Disorders  Amphetamine-Related Disorders  Cocaine-Related Disorders  Marijuana Abuse

21 Automatic Expansion If there is a defined hierarchy, several search strategies may be defined to expand the query: If there is a defined hierarchy, several search strategies may be defined to expand the query:  Search with the query term only  Search with the query term and all the terms in its upper hierarchy  Search with the query term and all the terms in its lower hierarchy.  Search with the query terms and its all the sibling terms

22

23 Query Operations Query execution Query execution Query expansion Query expansion Query translation Query translation

24 Query Expansion Improve the initial query through automatically Improve the initial query through automatically  restructuring the query or  adding other new terms or  Adjusting weights of each terms.

25 Restructuring the query: Restructuring the query:  Identify key concepts through natural language processing  Identify any field information that may be contained in the query  Is this an author?  Is this a journal?  Reverse term orders in the query

26 Adding new terms: Adding new terms:  Synonyms  Hierarchical terms  Scope terms  Does query “Football” retrieve information on football or on soccer?  Relevant terms  Selected terms from relevant documents  Terms co-occur most often with the query terms

27 Adjusting term weighting Adjusting term weighting  If relevant documents are known, increase the weights for terms assigned to the relevant documents and decrease the weights to terms assigned to non-relevant documents. Adjust term weights in a topic tree: Adjust term weights in a topic tree:  Fruit  Fruit, 0.9 ; apple, 0.7; orange, 0.7; banana, 0.6; ….; Macintosh, 0.1; Computer -.4.

28 Query Translation From natural language to queries From natural language to queries  AskJeeves From queries in one system to queries in another system From queries in one system to queries in another system From one natural language to another natural language From one natural language to another natural language  Altavista

29 Other types of representation for user’s needs? Mind-reading? Mind-reading? Non-text queries? Non-text queries? Gesture/motion? Gesture/motion?

30 IBM – Visualization Space This information system understands the user. It "hears" users' voice commands and "sees"their gestures and body positions. Interactions are natural, more like human-to-human interactions.

31 Multimedia Queries Content-based Content-based  Text indexing Attribute-based Attribute-based  Color, size, type, time period, … Structure-based Structure-based  Location, shape, layout, etc. Cluster-based Cluster-based  Semantic groups, physical groups, structure-groups, Example: find a photo that has the White House in the center. Example: find a photo that has the White House in the center.

32 Project Discussion Idea 1: Install and implement an IR system Idea 1: Install and implement an IR system  Focus on system and technology  Need to have a collection  Need to have hand-on experience with systems Idea 2: Conduct an evaluation experiment on one or two selected IR systems Idea 2: Conduct an evaluation experiment on one or two selected IR systems  Focus on interfaces and users Idea 3: Customize an IR system Idea 3: Customize an IR system  Focus on functionality and customization

33 Project Evaluation Topics Topics  Relevance  Problems identified  Technical difficulties  Solutions/ideas The process The process  Design  Implementation

34 The report The report  Background  Written  Oral

35 Midterm Concepts Concepts  What is information retrieval?  Data, information, text, and documents  Two abstractions principles  User’s information needs  Queries and query formats  Precision and Recall  Relevance

36 Midterm Procedures & problem solving Procedures & problem solving  How to translate a request into a query?  How to expand queries  for better recall or better precision?  How to create an inverted indexing?  How to create a vector space ?  How to calculate similarities of documents?  How to match a query to documents in a vector space?

37 Discussions Discussions  Challenges of IR  Advantages and disadvantages of Boolean search (vector space, automatic indexing, association-based queries, etc.)  Evaluation of IR systems  With or without using precision/recall.  Difference between data retrieval and information retrieval


Download ppt "INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University."

Similar presentations


Ads by Google