Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information retrieval wed sept 02 2015 data…. -start at 6.45.

Similar presentations


Presentation on theme: "Information retrieval wed sept 02 2015 data…. -start at 6.45."— Presentation transcript:

1 information retrieval wed sept 02 2015 data…

2 -start at 6.45

3 framework for today’s lecture… data organizing data retrieving data tools supporting the process

4 Structured Data information with a high degree of organization easy to put into a relational database search is simple and straightforward Unstructured data essentially the opposite of structured data natural language / free text

5 STRUCTURED vs unstructured data easy to envision structured data in terms of “tables” 5 EmployeeManagerSalary SmithJones 68000 ChangSmith 65000 50000IvySmith Typically allows numerical range and exact match (for text) queries, e.g., Salary < 60000 AND Manager = Smith.

6 Relational Databases Structured data Designed to provide search results with exact answers Queries built on schema of structured fields Lack of ranking mechanism (initially) We know the schema in advance, so semantic correlation between queries and data is clear We can get exact answers Information Retrieval Systems

7 tables in a MS Access relational database – defines each defining a social networking site

8 Data entry form in a MS Access relational database – create each record

9

10 Structured Data information with a high degree of organization easy to put into a relational database search is simple and straightforward Unstructured data essentially the opposite of structured data natural language / free text

11 typically refers to free text email is a good example of unstructured data. it's indexed by date, time, sender, recipient, and subject, but the body of an email remains unstructured other examples of unstructured data include books, documents, medical records, and social media posts structured vs UNSTRUCTURED data

12 magazine article is an example of unstructured data

13 Relational Databases Information Retrieval Systems Unstructured / semi- structured data Designed to support unstructured natural language full text search Ranking mechanism is very important – results must be sorted by relevance in order to satisfy user’s information need We get inexact, estimated answers

14 Document collection (corpus) Index Query Representation function Matching function Results CATEGORIES SUBJECT HEADINGS

15

16 KWIC Key word in context

17 KWIC Key word in context

18 metadata

19 What is Metadata? Classic definition: data about data Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. (NISO) 3 primary “types”: – Descriptive – Structural – Administrative (rights management, preservation)

20

21 digital forensics

22 This reading really made me think about how easily accessible and organized information is today because of the implementation of metadata. It sparked a few questions: Without metadata, how would accessing data, resources and information be different in today’s society? -Chris

23 http://search.lib.unc.edu/search?R=UNC b7097376 More Metadata: A Cataloging Record

24 The Idea of Facets Facets are a way of labeling data – A kind of Metadata (data about data) – Can be thought of as properties of items Facets vs. Categories – Items are placed INTO a category system – Multiple facet labels are ASSIGNED TO items

25 Facets Epicurious example http://www.epicurious.com/ http://www.epicurious.com/ Create INDEPENDENT categories (facets) – Each facet has labels (sometimes arranged in a hierarchy) Assign labels from the facets to every item – Example: recipe collection Course Main Course Cooking Method Stir-fry Cuisine Thai Ingredient Bell Pepper Curry Chicken

26 The Idea of Facets Break out all the important concepts into their own facets Sometimes the facets are hierarchical – Assign labels to items from any level of the hierarchy Preparation Method Fry Saute Boil Bake Broil Freeze Desserts Cakes Cookies Dairy Ice Cream Sorbet Flan Fruits Cherries Berries Blueberries Strawberries Bananas Pineapple

27 Using Facets Now there are multiple ways to get to each item Preparation Method Fry Saute Boil Bake Broil Freeze Desserts Cakes Cookies Dairy Ice Cream Sherbet Flan Fruits Cherries Berries Blueberries Strawberries Bananas Pineapple Fruit > Pineapple Dessert > Cake Preparation > Bake Dessert > Dairy > Sherbet Fruit > Berries > Strawberries Preparation > Freeze

28 labor intensive? expensive?

29 UNC Libraries Online Catalog http://www.lib.unc.edu/ e.g. personal crisis

30 caveat: semi-structured data in fact almost no data is absolutely “unstructured” e.g., this slide has distinctly identified zones such as the title and bullets facilitates “semi-structured” search such as – title contains data and bullets contain structure

31 Let’s look at a database of magazine & journal articles… …Academic Search Complete >> UNC Libraries Homepage: http://www.lib.unc.edu/http://www.lib.unc.edu/ >> E-Research by Discipline >> Frequently Used >> Academic Search Premier [off-campus log in with onyen/password]

32 Organization / Search We organize to enable retrieval The more effort we put into organizing information, the more effectively it can be retrieved The more effort we put into retrieving information, the less it needs to be organized first We need to think in terms of investment, allocation of costs and benefits between the organizer and retriever The allocation differs according to the relationship between them; who does the work and who gets the benefit?


Download ppt "Information retrieval wed sept 02 2015 data…. -start at 6.45."

Similar presentations


Ads by Google