LIS 7450, Searching Electronic Databases Basic: Database Structure & Database Construction Dialog: Database Construction for Dialog (FYI) Deborah A. Torres.

Slides:



Advertisements
Similar presentations
In the Format section, we have activated the Bibliographic style drop down menu. From this page, you can choose a specific journal or format (e.g. BMC.
Advertisements

Search Strategy and Information Retrieval By Rekha Gupta, NIC
Chapter 5: Introduction to Information Retrieval
UCC Library Introduction to. Topics  What is ARTstor?  Why is it now available, and why is it useful?  Techniques for searching databases  Browsing.
Compiled by Helene van der Sandt. Is a search engine that searches for scholarly literature Can search across many disciplines Searches for articles,
Properties of Text CS336 Lecture 3:. 2 Generating Document Representations Want to automatically generate with little human intervention Use significant.
Periodicals BooksNewspapers Reference tools Online Databases Printed Version Electronic Version Annual reports and other publications.
Comp 335 File Structures Indexes. The Search for Information When searching for information, the information desired is usually associated with a key.
PubMed and its search options Jan Emmerich, Sonja Jacobi, Kerstin Müller (5th Semester Library Management)
April 2001Division of Library Services IDEAL® is a collection of full text journal titles. Includes 173 journal titles from Academic Press. Abstracts and.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Information Retrieval in Practice
Search Strategies Online Search Techniques. Universal Search Techniques Precision- getting results that are relevant, “on topic.” Recall- getting all.
WMES3103 : INFORMATION RETRIEVAL
EndNote Short Example. About EndNote A bibliographic or citation management tool Allows you to store citations in a personal database Citations can then.
Information & Library Services SwetsWise User Guide Emma Crowley Senior Academic Services Librarian
What is a document? Information need: From where did the metaphor, doing X is like “herding cats”, arise? quotation? “Managing senior programmers is like.
Using ProQuest Databases Jackson Community College Atkinson Library.
Using the ERIC Database This tutorial will show you how to access ERIC which contains citations, abstracts and some full-text materials from journals and.
Reference Manager Making your life easier! Updated September 2007.
ELSEVIER SCIENCE ( LIFE SCIENCE-CURRENT OPINIONS, TRENDS, FUELLRESS )
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Guide no. 49 Ten tips on how to search EBSCO databases Tutorial.
Using Endnote Tiffany M. Bludau September 5, 2007.
InfoTrac Expanded Academic York College Library Information Literacy Series.
Microsoft Access 2000 Creating Tables and Relationships.
Indexes/Abstracts Ready Reference Dr. Dania Bilal IS 530 Spring 2002.
ISI Web of Knowledge Service for UK Education
1 DATABASES By: Hanna Ben-Or Phone: October 2011.
By Mackenzie Sims (Note: this presentation was created by one of my students. I will be adding audio later, but I thought it was worthy of inclusion)
Basic Catalog Searching Rich Edwards Innovative Coordinator Washington State Library.
EBSCOhost Databases Access to bibliographic and/or full-text resources from more than 30 online databases such ATLASerials, Academic Search Premier, Education.
The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice,
LIS618 lecture 4 Thomas Krichel Structure of talk Before online searching Introduction to online searching Introduction to DIALOG –Overview.
Current Events and Issues Using Index Databases for Finding Answers.
LIS618 lecture 8 Credo and Gale Thomas Krichel
Chapter 6: Information Retrieval and Web Search
LIS618 lecture 3 Thomas Krichel Structure of talk The blue sheet Working with Dialog Nexis.com.
The physical parts of a computer are called hardware.
Web of Science: Citation Indexes on the Web Gary Wiggins 9/29/2004.
UoS Libraries 2011 EndNote X5 - basic graduate session.
Comparing Frequency of Content- Bearing Words in Abstracts and Texts in Articles from Four Medical Journals: An Exploratory Study September 4, 2001 James.
Database Objective Demonstrate basic database concepts and functions.
GOOGLE SCHOLAR Compiled by Helene van der Sandt. WHAT IS GOOGLE SCHOLAR?
Sabinet: SA ePublications Compiled by Helene van der Sandt.
LIS618 lecture 8 Thomas Krichel Lexis/Nexis Lexis is a specialized legal research service Nexis is primarily a news services adds an important.
COMMON COMMUNICATION FORMAT (CCF). Dr.S. Surdarshan Rao Professor Dept. of Library & Information Science Osmania University Hyderbad
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
Key Applications Module Lesson 22 — Managing and Reporting Database Information Computer Literacy BASICS.
Oxlip+. What is Oxlip+? A tool for finding & linking to databases – Online collections of (scholarly) materials – Includes full text / indexes / range.
IUB Libraries Faculty & Graduate Student Updates Web of Science: Citation Indexes on the Web Presented by Gary Wiggins
Bibliographic Record Description of a book or other library material.
An information retrieval system may include 3 categories of information:  Factual  Bibliographical  Institutional  Exchange and sharing of these categories.
Chapter 5 Ranking with Indexes. Indexes and Ranking n Indexes are designed to support search  Faster response time, supports updates n Text search engines.
Information Retrieval Inverted Files.. Document Vectors as Points on a Surface Normalize all document vectors to be of length 1 Define d' = Then the ends.
TEN TIPS ON HOW TO SEARCH EBSCO DATABASES
Information Retrieval in Practice
Why indexing? For efficient searching of a document
Bibliography and reference manager programs, Endnote 2018 Attila Skulteti
Searching for and Accessing Information
Concept of a document Lesson 3.
Database & Record Structure
Query Languages.
CAB Abstracts, Medline & Zoological Record
Ingenta Ingenta Ingenta (ConnectComplete) (formerly Uncover)
Welcome to this session which covers tips for searching the Web of Science. Download the slides from this presentation by clicking the Attachments tab.
IL Step 3: Using Bibliographic Databases
DATABASES By: Hanna Ben-Or Phone:
Information Retrieval B
Presentation transcript:

LIS 7450, Searching Electronic Databases Basic: Database Structure & Database Construction Dialog: Database Construction for Dialog (FYI) Deborah A. Torres

Database Structure Organization of Data Elements and records

Database Record Record – basic unit of information in a database (file).  Example: Bibliographic record contains description information, i.e. author, title, publisher etc.

Fields Field – a distinct part or section of a record (a unit of information within the record)  Example of personnel record fields: employee’s name, special identifier number, address, date of hire etc.

Field Design Decisions For each field  Decide what information is placed within that field & format for that information (text, numeric)  Should there be subfields within a field? What to call the fields? Field codes (abbreviations, numbering) Order of the fields

Example: MARC Record (a type of record you should be familiar with) Record Fields & Codes The 100 field contain author information. The 245 field contains main title information.

Other Design Decisions Hyphenated words  Home-school Stop words  High frequency words not useful for searching Single words and phrases  Library, library science, color of money Alternative spellings of words  Color, colour

Types of Databases Bibliographic – references and abstracts of published documents Fulltext – complete text of articles, dictionary entry, code of law, or other such document. Directory – factual information about organizations, companies, products, people, or materials.

Types of Databases Numeric – data in a tabular or statistically manipulated form, often with some added text. Hybrid – a mix of record types. For example, a database may have full- text records for some publications and citations and abstracts for other source documents.

Database Construction Basic Steps for automatic indexing of text documents

Six Basic Steps Step 1: Parse text into words Step 2: Compare to stoplist and eliminate stopwords Step 3: Stem content words (reduce to root words) (skip this step if decide not to stem) Step 4: Count stemmed word occurrences Step 5: Create union list of terms Step 6: Create data structure for specific retrieval techniques (i.e. an inverted file)

Example: Simple Set of 5, One-sentence documents D1: It is a dog eat dog world! D2: While the world sleeps. D3: Let sleeping dogs lie. D4: I will eat my hat. D5: My dog wears a hat. “D” stands for document

Step 1: Parse Text into Words D1: it is a dog eat dog world D2: while the world sleeps D3: let sleeping dogs lie D4: I will eat my hat D5: my dog wears a hat Note: Some databases remove punctuation for words, like possessives; others preserve it. What difference would this make ?

Step 2: Eliminate Stop Words D1: dog eat dog world D2: world sleeps D3: let sleeping dogs lie D4: eat hat D5: dog wears hat Stop words are content-free words – those not useful in determining the content of the document. Examples: pronouns (I, my), prepositions (of, by, on), articles (a, the, this )

Step 3: Stemming (remember not all databases stem words) D1: dog eat dog world D2: world sleeps D3: let sleeping dogs lie D4: eat hat D5: dog wears hat D1: dog eat dog world D2: world sleep D3: let sleep dog lie D4: eat hat D5: dog wear hat

Types of Stemming Decisions No Stemming: contract contracts contracted contracting contractor contraction contractual contracture Weak Stemming: Inflections: -s, -es, -ed, -ing, -’s Strong Stemming: Derivations: - tion, -ly, -ally Reduce words to a root variant; there are different stemming algorithms

A bit more about stemming for searching… Some databases automatically search for all of the words that come from the same stem/root word unless you indicate that you only want the word you entered. Example: if you entered computer, the database would also search for computing, computers, computation, etc.

Step 4: Sort Words, Count Duplicates D1: dog eat world D2: sleep world D3: dog let lie sleep D4: eat hat D5: dog hat wear D1: dog(2) eat world D2: sleep world D3: dog let lie sleep D4: eat hat D5: dog hat wear Sort into Alpha order Count any duplicates

Step 5: Create Union List of Unique Terms Unsorted List dog eat world sleep world dog let lie sleep eat hat dog hat wear Sorted List dog eat hat let lie sleep wear world Sorted, Unique List dog eat hat let lie sleep wear world

Step 6: Create Inverted Index (inverted file) dog eat hat let lie sleep wear word Union List Unique terms dog: D1 D3 D5 eat: D1 D4 hat: D4 D5 let: D3 lie: D3 sleep: D2 D3 wear: D5 word: D1 D2 Inverted Index: has pointers to documents in which word occurs Inverted Index

Dialog Database Construction FYI: For those interested in Dialog

Dialog Database Construction Step 1: Create a linear file of records received from the Information Provider. Assign sequential accession numbers to the records. Step 2: Label the fields within the records: AU for Author, TI for Title, etc. If a field is word- indexed, also label the words within each field. Exclude stop words: AN FOR THE AND FROM TO BY WITH

Dialog Database Construction Step 3: Create the Basic Index: all words and phrases from fields containing subject-related terms. Step 4: Create the Additional Indexes: all terms from all remaining fields.