LIS618 lecture 0 Thomas Krichel 2004-01-25. today's lecture A look at the course home page administrative.

Slides:

Advertisements

Similar presentations

LIS618 lecture 2 Thomas Krichel Structure Theory: information retrieval performance Practice: more advanced dialog.

Advertisements

LIS618 lecture 1 Thomas Krichel Organization homepage

Rclis in vision and reality Thomas Krichel

Use your bean. Count it. Thomas Krichel

My life and times Thomas Krichel LIU & НГУ

LIS618 lecture 1 Thomas Krichel Structure of talk Recap on Boolean Before online searching Working with DIALOG –Overview –Search command –Bluesheets.

EndNote Web Reference Management Software (module 5.1)

EndNote Web Reference Management Software (module 5)

Database Searching: How to Find Journal Articles? START.

Off Campus Library Services Your virtual library.

Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.

LIS618 lecture 6 Thomas Krichel Structure Probabilistic model News from the front line –Open WorldCat Pilot –Amazon Search Inside the book.

Information & Library Services Australian Education Index, British Education Index and ERIC Sally Giffen August 2006.

Web Search - Summer Term 2006 II. Information Retrieval (Basics Cont.)

Engineering Village ™ ® Basic Searching On Compendex ®

T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)

Models for Information Retrieval Mainly used in science and research, (probably?) less often in real systems But: Research results have significance for.

E-resources for the social sciences A brief overview of general resources for the social sciences: –Bibliographic databases –Resources for news and statistics.

Demystifying Library Databases LJG 3/19/02. This PowerPoint presentation was prepared by: Linda J. Goff (aka Madame Zolda, Gypsy Reference Librarian)

Tony Wilson Academic Liaison Librarian for Computer Science May 2011.

Chapter 5: Information Retrieval and Web Search

MIS 300…Information Systems- Theory and Practice Library Instruction Session Dr Bee Yew and Matt Lawson, IT Librarian Charles W. Chesnutt Library Tel:

EBSCO for All An introduction to the wonderful world of EBSCO.

Library HITS Helpful Information for Trinity Students/Staff Library eResources for Languages & Literatures Michaelmas Term 2013 Trinity College Library.

Online the Library Michaelmas Term 2011 Trinity College Library Dublin 1 1.

1 Urban Education Resources LIBRARY INSTRUCTION Jacqueline A. Gill Associate Professor Reference

An introduction to databases In this module, you will learn: What exactly a database is How a database differs from an internet search engine How to find.

Modeling (Chap. 2) Modern Information Retrieval Spring 2000.

Research Strategies Step-by-Step An Introduction to Library Research Questions about this activity? Contact Kimberley Stephenson at

1 Information Literacy Program Module 6 Emalus Campus.

LIS618 lecture 4 before searching + introduction to dialog Thomas Krichel

LIS618lecture 0 Introduction to the course Thomas Krichel

LIS654lecture 1 Introduction Thomas Krichel

LIS510 lecture 3 Thomas Krichel information storage & retrieval this area is now more know as information retrieval when I dealt with it I.

LIS618 lecture 1 Thomas Krichel economic rational for traditional model In olden days the cost of telecommunication was high. database use.

Selecting a Topic and Purpose

BIO1130 Lab 2 Scientific literature. Laboratory objectives After completing this laboratory, you should be able to: Determine whether a publication can.

Bio-Medical Information Retrieval from Net By Sukhdev Singh.

Using the University of Northampton Library: an ‘EWO’ guide for students based at other locations Please note: The University’s official term for arrangements.

Week 9 Search Engines and the Invisible Web. Resource Pages Collections of Links Compiled by “experts” Sometimes annotated Targeted Information for a.

NCBI/WHO PubMed/Hinari Course Introduction Session #1, Sept 13, 2005 Session #2, Sept 14, 2005 Internet Concepts and Scientific Literature Resources Ho.

1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)

Searching for Information and Library Databases. Knowing… When When Where Where How to find information isn’t easy How to find information isn’t easy.

LIS618 lecture 4 Thomas Krichel Structure of talk Before online searching Introduction to online searching Introduction to DIALOG –Overview.

What to Know: 9 Essential Things to Know About Web Searching Janet Eke Graduate School of Library and Information Science University of Illinois at Champaign-Urbana.

Chapter 6: Information Retrieval and Web Search

LIS618 lecture 0 Thomas Krichel today's lecture I will not talk about the strike. A look at the course home page

WISER : OxLIP+ Workshops in Information Skills and Electronic Research Oxford Libraries Information Platform Craig Finlay Gillian Beattie.

Introduction to Digital Libraries hussein suleman uct cs honours 2003.

LIS618 lecture 3 Thomas Krichel Structure of talk Document Preprocessing Basic ingredients of query languages Retrieval performance evaluation.

WISER: Citation searching Web of Knowledge is a powerful way to access the ISI's multidisciplinary citation indexes. It allows you to discover what research.

LIS654 lecture 1 Introduction Thomas Krichel

LIS618 lecture 0 Thomas Krichel Organization homepage Contents to be discussed today. Send mail.

Introduction to LIS508 Thomas Krichel

Introduction to LIS508 Thomas Krichel

Daniel Boivin OCLC Canada OCLC and Access98. AgendaAgenda n What’s new with FirstSearch 4.0 n New FirstSearch or FirstSearch 5.0.

© 2010 Deep Web Technologies, Inc. Taking the Library Back from Google Abe Lederman, President and CTO Deep Web Technologies May 12, 2010.

1 Information Retrieval LECTURE 1 : Introduction.

Information Retrieval

Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.

LIS618 lecture 8 Thomas Krichel Lexis/Nexis Lexis is a specialized legal research service Nexis is primarily a news services adds an important.

Please feel free to chat until the seminar begins at the top of the hour!

1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.

OxLIP+ Electronic Resources Gillian Beattie Angela Carritt.

Among the skills we’ll address today....  Constructing a search for scholarly articles (Where? How?)  Working with your search results  Locating the.

Session 5: How Search Engines Work. Focusing Questions How do search engines work? Is one search engine better than another?

LEARNING SERVICES. LEARNING SERVICES Learning Resources As a student of Edge Hill University you have a wealth of resources available to help you complete.

Introduction to Human Services

CS 430: Information Discovery

Introduction to Information Retrieval

Presentation transcript:

LIS618 lecture 0 Thomas Krichel

today's lecture A look at the course home page administrative stuff historical matters about the course about me business of database searching indexes the Boolean information retrieval model practice example on Dialog

Organization homepage Contents to be discussed today. Send mail to –Your name –Your secret word for grades delivery Interrupt me with as many questions as possible! Ask for breaks!

Proposed Organization Normal lecture Quiz at the beginning of every lecture –Factually oriented, around 15 minutes –Remove worst performance –Average to form 50% Search exercise 50% I may make some adjustment to the syllabus this week.

Search exercise Find victim of an information need Best to take someone you know in a professional capacity Conduct interview about an information need experienced by the victim, write down expectations Search in formal database and on web Discuss results with the victim Write essay, no longer than 5 pages.

about the course This course is new wine in an old bottle Officially a merger of –lis566 information resources on the Internet mailing lists usenet news web searching –lis618 database searching access and use of commercial databases

mix of theory and practice I am not a database search practitioner. Each database is different, practical skills are not easily transferable. Thus my emphasis in the course is more on theory. In the past, I did theory first, then practice. These day I mix. Some theory and some practice in every session.

What online retrieval systems? Dialog has been the traditional database covered. –They were the market leaders in online databases in the past. –Nowadays the field is much more open. –They remain a very good teaching tool for command based database searching. Nexis: a news database I have covered every year. Google: a well-known search engine that I started to cover two years ago.

other stuff Other online IR systems that I have covered in the past –OCLC FirstSearch –Factiva (briefly) –WestLaw (external speaker) New developments – Peer-to-peer networks – an introduction to reference linking using OpenURL Old developments with library potential –relational databases

About me Born 1965, in Völklingen (Germany) Studied economics and social sciences at the Universities of Toulouse, Paris, Exeter and Leiceister. PhD in theoretical macroeconomics Lecturer in Economics at the University of Surrey 1993 and 2001 Since 2001 assistant professor at the Palmer School

Why? During research assistantship period, (1990 to 1993) I was constantly frustrated with difficult access to scientific literature. At the same time, I discovered easy access to freely downloadable software over the Internet. I decided to work towards downloadable scientific documents. This lead to my library career (eventually).

Steps taken I 1993 founded the NetEc project at later available at as well as at These are networking projects targeted to the economics community. The bulk is –Information about working papers –Downloadable working papers –Journal articles were added later

Steps taken II Set up RePEc, a digital library for economics research. Catalogs –Research documents –Collections of research documents –Researchers themselves –Organizations that are important to the research process Decentralized collection, model for the open archives initiative

Steps taken III Co-founder of Open Archives Initiative Work on the Academic Metadata Format Co-founded rclis, a RePEc clone for (Research in Computing, Library and Information Science) Currently working on the Konz project. It uses a database of titles of journal published papers and tries to find them on the Internet.

my interest in databases an important emphasis of course is still on commercial databases. From my point of view I have two interests in database searching –As a provider, I must understand how people search in order to provide some data that they can use and will use. –As an economist, I have a strong interest in information as a commodity. The database market is an important market place.

online information retrieval This subject can be though off as a subset of information retrieval (IR). Most IR is online or digital. IR concentrates on textual data. We can think of online IR to fall under two categories –database IR –web IR

database / web IR Database IR look at systems that have –controlled set of record –low heterogeneity –use requires authentication –advanced search features Web IR has opposite characteristics

traditional social model User goes to a library Describes problem to the librarian Librarian does the search –without the user present –with the user present Hands over the result to the user User fetches full-text or asks a librarian to fetch the full text.

economic rational for traditional model In olden days the cost of telecommunication was high. Database use costs –cost of communication –cost of access time to the database The traditional model controls an upper limit to the costs.

disintermediation With access cost time gone, the traditional model is under threat There is disintermediation where the librarian looses her role of doing the search. But that may not be good news for information retrieval results –user knows subject matter best –librarian knows searching best

Web searching IR has received a lot of impetus through the web, which poses unprecedented search challenges. With more and more data appearing on the web DS may be a subject in decline –It is primarily concerned with non-web databases –There is more and more web-based methods of searching

Public access vs quality Now the public at large is able to do online searching. At the same time need for quality answers has grown. Quality-filtered services will become more important. In the current databases, there is as lot that would already be available for free mixed with quality-controlled stuff. Publishers have direct offerings and intermediated vending is in decline.

main theory part Literature: – "Modern Information Retrieval" by Ricardo Baeza-Yates and Berthier Ribiero-Neto. –"Information Retrieval in the Digital Age" by Heting Chu. You don't need to buy the books. You better spend practice time on databases rather than reading books

components of the IR process provider –define data that is available documents that can be used document operations document structure –index user –user need –IR system familiarity

the IR process Query expresses user need in a query language Processing of query yields retrieved documents Calculation of relevance ranking Examination of retrieved documents Possible return to the start, another query.

main problem User is not an expert at the formulation of a query Garbage in garbage out, the retrieval yields poor result Ways around that problem –design very intuitive interface for the query –give expert guidance

taxonomy of classic IR models Boolean, or set-theoretic –fuzzy set models –extended Boolean Vector, or algebraic –generalized vector model –latent semantic indexing –neural network model Probabilistic –inference network –belief network

summary There are three basic types of models in classic information retrieval. Extensions of these types are a matter of research concern and require good mathematical skills. All classic models treat document as individual pieces.

key aid: index An index is a list of terms, with a list of locations where the term is to be found. The way to express locations usually depends on the form that the indexed data takes. –for a book, it is usually the page number, e.g. "shmoo 34, 75" –for computer files it is usually the name of the file plus the number of the byte where the indexed term starts, e.g. "krichel index.html 34, cv.html " There is usually more than one location of the term.

key aid: index terms The index term is a part of the document that has a meaning on its own. It is usually a noun word. Retrieval based on index term raises questions –semantics in query or document is lost –matching done in imprecise space of index terms One way out is to specify several terms and require that they have to be close to each other.

basic concept: weight of index term Given all nouns, not all appear to have the same relevance to the text Sometimes, we can have a simple measure of the importance of a term, example? More generally, for each indexing term and each document we can associate a weight with the term and the document. Usually, if the document does not contain the term, its weight is zero

Boolean model In the Boolean model, the index weight of all index term for any document is 1 if the term appears in the document. It is 0 otherwise. This allows to combine query terms with Boolean operator AND, OR, and NOT thus powerful queries can be written

Classic implementation: dialog The documents that I have used – _sem/dlg1.pdf – _sem/dlg2.pdf – _sem/dlg3.pdf – _sem/dlg4.pdf I am also told that there are others at

Dialog is a databank over 500 databases these are also known as files and cover – references and abstracts for published literature, – business information and financial data; –complete text of articles and news stories; –statistical tables –directories DIALOG uses the Boolean model

DIALOG interface It is still rooted in "traditional" database systems. It has been dismissed as "dial-a-dog". It uses a command-driven interface. It is very complicated to learn fully. It is not suitable for the end-user. It therefore offers a valuable skill to the information professional.

Accessing DIALOG On the web, go to Enter username and password Forget about subaccount Then click on logon On the next screen go to command search "continue" at the next screen

two steps in DIALOG Step one: select databases (aka files) to look at Step two: perform searches on the selected databases You may wonder why one does not have one single step like in a search engine. Discuss.

sample search We want to know something about "current awareness in digital libraries" From dialogweb command search: –databases –social sciences and humanities –library and information science This leads you to guided&url=/cgi/dwframe?href=search.html

This is database selection… At that screen you see a number of "files" with their number. You can select those that you want to search Then you click "begin database" and you get back to the command search "b numbers" it will say. That is the command to begin working with files.

Boolean search Do a number of searches –s current(N)awarness –s digital(N)library –s digital(N)libraries Each search retrieves a set of documents The sets can be combined –s s1 and (s2 or s3)

What is the deal? There are two stages. At stage two we make Boolean queries. Each query splits the records into matching and non-matching records. The set of matching records is return. It can be further searched or combined with other sets using Boolean operators. Try this at home.

Thank you for your attention!