Reference Collections: Collection Characteristics.

Slides:



Advertisements
Similar presentations
Comparison of BIDS ISI (Enhanced) with Web of Science Lisa Haddow.
Advertisements

Information Retrieval (IR) on the Internet. Contents  Definition of IR  Performance Indicators of IR systems  Basics of an IR system  Some IR Techniques.
Transferable Skills beyond the academic training 22nd January, 14-18h, Building 3, Floor 1, Computer Room 9 (16.P1.E3) 29nd January, 14-18h, Building.
1/1/ A Knowledge-based Approach to Citation Extraction Min-Yuh Day 1,2, Tzong-Han Tsai 1,3, Cheng-Lung Sung 1, Cheng-Wei Lee 1, Shih-Hung Wu 4, Chorng-Shyong.
ANALYSING RESEARCH – A GLOBAL PERSPECTIVE Krzysztof Szymanski – Country Manager Thomson Reuters October 2009.
1 Retrieval Performance Evaluation Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto Addison-Wesley, (Chapter 3)
NTIS on Engineering Village. What is the NTIS Database? The NTIS Database is the main resource for accessing the latest research.
1 CS 430 / INFO 430: Information Retrieval Lecture 16 Web Search 2.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Most relevant, few hits Not enough hits? Search by ‘Keyword(ke)’ Still not enough hits? Search by ‘Entire Document (tx)’ (least relevant, most hits) Advanced.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Information retrieval: overview. Information Retrieval and Text Processing Huge literature dating back to the 1950’s! SIGIR/TREC - home for much of this.
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
ISYS3015 Analytical Methods for Information systems professionals Week 3 Lecture 1: Finding the literature.
A tutorial on how to compute H-index using Web of Science database.
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
Indexing 1/2 BDK12-3 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.
H. Lundbeck A/S3-Oct-151 Assessing the effectiveness of your current search and retrieval function Anna G. Eslau, Information Specialist, H. Lundbeck A/S.
Basics of Information Retrieval Lillian N. Cassel Some of these slides are taken or adapted from Source:
Jane Reid, AMSc IRIC, QMUL, 16/10/01 1 Evaluation of IR systems Jane Reid
University of Antwerp Library TEW & HI UA library offers... books, journals, internet catalogue -UA catalogue, e-info catalogue databases -e.g.
Genetic Learning for Information Retrieval Andrew Trotman Computer Science 365 * 24 * 60 / 40 = 13,140.
Journal Evaluation. Impact Factor  The impact factor, often abbreviated IF, is a measure of the citations to science and social science journals. citationsscience.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
IL Step 3: Using Bibliographic Databases Information Literacy 1.
Introduction to Digital Libraries hussein suleman uct cs honours 2004.
Web of Science User’s guide. What is Web of Science? How to Register? How to use Web of Science Main screen of Web of Science How to do a search General.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Retrieval of Highly Related Biomedical References by Key Passages of Citations Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
Objectives: Students will be able to: Identify means of “How to search the World Wide Web” List different search engines Restrict or focus a search request.
Database collection evaluation An application of evaluative methods S519.
RESEARCH – DOING AND ANALYSING Gavin Coney Thomson Reuters May 2009.
SPRINGER ONLINE
OARE Module 4: Summon Searching. What is Summon? Summon is a Google-like search engine that provides fast, relevancy-ranked results: Enter the search.
WISER Finding stuff: Articles Kerry Webb, Deputy Librarian, English Faculty Library Isabel McMann, Academic Liaison Services, Radcliffe Science Library.
Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.
Secondary Resources. Secondary literature refers to references that either index or abstract the primary literature Its goal is directing the user to.
Performance Measurement. 2 Testing Environment.
Information Retrieval
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
Today’s lineup… Data-to-Story Project – description due questions about mid-term? Bibliometrics and citation analysis.
WISER: Finding stuff Journal articles Kerry Webb, Deputy Librarian, English Faculty Library & Angela Carritt, OULS User Education Coordinator.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Information Retrieval Quality of a Search Engine.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Chapter Three Presentation: User interface How to Build a Digital Library Ian H. Witten and David Bainbridge.
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Presenting Documents How to Build a Digital Library Ian H. Witten and David Bainbridge.
JST Chinese Bibliographic Database January, 2007 Japan Science and Technology Agency (JST) Office of Science and Technology Information.
Retrieval Evaluation Modern Information Retrieval, Chapter 3
BioCreAtIvE Critical Assessment for Information Extraction in Biology Granada, Spain, March28-March 31, 2004 Task 2: Functional annotation of gene products.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
A tutorial on how to compute H-index using Web of Science database
Searching for and Accessing Information
IL Step 3: Using Bibliographic Databases
Introduction of KNS55 Platform
Bibliometric Analysis of Quality of Life Publication
Networked Information Resources
Citation-based Extraction of Core Contents from Biomedical Articles
Introduction to Information Retrieval
Retrieval Evaluation - Reference Collections
Retrieval Evaluation - Reference Collections
LINK ANALYSIS Links connecting pages are a key component of the Web.  
Retrieval Evaluation - Reference Collections
Retrieval Evaluation - Reference Collections
Retrieval Evaluation - Reference Collections
Presentation transcript:

Reference Collections: Collection Characteristics

CACM Collection 3204 Communications of the ACM articles Focus of collection: computer science Structured subfields: –Author names –Date information –Word stems from title and abstract –Categories from hierarchical classification –Direct references between articles –Bibliographic coupling connections –Number of co-citations for each pair of articles

CACM Collection 3204 Communications of the ACM articles Test information requests: –52 information requests in natural language with two Boolean query expressions –Average of 11.4 terms per query –Requests are rather specific with an average of about 15 relevant documents –Result in relatively low precision and recall

ISI Collection 1460 documents from the Institute of Scientific Information Focus of collection: information science Structured subfields: –Author names –Word stems from title and abstract –Number of co-citations for each pair of articles

ISI Collection 1460 documents from the Institute of Scientific Information Test information requests: –35 information requests in natural language with Boolean query expressions –Average of 8.1 terms per query –41 information requests in NL without Boolean query expression –Requests are fairly general with an average of about 50 relevant documents –Higher precision and recall

Observation Collection# of Docs# of TermsTerms/Doc CACM ISI Number of terms increases slowly with number of documents

Cystic Fibrosis Collection 1239 articles with “Cystic Fibrosis” index in MEDLINE Structured subfields: –MEDLINE accession number –Author –Title –Source –Major subjects –Minor subjects –Abstract (or extract) –References in the document –Citations to the document

Cystic Fibrosis Collection 1239 articles with “Cystic Fibrosis” index in MEDLINE Test information requests: –100 information requests –Relevance assessed by four experts with a scale of 0 (not relevant), 1 (marginal relevance), and 2 (high relevance) –Overall relevance is sum (0-8)

Discussion Questions In developing a search engine: –How would you use metadata (e.g. author, title, abstract)? –How would you use document structure? –How would you use references, citations, co-citations? –How would you use hyperlinks?