LOGO A comparison of two web-based document management systems ShaoxinYu Columbia University March 31, 2009.

Slides:



Advertisements
Similar presentations
SCOPUS Searching for Scientific Articles By Mohamed Atani UNEP.
Advertisements

28 March 2003e-MapScholar: content management system The e-MapScholar Content Management System (CMS) David Medyckyj-Scott Project Director.
Database Searching: How to Find Journal Articles? START.
PubMed.
EndNote. What is EndNote:  EndNote is referencing software that enables you to create a database of references from your readings. Your database of references.
ProQuest Databases The ProQuest Databases: What are They? ProQuest databases allow users to search for citations, abstracts, and full-text and full-image.
Features and Uses of a Multilingual Full-Text Electronic Theses and Dissertations (ETDs) System Yin Zhang Kent State University Kyiho Lee, Bumjong You.
Digital Libraries and Autonomous Citation Indexing Steve Lawrence C. Lee Giles Kurt Bollacker.
Search Engines and Information Retrieval
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
IS 360 Web Promotion. Slide 2 Overview How to attract visitors.
Using WilsonSelect. WilsonSelect (or WilsonSelectPlus) is a database of full-text articles from magazines and journals. It covers a very wide range of.
Copyright © 2006 Pearson Education, Inc. publishing as Benjamin Cummings. The Literature of Health Education Chapter 9.
Using ProQuest Databases Jackson Community College Atkinson Library.
Overview of Search Engines
Software Documentation Written By: Ian Sommerville Presentation By: Stephen Lopez-Couto.
Araba Dawson-Andoh 122 A Alden Library
Jean Phillips Schwerdtfeger Library Space Science and Engineering Center University of Wisconsin-Madison November 2005.
Mendeley What is it? How is it different from other “Bibliographic databases” like End Note and Reference.
II. Visiting the Library 1 updated 12/02/09. 2 Pat’s English class visits the BCC Library to locate literary criticism on Charlotte Perkins Gilman’s story,
Refworks Presented by Margaret Clark, Reference Librarian FSU College of Law Library September 20, 2005.
Web 2.0: Concepts and Applications 2 Publishing Online.
Biological Science Database Proquest WEDAD AL-HUSAINAN ISD/NSTIC Kuwait Institute for Scientific Research November/2012.
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
How to Use Google Scholar An Educator’s Guide
Finding and managing information for your doctorate (including Endnote): part 2 David Heading and Christine Purcell.
Rescue for the Researcher and Writer. The Research Process 1.Planning the project 2.Selecting / refining a topic 3.Finding sources 4.Evaluating your sources.
Search Engines and Information Retrieval Chapter 1.
Lecture Five: Searching for Articles INST 250/4.  What are LCSH? ◦ Why should one hyperlink on the LCSH in the Library catalogue search?  Subject vs.
1 DATABASES By: Hanna Ben-Or Phone: October 2011.
Finding Book Reviews H. Calogeridis R. Caldwell UW Library Last Updated: March 2005.
Catherine C. Marshall Akshay Kulkarni.  Explores practices associated with ◦ Collaborative Authoring ◦ Reference Use ◦ Informal Creation of Personal.
Online Autonomous Citation Management for CiteSeer CSE598B Course Project By Huajing Li.
Library Information and Services CSE Librarian: Jason Neal Phone: Office: B 03 E Nedderman Hall UTA.
Bio-Medical Information Retrieval from Net By Sukhdev Singh.
Business Software What is database software? p. 145 Allows you to create, access, and manage data Add, change, delete, sort, and retrieve data Next.
SIRS Issues Researcher Insight into today’s Leading Issues sks.sirs.com | proquestk12.com.
WISER Social Sciences: Politics & International Relations Gillian Beattie (Social Science Library) Jane Rawson (Vere Harmsworth Library)
NCBI/WHO PubMed/Hinari Course Introduction Session #1, Sept 13, 2005 Session #2, Sept 14, 2005 Internet Concepts and Scientific Literature Resources Ho.
Similar Document Search and Recommendation Vidhya Govindaraju, Krishnan Ramanathan HP Labs, Bangalore, India JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE.
IL Step 3: Using Bibliographic Databases Information Literacy 1.
WISER: Citation searching Web of Knowledge is a powerful way to access the ISI's multidisciplinary citation indexes. It allows you to discover what research.
EndNote. What is EndNote? EndNote is referencing software that enables you to create a database of references from your readings.
1 EndNote X2 Your Bibliographic Management Tool 29 September 2009 Humanities and Social Sciences Resource Teams.
Student Edition: Gale Info Trac Database Lesson Grades 9-12 High School Student Edition: Gale Info Trac Database Lesson Grades 9-12 High School Anita Cellucci.
October RefWorks Basics Creating accounts and folders Adding references (manually & electronically) Sorting, editing and linking Creating a bibliography.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Search Engine Know- How: How To Optimize Your Content, Navigation Pages, & Documents For Search Engines.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
A System for Automatic Personalized Tracking of Scientific Literature on the Web Tzachi Perlstein Yael Nir.
Oxlip+. What is Oxlip+? A tool for finding & linking to databases – Online collections of (scholarly) materials – Includes full text / indexes / range.
Bodleian Social Science Library Michaelmas, 2011 Post-induction session for Anthropologists Finding key information resources Sarah Rhodes Forced Migration,
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
INFORMATION LITERACY SKILLS 1. OBJECTIVES  To introduce students to the best search strategies to use when searching for information online.  To expose.
Roger Mills February don’t be evil stand on the shoulders of giants.
(Click to advance the presentation.). The best source for locating these articles is the collection of research databases at the Online Library. While.
Using Google Scholar Ronald Wirtz, Ph.D.Calvin T. Ryan LibraryDec Finding Scholarly Information With A Popular Search Engine Tool.
Google Scholar Google Scholar allows the researcher to search for scholarly articles on a broad range of subjects.
Reference Management Module I: Introduction By Rehema Chande-Mallya(PhD)
Databases Post-Graduate Workshop 2011 Letitia Lekay.
1 DATABASE INTERNATIONAL BIBLIOGRAPHY OF BOOK REVIEWS OF SCHOLARLY LITERATURE IN THE HUMANITIES AND SOCIAL SCIENCES ONLINE.
Google Scholar and ShareLaTeX
How to Use Google Scholar An Educator’s Guide
Software Documentation
Strategies for Researching Information Online
EndNote by: fatimah alotaibi.
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
For academic research Using Google Scholar For academic research
Data Mining Chapter 6 Search Engines
Introduction of KNS55 Platform
Presentation transcript:

LOGO A comparison of two web-based document management systems ShaoxinYu Columbia University March 31, 2009

LOGO Index I. Description of the problem II. Google Scholar III. CiteSeer IV. Comparison of Google Scholar and CiteSeer

LOGO Description of the problem Nowadays, with mushrooming of the quantity of on-line text information, automatic text summarization plays a more and more important role in information industry Online resources will certainly contain similar content, however, exist separately, it is meaningful for us to find high efficient ways to manage these information.

LOGO Description of the problem Background of Multi-document Summarization Techniques 1. Free style summarization 2. Sentence Extraction type summarization 3. Axis (type of main topic) 4. Table style summary Four types

LOGO Description of the problem How to achieve documents about the same topic manually? 1.Use a marker to mark the important phrases or sentences 2.Figure out the main topics in the marked sentences OR Make a list to figure out the overview of the documents 3. Connect these main topics

LOGO Google Scholar 1. Released in November Search engine for scholarly literature 3. Wide range of subject areas

LOGO Google Scholar Do not search all publicly available Web pages as Google Google Scholar gets its records from three sources: 1.Use a proprietary algorithm to identify Web documents “look scholarly” ----full-text documents and citations with abstracts. 2.Add content provided by its partners—journal publishers, scholarly societies, database vendors, and academic institutions. 3.Extracts citations from the reference lists of documents found through the first two methods

LOGO Google Scholar Google File System Architecture

LOGO Google Scholar 1.Chunk fragment of information used in multimedia formats 64 MB: optimize by statistic method 2.Metadata (stored in master) a. files and chunk namespaces b. mapping from files to chunks c. locations of each chunk’s replicas 3.Master Single process running on a machine that stores all metadata 4. Communication between Master and Chuck Servers If corrupted, master also sends instruction to the chuck servers for deleting existing chunks, creating new chunks.

LOGO CiteSeer 1. Public search engine for academic papers 2. Created by Steve Lawrence, Kurt Bollacker and Lee Giles 3. NEC Research Institute, Princeton, New Jersey, USA 4. Hosted by Pennsylvania State University 5. Over 700,000 documents, primarily in computer and science and engineering.

LOGO CiteSeer CiteSeer features 1. Autonomous citation indexing system 2. Index academic literature in Postscript files or PDF 3. Literature retrieval by following citation links 4. Evaluation and ranking of papers, authors and journals 5. Create up-to-date databases not limited to preselected journals or restricted by journal publication delays 6. Autonomous operation with a corresponding reduction in cost 7. Powerful interactive browsing of the literature using the context of citations

LOGO CiteSeer Methods of CiteSeer use for computing similarity 1.Word Vectors Use the top 20 components, since the truncation may not have a large effect on the distance measures 2. String Distance Use “LikeIt” string distance to measure the edit distance 3. Citations Use common citations to find the research papers most closely related to the document 4. Combination of Methods CiteSeer combines document similarity methods above

LOGO Comparison of Google Scholar & CiteSeer Different positioning The core purpose of CiteSeer is to search for the complete academic papers with complete citations and exempt of the hefty fee Google Scholar is Google’s products to promote the complete solution of searching and other need of academic purposes, whose strategy focuses on complete and can be used as a final solution

LOGO Comparison of Google Scholar & CiteSeer Coverage and performance Google Scholar utilizes the first K bytes of the text for searching and the links always need to pay We can trace the informative paper by CiteSeer itself, and the contributions of all the citation papers provide huge help in academic affairs

LOGO Comparison of Google Scholar & CiteSeer Click any of the informative links can connect to one link

LOGO Comparison of Google Scholar & CiteSeer Results are provided only by the topics extraction

LOGO Comparison of Google Scholar & CiteSeer As to the staleness matter, Google Scholar seems to be a loser in comparison with CiteSeer. This effect was more obvious in the early days of appearance of Google Scholar. Nowadays, for majority of uses, the staleness is no longer a big problem for both of them.

LOGO