Build Your Own Research Database Using DocFetcher Open Source Software

Slides:



Advertisements
Similar presentations
Don’t Type it! OCR it! How to use an online OCR..
Advertisements

Support.ebsco.com EBSCOhost Digital Archives Viewer Tutorial.
In the Format section, we have activated the Bibliographic style drop down menu. From this page, you can choose a specific journal or format (e.g. BMC.
.txt What about the Apple Apps? Save As... OR Export Pages Numbers Keynote.
Welcome to San Antonio!. halFILE 3.0 A Sneak Preview.
Alternative FILE formats
Easy, inexpensive…Effective ! Popfax. com, professional fax services, worldwide Popfax – SF – EN How to edit, sign and fax documents Easy, inexpensive…Effective.
Internet Research Techniques Graham Seibert Copyright 2006 This is a segment of the draft version of a large syllabus. I need your feedback to improve.
Single Search By Rakphao Theppan, librarian Searching Online Resources.
Using LegalTrac To Find Law Review Articles. What Is LegalTrac? A commercial service UW Libraries subscribe Indexes law reviews, other legal periodicals.
Hudson Valley Community College Marvin Library GOOGLE SCHOLAR
Premier Accessibility Suite Software for Reading and Writing.
Google Confidential and Proprietary 1 Intro to Docs Google Apps Apps.
1 The Vietnam Center and Archive Stephen Maxner, Ph.D.
Module 1.4 File management. Contents Introduction Windows Explorer The need to organise More about files Working with files Test and improve your knowledge.
‘ {] Chapter 2 (HW01) Getting Started with Windows 7.
Web of Science. Copyright 2006 Thomson Corporation 2 Example: (bird* or avian) and (flu or influenz*) Enter your terms to be searched. Search fields are.
Software All parts of the computer people can NOT touch, such as programs, files, documents and any other data.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Multimedia Digital Library Marcia Johnson. Collection 25 text documents 25 text documents In HTML, PDF, TXT formats (source: Project Gutenberg) In HTML,
Table of Contents How to save documents using various software
Lakeland Click arrow to advance show. Click on the “A” under “Listed By Name.” (“A” for Academic Search Database)
25-27 June 2003Clearing House Workshop, Paris1 Direct access to UNESCO Documents UNESDOC.
Beyond the Basics Steven Butzel, Nashua Public Library , Yahoo IM: nashuaref.
English 115 GoogleScholar/ OneSearch Hudson Valley Community College Marvin Library Learning Commons 1.
Research & Learning For Libraries and Patrons that need to stay Ahead of the Learning Curve Presenter Name Here Books24x7® for Libraries.
Welcome! Computer 101 Session 3 With Laura Crichton.
Plagiarism Detection on Canvas – VeriCite Student Guide Canvas Quick Start Guide.
Archimer Ifremer’s institutional repository Fred Merceur IAMSLIC's 32nd annual conference Every Continent, Every Ocean October 8-12, 2006 Portland, Oregon,
Student Quick Start Guide Prepared by: Information Services Division Perpustakaan Sultan Abdul Samad Universiti Putra Malaysia
Search Engine Know- How: How To Optimize Your Content, Navigation Pages, & Documents For Search Engines.
Quick Launch. Google Drive 30 GB Cloud Space Document.
Welcome to Cambridge Histories Online This unique historical reference compendium allows instant access to the renowned texts of the Cambridge Histories.
Computer Skills (1) Internet Explorer. To open the Internet Explorer: –Double click on the Internet Explorer icon on Desktop. –Or, from Start  All Programs.
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
Turnitin staff snapshot Faculty of Engineering and Surveying 16 th Dec 2010.
Important skills for Computer Based Researching Peter Szluka, Attila Skulteti
Introduction to Managing Research and Personal Data.
Jacynthe Touchette, MSI JGH Health Sciences Library
Plagiarism Detection tools
Tuesday Tech Talks Skeen Library Presents Today’s Topic: Presented by:
PubMed Database Interface (Basic Course Module 4 Part B)
Computing Fundamentals
Microsoft Outlook By: Phuong Nguyen.
Locating News Resources
Adding a File to a Course
Microsoft Windows Ver.1.01.
Access Lesson 14 Import and Export Data
View, edit, create and share your documents OfficeSuite brings the power and capability of a desktop app to the iPhone and iPad like never before! This.
Computer Software Computer Software 9/15/2018
Computer Software Created by Ann Ware
1 2 3 Here we are on the Ohio Web Library’s home page. To get to Business Source Premier, use the following steps: 1. Go to Ohio Web Library 2. Click on.
NextGen Document Management
PubMed Database Interface (Basic Course Module 4 Part B)
Literary reference center
File Extension Mini-Lesson
Google Drive and Gale Databases
Family Search and the scanning of OCPL’s historical book collection.
Computer Skills (1) Internet Explorer.
Making PDFs Accessible
Google Drive.
File Management Staying Organized.
EBSCOhost Digital Archives Viewer
USER MANUAL - WORLDSCINET
Important skills for Computer Based Researching
System Software: Operating system, Utility Programs, & File Management
Productivity Advantages
Tutorial Introduction to help.ebsco.com.
USER MANUAL - WORLDSCINET
Presentation transcript:

Build Your Own Research Database Using DocFetcher Open Source Software Chris Sweet Illinois Wesleyan University 2019 ACRL Conference

What are some potential uses of DocFetcher? What is DocFetcher? Open source desktop indexing software http://docfetcher.sourceforge.net What are some potential uses of DocFetcher? Searching all of your notes, emails, files, etc from one application Full-text keyword searching Basic digital humanities projects Historical research Indexing special collections or archival material What did I use DocFetcher for? Research for a book on the history of bicycling in Illinois (tens of thousands of .pdf pages)

Supported formats Microsoft Office (doc, xls, ppt) Rich Text Format (rtf) Microsoft Office 2007 and newer (docx, xlsx, pptx, docm, xlsm, pptm) AbiWord (abw, abw.gz, zabw) Microsoft Compiled HTML Help (chm) Microsoft Outlook (pst) MP3 Metadata (mp3) OpenOffice.org (odt, ods, odg, odp, ott, ots, otg, otp) FLAC Metadata (flac) JPEG Exif Metadata (jpg, jpeg) Portable Document Format (pdf) Microsoft Visio (vsd) EPUB (epub) Scalable Vector Graphics (svg) HTML (html, xhtml, ...) TXT and other plain text formats (customizable)

Important Features Open Source Full-text indexing Very fast searches Fast index building (in comparison to similar software) Automatic index updating You tell it what files to index. Easy to turn off indexed folders. Portable version can be saved to thumb drive Copyright law compliant since it is not web-based

Limitations Relevancy is nowhere near what we are used to in commercial library databases Text previews of very large .pdfs take awhile to load No previews of the .pdf images w/o opening the file Known date and date range searching is not straightforward (my workaround) Does not perform OCR on .pdf files. This must be done first.

Search Tips Default setup is to use “Or” operator (I recommend changing this to “And”) Because the relevancy algorithm is not robust, using proximity operators helps. For example: “Tillie Anderson” Rac*~10 This searches specifically for the name Tillie Anderson and the terms race, racer, racing within 10 words. Use left-hand navigation to limit by file type and specify which indices to search

Utilizing Hathi Trust and Internet Archive You can create an account and assemble custom lists of items to search in each repository, but it is a pain, and you have to repeat searches in each. Easy to download from Archive.org Harder (depending on institutional access) to download complete items from Hathi Trust Save materials to topical folders, then have DocFetcher index them These are just popular examples. There are many sources of digitized materials today.

Using Your Own Notes in the Index My personal research process: Mark passages in books and articles with post-it notes Scan these materials to a .pdf format Use Acrobat Pro to perform Optical Character Recognition (OCR) on these items Copy and paste the passages I had noted into Google Documents Add the files to my DocFetcher index for future searching (extremely useful!)

DocFetcher in Action

Docfetcher Alternatives? Lookeen Copernic +indexed full-text +Free individual license +fast search results -Slower than DocFetcher X1 Search -slow indexing -slow document preview, no highlighted keywords -Free Trial, $50 purchase -extremely slow to load large pdfs -results just arranged alphabetically -poor relevancy ranking -no hit count -$15 per year

Questions? Chris Sweet Illinois Wesleyan University csweet@iwu.edu