Build Your Own Research Database Using DocFetcher Open Source Software

Build Your Own Research Database Using DocFetcher Open Source Software
Chris Sweet Illinois Wesleyan University 2019 ACRL Conference

What are some potential uses of DocFetcher?
What is DocFetcher? Open source desktop indexing software What are some potential uses of DocFetcher? Searching all of your notes, s, files, etc from one application Full-text keyword searching Basic digital humanities projects Historical research Indexing special collections or archival material What did I use DocFetcher for? Research for a book on the history of bicycling in Illinois (tens of thousands of .pdf pages)

Supported formats Microsoft Office (doc, xls, ppt)
Rich Text Format (rtf) Microsoft Office 2007 and newer (docx, xlsx, pptx, docm, xlsm, pptm) AbiWord (abw, abw.gz, zabw) Microsoft Compiled HTML Help (chm) Microsoft Outlook (pst) MP3 Metadata (mp3) OpenOffice.org (odt, ods, odg, odp, ott, ots, otg, otp) FLAC Metadata (flac) JPEG Exif Metadata (jpg, jpeg) Portable Document Format (pdf) Microsoft Visio (vsd) EPUB (epub) Scalable Vector Graphics (svg) HTML (html, xhtml, ...) TXT and other plain text formats (customizable)

Important Features Open Source Full-text indexing Very fast searches
Fast index building (in comparison to similar software) Automatic index updating You tell it what files to index. Easy to turn off indexed folders. Portable version can be saved to thumb drive Copyright law compliant since it is not web-based

Limitations Relevancy is nowhere near what we are used to in commercial library databases Text previews of very large .pdfs take awhile to load No previews of the .pdf images w/o opening the file Known date and date range searching is not straightforward (my workaround) Does not perform OCR on .pdf files. This must be done first.

Search Tips Default setup is to use “Or” operator (I recommend changing this to “And”) Because the relevancy algorithm is not robust, using proximity operators helps. For example: “Tillie Anderson” Rac*~10 This searches specifically for the name Tillie Anderson and the terms race, racer, racing within 10 words. Use left-hand navigation to limit by file type and specify which indices to search

Utilizing Hathi Trust and Internet Archive
You can create an account and assemble custom lists of items to search in each repository, but it is a pain, and you have to repeat searches in each. Easy to download from Archive.org Harder (depending on institutional access) to download complete items from Hathi Trust Save materials to topical folders, then have DocFetcher index them These are just popular examples. There are many sources of digitized materials today.

Using Your Own Notes in the Index
My personal research process: Mark passages in books and articles with post-it notes Scan these materials to a .pdf format Use Acrobat Pro to perform Optical Character Recognition (OCR) on these items Copy and paste the passages I had noted into Google Documents Add the files to my DocFetcher index for future searching (extremely useful!)

DocFetcher in Action

Docfetcher Alternatives?
Lookeen Copernic +indexed full-text +Free individual license +fast search results -Slower than DocFetcher X1 Search -slow indexing -slow document preview, no highlighted keywords -Free Trial, $50 purchase -extremely slow to load large pdfs -results just arranged alphabetically -poor relevancy ranking -no hit count -$15 per year

Questions? Chris Sweet Illinois Wesleyan University

Build Your Own Research Database Using DocFetcher Open Source Software

Similar presentations

Presentation on theme: "Build Your Own Research Database Using DocFetcher Open Source Software"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Build Your Own Research Database Using DocFetcher Open Source Software

Similar presentations

Presentation on theme: "Build Your Own Research Database Using DocFetcher Open Source Software"— Presentation transcript:

Similar presentations

About project

Feedback