Crowdsourced Manuscript Transcription Ben Brumfield Roots and Routes 2012.

Slides:



Advertisements
Similar presentations
E-resources Collection Management Anna Grigson E-resources Manager.
Advertisements

Chapter 3 Application Software p. 6.
Providing Online Access to the HKUST University Archives: EAD to INNOPAC Sintra Tsang and K.T. Lam The Hong Kong University of Science and Technology 7th.
Oxford University Computing Services Research Information Management Tools for the Humanities.
Using EndNote for Citations and Reference Lists Leadership Roundtable Conference Monday, July 14, 2003 By David Heise.
Introduction to Information Technology v Session : 07 v Source : Shelly, Gary B. Discovering Computers (2004/2005/2006). Thomson Course Technology. Chapter.
GPPC Connections 2011 | November 6-8 | Las Vegas, NV SharePoint 101: An Introduction to Microsoft SharePoint 2010 Joseph Tews, MCITP, MCT Summit Group.
Copyright 2003 The McGraw-Hill Companies, Inc CHAPTER Application Software computing ESSENTIALS    
Libraries and Institutional Content Management Systems
Professor Michael J. Losacco CIS 1150 – Introduction to Computer Information Systems Application Software Chapter 3.
Professor Michael J. Losacco CIS 1110 – Using Computers Application Software Chapter 3.
Application Software.  Topics Covered:  Software Categories  Desktop vs. Mobile Software  Installed vs. Web-Based Software.
Copyright © Allyn & Bacon 2008 POWER PRACTICE Chapter 5 Administrative Software START This multimedia product and its contents are protected under copyright.
Digital Partnerships at San Francisco Public Library: So Many Suitors, So Little Time.
Office 2010 Word Ribbons Slides 1 and 2 are a look at the 7 basic ribbons in Word Slides 3 – 9 give descriptions of some of the functions available.
Build a CMS Website. The topics this chapter covers are: What is CMS ? What you can do with CMS The benefits and disadvantages of using a content management.
Web Content Management Systems. Lecture Contents Web Content Management Systems Non-technical users manage content Workflow management system Different.
1 Chapter 6 Understanding Computers, 11 th Edition Software Ownership Rights Software license: agreement, either included in a software package or displayed.
Project Proposal: Academic Job Market and Application Tracker Website Project designed by: Cengiz Gunay Client: Cengiz Gunay Audience: PhD candidates and.
Sam Kalb Scholarly Communication Services Coordinator QUEEN’S.
Journal Hosting Sigrid Kelsey Director, Communications & Publications Journal Editor, Former Web Development Coordinator.
Implementing CMS: Academic David Bietila George Washington University Jonathan M. Smith The Catholic University.
Kent County’s Quest, or, The Evergreen Fairytale Karen Collier, Public Services Librarian Andrea Buntz Neiman, Technical Services Librarian.
Www. ScoutsOnline.co.uk On-Brand Websites for Scout Groups.
Business Studies Information & Communication Technology.
Open Source Software Sustainability: A Case Study of Indiana University's Variations Software Jon W. Dunn, Phil Ponella, and Robert H. McDonald Indiana.
What does it mean to “support TEI” for manuscript transcription? Ben Brumfield TEI 2012.
Item Web 2.0 application relevant to teacher’s work.
Describing Collections So Visitors Can Find Them: A sampling of ways to get materials on-line Amanda Focke, Rice University
Managing your References Sue Bird Bodleian Bio- & Environmental Sciences October 2010.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Class Instructor Name Date. Classroom Tips Class Roster – Please Sign In Class Roster – Please Sign In Internet Usage Internet Usage –Breaks and Lunch.
Web Content Management System Access October 14, 15, 16 - Halifax, Nova Scotia Developing a System for Managing Web Content York Libraries Content.
The Ontario Name Index (TONI): An Introduction Ontario Genealogical Society 29 May 2015.
Wordpress Ben Mulpeter. What is wordpress?  Wordpress is a free Content management system (CMS)  It allows free tools to help design your website and.
Electronic Theses at Rhodes University presented by Irene Vermaak Rhodes University Library National ETD Project CHELSA Stakeholder Workshop 5 November.
Options for digital delivery Record Society Conference, April 19 th 2007 Bruce Tate Project Manager British History Online.
Tutorial 1: Browser Basics.
Administrative Software Chapter 7 Teaching and Learning with Technology.
Breakouts. Penguins: Skunks: Cacti: Beetles: Classroom A - Suzanne Classroom C - Chris Lecture Hall 2 - Connie Ward Lecture Hall - Marie (Theme: Content.
1 Application Software What is application software?  Programs that perform specific tasks for users.
Meet and Confer Rule 26(f) of the Federal Rules of Civil Procedure states that “parties must confer as soon as practicable - and in any event at least.
Basics of Drupal for Admins Rochelle Terman
Teaching and Learning with Technology to edit Master title style  Allyn and Bacon 2002 Teaching and Learning with Technology lick to edit Master title.
PAN-European Exploitation of the Results of the Libraries Programme - EXPLOIT German Libraries Institute Berlin EXPLOIT 1 Electronic library materials.
May 6, 2015 Session 10:00am – 12 Noon. Agenda 2 TopicPresenter / FacilitatorTimeframe  Welcome Michael Chen5 min.  CA Web Publishing Richard Lehman55.
I Copyright © 2007, Oracle. All rights reserved. Module i: Siebel 8.0 Essentials Training Siebel 8.0 Essentials.
Wiki’s Collaborative tools for information workers within a Web 2.0 environment Ina Smith & Ivy Segoe Dept. of Library Services, University of Pretoria.
Databases for Research. Standard 4 – Research and Reasoning Gather information from a variety of sources; analyze and evaluate the quality and relevance.
+ The Use of Databases in the Instructional Program Increasing Rigor and Inquiry Throughout the Curriculum Donna Dick, Jacob Gerding, and Michelle Phillips.
Ask a Librarian: The Role of Librarians in the Music Information Retrieval Community Jenn Riley, Indiana University Constance A. Mayer, University of Maryland.
TPEN: Transcription for Paleographic and Editorial Notation Funded by the Andrew W. Mellon Foundation and The National Endowment for the Humanities Initial.
Collection Management Systems
Electronic Theses and Dissertations: The bepress Approach Ben Hermalin Interim Dean, Haas School of Business, UC Berkeley & Co-Founder, bepress.
Overview In this tutorial you will: learn what an e-portfolio is learn about the different things e-portfolios may be used for identify some options for.
Comparing Institutional Repository Software Pampering Metadata Uploaders Craighton Hippenhammer Digital Initiatives Librarian Olivet Nazarene University.
Getting Started Telligent or SharePoint (or Hybrid)?
Developing a Dark Archive for OJS Journals Yu-Hung Lin, Metadata Librarian for Continuing Resources, Scholarship and Data Rutgers University 1 10/7/2015.
CENTRAL/WESTERN MASSACHUSETTS AUTOMATED RESOURCE SHARING Digitization GOALS & THEIR LOGISTICS Michael J. Bennett Digital Initiatives Librarian C/WMARS,
SDL Tridion Presentation Frameworks
A Generic Toolkit for Electronic Editions of Medieval Manuscripts
Information & Communication Technology
BASIC INFORMATION ABOUT DATABASE MANAGEMENT SOFTWARE
Content Management System
Teaching and Learning with Technology
Basics of Drupal for Researchers
Chapter 4 Application Software
Business Intelligence
New Platform to Support Digital Humanities in the Czech Republic
Presentation transcript:

Crowdsourced Manuscript Transcription Ben Brumfield Roots and Routes 2012

Not just crowdsourcing... Collaborative work Off-site solo work Private work

Not just manuscripts... Maps Textiles Music Flawed OCR

Not just transcription... Indexing Editing Identification Counting seals on Arctic ice caps.

What it isn't We'll concentrate on web-based tools for extracting text from images, not addressing: Oral History Video Audio Transcription Image Manipulation Transcription/Facsimile Display Tools exist for these tasks, nevertheless.

Break What materials are you working with outside of modern, printed books and websites?

Origins (Approaches) Two Approaches and one Dead End Indexing Editing Tagging

Indexing Structured Data Extracts from Text vs. Representing Text Databases for Search and Analysis Granular Quality Control Gamification

Editing Books, Diaries, Letters, Articles Representing Text Traditional Editorial Workflow Digital or Print Editions

Tagging Too small Too imprecise

Origins (Traditions) OCR Correction Documentary Editing Genealogy Natural Science Astronomy Split this into 5 slides

Online Tools Recent (none older than 2005) Influenced by origin Still pretty raw Most require tech expertise for set-up and customization All require making trade-offs

Lab Session 1: Breadth NYPL What's on the Menu Indexing Wikisource Editing

Selection Factors Source Material Transcript Purpose Organizational/Project Management Fit Financial and Technical Resources

Source Material Evaluating your source material: Is it of interest to anyone else? Is it under copyright? Does it need restricted access? Is it composed of documents or records? Is it non-textual? How complex is the layout? How important is that layout?

Purpose How will you be using the transcribed data? Traditional print editions Searchable online editions Do you want to use the system to analyze the text? How do you want to analyze the text? Is public engagement a goal? Should the transcripts be open?

Organizational/Project Management Fit How important is traditional editorial workflow? Will you rely on volunteers? How will you motivate them? What is the duration of the project? Is there a "final version"? Is TEI a mandate?

Financial and Technical Resources Do you have or need: System administrators to install non-hosted software? Money to pay hosting costs? Programming skills to customize a tool? Money to pay programmers for customization? Support for on-going costs to keep the site running, however small?

Lab Session 2: Markup Options FromThePage TranscribeBentham

Technical Questions to Answer Where are the images now? How do images get into the system? How do transcripts get out of the system? How mature is the underlying technology? How configurable is the technology? How does the system work with the public face of your project? Where does the metadata live? Who will maintain this? How long? How many sites are using this system?

Wikisource Pro: Mediawiki plus its add-on modules (e.g. print-on-demand, export). Wikimedia community. Incredibly mature. Con: Wikimedia policy. Public editing. Limited mark-up.

Bentham Transcription Desk Pro: MediaWiki is very mature. TEI Toolbar (can also be used on other systems) Deployed outside original project. Con: Development efforts halted.

Scripto Pro: Team at CHNM has a great track record. Your CMS is your public face. MediaWiki is very mature. Deployed and under active development. Con: Your CMS handles all metadata. Mark-up is extremely limited.

FromThePage Pro: Designed for intensive editing and indexing. Semantic mark-up and analysis. Hosting available. Con: Single developer (me). No TEI mark-up.

Islandora TEI Editor Caveat: I don't know much about this tool or this team. Based on Drupal and Fedora Supports TEI via friendly interface Many Drupal-based projects considering it.

T-PEN Caveat: I don't know much about this tool. Designed for medieval manuscripts. Supports TEI natively. Line-by-line interface. Hosted version available.

Scribe Pro: Excellent for complex layout or non- documentary transcription. Zooniverse team is large, well-funded, experienced. Configurable. Con: No automated tool for loading images or viewing transcript database (yet!) No concept of image-as-a-text.

Pybossa Caveat: I don't know much about this tool or this team. Open Knowledge Foundation's crowdsourcing task management tool. Designed for tabular data. Google Spreadsheet data entry. Extremely young.

TextLab Caveat: I don't know much about this tool or this team. Melville Electronic Library. Direct addition of TEI tags to image.

Lab Session 3: Configuration Scribe Old Weather, What's the Score, Development deployments

Find me Ben Brumfield