Scanned Books: Annotator Training. Project Overview Untapped sources – 100,000+ scanned/OCRed books – Problem: how to cost-effectively extract Extraction.

Slides:



Advertisements
Similar presentations
Microsoft® Access® 2010 Training
Advertisements

Lesson 12 Getting Started with Excel Essentials
INSERT A SYMBOL Determine the Symbol to insert Determine the Symbol to insert Computer don’t just work with letters and numbers. In the global economy.
Chapter 2 Creating a Research Paper with Citations and References
FIRST COURSE M icrosoft Word. XP 2 Opening a New Document.
Lesson #3 Merge Duplicates, Edit Info, Establish Relationships.
SMART Tip Sheets Maryland February 2008 IGSR Technical Support: SMART Basic Navigation Menus/Toolbars Navigation Buttons/Table Actions Controls.
1 of 5 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2006 Microsoft Corporation.
Inventory Throughout this slide show there will be hyperlinks (highlighted in blue) follow the hyperlinks to navigate to the specified Topic or Figure.
Key Applications Module Lesson 12 — Word Essentials
ExitTOC Run & Route Directions 2003 Editing Run and Route Directions Edulog.nt v9.2 Use the buttons to navigate the training package First PagePreviousNextLast.
Ground-truthing Obituaries. Project Overview Untapped sources – Obituaries: hundreds of millions – Problem: how to cost-effectively extract Extraction.
Plex Training. 2 Course Objectives Learn how to Log on and Change Passwords in Plex Learn the Common Functions on the Control Panel Learn how to Log into.
Word Tutorial 3 Creating a Multiple-Page Report
If you are very familiar with SOAR, try these quick links: Principal’s SOAR checklist here here Term 1 tasks – new features in 2010 here here Term 1 tasks.
SMART Response Initial Set-up: Windows PC Teacher and Class Set-up Creating a Quiz, Test, or Survey Delivering an Assessment Reports Additional Question.
SMART Agency Tipsheet Staff List This document focuses on setting up and maintaining program staff. Total Pages: 14 Staff Profile Staff Address Staff Assignment.
Address Refer to Slide 2 for instructions on how to view the full-screen slideshow.Slide 2.
So – You want to learn how to put an article onto the state website. (Note: If you have not done so, you will need to review the web training provided.
ICP Kit 2011 HHC Data Entry Module The World Bank ICP Kit Training African Development Bank.
Using Excel for A – Z Analysis: ‘To Present’ items Jack Weinbender, Milligan College.
Washington Campus Compact New Time Log Database Note to users: You should use Internet Explorer to use this database. In other programs (i.e. Firefox)
Microsoft Office 2003 Illustrated Brief Document Creating a.
Chapter 2 Creating a Research Paper with References and Sources Microsoft Word 2013.
PowerPoint 1 The Basics 1. Save this file to your Apps Folder as YourLastName_PP1 2. Read each slide. 3. Complete each set of numbered directions.
AESuniversity User Tips & Tricks. Session Outline  Working with your Caseload Customers  Recording Services  Snapshot Tips  Searching Tips  Working.
Microsoft Access Get a green book. Page AC 2 Define Access Define database.
Learning With Computers I (Level Green) ©2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly.
MICROSOFT WORD 2007 INTERMEDIATE/ADVANCED. CREATE A NEW STYLE BASED ON A SELECTED TEXT HOME tab > STYLES group dialog launcher > at the bottom of the.
Word Chapter 2 Review. MLA and APA Two styles used today for documenting references.
Open the Goodyear Homepage Click on Teacher Tools.
So – You want to learn how to put a BLOG article onto the state website. (Note: If you have not done so, you will need to review the web training provided.
TxEIS Security A role-based solution October 2010.
Scanned Books: Annotator Training. Project Overview Untapped sources – 100,000+ scanned/OCRed books – Problem: cost-effective extraction Extraction tools.
Microsoft ® Outlook ® 2010 Training Mailbox management 1: Creating folders.
Nursing Library Training using Sunrise Press data.
Database Systems Microsoft Access Practical #3 Queries Nos 215.
Microsoft ® Office Project 2007 Training Linking Project tasks [Your company name] presents:
FIX Eye FIX Eye Getting started: The guide EPAM Systems B2BITS.
Grade Quick Training Level I Please do not log on.
Darek Sady - Respondus - 3/19/2003 Using Respondus Beginner to Basic By: Darek Sady.
Scanned Books: Annotator Training. Project Overview Untapped sources – 100,000+ scanned/OCRed books – Problem: cost-effective extraction Extraction tools.
ENDNOTE X7 ….. Bibliographies Made Easy RESEARCH SUPPORT DIVISION PERPUSTAKAAN SULTANAH ZANARIAH.
Typing and Formatting a Research Paper WORD 2013.
PRESERVING YOUR PAST AND YOUR PRESENT FOR THE FUTURE.
Microsoft Word Level 1 Michael Carco. Word Level 1 Agenda  Word Basics  Navigating in a Document  Inserting and Modifying Text  Creating and Modifying.
Work with Tables and Database Records Lesson 3. NAVIGATING AMONG RECORDS Access users who prefer using the keyboard to navigate records can press keys.
Introduction to KE EMu Unit objectives: Introduction to Windows Use the keyboard and mouse Use the desktop Open, move and resize a.
Introduction to KE EMu Unit objectives: Introduction to Windows Use the keyboard and mouse Use the desktop Open, move and resize a.
SMART Agency Tipsheet Facility List This document focuses on setting up facilities within an agency. Total Pages: 7 Facility Profile Contacts Operating.
On Line Microsoft Word Tutorial & Evaluation Begin.
Access Queries and Forms. Adding a New Field  To insert a field after you have saved your table, open Access, and open the table  It is easier to add.
Scanned Books: Annotator Training. Project Overview Untapped sources – 200,000+ scanned/OCRed books – Problem: cost-effective extraction Extraction tools.
ELISTING How to use eListing to conveniently and quickly file your personal property listing online. DEPARTMENT OF ASSESSMENTS King County To navigate.
IS OPEN THE LIBRARY Polaris ILS Patron Services 5.0 SP3 Training.
Scanned Books: Annotator Training. Project Overview Untapped sources – 100,000+ scanned/OCRed books – Problem: cost-effective extraction Extraction tools.
Chapter 11 Enhancing an Online Form and Using Macros Microsoft Word 2013.
Continuing Professional Development How to fill in your summary online
Shelly Cashman: Microsoft Word 2016
Instructions for COMET Users
CRM ASB Training Guide:
Vision for an Automatically Constructed FH-WoK
Chapter 2 Creating a Research Paper with References and Sources
Headcount Information Sessions
Adjudicator Instructions
Family History Merge Duplicates, Edit Info, Establish Relationships
Approving Time in Kronos Manager/Supervisor Reference Guide
Leslie Chavez and Will Bardé
Presentation transcript:

Scanned Books: Annotator Training

Project Overview Untapped sources – 100,000+ scanned/OCRed books – Problem: how to cost-effectively extract Extraction tools – Read and do form-fill type-in – Form-fill by clicking Copy/paste & correction Genealogy construction by inference – Synergistic Automated form-fill with user correction Manual specification of rules (FROntIER) Machine-learned extraction rules – Discover author-specified patterns (ListReader) – Parse sentences & match concepts (OntoSoar) – Learn from observing users work (GreenFIE-HD) Groundtruthing

3

4

Read and Do Form-fill Type-in 5

Form-fill: Click-only 6

Synergistic: Automatic Form-fill with Human Confirmation/Correction 7    

Demo Annotator Framework – Session initialization/save/termination – Page mode/magnification/navigation Form Fill-in – Person – Marriages – Children 8

Session Initialization/Save/Termination 9 Navigate to “dithers.cs.byu.edu/bookannotator” and login: Select page to annotate: You will be given several username-password pairs. Each is associated with an annotation job to do. Save / Continue to Next Page: When you have finished all assigned pages and have saved your work, you need do nothing more. To start another job, navigate to “dithers.cs.byu.edu/fhannotator”

Page Mode/Magnification/Navigation 10 go to previous page, next page magnify: zoom in and out mode bounding box scroll bars

Rules and Hints for All Forms Rules 1.Record only typeset information (nothing written by hand). 2.Do not fix errors (OCR, typesetting, misspellings, …). 3.Restore “real” end-of-line hyphens. 4.For items that cross page boundaries, extract complete records with the first page. (If no previous/subsequent page, extract partial records.) 5.Use click, Alt-click, or mouse-drag-and-click only (to the extent possible). Hints 1.For click and Alt-click, hold down Ctrl to add tokens to a field. (Sometimes a click doesn’t “take”; just click again.) 2.To remove a character that should be omitted, click on the character and use Backspace/Delete, and then Esc when done. 3.The field focus changes automatically; to change manually, use Tab and shift-Tab or click on a field and Esc. 4.A blank record or field at the end need not be deleted. 5.Be familiar with all Actions and use Keyboard Shortcuts. 11

Rules and Hints for All Forms Rules 1.Record only typeset information (nothing written by hand). 2.Do not fix errors (OCR, typesetting, misspellings, …). 3.Restore “real” end-of-line hyphens. 4.For items that cross page boundaries, extract complete records with the first page. (If no previous/subsequent page, extract partial records.) 5.Use click, Alt-click, or mouse-drag-and-click only (to the extent possible). Hints 1.For click and Alt-click, hold down Ctrl to add tokens to a field. (Sometimes a click doesn’t “take”; just click again.) 2.To remove a character that should be omitted, click on the character and use Backspace/Delete, and then Esc when done. 3.The field focus changes automatically; to change manually, use Tab and shift-Tab or click on a field and Esc. 4.A blank record or field at the end need not be deleted. 5.Be familiar with all Actions and use Keyboard Shortcuts. 12

Record only typeset information (nothing written by hand). 13 This, not that

Do not fix errors (OCR, typesetting, misspellings, …). 14

Restore “real” end-of-line hyphens. 15 Alt-click on “Latter-” in: “… Latter- day Saints” also yields “Latterday”. Restore “real” hyphen: “Latter-day” (click on field and edit) Alt-click on “McKen-” properly extracts all of “McKenzie”.

For items that cross page boundaries, extract complete record with the first page. (If no previous/subsequent page, extract partial records.) 16 page 1 page 2 record together with first page (page 418)

Rules and Hints for Person Form Rules 1.Extract every name on a page (except names that don’t designate people, e.g., “George Washington University”). 2.Extract non-names that designate a person not otherwise named (e.g., “Baby Ely”) 3.Get full name, including any punctuation, title(s) and suffix, but not non- name components associated with the name such as footnote references. Omit possessives (i.e., ’s). 4.Extract names as written. (Do not include implied maiden names and surnames). 5.Extract dates and place names only for birth and death events. 6.Get full date and place names, including punctuation. 7.In the absence of birth or death dates, use christening or burial dates and include designations (e.g., “bapt. 1854). Hints 1.For names and dates with punctuation, use Alt-click. 2.Use Ctrl to append name parts. 3.Use Keyboard Shortcut “a” to add a record. 17

Extract all names, but dates and places only for birth and death events 18 all names, even when a name appears more than once on a page this place but not these places this date but not this date

Extract every name on a page (non-person names). Omit non-name components. 19 not embedded reference markers not names used for internal designators not paragraph headers not names that are part of place or thing names

Get full name, including any punctuation, title(s) and suffix. Omit possessives. 20 Isaac Steel, Sr. Azubah Tully Joel M. Gloyd Chief Justice Waite David Vance CALL Rex – omit the surname, “Call” (not written with the name “Rex”) Arta (Shippee) Call – include the parentheses Jolayne Lois SILLITO

Do not include implied maiden names and surnames. 21 not “Mary Ely Lathrop” not “Mary Ely McKenzie” not “Gerard Lathrop McKenzie” just the names as written

Get full date and place names, including punctuation. 22 include date modifiers not date modifiers, not date explanations (do not include) days of the week (do not include) punctuation part of date (include) punctuation not part of date (do not include)

In the absence of birth or death dates, use christening or burial dates and include designations. 23

Rules and Hints for Marriages Form Rules 1.Record all marriages, both stated and implied (e.g., if A is mentioned as the son of B and C, then record B and C as being married). 2.For persons with multiple marriages, record each marriage in a separate record. 3.Extract names as specified for the person form---full name including punctuation, but only the name as written, not including implied maiden and surnames. Hints 1.Use Tab or “a” to add a new blank marriage record (when there is no MarriageDate). 2.“Spouse1” can be either the husband or the wife; record names in the order they appear in the document. 24

Record all marriages, both stated and implied. 25 stated implied names, as written (maiden name, married name not included)

For persons with multiple marriages, record each marriage in a separate record. 26 Christopher with three marriages & three records

Rules and Hints for Children Forms Rules 1.Sometimes the same surname appears for every child. Be sure to properly include each separate surname with each separate name. 2.Include families only if parents and at least one child appear in the given sequence of pages. 3.Record families that extend across page boundaries with the first page. Hints 1.When the focus is on a list field, a number key, n, adds n more blank fields to the list. Count the number of children and add the right number of fields first, then fill them in (e.g., if there are 5 children, enter 4 to add 4 more fields for the children; for 24 children, enter 9, then 9 again, and finally 5). 2.Use “a” to add a new record if you wish to jump out of the middle of the record you’re working on and start a new record. 27

Don’t forget children, not explicitly marked as “children”. 28

Be sure to properly include each separate surname with each separate name. 29 For “Michael Lawrence KIRCHGESSNER”, click here, here, and here. For “Deborah Joan KIRCHGESSNER”, click here, here, and here.

Record children with families that extend across page boundaries with the first page. 30 omit, if first page; otherwise record with parents on previous page omit, if last page; otherwise record children with this page no children, omit include

Good Luck! (our ancestors are waiting) 31