Presentation is loading. Please wait.

Presentation is loading. Please wait.

(Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.

Similar presentations


Presentation on theme: "(Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley."— Presentation transcript:

1 (Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley

2 Overview Big Picture Diagram Details & Demo Current Status and Expectations

3 Fe6: 1. Prepare 2. Extract 3. Merge&Split 4. Check&Correct 5. Generate 6. Convert FROntIER ListReader OntoSoar GreenFIE

4 1. Prepare {

5 2. Extract

6 3. Merge & Split Person Couple ParentsWithChildren

7 4. Check & Correct

8 5. Generate

9 6. Convert

10 Highlighted Results

11 Fe6: 1. Prepare 2. Extract 3. Merge&Split 4. Check&Correct 5. Generate 6. Convert FROntIER ListReader OntoSoar GreenFIE COMET

12 Precision, Recall, F-Measure Results PrecisionRecallF-Measure FROntIER Person0.860.660.75 Couple1.000.400.57 ParentsWithChildren0.89 GreenFIE Person0.940.830.88 Couple1.000.900.95 ParentsWithChildren1.000.780.86 OntoSoar Person0.67 Couple0.750.300.43 ParentsWithChildren1.000.440.62

13 Fe6: 1. Prepare 2. Extract 3. Merge&Split 4. Check&Correct 5. Generate 6. Convert FROntIER ListReader OntoSoar GreenFIE Feedback Loop Automated Check (Fix & Warn) “Sanity” Check Name, Date, Place Standardization Administrative and Batch-Processing Management System COMET

14 Fe6: 1. Prepare 2. Extract 3. Merge&Split 4. Check&Correct 5. Generate 6. Convert FROntIER ListReader OntoSoar GreenFIE Feedback Loop Automated Check (Fix & Warn) “Sanity” Check Name, Date, Place Standardization Administrative and Batch-Processing Management System Bootstrapping, Ever-learning, Feedback Loop Extraction Tools: Layout Machine Learning Non-English Languages COMET

15 Summary (Semi)automatic Extraction Green, Ever-Learning System (improves with use) Status: Extraction Tools (tech-transfer of academic prototypes) Thin-Line Ensemble Prototype (being thickened)

16 Summary (Semi)automatic Extraction Green, Ever-Learning System (improves with use) Status: Extraction Tools (tech-transfer of academic prototypes) Thin-Line Ensemble Prototype (being thickened) BYU Data Extraction Research Group www.deg.byu.edu


Download ppt "(Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley."

Similar presentations


Ads by Google