Semiautomatic Generation of Resilient Data Extraction Ontologies Yihong Ding Data Extraction Group Brigham Young University Sponsored by NSF.

Slides:



Advertisements
Similar presentations
Afghanistan. Map of Afghanistan Afghanistans Flag.
Advertisements

Four Natural Regions of Texas
A Topological Interpretation for Mass Transit Network Connectivity July 8, 2006 Chulmin Jun, Seungjae Lee, Hyeyoung Kim & Seungil Lee The University of.
Semiautomatic Generation of Data-Extraction Ontologies Master’s Thesis Proposal Yihong Ding.
By mason bell Country project  Major religion: Islam  Government: Unitary presidential republic.
©2013 by Gospel Publishing House, 1445 N. Boonville Ave., Springfield, Missouri. All rights reserved. Permission to replicate for church use only and may.
AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University
Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF.
Domain-Independent Data Extraction: Person Names Carl Christensen and Deryle Lonsdale Brigham Young University
Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF.
Query Rewriting for Extracting Data Behind HTML Forms Xueqi Chen Department of Computer Science Brigham Young University March, 2003 Funded by National.
6/17/20151 Table Structure Understanding by Sibling Page Comparison Cui Tao Data Extraction Group Department of Computer Science Brigham Young University.
Semiautomatic Generation of Resilient Data-Extraction Ontologies Yihong Ding Data Extraction Group Brigham Young University Sponsored by NSF.
Two-Level Semantic Annotation Model BYU Spring Conference 2007 Yihong Ding Sponsored by NSF.
Semiautomatic Generation of Resilient Data-Extraction Ontologies Yihong Ding Data Extraction Group Brigham Young University Sponsored by NSF.
Thesis Defense Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department.
Conceptual-Model-Based Web Data Extraction by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
A New Web Semantic Annotator Enabling A Machine Understandable Web BYU Spring Research Conference 2005 Yihong Ding Sponsored by NSF.
1 Extracting RDF Data from Unstructured Sources Based on an RDF Target Schema Tim Chartrand Research Supported By NSF.
Semi-Automatically Generating Data-Extraction Ontology Yihong Ding March 6, 2001.
Native Americans Land ahoy!! Day 4 Ms. Angalet’s 4 th Grade Class.
Toward Making Online Biological Data Machine Understandable Cui Tao Data Extraction Research Group Department of Computer Science, Brigham Young University,
1 Ontology Based Extraction of RDF Data from the World Wide Web Tim Chartrand Masters Thesis Research Supported By NSF.
Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.
Generating Data-Extraction Ontologies By Example Joe Zhou Data Extraction Group Brigham Young University.
Semi-Automatic Generation of Mini-Ontologies from Canonicalized Relational Tables Chris Hathaway Supported by NSF.
Recognizing Records from the Extracted Cells of Genealogical Microfilm Tables Kenneth Martin Tubbs Jr. A Thesis Submitted to the Faculty of Brigham Young.
Query Rewriting for Extracting Data Behind HTML Forms Xueqi Chen Department of Computer Science Brigham Young University March 31, 2004 Funded by National.
Saboor Shirzad Ministry of Agriculture, Irrigation and Livestock (MAIL) Government of Islamic Republic of Afghanistan Social Transfer for the Fight against.
AFGHANISTAN. Afghanistan: Geography Located in Central Asia Bordered by six countries: ● ● China ● ● Iran ● ● Pakistan ● ● Tajikistan ● ● Turkmenistan.
Afghanistan Geography Nick Guay. Terrain and agriculture Afghanistan is situated in the interior of Asia, lying on the Iranian Plateau. It is a country.
Viva Mexico! By: Mrs. Simmons. Facts about Mexico Mexican money is called peso. Their language is Spanish. Their Capital is Mexico City.
Thesis Proposal Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department.
Welcome to the Virtual Texas Tour Please remember to fasten your seatbelt! Enter and Exit through the aisle! Take notes on the beautiful sites you see!
Stephany Santoyo Claudia Lopez Social Justice Blk A/1 April 20, 2010.
Population: approximately 31 million -Capital city: Kathmandu -population million -Landlocked -Land area of sq km -16% arable land -11,680.
Geography’s Five Themes CORE CONCEPTS 1.2. Essential Question: What are Themes of Geography?  Geographers use five different themes or ways of thinking.
Afghanistan Background.  Afghanistan is a land-locked country, a bit smaller than the state of Texas.  A lot of the terrain is rugged mountains.
Presenters Thuthane Cubai & Rosa Macondzo. Botswana.
Dimitrios Skoutas Alkis Simitsis
BSBCMM401A Student Name:Xiaoyan Zhao Student ID:C61965.
Afghan Culture George, Josh, and Dan. Table of Contents 1. Geography 2. People 3. Languages 4. Religion 5. Rural Life 6. Sports and Pastimes 7. Education.
39°N 22°E  Where it is on the map The climate in Greece is mainly Mediterranean made up of hot and dry summers, and mild wet winters.
Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.
West Virginia Alabama.  Capital  Charleston  Became a State  June 20, 1863 (35 th )  Location  Appalachia/southern United States  Products 
The Regions of the United States
Evaluation of Information Retrieval Systems Xiangming Mu.
Agenda 4/13/15 (Africa) Please open your agenda and write the following: Bellringer Asia Test Study Guide Notes: Egypt HW: Missing/Make-up work.
  15,000 residents  Note: Los Angeles has 3,900,000 residents Population.
Where in the World? You are in a country located on ancient trade routes, both north-south and east-west. This country is known for its fierce fighters,
Ontology Based Annotation of Text Segments Presented by Ahmed Rafea Samhaa R. El-Beltagy Maryam Hazman.
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
Corpus Exploitation from Wikipedia for Ontology Construction Gaoying Cui, Qin Lu, Wenjie Li, Yirong Chen The Department of Computing The Hong Kong Polytechnic.
OVERVEIW OF AFGHANISTAN Abdul Matin E Mohammad Zubair B
Geography’s Five Themes
Afghanistan 1. Afghanistan.
Hello everybody ...
Its People, Places, and History
Afghanistan Geography.
An Introduction to Afghanistan
Wellington Capital By Nikhil Patel
Human Rights 1. Afghanistan 11/11/2018.
By mason bell Country project
Where in the World? You are in a country located on ancient trade routes, both north-south and east-west. This country is known for its fierce fighters,
By mason bell Country project
Afghanistan Cultural Awareness
Ancient Greece Geography.
Combining Keyword and Semantic Search for Best Effort Information Retrieval  Andrew Zitzelberger 1.
Afghanistan’s Geography
Presentation transcript:

Semiautomatic Generation of Resilient Data Extraction Ontologies Yihong Ding Data Extraction Group Brigham Young University Sponsored by NSF

Data Extraction Ontology Goal: extract data from web pages Components concepts relations between the concepts participation constraints Resilient Difficulty: manual ontology generation is costly

Generation Procedure Knowledge Sources Data-ExtractionOntology Knowledge Selection Processing Extraction Processing Database TrainTest

Knowledge Collection Assumptions about knowledge base general contains meaningful relationships pre-existing XML or easy to transfer to XML Current input Mikrokosmos ontology [Mik] auxiliary data frame library

Selection of Concepts PROCEDURE ConceptSelection(Tdoc, Kbase) SourceDoc = Parse(Tdoc); PrimarySelectedConceptsList = MikroSelection(M-Ontology); SecondarySelectedConceptsList = DataFrameSelection(DF-Library); ConflictHandling(); SelectedSubgraphGeneration(); MANY ISSUES selection strategies, conflict resolution, …

Basic Selection Strategy Select from Mikrokosmos Ontology Afghanistan smaller than Texas. Area: 648,000 sq. km. Capital--Kabul, Other cities--Kandahar Mazar- e-Sharif Konduz Terrain: Landlocked; mostly mountains and desert. Climate: Dry, with cold winters and hot summers. Population:17.7 million. Agriculture: Wheat, corn, barley,rice, cotton, fruit, nuts, karakul pelts, wool, mutton.

Basic Selection Strategy Select from Mikrokosmos Ontology concept names and their synonyms Afghanistan smaller than Texas. Area : 648,000 sq. km. Capital --Kabul, Other cities--Kandahar Mazar-e-Sharif Konduz Terrain: Landlocked; mostly mountains and desert. Climate: Dry, with cold winters and hot summers. Population :17.7 million. Agriculture:Wheat, corn, barley,rice, cotton, fruit, nuts, karakul pelts, wool, mutton.

Basic Selection Strategy Select from Mikrokosmos Ontology concept names and their synonyms concept values and their synonyms Afghanistan smaller than Texas. Area : 648,000 sq. km. Capital --Kabul, Other cities--Kandahar Mazar-e-Sharif Konduz Terrain: Landlocked; mostly mountains and desert. Climate: Dry, with cold winters and hot summers. Population :17.7 million. Agriculture:Wheat, corn, barley,rice, cotton, fruit, nuts, karakul pelts, wool, mutton.

Basic Selection Strategy Select from Mikrokosmos Ontology concept names and their synonyms concept values and their synonyms Select from Data Frame Libraries Afghanistan smaller than Texas. Area: 648,000 sq. km. Capital--Kabul, Other cities--Kandahar Mazar- e-Sharif Konduz Terrain: Landlocked; mostly mountains and desert. Climate: Dry, with cold winters and hot summers. Population:17.7 million. Agriculture: Wheat, corn, barley,rice, cotton, fruit, nuts, karakul pelts, wool, mutton.

Basic Selection Strategy Select from Mikrokosmos Ontology concept names and their synonyms concept values and their synonyms Select from Data Frame Libraries extract result based on the data frames Afghanistan smaller than Texas. Area: 648,000 sq. km. Capital--Kabul, Other cities--Kandahar Mazar- e-Sharif Konduz Terrain: Landlocked; mostly mountains and desert. Climate: Dry, with cold winters and hot summers. Population:17.7 million. Agriculture: Wheat, corn, barley,rice, cotton, fruit, nuts, karakul pelts, wool, mutton.

Document-Level Conflict Afghanistan smaller than Texas. Area: 648,000 sq. km. Capital --Kabul, Other cities--Kandahar Mazar-e-Sharif Konduz Terrain: Landlocked; mostly mountains and desert. Climate: Dry, with cold winters and hot summers. Population:17.7 million. Agriculture: Wheat, corn, barley,rice, cotton, fruit, nuts, karakul pelts, wool, mutton.

Concept-Level Conflict Afghanistan smaller than Texas. Area : 648,000 sq. km. Capital--Kabul, Other cities--Kandahar Mazar-e-Sharif Konduz Terrain: Landlocked; mostly mountains and desert. Climate: Dry, with cold winters and hot summers. Population : 17.7 million. Agriculture: Wheat, corn, barley,rice, cotton, fruit, nuts, karakul pelts, wool, mutton.

Relation Retrieval Theoretical solution all paths in the subgraph too expensive: NP-Complete Heuristic solution find the shortest path between any two nodes set a threshold distance

Participation Constraints Afghanistan smaller than Texas. Area: 648,000 sq. km. Capital—Kabul, Other cities--Kandahar Mazar-e-Sharif Konduz Terrain: Landlocked; mostly mountains and desert. Climate: Dry, with cold winters and hot summers. Population: 17.7 million. Agriculture: Wheat, corn, barley,rice, cotton, fruit, nuts, karakul pelts, wool, mutton. CapitalCity [1:1] IsA.CITY.PartOf Nation [1:1]

Participation Constraints (cont.) Afghanistan smaller than Texas. Area: 648,000 sq. km. Capital--Kabul, Other cities --Kandahar Mazar-e-Sharif Konduz Terrain: Landlocked; mostly mountains and desert. Climate: Dry, with cold winters and hot summers. Population: 17.7 million. Agriculture: Wheat, corn, barley,rice, cotton, fruit, nuts, karakul pelts, wool, mutton. City [1:1] PartOf Nation [1:*]

Performance Evaluation Speed of generation Precision and recall of the generation process Precision and recall of the generated ontology

Generation Time with Distance Threshold

P&R of Generation Process

Conclusion Data Extraction Ontology generated Knowledge sources exploited Many issues applied Many more to explore