Machines learnt how to understand tables. What happens next will shock you. Welcome to the PhD dissertation defense of Varish Mulwad!

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Google News Personalization: Scalable Online Collaborative Filtering
Understanding Tables on the Web Jingjing Wang. Problem to Solve A wealth of information in the World Wide Web Not easy to access or process by machine.
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Learning to Map between Ontologies on the Semantic Web AnHai Doan, Jayant Madhavan, Pedro Domingos, and Alon Halevy Databases and Data Mining group University.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
OntoBlog: Informal Knowledge Management by Semantic Blogging Aman Shakya 1, Vilas Wuwongse 2, Hideaki Takeda 1, Ikki Ohmukai 1 1 National Institute of.
Semantic Search Jiawei Rong Authors Semantic Search, in Proc. Of WWW Author R. Guhua (IBM) Rob McCool (Stanford University) Eric Miller.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
Scalable Text Mining with Sparse Generative Models
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching ER 2012 October 2012, Florence.
India Research Lab Auto-grouping s for Faster eDiscovery Sachindra Joshi, Danish Contractor, Kenney Ng*, Prasad M Deshpande, and Thomas Hampp* IBM.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Supporting the Automatic Construction of Entity Aware Search Engines Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti Dipartimento di Informatica.
LOD 123: Making the semantic web easier to use Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Linked DataTables Automatically Generating Linked Data from Tables Varish Mulwad University of Maryland, Baltimore County November 15, 2011.
Tables to Linked Data Zareen Syed, Tim Finin, Varish Mulwad and Anupam Joshi University of Maryland, Baltimore County
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Bug Localization with Machine Learning Techniques Wujie Zheng
-1- Philipp Heim, Thomas Ertl, Jürgen Ziegler Facet Graphs: Complex Semantic Querying Made Easy Philipp Heim 1, Thomas Ertl 1 and Jürgen Ziegler 2 1 Visualization.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
LOD for the Rest of Us Tim Finin, Anupam Joshi, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 15 March 2012
Question Answering over Implicitly Structured Web Content
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Web-site Building Methodologies Current Research.
The Semantic Web: there and back again
T2LD – An automatic framework for extracting, interpreting and representing tables as linked data Varish Mulwad Master’s Thesis Defense Advisor: Dr. Tim.
ISWC2007, Nov. 14. Discovering simple mappings between Relational database schemas and ontologies Wei Hu, Yuzhong Qu {whu,
Semi-Automatic Quality Assessment of Linked Data without Requiring Ontology Saemi Jang, Megawati, Jiyeon Choi, and Mun Yong Yi KIRD, KAIST NLP&DBPEDIA.
Government Linked Data Tables Automatically Generating Government Linked Data from Tables Varish Mulwad University of Maryland, Baltimore County.
Using linked data to interpret tables Varish Mulwad September 14,
Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.
Using linked data to interpret tables Varish Mulwad, Tim Finin, Zareen Syed and Anupam Joshi University of Maryland, Baltimore County November 8, 2010.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
CityStateMayorPopulation BaltimoreMDS.C.Rawlings-Blake637,418 SeattleWAM.McGinn617,334 BostonMAT.Menino645,169 RaleighNCC.Meeker405,791 We are laying a.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
An Ontological Approach to Financial Analysis and Monitoring.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Linked Data inferringsemanticstables Generating Linked Data by inferring the semantics of tables Varish Mulwad University of Maryland, Baltimore.
Linked Data for the Rest of Us Tim Finin, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 12 January 2012
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Efficient Semantic Web Service Discovery in Centralized and P2P Environments Dimitrios Skoutas 1,2 Dimitris Sacharidis.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Coached Active Learning for Interactive Video Search Xiao-Yong Wei, Zhen-Qun Yang Machine Intelligence Laboratory College of Computer Science Sichuan University,
Wen Chan 1 , Jintao Du 1, Weidong Yang 1, Jinhui Tang 2, Xiangdong Zhou 1 1 School of Computer Science, Shanghai Key Laboratory of Data Science, Fudan.
Kyriaki Dimitriadou, Brandeis University
Wikitology Wikipedia as an Ontology
Data Integration for Relational Web
Property consolidation for entity browsing
[jws13] Evaluation of instance matching tools: The experience of OAEI
A Graph-Based Approach to Learn Semantic Descriptions of Data Sources
Presentation transcript:

Machines learnt how to understand tables. What happens next will shock you. Welcome to the PhD dissertation defense of Varish Mulwad!

2 TABEL TABEL – Domain Independent and Extensible Framework to Infer the Semantics of Tables Varish Mulwad Ph.D. Dissertation Defense Adviser: Dr. Tim Finin January 8, 2015

3

Zareen Syed, Tim Finin, Varish Mulwad, and Anupam Joshi, "Exploiting a Web of Semantic Data for Interpreting Tables", In 2nd Web Science Conference (WebSci 2010), Raleigh, NC, USA, Apr Semantics of a Table 4 NameTeamPositionHeight Michael Jordan ChicagoShooting Guard 1.98 Allen IversonPhiladelphiaPoint Guard1.83 Yao MingHoustonCenter2.29 Tim DuncanSan AntonioPower Forward 2.11 NationalBasketballAssociationTeams n Map literals as property values playsFor

Semantics of a Table 5 NameTeamPositionHeight Michael Jordan ChicagoShooting Guard 1.98 Allen IversonPhiladelphiaPoint Guard1.83 Yao MingHoustonCenter2.29 Tim DuncanSan AntonioPower Forward 2.11 Linked Data tab:cell_01 a tab:ColumnHeader; tab:cellLabel "Name"^^xsd:String; tab:columnIndex "1"^^xsd:Integer; tab:valueType dbpedia-owl:BasketballPlayer. tab:cell_11 a tab:DataCell; tab:cellLabel "Michael Jordan"^^xsd:String; tab:columnIndex "1"^^xsd:Integer; tab:rowIndex "1"^^xsd:Integer; tab:entity dbpedia:Michael_Jordan. All this in a completely automated way!

6 TABEL TABEL – Domain Independent & Extensible Framework to Infer the Semantics of Tables

Thesis Statement It is possible to generate high quality linked data from tables by jointly inferring the semantics of column headers, values (string and literal) in table cells, and relations between columns augmented with background knowledge from open data sources such as the Linked Open Data cloud. 7

Contributions o Probabilistic Graphical Model to jointly infer the semantics + a novel inference technique Semantic Message Passing o An proof of concept user–interactive application to generate meta-analysis reports automatically o Develop & Explore Human in the Loop paradigm o A novel technique to generate candidate properties from literal values 8

9 Why How Evaluatio n Applicatio n Wrap up

Tables are everywhere! 154 million high quality relational tables on the web ~400,000 CSVs on data.gov Healthcare, Financial and other domains 10

The Semantic Web & the Web Spreadsheets/CSVs to RDF/OWL 11

Evidence Based Medicine 12 Combine: All studies that compare organic milk v/s grass fed cow milk Produce Unified report: Organic Milk is better! Meta – Analysis report Correlation between Cardio vascular risk factors and Venous Thrombosis Duration of proton pump inhibitors as first line of treatment for Helicobacter pylori eradication

Tables are valuable 13

Meta – Analysis: Today 14 Correlation between Cardio vascular risk factors and Venous Thrombosis Initial Search >> 1949 studies 22 Final # of studies selected >> 22! 1 - W. Ageno, C. Becattini, T. Brighton, R. Selby, and P. W. Kamphuisen,”Cardiovascular risk factors and venous thromboembolism a meta-analysis,” Circulation, vol. 117, no. 1, pp. 93–102, Keyword based search Initial search yields large # of results Manually filter out irrelevant results

Not restricted to healthcare … 15

Related Work Databases & Spreadsheets to RDF: Existing solutions: Largely manual or semi-automatic Number of Ontologies, classes, relations Automatic solutions: “Row as RDF node”; local mappings No links to existing classes, properties, entities 16

Related Work Semantics of Table: Infer semantics for only parts of the table [header cells; relation between headers; data cell values or a combination of the two] Fail to generate RDF Linked Data representation Poor support for literals 17

Related Work Limaye et al. [Sep. 2010] [Soumen Chakrabarti’s IIT-B] RDF Linked Data representation Literal values 18 Knoblock et al. [May 2012] [Craig Knoblock’s USC – ISI] Largely focuses on header cell semantics & relation between headers Requires initial user input before automatic predictions from the system Venetis et al. [Sep. 2011] [Alon Halevy’s Google] Column header and Relation semantics Literal values; RDF Linked Data

What TABEL brings to the “table” Infers the complete semantics of a table Generates a RDF Linked Data representation Supports tables with different structures over a variety of domains [medical tables] Incorporates user feedback to improve the quality of inferred semantics Infers the semantics of literal values* [numerical values] 19

20 Why How Evaluatio n Applicatio n Wrap up

TABEL TABEL – TABle Extracted as Linked Data 21 DECODE AAD Pre-processing modules Query and Rank 1 Generate RDF Linked Data Verify (optional) Store / Publish Joint Inference NameTeamPositionHeight Michael JordanChicagoShooting Guard1.98 Allen IversonPhiladelphiaPoint Guard1.83 Yao MingHoustonCenter2.29 Tim DuncanSan AntonioPower Forward2.11 Your module here! Varish Mulwad, Tim Finin and Anupam Joshi, “A Domain Independent Framework for Extracting Linked Semantic Data from Tables”, In Search Computing, ISBN , vol. 7538, 2012.

Query – Candidate Entities 22 Chicago + Context {Team} + Context {Michael Jordan, Shooting Guard, 1.98} 1. Chicago 2. Judy_Chicago 3. Chicago_Bulls 1. Chicago_Bulls 2. Chicago 3. Judy_Chicago 1. Chicago 2. Judy_Chicago 3. Chicago_Bulls Re-rank – Classifier (String Similarity, Popularity) Varish Mulwad, Tim Finin, Zareen Syed and Anupam Joshi, “Using linked data to interpret tables”, In 1st Int. Workshop on Consuming Linked Data, held at the 9th Int. Semantic Web Conf. (ISWC 2010), Shanghai, China, Nov

Query – Candidate Classes 23 Class Instance 1. Chicago_Bulls 2. Chicago 3. Judy_Chicago {Place,City, WomenArtist, LivingPeople, NationalBasketballAssociationTeam s } {Place, PopulatedPlace, Film, NationalBasketballAssociationTea ms, …, … } {…………………………………… ………………………. } Place, City, WomenArtist, LivingPeople, NationalBasketballAssociationTeams, PopulatedPlace, Film …. Team Chicago Philadelphia Houston San Antonio

Query – Candidate Relations 24 Name Michael Jordan Allen Iverson Yao Ming Tim Duncan Team Chicago Philadelphia Houston San Antonio 1. Chicago_Bulls 2. Chicago 3. Judy_Chicago 1. Michael_Jordan 2. Michael_I_Jordan 3. Jordan_River playsFor livesIn …. …… playsFor, livesIn,born, …….

Query – Literals * [numeral data] 25 Team Chicago Philadelphia Houston San Antonio Place, City, WomenArtist, LivingPeople, NationalBasketballAssociati onTeams, PopulatedPlace, Film Chicago

Query – Literals 26 ?

NumKB 27 Population Income Height Person BasketBallPlayer (?) NumKB:  Encodes distributional features for Linked Data properties  Allows query using literal values (and optionally property name)  Provides information on property domains 250,

Identify property domains 28 seatingCapacity Get Instances Get Instance Types Order by frequency Queen's_Film_Theatre Restaurant_Gordon_Ram say M&T_Bank_Stadium Theatre Stadium Restaurant 1. seatingCapacity_Stadium [1] 2. seatingCapacity_Theatre [0.70] 3. seatingCapacity_Restaurant [0.57]

Identify property domain duplet values 29 Property, domain [seatingCapacity,Stadiu m] Get Property Values Sort; Trim front & back tails; Compute µ & σ : [86.66 %] :38721 [6.56 %] : [4.67 %] : [2.08 %] Compute Ranges µ - σ : µ + σ µ - 2σ : µ + 2σ

Query – Literals , height NumKB 1.height 2.diameter 3.minimumElevation minRange < 1.98 < maxRange Fuzzy string match (ColHeaderString, PropertyName)

Graphical Model for Tables 31 C1 C2C3 R1 1 R1 2 R1 3 R2 1 R2 2 R2 3 R3 1 R3 2 R3 3 Team Chicago Philadelphia Houston San Antonio Class Instance Name Vice-PresidentOffice Held Beetle Red Gasoline

Parameterized Graphical Model 32 C1 C2 C3 R1 1 R1 2 R1 3 R2 1 R2 2 R2 3 R3 1 R3 2 R3 3 Function that captures the affinity between the column headers and row values Row value Variable Node: Column header Captures interaction between column headers Captures interaction between row values

Semantic Message Passing 33 Michael_I_Jordan Chicago_Bulls “Change” playsFor “No Change” C1:[BasketballPlay er] C2:[NBATeam] C3:[BasketBallPosition s] Yao_MingAllen_Iverson BasketballPlayer “Change” BasketBall Player “No Change” … …

Semantic Message Passing [V] Pick new value [V] Send current values [F] Identify Outliers [F] Send semantics 34 V – Variable Nodes F – Factor Nodes Semantically Aware Factor Nodes Varish Mulwad, Tim Finin and Anupam Joshi, "Semantic Message Passing for Generating Linked Data from Tables", In 12th Int. Semantic Web Conf. (ISWC 2013), Sydney, Australia, Oct

[Michael_I_Jordan, Allen_Iverson, Yao_Ming] GeoPopulatedPlace BasketBallPlayer Art Work Name Michael_I_Jordan Allen_Iverson Yao_Ming Athelete BasketballPlayer ArtificialIntelligenceResearcher s 1. BasketBallPlayer 2.GeoPopulatedPlac e …. Top Class: BasketBallPlayer 35

Use the topClass in Message Passing process Send topClassScore as confidence score Name Michael_I_Jorda n Allen_Iverson Yao_Ming Change No - Change Update Column Header Annotation = “No-Annotation” topClassScore < threshold class ? BasketBallPlayer 36

4 – Relation between Columns 37 [Michael_I_Jordan, Chicago_Bulls] [Allen_Iverson, Philadelphia_76ers] [Yao_Ming, Houston_Rockets] Team Chicago_Bulls Philadelphia_76er s Houston_Rockets Name Michael_I_Jorda n Allen_Iverson Yao_Ming playsFor livesIn …. No – rel playsFor 1. playsFor 2. livesIn …. Top relation: playsFor

4 – Relation between Columns 38 Use the topRel in Message Passing process Send topRelScore as confidence Update Rel Annotation = “No- Annotation” topRelScore < threshold relation ? Name Michael_I_Jord an Allen_Iverson Yao_Ming Change playsFor No - Change

Variable Node Update R11 Michael Jordan Change [BasketBallPlayer, 0.8] Change [playsFor, 0.6] No- Change[0.5 5] (Team ) (Chicago ) (Shootin g Guard) 39 avgChangeConfidenceScore > avgNoChangeConfidenceScore ? [0.55]

Variable Node Update [Class: BasketBallPlayer, 0.8] [Relation: playsFor, 0.6] R11 Michael Jordan (1)BasketBallPlay er (2)playsFor Michael_I_Jordan …….. Michael_Jordan …….. 40 Satisfy constraints: [1, 2, 3] Satisfy constraints: [1, 2] Satisfy constraints: [1,3] Satisfy constraints: [2,3] Satisfy constraints: [1] Satisfy constraints: [2] Satisfy constraints: [3] Choose “No Annotation”

Halting Condition Ideal Case – No variable node receives a ‘CHANGE’ message Practical Case – Fraction of variable nodes that receive ‘CHANGE’ message < threshold Change 41

Tables Ontology 42 dbpedia- owl:BasketBallTeam dbpedia:Michael_Jord an dbpedia- owl:playsFor

RDF Linked Data Representation 43 tab:cell_01 a tab:ColumnHeader; tab:cellLabel "Name"^^xsd:String; tab:columnIndex "1"^^xsd:Integer; tab:valueType dbpedia-owl:BasketballPlayer. tab:cell_11 a tab:DataCell; tab:cellLabel "Michael Jordan"^^xsd:String; tab:columnIndex "1"^^xsd:Integer; tab:rowIndex "1"^^xsd:Integer; tab:entity dbpedia:Michael_Jordan. tab:HeaderRelation_12 a tab:TableRelation; tab:relFromColumn tab:cell_01; tab:relToColumn tab:cell_02; tab:relLabel dbpedia-owl:team.

Human in the loop 44 AAD DECODE Generate RDF Linked Data Verify (optional) Store / Publish Query and Rank 2 1 Joint Inference NameTeamPositionHeight Michael JordanChicagoShooting Guard1.98 Allen IversonPhiladelphiaPoint Guard1.83 Yao MingHoustonCenter2.29 Tim DuncanSan AntonioPower Forward2.11 AAD DECODE Joint Inference Generate RDF Linked Data Verify (optional) Store / Publish During After Before

Human in the loop – Before 45 No.NameTeamPositionHeight 1Michael JordanChicagoShooting Guard1.98 2Allen IversonPhiladelphiaPoint Guard1.83 3Yao MingHoustonCenter2.29 4Tim DuncanSan AntonioPower Forward2.11

Human in the loop – Before 46 Team WomenArtist BasketBallTeam City PopulatedPlace SportsTeam …. Michael Jordan Michael_I_Jordan Michael_Jordan Michael_Jackson Michael_Wodruff …. Name, Team livesIn team …. Assignments treated as “true values”

Human in the loop – During 47 Team [0.2]Name, Team [0.1] WomenArtist BasketBallTeam City SportsTeam ….

Human in the Loop – Impact on Joint Inference 48 Name Michael_I_Jorda n Allen_Iverson Yao_Ming Change No - Change BasketBallPlay er Name [BasketballPlayer] [Class: BasketBallPlayer, 1.0] [Fixed] [Relation: playsFor, 0.6] R11 Michael Jordan Name,Team [playsFor] [Class: BasketBallPlayer, 0.8] [Relation: playsFor, 1.0] [Fixed] Name Michael_I_Jorda n Allen_Iverson Yao_Ming Change No - Change playsFor

Human in the Loop – Impact on Joint Inference 49 R11 Chicago [Chicago_Bulls] WomenArtist BasketBallTeam City PopulatedPlace SportsTeam …. livesIn team …. Candidate classesCandidate relations

50 Why How Evaluati on Applicatio n Wrap up

Datasets Dataset# of tables used in Col. And Rel Annotations # of tables used in Data Cell Annotations Average number of columns and rows Web_Manual , 36 Web_Relation28–4, 67 Wiki_Manual25394, 35 Wiki_Links–803, 16 Subset of the IIT-B dataset 51 Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. In: Proc. 36th VLDB (2010)

Ground Truth Human annotators marked each class, relation as ‘vital’, ‘okay’, ‘incorrect’ To compute precision, assign scores to class & relation predicted by the system 1 – If the class was vital 0.5 – If the class was okay, but could have been better (e.g. Place v/s City) 0 – if it was incorrect To compute recall assign score of 1 if vital or okay, 0 for incorrect Ground truth for data cell value annotations from the IIT – B dataset 52

Column Header Annotations 53 Web_ManualWeb_Relatio n Wiki_Manual okay vital % of Relevant labels at Rank 1

Column Header Annotations 54 % of Relevant labels at different ranks Web_ManualWeb_Relatio n Wiki_Manual

Column Header Annotations 55 Precision, Recall and F-score at rank 1 Precisio n Recall F-score

Column Header Annotations 56 Web_Manual Web_Relatio n Wiki_Manual Precision v/s Recall at ranks 1-10

Column Header Annotations 57 SM P IIT- B GOOG SM P IIT- B GOOG Web_Manual Wiki_Manual Semantic Message Passing v/s the rest F-scores

Example Column Header Predictions 58 Column: Constituency Predicted: N.A. DBpedia classes [Ranks 2- 10]: OfficeHolder PrimeMinister Politician Election Event AdministrativeRegion PopulatedPlace University EducationalInstitution Column: Name of Elected M.P. Predicted: OfficeHolder DBpedia classes [Ranks 2- 10]: Election Event PrimeMinister Politician Country PopulatedPlace Settlement University EducationalInstitution

Relation Annotations 59 oka y vital % of relevant relations at rank 1

Relation Annotations 60 Web_Manual Web_Relation Wiki_Manual DBpedia Yago % of relevant relations at rank 1-10

Relation Annotations 61 SM P IIT- B Web_Manual Wiki_Manual Web_Relation SM P Semantic Message Passing v/s the rest F-scores

Example Relation Predictions 62 Column: President – Birth state Predicted: N.A. DBpedia rels [Ranks 2-10]: location deathPlace locatedInArea birthPlace isPartOf largestCity almaMater region state Column Pair: Name of Elected M.P. -- Party Affiliation Predicted: party DBpedia rels [Ranks 2-8]: affiliation otherParty primeMinister deathPlace birthPlace region NA

Data Cell Value Annotations 63

How long did it run ? 64 Line represents a table Number of variables that received a “change” message at the end of a iteration

Literals – Experimental Setup Subset of 16 tables [17 literal value columns] from the Wiki_Link Dataset Generate property candidate set by querying against NumKB Manually annotated each literal column with an appropriate DBpedia property 65

Header Cell Annotations for Literals 66 Percentage of correct properties at ranks 1- 10

Human in the loop – Experimental Setup Subset of 11 tables from the Wiki_Link dataset User feedback: Correct column header class [1 column in 9 tables and 2 for the remaining 2 tables] Rest of the experimental setup same. 67

Data Cell Annotations 68 No HILHIL Human in the Loop (HIL) v/s No Human in the Loop Correct Entities Total% HIL No – HIL

69 Why How Evaluatio n Applicatio n Wrap up

Interpreting Medical Tables as Linked Data for Generating Meta–Analysis Reports 70

TABEL TABEL – TABle Extracted as Linked Data 71 AAD DECODE Pre-processing modules Query and Rank 2 1 Generate RDF Linked Data Verify (optional) Store / Publish Joint Inference NameTeamPositionHeight Michael JordanChicagoShooting Guard1.98 Allen IversonPhiladelphiaPoint Guard1.83 Yao MingHoustonCenter2.29 Tim DuncanSan AntonioPower Forward2.11 Your module here! Normalize Varish Mulwad, Tim Finin and Anupam Joshi, "Interpreting Medical Tables as Linked Data to Generate Meta– Analysis Reports", In 15th IEEE Int. Conf. on Information Reuse and Integration (IRI 2014), San Francisco, USA, Aug

Preprocessing – Normalize 72

Preprocessing – Normalize 73 Patients with Secondary Thrombosis N = 146 no. --> 49; % -->33.6 no. (%) Smoker Split header cells into Query String and Metadata Normalize data cells; identify types or units

Query – Candidate Classes * [DBpedia] 74 Hypertensio n (1) Idiopathic intracranial hypertension (2) Pulmonary hypertension (3) Hypertension (1) Idiopathic intracranial hypertension (2) Pulmonary hypertension (3) Hypertension Re-rank – Classifier (String Similarity, Popularity) (1) Hypertension (2) Pulmonary hypertension (3) Idiopathic intracranial hypertension Also evaluated against SNOMED CT & UMLS

Query – Candidate Classes [Hybrid] 75 Hypertensio n (1) Hypertension (2) Pulmonary hypertension (3) Idiopathic intracranial hypertension No results? SNOMED CT (1) Hypertension (2) Pulmonary hypertension (3) Idiopathic intracranial hypertension API

Modeling Medical Tables as RDF 76 PatientGroup xsd:integer owl:Thing numberOf Individual s hasGroup Attribute 146 umls:Secondar y_Thrombosis Value xsd:String hasType xsd:double hasRawValu e % 33.6

Interactive tool to generate Meta – Analysis reports 77 User interface to define meta- analysis parameters and select studies Tool automatically generates relevant SPARQL queries

Evaluation 78

Header Cell Annotations 79 Distribution of header cell concepts at different ranks SNOMED CTUML S HYBRI D DBPEDI A NF: Correct concept not found in the candidate set NF NF NF NF Dataset: 7 tables (122 header cells)

Retrieval (Find) Evaluation Experimental Setup Generated Linked Data from four tables Executed Retrieval SPARQL queries to find tables that included correlation between venous thrombosis for four different cardio vascular risk factors Average Precision: 0.79; Average Recall:

81 Why How Evaluatio n Applicatio n Wrap up

Conclusions It is possible to generate high quality linked data from tables by jointly inferring the semantics of column headers, values (string and literal) in table cells, and relations between columns augmented with background knowledge from open data sources such as the Linked Open Data cloud. 82 I claimed: “ ’’

Conclusions It is possible to generate high quality linked data from tables by jointly inferring the semantics 83 TABEL jointly inferred the semantics; thorough evaluation showed promising results … the semantics of column headers, values (string and literal) in table cells, and relations between columns A novel technique to generate candidate properties from literal values

Conclusions It is possible to generate high quality linked data from tables 84 Tables ontology to represent the inferred semantics Demonstrated domain independence and extensibility and support for tables with different structures Explored different models for Human in the loop

Future Work Schema + Data driven approach Build on the work on inferring literals; NumKB Further develop Human in the loop Tool to generate meta-analysis reports 85

Acknowledgements 86 Dr. Tim Finin Dr. Anupam Joshi Dr. Tim Oates Dr. Yun Peng Dr. L V Subramaniam Dr. Indrajit Bhattacharya Lab mates & Friends!

Thank You! Our papers on this research topic have garnered 93 citations!