Presentation is loading. Please wait.

Presentation is loading. Please wait.

Connecting Knowledge Silos using Federated Text Mining Guy Singh Senior Manager, Product & Strategic Alliances ©2014 Linguamatics Ltd.

Similar presentations


Presentation on theme: "Connecting Knowledge Silos using Federated Text Mining Guy Singh Senior Manager, Product & Strategic Alliances ©2014 Linguamatics Ltd."— Presentation transcript:

1 Connecting Knowledge Silos using Federated Text Mining Guy Singh Senior Manager, Product & Strategic Alliances ©2014 Linguamatics Ltd

2 Click to edit Master title style External Content Internal Content Structured, semi-structured or unstructured content Separate interfaces to access content Cannot query across the silos, or exchange content Data Silos ©2014 Linguamatics Ltd

3 Click to edit Master title style ©2014 Linguamatics Ltd Possible Approaches

4 Click to edit Master title style Integration using Workflow Tools If each data source has an API, can link together using specific tools for each data source Can program particular workflows pulling information together from different data sources Advantages –Can perform complex data manipulation –Can exploit structure in data sources, or use I2E to transform the unstructured data Disadvantages –Workflows are fixed: can’t easily navigate and explore connections between data ©2014 Linguamatics Ltd

5 Click to edit Master title style Connecting via Linked Data Transform databases to RDF or provide a conversion layer Advantages –Standardizes data format –Can exploit structure in structured data sources –Can use I2E to transform unstructured data into RDF –Can reason with the RDF Disadvantages –Transformations are fixed –Have to predict what information you need from the unstructured text typically pull out a small proportion of the original information ©2014 Linguamatics Ltd

6 Click to edit Master title style Integrate the data together into a data warehouse –Extract, Transform and Load each data source into a new database Advantages –Allows users to perform a single query across all the content –Can use I2E to pull information out of unstructured text –Can combine with human curation so warehouse contains checked content Disadvantages –ETL can be time consuming and expensive process –Lose information have to predict what information you need from the unstructured text –typically pull out a small proportion of the original information transformation of discrete fields can lose finer distinctions Using a Data Warehouses ©2014 Linguamatics Ltd

7 Click to edit Master title style Use I2E to make data available for search, navigation, linking –Keep data in original format without any data loss –I2E queries become the conversion layer, dynamically transforming data into the format we want when we need it –Ontologies convert between different identifiers, or different languages –Configurable: just change the queries Use other methods when require their strengths –RDF for reasoning with results –Workflow tools for complex data analysis and manipulation –Data warehouses for curated data Federated Text Mining for Data Silos ©2014 Linguamatics Ltd

8 Click to edit Master title style ©2014 Linguamatics Ltd Road to Federated Text Mining Federated Text Mining Data Normalization Merge Results Link the Content Servers

9 Click to edit Master title style 9 Data Normalisation – Virtual Indexes Pathology Reports Index Journal Abstracts Index Virtual Index

10 Click to edit Master title style 10 Data Normalisation – Document Structure Pathology Reports Journal Abstracts

11 Click to edit Master title style 11 Data Normalisation - Entities Journal Abstracts Pathology Reports Combined (Normalized)

12 Click to edit Master title style ©2014 Linguamatics Ltd Road to Federated Text Mining Federated Text Mining Data Normalization Merge Results Link the Content Servers

13 Click to edit Master title style I2E 4.1/4.2: Single Client, Multiple Results I2E Server 2 FDA Drug Labels I2E Server 1 Internal Documents external network internal network ©2014 Linguamatics Ltd Linked server

14 Click to edit Master title style ©2014 Linguamatics Ltd Road to Federated Text Mining Federated Text Mining Data Normalization Merge Results Link the Content Servers

15 Click to edit Master title style 15 Each Server supplying separate set of results Content Server 1 Content Server 2 Content Server 3 Content Server 4 Merge into a single set of results

16 Click to edit Master title style ©2014 Linguamatics Ltd Road to Federated Text Mining Federated Text Mining Data Normalization Merge Results Link the Content Servers

17 Click to edit Master title style I2E Federated Text Mining ©2014 Linguamatics Ltd17 © Linguamatics Confidential Connected Knowledge Extract and connect data in any format, wherever it resides


Download ppt "Connecting Knowledge Silos using Federated Text Mining Guy Singh Senior Manager, Product & Strategic Alliances ©2014 Linguamatics Ltd."

Similar presentations


Ads by Google