A framework for ontology Learning FROM Big Data

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

The 20th International Conference on Software Engineering and Knowledge Engineering (SEKE2008) Department of Electrical and Computer Engineering
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Chapter 5: Introduction to Information Retrieval
Progress Update Semantic Web, Ontology Integration, and Web Query Seminar Department of Computing David George.
SPICE! An Ontology Based Web Application By Angela Maduko and Felicia Jones Final Presentation For CSCI8350: Enterprise Integration.
Building and Analyzing Social Networks Web Data and Semantics in Social Network Applications Dr. Bhavani Thuraisingham February 15, 2013.
Analyzing Minerva1 AUTORI: Antonello Ercoli Alessandro Pezzullo CORSO: Seminari di Ingegneria del SW DOCENTE: Prof. Giuseppe De Giacomo.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
AceMedia Personal content management in a mobile environment Jonathan Teh Motorola Labs.
Describing Syntax and Semantics
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
Audumbar Chormale Advisor: Dr. Anupam Joshi M.S. Thesis Defense
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
New trends in Semantic Web Cagliari, December, 2nd, 2004 Using Standards in e-Learning Claude Moulin UMR CNRS 6599 Heudiasyc University of Compiègne (France)
Katanosh Morovat.   This concept is a formal approach for identifying the rules that encapsulate the structure, constraint, and control of the operation.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness and Peter Fox CSCI Week 9, October 27, 2008.
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
Zhonghua Qu and Ovidiu Daescu December 24, 2009 University of Texas at Dallas.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness TA Weijing Chen Semantic eScience Week 10, November 7, 2011.
Semantic Web Applications GoodRelations BBC Artists BBC World Cup 2010 Website Emma Nherera.
Web Usage Mining for Semantic Web Personalization جینی شیره شعاعی زهرا.
Aude Dufresne and Mohamed Rouatbi University of Montreal LICEF – CIRTA – MATI CANADA Learning Object Repositories Network (CRSNG) Ontologies, Applications.
RDF and triplestores CMSC 461 Michael Wilson. Reasoning  Relational databases allow us to reason about data that is organized in a specific way  Data.
Page 1 Alliver™ Page 2 Scenario Users Contents Properties Contexts Tags Users Context Listener Set of contents Service Reasoner GPS Navigator.
Copyright © 2012, SAS Institute Inc. All rights reserved. ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY,
10/24/09CK The Open Ontology Repository Initiative: Requirements and Research Challenges Ken Baclawski Todd Schneider.
Semantic Technologies and Application to Climate Data M. Benno Blumenthal IRI/Columbia University CDW /04-01.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
© Geodise Project, University of Southampton, Knowledge Management in Geodise Geodise Knowledge Management Team Barry Tao, Colin Puleston, Liming.
Tool for Ontology Paraphrasing, Querying and Visualization on the Semantic Web Project By Senthil Kumar K III MCA (SS)‏
Semantic Publishing Benchmark Task Force Fourth TUC Meeting, Amsterdam, 03 April 2014.
Triple Stores. What is a triple store? A specialized database for RDF triples Can ingest RDF in a variety of formats Supports a query language – SPARQL.
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
An Ontology-based Approach to Context Modeling and Reasoning in Pervasive Computing Dejene Ejigu, Marian Scuturici, Lionel Brunie Laboratoire INSA de Lyon,
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
NeOn Components for Ontology Sharing and Reuse Mathieu d’Aquin (and the NeOn Consortium) KMi, the Open Univeristy, UK
WonderWeb. Ontology Infrastructure for the Semantic Web. IST Project Review Meeting, 11 th March, WP2: Tools Raphael Volz Universität.
Selected Semantic Web UMBC CoBrA – Context Broker Architecture  Using OWL to define ontologies for context modeling and reasoning  Taking.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
1 CS 8803 AIAD (Spring 2008) Project Group#22 Ajay Choudhari, Avik Sinharoy, Min Zhang, Mohit Jain Smart Seek.
Versatile Information Systems, Inc International Semantic Web Conference An Application of Semantic Web Technologies to Situation.
Sharing personal knowledge over the Semantic Web ● We call personal knowledge the knowledge that is developed and shared by the users while they solve.
Large Scale Semantic Data Integration and Analytics through Cloud: A Case Study in Bioinformatics Tat Thang Parallel and Distributed Computing Centre,
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Database Systems: Design, Implementation, and Management Tenth Edition
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Cloud based linked data platform for Structural Engineering Experiment
Triple Stores.
Overview of MDM Site Hub
Knowledge Management Systems
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Web Ontology Language for Service (OWL-S)
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
SIS: A system for Personal Information Retrieval and Re-Use
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Analyzing and Securing Social Networks
Ontologies and Model-Based Systems Engineering
Triple Stores.
A Snapshot of the OWL Web
BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES
Information Networks: State of the Art
Database System Concepts and Architecture
Triple Stores.
Presentation transcript:

A framework for ontology Learning FROM Big Data IDRA A framework for ontology Learning FROM Big Data Ing. Roberto Enea Senior Software Engineer @SecureAuth

Summary IDRA (Inductive Deductive Reasoning Architecture): General Framework for Ontology Learning Challenge Architecture and Data Model (LOM) Components Kevin: an IDRA Use Case to collect user data in order to highlight risk factors for Social Engineering Attack Susceptibility Facebook Crawling Evaluation Criteria

ETL (Extract Transform Load) The main result of the ETL process is to give a new and useful semantics to data. Examples in Identity and Access Governance Abandoned Accounts Orphan Accounts

ETL from Web Challenge Managing Data Evolution Change of perspectives: not only documents but sources Managing Volume New data storage approaches are needed Managing Uncertainty

Managing Uncertainty two different kinds of uncertainty: objective and subjective. An objective uncertain data is something that is inherently uncertain. As an example, we could mention the daily weather forecast where the chance of rain is expressed with a given probability. The subjective uncertain data represents facts that are inherently true or false, but their value is hard to be defined because of data noise. Our aim is to reduce subjective uncertainty of extracted facts in order to embed them in the learning ontology

Managing Uncertainty Our approach can be summarized by the following features: Use of a value in the range of [0,1000] to represent the level of uncertainty of an axiom (RDF statement) coming from the ontology learning process. The value depends on: The algorithms used to extract the statement The sources’ reliability A dedicated infrastructure to manage the LOM (Learning Ontology Model) including management of the confidence level (the confidence level is not included in the learned ontology) Providing a tool for running SPARQL what-if queries to the user, in order to let him investigate and validate uncertain statements using an interrogative model approach

IDRA Concept

Structural Components Statements Generator: The module implementing the statements generation. Statements Repository: the repository storing the LOM Statements Modulator: This module update the confidence level of the incoming triples Analysis tools (query What if, Reasoner)

Domain dependent components Domain Ontology: the ontology including the initial ontology and inference rules Analysis Engines (UIMA annotators): the set of annotators specific for each source monitored Projection Rules: the rules used to map the annotation to the domain ontology. It is used by CODA to generate the triples Inference Rules: the set of rules to infer the required categorization on data Crawler: the module dedicated to data extraction. It is strictly related to the source monitored.

IDRA Architecture Statements Generator FB LAM Statements Generator Wiki LAM (Learned Axioms Manager) Statements Generator RG

Statements Generator REST Interface Crawler Analysis Engine (UIMA Annotator) CODA Sources Triples Domain Ontology Projection Rules

Learned Axiom Manager REST Interface Repository Statements Modulator Domain Ontology Triples Statements Modulator SPARQL Parser Reasoner Inference Rules Repository

LOM Data Model The main entity of the model is the Triple. It contains some important attributes required for the managing of uncertain statements: the confidence level the number of occurrences the source If a single instance of a triple is extracted from the analyzed documents the SM will assign to the triple the same confidence level coming from the SG, otherwise it will calculate the average of the confidence level increasing the number of occurrences. The source indicates the original corpus the statement has been extracted from.

In-Memory graph DBs Based on Binary Matrices used as Adjacency matrices to store Relationships between entities All the indexes are stored in memory Persistence is used just from not frequently accessed properties

In-Memory graph DBs Adjacency matrices make the indirect relationship computation easy Facebook Profile Places X Y Z a b c d e A B C f g h i l m n Friends f g h i l m n X Y Z Users Facebook Profile Friends

In-Memory graph DBs In order to compute the relationships between the Users and the places where their friends live, It is enough to compute the Boolean matrix product of the three matrices above Places A B C f g h i l m n X Y Z a b c d e A B C a b c d e f g h i l m n X Y Z Users

In-Memory graph DBs Hierarchy navigation a b c d e f g h a b c d e f g

In-Memory graph DBs OS

Use case: Kevin Detection system of people at risk of SE Attacks. Sources: main social networks Domain ontology: based on common HR Db entities Inference rules coming from: Samar Muslah Albladi and George R. S. Weir, «User characteristics that influence judgment of social engineering attacks in social Networks»

Facebook Crawling In order to overcome several blocking systems adopted by Facebook to avoid crawling I considered to use: Selenium: A java library for Web UI Automation, usually used by QA Engineers Several Agents (fake profiles with no friends to not influence Facebook search) Random delays between actions to simulate human behavior

Projection Rules Strategy The confidence level is evaluated using the edit distance between the field extracted and the corresponding information in HR Profile. It is applied to the hasFacebook profile relationships, while the relationships between Facebook profile and its attributes have the maximum cl Job HR Profile CL=1000 Facebook Profile CL[0, 1000] Education Company

Inference Rules Rule name Rule Description Identifiable Person(?x) ^ kevin-ontology:hasFacebookProfile(?x, ?y) -> kevin-ontology:Identifiable(?x) HighInformationDisseminator kevin-ontology:Person(?x) ^ kevin-ontology:FacebookProfile(?y) ^ kevin-ontology:hasFacebookProfile(?x, ?y) ^ kevin-ontology:hasJob(?y, ?z) ^ kevin-ontology:livingPlace(?y, ?l) ^ kevin-ontology:spouse(?y, ?h) ^ kevin-ontology:studied(?y, ?d) ^ kevin-ontology:studiedAt(?y, ?m) ^ kevin-ontology:workFor(?y, ?u) -> kevin-ontology:HighInformationDisseminator(?x) VerySusceptibleToSEA kevin-ontology:Identifiable(?x) ^ kevin-ontology:HighInformationDisseminator(?x) -> kevin-ontology:VerySusceptibleToSEA(?x) MediumInformationDisseminator kevin-ontology:Person(?x) ^ kevin-ontology:FacebookProfile(?y) ^ kevin-ontology:hasFacebookProfile(?x, ?y) ^ kevin-ontology:livingPlace(?y, ?l) ^ kevin-ontology:studiedAt(?y, ?m) ^ kevin-ontology:workFor(?y, ?u) -> kevin-ontology:MediumInformationDisseminator(?x) ModeratelySusceptibleToSEA kevin-ontology:MediumInformationDisseminator(?x) ^ kevin-ontology:Identifiable(?x) -> kevin-ontology:ModeratelySusceptibleToSEA(?x)

Inference Rules The inference Rules are expressed in SWRL language because: It is flexible (a lot of built-in function that let the user implement advanced rules) It starts from OWA It is embedded inside the ontology in OWL format The extended specification (built in functions) is supported by few reasoners

Demo

Q&A