ICCS 2008, CracowJune 23-25, 20081 Towards Large Scale Semantic Annotation Built on MapReduce Architecture Michal Laclavík, Martin Šeleng, Ladislav Hluchý.

Slides:



Advertisements
Similar presentations
Smart Qualitative Data: Methods and Community Tools for Data Mark-Up SQUAD Libby Bishop Online Qualitative Data Resources: Best Practice in Metadata Creation.
Advertisements

Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Hadoop(MapReduce) in the Wild —— Our current understandings & uses of Hadoop Le Zhao, Changkuk Yoo, Mark Hoy, Jamie Callan Presenter: Le Zhao
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
MapReduce Simplified Data Processing on Large Clusters Google, Inc. Presented by Prasad Raghavendra.
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
Building an Ontological Base for Experimental Evaluation of Semantic Web Applications Peter Bartalos, Michal Barla, Gyorgy Frivolt, Michal Tvarožek, Anton.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
HADOOP ADMIN: Session -2
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
11 October Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid.
Enron s as Graph Data Corpus for Large-scale Graph Querying Experimentation Michal Laclavík, Martin Šeleng, Marek Ciglan, Ladislav Hluchý.
RDB2Onto: Approach for creating semantic metadata from relational database data Martin Šeleng, Michal Laclavík, Zoltán Balogh, Ladislav Hluchý Institute.
Jeffrey D. Ullman Stanford University. 2 Chunking Replication Distribution on Racks.
Taking Raw Data Towards Analysis 1 iCSC2015, Vince Croft, NIKHEF Exploring EDA, Clustering and Data Preprocessing Lecture 2 Taking Raw Data Towards Analysis.
MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …
Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.
Survey of Semantic Annotation Platforms
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Processing and Recommendation Michal Laclavík, Ladislav Hluchý, Martin Šeleng ( research, information extraction, information retrieval, contextual.
Information processing Michal Laclavík, Ladislav Hluchý ( research, information extraction, information retrieval, contextual recommendation)
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.
The exponential growth of data –Challenges for Google,Yahoo,Amazon & Microsoft in web search and indexing The volume of data being made publicly available.
Semantic Technologies & GATE NSWI Jan Dědek.
Session 4e, 24 October 2007 eChallenges e-2007 Copyright 2007 Institute of Informatics, SAS Network Enterprise Interoperability and Collaboration using.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Enabling Grids for E-sciencE EGEE-III INFSO-RI Using DIANE for astrophysics applications Ladislav Hluchy, Viet Tran Institute of Informatics Slovak.
Department of computer science and engineering Two Layer Mapping from Database to RDF Martin Švihla Research Group Webing Department.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
ISIM’06, Přerov ; Corporate Memory Corporate Memory: A framework for supporting tools for acquisition, organization and maintenance of information.
Ontea: Pattern based Annotation Platform Michal Laclavík.
SLIDE 1IS 240 – Spring 2013 MapReduce, HBase, and Hive University of California, Berkeley School of Information IS 257: Database Management.
Workshop 12g, 26 October 2007 eChallenges e-2007 Copyright 2007 Commius consortium Commius: ISU via Michal Laclavík Institute of Informatics, Slovak.
Session 10a, 21st October 2005 eChallenges e-2005 Copyright 2005 K-Wf Grid, Institute of Informatics SAS Experience Management based on Text Notes (EMBET)
Lightweight Semantic Approach for Enterprise Search and Interoperability Michal Laclavík, Štefan Dlugolinský, Martin Šeleng, Marek Ciglan, Martin Tomašek,
Chapter 5 Ranking with Indexes 1. 2 More Indexing Techniques n Indexing techniques:  Inverted files - best choice for most applications  Suffix trees.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
11 November Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid.
Personalized Recommendation of Related Content Based on Automatic Metadata Extraction Andreas Nauerz 1, Fedor Bakalov 2, Birgitta.
7th May Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
WIKTBratislava, 28. november Semantic Organization/Enterprise Vision Michal Laclavik, Ladislav Hluchy, Marian Babik, Zoltan Balogh, Ivana Budinska,
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
©2003 Paula Matuszek CSC 9010: AeroText, Ontologies, AeroDAML Dr. Paula Matuszek (610)
BIG DATA/ Hadoop Interview Questions.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
WIKT 2007Košice, november Tvorba sémantických metadát Michal Laclavík Ústav Informatiky SAV.
Google Cloud computing techniques (Lecture 03) 18th Jan 20161Dr.S.Sridhar, Director, RVCT, RVCE, Bangalore
Large Scale Semantic Data Integration and Analytics through Cloud: A Case Study in Bioinformatics Tat Thang Parallel and Distributed Computing Centre,
Hadoop Aakash Kag What Why How 1.
GATE and the Semantic Web
Scalable systems.
Map Reduce.
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
Extraction, aggregation and classification at Web Scale
CS110: Discussion about Spark
Ch 4. The Evolution of Analytic Scalability
Hadoop Technopoints.
Presentation transcript:

ICCS 2008, CracowJune 23-25, Towards Large Scale Semantic Annotation Built on MapReduce Architecture Michal Laclavík, Martin Šeleng, Ladislav Hluchý Institute of Informatics Slovak Academy of Sciences in Bratislava

Motivation Semantic Annotation or Tagging –Deliver formal understanding of text documents one of main focuses of semantic web –Documents on Web or in enterprise to be understood by computer –To understand content and context ICCS 2008, CracowJune 23-25, 20082

Semantic Annotation Similar to Information Extraction Finding meta data about entities, its properties and their relations Ontologies Manual tools (Semi) Automatic tools –Usually tested on a few hundreds documents Needs: –To deliver application on the web or in enterprises we need to annotate large scale –Semantic Web can be exploited only if metadata understood by a computer reach critical mass Examples: –Geographical locations, People, Organizations ICCS 2008, CracowJune 23-25, 20083

MapReduce Google approach for large scale information processing Commodity PC’s Application developer needs to implement only Map and Reduce methods Inputs and outputs are ordered key-value pairs Fault tolerant, easy to use, scalable to hundred thousands computers Hadoop –open source implementation by Apache –Yahoo! is using it on cores in production environment. ICCS 2008, CracowJune 23-25, 20084

Ontea: Pattern Based Annotation Information extraction and semantic annotation using patterns Find objects and properties in text Possibility to transform it to RDF/OWL Similar to C-PANKOW, KIM or GATE Very simple solution good for languages where advanced NLP is not present Applicable in enterprise applications ICCS 2008, CracowJune 23-25, 20085

Ontea in Hadoop Map function - Pattern.annotation() –Input lines of text –Output key-value pairs e.g. file_name => organization:Apple Organization:Apple=>address:Mountain View Map function – transformers –E.g. lemmatization transformer –input: Settlement:Bratislave,Settlement:Bratislava –Output: Settlement:Bratislava Reduce function –input key-value pairs (objects and properties) –Output as needed – objects and its relations to files with properties (e.g. in RDF/OWL) ICCS 2008, CracowJune 23-25, 20086

Results & Conclusion It works, it is portable, it is faster 12 times faster on 16 cores ICCS 2008, CracowJune 23-25, 20087