7th May 20131 Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid.

Slides:



Advertisements
Similar presentations
Leveraging Commercial Graph DB Technologies in Open Source and Polyglot Application Environments Brian Clark, VP Product Management Objectivity, Inc.
Advertisements

Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Nokia Technology Institute Natural Partner for Innovation.
University of Illinois Visualizing Text Loretta Auvil UIUC February 25, 2011.
BigData Tools Seyyed mohammad Razavi. Outline  Introduction  Hbase  Cassandra  Spark  Acumulo  Blur  MongoDB  Hive  Giraph  Pig.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Business Intelligence System September 2013 BI.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
Databases & Data Warehouses Chapter 3 Database Processing.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.
What Can Do for You! Fabian Christ
11 October Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid.
Enron s as Graph Data Corpus for Large-scale Graph Querying Experimentation Michal Laclavík, Martin Šeleng, Marek Ciglan, Ladislav Hluchý.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
SQL vs NOSQL Discussion
Multimedia Databases (MMDB)
Organizational Memory: Issues in Design & Implementation Sree Nilakanta May 1, 2000.
Survey of Semantic Annotation Platforms
Processing and Recommendation Michal Laclavík, Ladislav Hluchý, Martin Šeleng ( research, information extraction, information retrieval, contextual.
Information processing Michal Laclavík, Ladislav Hluchý ( research, information extraction, information retrieval, contextual recommendation)
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Institute of Informatics, Slovak Academy of Sciences Michal Laclavík Ladislav Hluchý.
Košice, 10 February Experience Management based on Text Notes The EMBET System Michal Laclavik.
When bet365 met Riak and discovered a true, “always on” database.
Session 4e, 24 October 2007 eChallenges e-2007 Copyright 2007 Institute of Informatics, SAS Network Enterprise Interoperability and Collaboration using.
Oracle Database 11g Semantics Overview Xavier Lopez, Ph.D., Dir. Of Product Mgt., Spatial & Semantic Technologies Souripriya Das, Ph.D., Consultant Member.
ICCS 2008, CracowJune 23-25, Towards Large Scale Semantic Annotation Built on MapReduce Architecture Michal Laclavík, Martin Šeleng, Ladislav Hluchý.
Ontea: Pattern based Annotation Platform Michal Laclavík.
CNI, 3rd April 2006 Slide 1 UK National Centre for Text Mining: Activities and Plans Dr. Robert Sanderson Dept. of Computer Science University of Liverpool.
Workshop 12g, 26 October 2007 eChallenges e-2007 Copyright 2007 Commius consortium Commius: ISU via Michal Laclavík Institute of Informatics, Slovak.
Session 10a, 21st October 2005 eChallenges e-2005 Copyright 2005 K-Wf Grid, Institute of Informatics SAS Experience Management based on Text Notes (EMBET)
By Vaibhav Nachankar Arvind Dwarakanath.  HBase is an open-source, distributed, column- oriented and sorted-map data storage.  It is a Hadoop Database;
Lightweight Semantic Approach for Enterprise Search and Interoperability Michal Laclavík, Štefan Dlugolinský, Martin Šeleng, Marek Ciglan, Martin Tomašek,
11 November Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid.
CPT-S Topics in Computer Science Big Data 1 Yinghui Wu EME 49.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
NoSQL: Graph Databases. Databases Why NoSQL Databases?
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
NoSQL databases A brief introduction NoSQL databases1.
WIKTBratislava, 28. november Semantic Organization/Enterprise Vision Michal Laclavik, Ladislav Hluchy, Marian Babik, Zoltan Balogh, Ivana Budinska,
Big Data Yuan Xue CS 292 Special topics on.
SAP BI – The Solution at a Glance : SAP Business Intelligence is an enterprise-class, complete, open and integrated solution.
MarkLogic The Only Enterprise NoSQL Database Presented by: Aashi Rastogi ( ) Sanket Patel ( )
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Viet Tran Institute of Informatics, SAS Slovakia.
Data Analytics Challenges Some faults cannot be avoided Decrease the availability for running physics Preventive maintenance is not enough Does not take.
Big Data-An Analysis. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult.
CS 405G: Introduction to Database Systems
NoSQL: Graph Databases
Big Data Enterprise Patterns
Chapter 14 Big Data Analytics and NoSQL
Data Management Agenda
Modern Data Management
YourDataStories: Transparency and Corruption Fighting through Data Interlinking and Visual Exploration Georgios Petasis1, Anna Triantafillou2, Eric Karstens3.
Data Warehousing and Data Mining
Defining Data-intensive computing
Overview of big data tools
Searching and browsing through fragments of TED Talks
CSE 635 Multimedia Information Retrieval
Database Systems Summary and Overview
Big DATA.
Primary Research Team & Capabilities
Big Data.
Presentation transcript:

7th May Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid and MapReduce applications –Intelligent and Knowledge oriented Technologies Experience from IST: –3 project in FP5: ANFAS, CrosGRID, Pellucid –6 project in FP6: EGEE II, K-Wf Grid, DEGREE (coordinator), EGEE, int.eu.grid, MEDIGRID –4 projects in FP7: Commius, Admire, Secricom, EGEE III Several National Projects (SPVV, VEGA, APVT) IKT Group Focus: –Information Processing (Large Scale) –Graph Processing –Information Extraction and Retrieval –Semantic Web –Knowledge oriented Technologies –Parallel and Distributed Information Processing Solutions: –SGDB: Simple Graph Database –gSemSearch: Graph based Semantic Search –Ontea: Pattern-based Semantic Annotation –ACoMA: KM tool in –EMBET: Recommendation System –Experts on MapReduce and IR (Nutch, Solr, Lucene) Director & leader of PDC: Dr. Ladislav Hluchý URL:

Large scale Text and Graph data processing Core Technology Web crawling –Nutch + plugins Full text indexing and search –lucene, Sorl Information Extraction –Ontea, GATE All above large scale –Hadoop, S4 Graph processing and Querying –Simple Graph Database (SGDB) –gSemSearch –Neo4j –Blueprints 7th May Underlined are the technologies developed by IISAS

Relation to Business Intelligence Old BI approaches –Data Integration from RDBM –Data ware houses –OLAP –… New BI approaches –Other than RDBM data structures: Networks, Semantics Networks/Graphs in Telecom, Social Networks, Transactions, Linked Data … NoSQL: key value (Tokyo Cabinet), column stores (HBase), Graph databases, RDF(s) –In-Memory computing –Commodity PCs solutions for large data: MapReduce style - Hadoop, Pregel style – Giraph, Hama –Big unstructured data processing (on Hadoop): Sentiment analysis, topic detection, named entity detection 7th May 20133

Ontea: Information Extraction Tool  Regex patterns  Gazetteers  Resuls  Key-value pairs  Structured into trees  graphs  Transformers, Configuration  Automatic loading of extractors  Visual Annotation Tool  Integration with external tools  GATE, Stemers, Hadoop …  Multilingual tests English, Slovak, Spanish, Italian 7th May Text with annotations Tree of annotations Network /Graph of annotations

Named Entity Recognition (NER) Combination of Existing NER –ANNIE (GATE), Apache OpenNLP, –Illinois NER, Illinois Wikifier, –LingPipe, Open Calais –Stanford NER,WikiMiner, –Miscinator Machine Learning –Decision Trees models Our approach was evaluated in best 6 from 17 word wide on MSM th May 20135

gSemSearch: Graph based Semantic Search Entity relation search in semantic networks/graphs Search, Navigation, Data Interaction Aiming at data integration of –Structured data (Relational data, LinkedData) –Unstructured Data (text, documents, communication) Applications: – , Web, Text documents, LinkedData 17 April

SemSets: Sematnic Search Answering list type questions: astronauts who walked on the Moon Wikipedia as text and networks/graph Text: IR methods, Lucene based Graph/network: sprading activation and SemSets Winning solution on Semantic Search Challenge April Eugene_Cernan 2.Alan_Bean 3.David_Scott 4.John_Young_(astronaut) 5.Neil_Armstrong 6.Pete_Conrad 7.Harrison_Schmitt 8.Alan_Shepard 9.Charles_Duke 10.Buzz_Aldrin 11.James_Irwin 12.Edgar_Mitchell

SGDB: Simple Graph Database Storage for graphs Optimized for graph traversing and spread of activation Faster then Neo4j for graph traversing operations Supports Blueprints API Graph Database Benchmarks –Graph Traversal Benchmark for Graph Databases – –Blueprints API - possibility to test compliant Graph databases 7th May Source:

Future Direction: Relations Discovery in Large Graph Data Motivation –Graph/Network data are everywhere: social networks, web, LinkedData, transactions, communication ( , phone). –Also text can be converted to graph. –Interconnecting graph data and searching for relations is crucial. Approach –Forming semantic trees and graphs from text, web, communication, databases and LinkedData –User interaction with graph data in order to achieve integration and data cleansing –Users will do it, if user effort have immediate impact on search results 7th May 20139