Presentation is loading. Please wait.

Presentation is loading. Please wait.

11 October 20131 Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid.

Similar presentations


Presentation on theme: "11 October 20131 Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid."— Presentation transcript:

1 11 October 20131 Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid and MapReduce applications –Intelligent and Knowledge oriented Technologies Experience from IST: –3 project in FP5: ANFAS, CrosGRID, Pellucid –6 project in FP6: EGEE II, K-Wf Grid, DEGREE (coordinator), EGEE, int.eu.grid, MEDIGRID –4 projects in FP7: Commius, Admire, Secricom, EGEE III Several National Projects (SPVV, VEGA, APVT) IKT Group Focus: –Information Processing (Large Scale) –Graph Processing –Information Extraction and Retrieval –Semantic Web –Knowledge oriented Technologies –Parallel and Distributed Information Processing Solutions: –SGDB: Simple Graph Database –gSemSearch: Graph based Semantic Search –Ontea: Pattern-based Semantic Annotation –ACoMA: KM tool in Email –EMBET: Recommendation System –Experts on MapReduce and IR (Nutch, Solr, Lucene) Director & leader of PDC: Dr. Ladislav Hluchý URL: http://ikt.ui.sav.skhttp://ikt.ui.sav.sk

2 Towards Entity Search Current approaches –Confirmed human knowledge –Google Knowledge Graph –Facebook Graph Search Data sets Available –Wikipedia –DBPedia (111 languages) –Freebase –Linked Data cloud Our approach –Quite unique mix of skills: IR, Semantic Web, Graphs and Networks –Networks, Text, metadata –Graph algorithms –Information Retrieval techniques –Anchor texts: aliases, properties, types 11 October 20132

3 Entity Search Applications 11 October 20133 https://www.linkedin.com/today/post/article/20130805134105-50510-search-what-s-cooking-in-the-lab http://www.siliconrepublic.com/strategy/item/31182-global-enterprise-search-ma

4 Entity Search Applications Online Advertising –Query Categorization –Keyword Extension Business Intelligence –Enterprise Search –Knowledge Management –Text analytics Multilingual short text categorizations –Based on Wikipedia Language versions, DBPedia, Freebase –Query Categorization –Social media (Twitter) categorization, analysis Security Domain –Information Leakage prevention –Categorization 11 October 20134

5 Large scale Text and Graph data processing Core Technology Web crawling –Nutch + plugins Full text indexing and search –lucene, Sorl Information Extraction –Ontea, GATE All above large scale –Hadoop, S4 Graph processing and Querying –Simple Graph Database (SGDB) –gSemSearch –Neo4j –Blueprints 11 October 20135 Underlined are the technologies developed by IISAS

6 Relation to Business Intelligence Old BI approaches –Data Integration from RDBM –Data ware houses –OLAP –… New BI approaches –Other than RDBM data structures: Networks, Semantics Networks/Graphs in Telecom, Social Networks, Transactions, Linked Data … NoSQL: key value (Tokyo Cabinet), column stores (HBase), Graph databases, RDF(s) –In-Memory computing –Commodity PCs solutions for large data: MapReduce style - Hadoop, Pregel style – Giraph, Hama –Big unstructured data processing (on Hadoop): Sentiment analysis, topic detection, named entity detection 11 October 20136

7 Ontea: Information Extraction Tool  Regex patterns  Gazetteers  Resuls  Key-value pairs  Structured into trees  graphs  Transformers, Configuration  Automatic loading of extractors  Visual Annotation Tool  Integration with external tools  GATE, Stemers, Hadoop …  Multilingual tests English, Slovak, Spanish, Italian 11 October 20137 http://ontea.sf.net Text with annotations Tree of annotations Network /Graph of annotations

8 Named Entity Recognition (NER) Combination of Existing NER –ANNIE (GATE), Apache OpenNLP, –Illinois NER, Illinois Wikifier, –LingPipe, Open Calais –Stanford NER,WikiMiner, –Miscinator Machine Learning –Decision Trees models Received second place at MSM 2013, missing first place by 1%, where participated 17 teams word wide http:// ikt.ui.sav.sk/index.php?n=Main.IEChallenge2013 http:// ikt.ui.sav.sk/index.php?n=Main.IEChallenge2013 11 October 20138

9 gSemSearch: Graph based Semantic Search Entity relation search in semantic networks/graphs Search, Navigation, Data Interaction Aiming at data integration of –Structured data (Relational data, LinkedData) –Unstructured Data (text, documents, communication) Applications: –Email, Web, Text documents, LinkedData 11 October 20139 http://ikt.ui.sav.sk/esns/

10 SemSets: Sematnic Search Answering list type questions: astronauts who walked on the Moon Wikipedia as text and networks/graph Text: IR methods, Lucene based Graph/network: sprading activation and SemSets Winning solution on Semantic Search Challenge 2011 11 October 201310 1.Eugene_Cernan 2.Alan_Bean 3.David_Scott 4.John_Young_(astronaut) 5.Neil_Armstrong 6.Pete_Conrad 7.Harrison_Schmitt 8.Alan_Shepard 9.Charles_Duke 10.Buzz_Aldrin 11.James_Irwin 12.Edgar_Mitchell

11 SGDB: Simple Graph Database Storage for graphs Optimized for graph traversing and spread of activation Faster then Neo4j for graph traversing operations Supports Blueprints API https://simplegdb.svn.sourceforge.net/svnroot/simplegdb/Sgdb3 Graph Database Benchmarks –Graph Traversal Benchmark for Graph Databases –http://ups.savba.sk/~marek/gbench.htmlhttp://ups.savba.sk/~marek/gbench.html –Blueprints API - possibility to test compliant Graph databases 11 October 201311 Source: http://geza.kzoo.edu/bionet/html/scalefree.html

12 Community Detection in Complex Networks Task: Identify densely connected subgraphs in complex networks community collapsing problem SCCD –Near-linear time complexity –Avoids community collapsing problem (to certain extend) KDD paper –Re-weighting approach –Better results on real networks 11 October 201312 Marek Ciglan, Kjetil Nørvåg: Fast detection of size-constrained communities in large networks, proceedings of WISE'10, LNCS Volume 6488/2010 Marek Ciglan, Michal Laclavík and Kjetil Nørvåg: On Community Detection in Real-World Networks and the Importance of Degree Assortativity, 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2013

13 Future Direction: Entity Search in Large Graph Data Motivation –Graph/Network data are everywhere: social networks, web, LinkedData, transactions, communication (email, phone). –Also text can be converted to graph. –Interconnecting graph data and searching for relations is crucial. Approach –Forming semantic trees and graphs from text, web, communication, databases and LinkedData –User interaction with graph data in order to achieve integration and data cleansing –Users will do it, if user effort have immediate impact on search results 11 October 201313


Download ppt "11 October 20131 Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid."

Similar presentations


Ads by Google