Usage of Parallel Fuzzy Clustering for Performance of Distributed ESA Datasets Processing Lycom Dmitri 1, Trzasala Milena 2, Students Gr. 1 -8BM23, 2 –AO-184.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Random Forest Predrag Radenković 3237/10
SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
Presented by: GROUP 7 Gayathri Gandhamuneni & Yumeng Wang.
Chapter 13 Other Planetary Systems The New Science of Distant Worlds.
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.
Other Planetary Systems. Detecting Extrasolar Planets  Extrasolar planets are planets orbiting other stars.  We usually detect these planets by the.
A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.
Bin Fu Eugene Fink, Julio López, Garth Gibson Carnegie Mellon University Astronomy application of Map-Reduce: Friends-of-Friends algorithm A distributed.
© 2010 Pearson Education, Inc. Chapter 13 Other Planetary Systems: The New Science of Distant Worlds.
1 Tuesday, October 03, 2006 If I have seen further, it is by standing on the shoulders of giants. -Isaac Newton.
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
May 29, Final Presentation Sajib Barua1 Development of a Parallel Fast Fourier Transform Algorithm for Derivative Pricing Using MPI Sajib Barua.
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
30 March 2006Birmingham workshop1 The Gaia Mission A stereoscopic census of our Galaxy.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 17 Client-Server Processing, Parallel Database Processing,
Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
Copyright © 2012 Pearson Education, Inc. Extrasolar Planetary Systems.
FLANN Fast Library for Approximate Nearest Neighbors
The hybird approach to programming clusters of multi-core architetures.
Shot boundary detection based on frame histograms analysis Vakulenko M.D. 1, Kovalenko D.A. 2, Tolkunov S.V. 2, Master Students Gr. 1 -8BM10, 2 -8VM13.
University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.
Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
Parallel Processing CS453 Lecture 2.  The role of parallelism in accelerating computing speeds has been recognized for several decades.  Its role in.
Computational Design of the CCSM Next Generation Coupler Tom Bettge Tony Craig Brian Kauffman National Center for Atmospheric Research Boulder, Colorado.
Parallel and Distributed Systems Instructor: Xin Yuan Department of Computer Science Florida State University.
天文信息技术联合实验室 New Progress On Astronomical Cross-Match Research Zhao Qing.
Community Architectures for Network Information Systems
Chokchai Junchey Microsoft Product Specialist Certified Technical Training Center.
Hierarchical model for pattern recognition based on parallel and distributed computing Olivier Bornet, University Joseph Fourier Grenoble Martin Kalany,
A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of.
InCoB August 30, HKUST “Speedup Bioinformatics Applications on Multicore- based Processor using Vectorizing & Multithreading Strategies” King.
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)
Garrett Poppe, Liv Nguekap, Adrian Mirabel CSUDH, Computer Science Department.
Message Passing Computing 1 iCSC2015,Helvi Hartmann, FIAS Message Passing Computing Lecture 1 High Performance Computing Helvi Hartmann FIAS Inverted CERN.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
Anomaly Detection in Data Mining. Hybrid Approach between Filtering- and-refinement and DBSCAN Eng. Ştefan-Iulian Handra Prof. Dr. Eng. Horia Cioc ârlie.
Random Graph Generator University of CS 8910 – Final Research Project Presentation Professor: Dr. Zhu Presented: December 8, 2010 By: Hanh Tran.
A Trust Based Distributed Kalman Filtering Approach for Mode Estimation in Power Systems Tao Jiang, Ion Matei and John S. Baras Institute for Systems Research.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Computing Simulation in Orders Based Transparent Parallelizing Pavlenko Vitaliy Danilovich, Odessa National Polytechnic University Burdeinyi Viktor Viktorovych,
Distributed simulation with MPI in ns-3 Joshua Pelkey and Dr. George Riley Wns3 March 25, 2011.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Joe Bradish Parallel Neural Networks. Background  Deep Neural Networks (DNNs) have become one of the leading technologies in artificial intelligence.
Parallel Computing Presented by Justin Reschke
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
High Performance Large Scale Datasets Clustering Based on Gustaffson-Kessel Fuzzy Technique Lycom Dmitri 1, Trzasala Milena 2, Students Gr. 1 -8BM23, 2.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
Kriging for Estimation of Mineral Resources GISELA/EPIKH School Exequiel Sepúlveda Department of Mining Engineering, University of Chile, Chile ALGES Laboratory,
DATA MINING: CLUSTER ANALYSIS (3) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1.
Big Data is a Big Deal!.
More on Clustering in COSC 4335
CLUSTER COMPUTING Presented By, Navaneeth.C.Mouly 1AY05IS037
Applying Twister to Scientific Applications
HPML Conference, Lyon, Sept 2018
Efficient Document Analytics on Compressed Data:
GPX: Interactive Exploration of Time-series Microarray Data
CSE572, CBS572: Data Mining by H. Liu
By Brandon, Ben, and Lee Parallel Computing.
CSE572: Data Mining by H. Liu
Feature mapping: Self-organizing Maps
BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies
Presentation transcript:

Usage of Parallel Fuzzy Clustering for Performance of Distributed ESA Datasets Processing Lycom Dmitri 1, Trzasala Milena 2, Students Gr. 1 -8BM23, 2 –AO Tomsk Polytechnic University, Russia 2 -Wroclaw University of Technology, Poland Scientific advisor: Dr. Sergey Axyonov

Gaia project Usage of Parallel Fuzzy Clustering for Performance of Distributed ESA Datasets Processing 2 European Space Agency (ESA) mission Launch Date: 2013 Mission End: after 5 years (2018) Launch Vehicle: Soyuz-Fregat Orbit: Lissajous-type orbit around L2 OBJECTIVES: Gaia is an ambitious mission to chart a three-dimensional map of our Galaxy, the Milky Way, in the process revealing the composition, formation and evolution of the Galaxy. MISSION: Produce a stereoscopic and kinematic census of about one billion stars in our Galaxy and throughout the Local Group. Detection and orbital classification of tens of thousands of extra-solar planetary systems.

Objective & Tasks 3 Motivation  Presence of large-scale datasets from the Gaia space project  Need for fast and accurate analysis of received information  The technique can be used to process other huge industrial databases Objective High Performance of Vectors Clustering Techniques for Rapid and Reliable ESA Datasets Analysis Tasks  Design a technique that allows to get effective data distribution and supercomputer processing.  Create a program implementation of the suggested technique that is capable to be used with computational cluster.  Test and estimate the performance of developed software with ESA testing datasets. Usage of Parallel Fuzzy Clustering for Performance of Distributed ESA Datasets Processing

Datasets & Attributes 4 Analyzing Stars Features Testing Datasets Attribute NameMeaning log-f1 Log of the first frequency log-f2 Log of the second frequency log-af1h1-t Log amplitude, first harmonic, first frequency log-af1h2-t Log amplitude, second harmonic, first frequency log-af1h3-t Log amplitude, third harmonic, first frequency log-af1h4-t Log amplitude, fourth harmonic, first frequency log-af2h1-t Log amplitude, first harmonic, second frequency log-af2h2-t Log amplitude, second harmonic, second frequency log-crf10 Amplitude ratio between harmonics of the first frequency pdf12 Phase difference between harmonics of first frequency Varrat Variance ratio before and after first frequency subtraction B-V Colour index V-I Colour index Dataset# Instances# AttributesUsage Synth43K Validation of clustering quality and implementation of algorithm Synth Scalability performance testing Synth Scalability performance testing Synth Scalability performance testing Synth Scalability performance testing Usage of Parallel Fuzzy Clustering for Performance of Distributed ESA Datasets Processing

DBSCAN Clustering technique 5 Allows to get clusters (groups of points) with any shape. Based on the points density Usage of Parallel Fuzzy Clustering for Performance of Distributed ESA Datasets Processing Disadvantage: Performance!!! Heavily Depends on Dataset Sizes

Data Distribution Problem 6 Each process that is connected with a computational node gets just a part of data. Processes can’t detect clusters because neighboring points are located in different memory space Our Solution Message Passing Interface MPI_Scatter result Usage of Parallel Fuzzy Clustering for Performance of Distributed ESA Datasets Processing

Gustaffson-Kessel Fuzzy Clustering 7 Allows to generate clusters with hyperellipse shape ⇒ more natural clustering. Computes fuzzy matrix that reflects distances between vectors and centers based on points density. ⇒ arrangement of points for indexing and boundary points detection Base Computations of clustering are Matrix operations ⇒ easy to parallize Usage of Parallel Fuzzy Clustering for Performance of Distributed ESA Datasets Processing

Preclustering 8 Use a part of data to clusterize ⇒ speed up processing. Use some computational nodes with different data ⇒ reliability of analysis. Usage of Parallel Fuzzy Clustering for Performance of Distributed ESA Datasets Processing

The Most General Clustering Search 9 Detection the most effective splitting in terms of density and distances between vectors. This distribution reflects the nature of data. Usage of Parallel Fuzzy Clustering for Performance of Distributed ESA Datasets Processing

Clusters join (Output Synchronization) 10 Each point has a fuzzy vector that reflects distances between the point and centers of clusters. Perform “Join” operation if distances between boundary points in two processes are less the radius of neighborhood. We need just a very small part of clustered points that are located at the borders between clusters ⇒ speedup. Usage of Parallel Fuzzy Clustering for Performance of Distributed ESA Datasets Processing

Implementation 11 The program is developed with MS Visual C# (multi-core + multi-node computations were used) Libraries of Parallel Computing.NET4 (Multi-core processing) Matrix operations in GK clustering Cluster equivalence (The Most General Clustering search) Libraries of Distributed Programming MPI.NET (Multi-node processing) Data Scattering (for Preclustering) Data Distribution (for DBSCAN Procedure) Data Broadcasting (The Most General Clustering sending) Clusters join (Point-to-Point Communication) The program was tested with SKIF-TPU Computational Cluster (used 20 computational nodes, 80 processors) Usage of Parallel Fuzzy Clustering for Performance of Distributed ESA Datasets Processing

Performance 12 Performance of the suggested technique to analyze the Synth-10 5 Dataset in distributed manner. ⇒ The approach really increases the performance of algorithm. Usage of Parallel Fuzzy Clustering for Performance of Distributed ESA Datasets Processing

Results 13 Designed a new original approach to parallel- distributed large-scale dataset clustering Created a C# program that implements the suggested model based on MPI.NET and parallel libraries of.NET 4. The software was tested to estimate the performance of suggested approach with the Gaia test datasets The suggested technique increases the DBSCAN performance. Parallel-distributed processing is much more effective versus standard DBSCAN method. Usage of Parallel Fuzzy Clustering for Performance of Distributed ESA Datasets Processing