Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür.

Slides:

Advertisements

Similar presentations

Load Balancing Parallel Applications on Heterogeneous Platforms.

Advertisements

Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition.

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.

1 Very Large-Scale Incremental Clustering Berk Berker Mumin Cebe Ismet Zeki Yalniz 27 March 2007.

A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.

Parallel Matrix Operations using MPI CPS 5401 Fall 2014 Shirley Moore, Instructor November 3,

Lecture 7-2 : Distributed Algorithms for Sorting Courtesy : Michael J. Quinn, Parallel Programming in C with MPI and OpenMP (chapter 14)

Parallel Strategies Partitioning consists of the following steps –Divide the problem into parts –Compute each part separately –Merge the results Divide.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Distributed Process Scheduling Summery Distributed Process Scheduling Summery BY:-Yonatan Negash.

Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

A Parallel Matching Algorithm Based on Image Gray Scale Liang Zong, Yanhui Wu cso, vol. 1, pp , 2009 International Joint Conference on Computational.

Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.

CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.

Query Evaluation Techniques for Cluster Database Systems Andrey V. Lepikhov, Leonid B. Sokolinsky South Ural State University Russia 22 September 2010.

Reference: Message Passing Fundamentals.

1 Friday, September 29, 2006 If all you have is a hammer, then everything looks like a nail. -Anonymous.

October, 1998DARPA / Melamed / Singh1 Parallelization of Search Algorithms for Modeling QTES Processes Joshua Kramer and Santokh Singh Rutgers University.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Pipelined Computations Divide a problem into a series of tasks A processor completes a task sequentially and pipes the results to the next processor Pipelining.

High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.

Design of parallel algorithms

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

On the Task Assignment Problem : Two New Efficient Heuristic Algorithms.

1 Matrix Addition, C = A + B Add corresponding elements of each matrix to form elements of result matrix. Given elements of A as a i,j and elements of.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.

Simple Load Balancing CS550 Operating Systems. Announcements Project will be posted – TBA This project will use the client-server model and will require.

An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

A CONDENSATION-BASED LOW COMMUNICATION LINEAR SYSTEMS SOLVER UTILIZING CRAMER'S RULE Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer.

Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.

Dense Linear Algebra Sathish Vadhiyar. Gaussian Elimination - Review Version 1 for each column i zero it out below the diagonal by adding multiples of.

Sieve of Eratosthenes by Fola Olagbemi. Outline What is the sieve of Eratosthenes? Algorithm used Parallelizing the algorithm Data decomposition options.

Parallel Simulation of Continuous Systems: A Brief Introduction

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.

Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.

Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

Spectral Sequencing Based on Graph Distance Rong Liu, Hao Zhang, Oliver van Kaick {lrong, haoz, cs.sfu.ca {lrong, haoz, cs.sfu.ca.

Motivation: Sorting is among the fundamental problems of computer science. Sorting of different datasets is present in most applications, ranging from.

CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.

Dense Linear Algebra Sathish Vadhiyar. Gaussian Elimination - Review Version 1 for each column i zero it out below the diagonal by adding multiples of.

Domain decomposition in parallel computing Ashok Srinivasan Florida State University.

08/10/ NRL Hybrid QR Factorization Algorithm for High Performance Computing Architectures Peter Vouras Naval Research Laboratory Radar Division Professor.

Static Process Scheduling

CDP Tutorial 3 Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison.

Sudhanshu Khemka.  Treats each document as a vector with one component corresponding to each term in the dictionary  Weight of a component is calculated.

PARALLEL COMPUTATION FOR MATRIX MULTIPLICATION Presented By:Dima Ayash Kelwin Payares Tala Najem.

Concurrency and Performance Based on slides by Henri Casanova.

By Shivaraman Janakiraman, Magesh Khanna Vadivelu.

COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University

COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University

Computation on Graphs. Graphs and Sparse Matrices Sparse matrix is a representation of.

Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.

CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning

COMP8330/7330/7336 Advanced Parallel and Distributed Computing Decomposition and Parallel Tasks (cont.) Dr. Xiao Qin Auburn University

High Performance Computing Seminar

Auburn University

Parallel Databases.

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.

Parallel Algorithm Design

Parallel Programming in C with MPI and OpenMP

Parallel Matrix Operations

FFTW and Matlab*p Richard Hu Project.

CSCE569 Parallel Computing

Parallel Programming in C with MPI and OpenMP

Numerical Algorithms Quiz questions

شیوه های موازی سازی parallelization methods

Chapter 01: Introduction

Parallel Programming in C with MPI and OpenMP

Presentation transcript:

Parallel C3M1 Aylin Tokuç Erkan Okuyan Özlem Gür Aylin Tokuç Erkan Okuyan Özlem Gür

Parallel C3M2 Outline Basics of Parallel computing Sequential C3M Parallel C3M Basics of Parallel computing Sequential C3M Parallel C3M

3 Parallel Computation Decomposition: The process of dividing a computation into smaller parts. Task: Programmer defined units of computation into which the main computation is subdivided by means of decomposition. Decomposition: The process of dividing a computation into smaller parts. Task: Programmer defined units of computation into which the main computation is subdivided by means of decomposition.

Parallel C3M4 Parallel Computation Primary Considerations Load Balancing Minimizing Communication Task Dependency Optimization Load Balancing Minimizing Communication Task Dependency Optimization

Parallel C3M5 Parallel Computation Load Balancing

Parallel C3M6 Parallel Computation Minimizing Communication

Parallel C3M7 Parallel Computation Task Dependency Optimization

Parallel C3M8 C3M Algorithm 1- Determine the cluster seeds of the database. 2- if d, is not a cluster seed then Find the cluster seed (if any) that maximally covers d 3- If there remain unclustered documents, group them into a ragbag cluster.

Parallel C3M9 C3M Formulas

Parallel C3M10 C3M – Sample Matrices

Parallel C3M11 Parallel C3M- Distribution Distribute rows among processors  Load balancing by cyclic block distribution Distribute rows among processors  Load balancing by cyclic block distribution

Parallel C3M12 Local Calculations All processors calculate α, partial β and P i Current Method for Weighted Matrix: too costly Need coloumn vectors (but row- wise partitioned)

Parallel C3M13 Seed Powers P i Seed power P i, should be small for a document whose terms appear in too many documents or too few documents. Seed power P i, should be bigger for a document whose terms appear in a moderate number of documents. Seed power P i, should be small for a document whose terms appear in too many documents or too few documents. Seed power P i, should be bigger for a document whose terms appear in a moderate number of documents.

Parallel C3M14 Minimize Communication - Proposed Heuristic # of non-zeros All processors calculate α, partial β and β’

Parallel C3M15 Effectiveness of Heuristic A matlab script is written to compare the effectiveness of the proposed heuristic. Correlation Coeeficient = 0.95 A matlab script is written to compare the effectiveness of the proposed heuristic. Correlation Coeeficient = 0.95

Parallel C3M16 Communication btw Processors Partial β and β’ vectors are exchanged btw processors to calculate the final β and β’ vectors. Then, all processor calculate c ii =δ i Partial β and β’ vectors are exchanged btw processors to calculate the final β and β’ vectors. Then, all processor calculate c ii =δ i

Parallel C3M17 # of Clusters Processors exchange local δ All processors calculate n c Processors exchange local δ All processors calculate n c

Parallel C3M18 Cluster-head Selection Calculate seed power of local documents Exchange largest n c seed powers. Calculate largest n c seed powers among all P i and find cluster heads.

Parallel C3M19 Clustering Non-seed Docs Exchange seed documents Cluster non-seed documents (as in sequential C3M) in each processor. Exchange seed documents Cluster non-seed documents (as in sequential C3M) in each processor.

Parallel C3M20 Future Work Term Based Clustering Overlapping Clusters Term Based Clustering Overlapping Clusters

Parallel C3M21 C3M Summary Load Balancing with cyclic block distribution Communication minimization by a new heuristic Task dependency minimized with block distirbution & heuristic. Load Balancing with cyclic block distribution Communication minimization by a new heuristic Task dependency minimized with block distirbution & heuristic.

Parallel C3M22 References Concepts and the effectiveness of the cover coefficient-based clustering methodology, F. Can, E. A. Ozkarahan Parallelizing the Buckshot Algorithm for Efficient Document Clustering, Eric C. Jensen, Steven M. Beitzel, Angelo J. Pilotto, Nazli Goharian, Ophir Frieder Clustering and Classification of Large Document Bases in a Parallel Environment, Anthony S. Ruocco, Ophir Frieder Efficient Clustering of Very Large Document Collections, I.S. Dhillon, J. Fan, Y. Guan Concepts and the effectiveness of the cover coefficient-based clustering methodology, F. Can, E. A. Ozkarahan Parallelizing the Buckshot Algorithm for Efficient Document Clustering, Eric C. Jensen, Steven M. Beitzel, Angelo J. Pilotto, Nazli Goharian, Ophir Frieder Clustering and Classification of Large Document Bases in a Parallel Environment, Anthony S. Ruocco, Ophir Frieder Efficient Clustering of Very Large Document Collections, I.S. Dhillon, J. Fan, Y. Guan

Parallel C3M23 Questions?

Parallel C3M24 The End Thank you for your patience