Advanced Optimization Techniques for Complex Problems Técnicas de Optimización Avanzadas para Problemas Complejos TRACER:ULL - 2003 Barcelona, October.

Slides:



Advertisements
Similar presentations
Introduction to C Programming
Advertisements

MPI Message Passing Interface
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Hadi Goudarzi and Massoud Pedram
Practical techniques & Examples
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
A Process Splitting Transformation for Kahn Process Networks Sjoerd Meijer.
Fast Algorithms For Hierarchical Range Histogram Constructions
Lecture 3: Parallel Algorithm Design
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
Dynamic Programming.
Chess Problem Solver Solves a given chess position for checkmate Problem input in text format.
A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO.
Parallel Programming Models and Paradigms
Sequence Alignment Variations Computing alignments using only O(m) space rather than O(mn) space. Computing alignments with bounded difference Exclusion.
Pipelined Computations Divide a problem into a series of tasks A processor completes a task sequentially and pipes the results to the next processor Pipelining.
7 -1 Chapter 7 Dynamic Programming Fibonacci Sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
Branch and Bound Algorithm for Solving Integer Linear Programming
Fundamental Techniques
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Diffusion scheduling in multiagent computing system MotivationArchitectureAlgorithmsExamplesDynamics Robert Schaefer, AGH University of Science and Technology,
FLANN Fast Library for Approximate Nearest Neighbors
DATA STRUCTURE Subject Code -14B11CI211.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Parallel implementation of RAndom SAmple Consensus (RANSAC) Adarsh Kowdle.
Predicting performance of applications and infrastructures Tania Lorido 27th May 2011.
Parallelism and Robotics: The Perfect Marriage By R.Theron,F.J.Blanco,B.Curto,V.Moreno and F.J.Garcia University of Salamanca,Spain Rejitha Anand CMPS.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Recursion and Dynamic Programming. Recursive thinking… Recursion is a method where the solution to a problem depends on solutions to smaller instances.
Computer Science Department Data Structure & Algorithms Lecture 8 Recursion.
Télécom 2A – Algo Complexity (1) Time Complexity and the divide and conquer strategy Or : how to measure algorithm run-time And : design efficient algorithms.
Hybrid MPI and OpenMP Parallel Programming
1 " Teaching Parallel Design Patterns to Undergraduates in Computer Science” Panel member SIGCSE The 45 th ACM Technical Symposium on Computer Science.
Data Structures Using C++1 Chapter 1 Software Engineering Principles and C++ Classes.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
WOOD 492 MODELLING FOR DECISION SUPPORT
September Bound Computation for Adaptive Systems V&V Giampiero Campa September 2008 West Virginia University.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Java Methods Big-O Analysis of Algorithms Object-Oriented Programming
Exact indexing of Dynamic Time Warping
Algorithm Analysis Part of slides are borrowed from UST.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
IT Applications for Decision Making. Operations Research Initiated in England during the world war II Make scientifically based decisions regarding the.
Branch and Bound Algorithms Present by Tina Yang Qianmei Feng.
CSCI-455/552 Introduction to High Performance Computing Lecture 23.
1 Chapter 2 Program Performance. 2 Concepts Memory and time complexity of a program Measuring the time complexity using the operation count and step count.
1 Hierarchical Parallelization of an H.264/AVC Video Encoder A. Rodriguez, A. Gonzalez, and M.P. Malumbres IEEE PARELEC 2006.
1 Chapter 5 Branch-and-bound Framework and Its Applications.
Journal of Computational and Applied Mathematics Volume 253, 1 December 2013, Pages 14–25 Reporter : Zong-Dian Lee A hybrid quantum inspired harmony search.
Genetic Algorithms in sequential environments and local networks
Lecture 3: Parallel Algorithm Design
Xing Cai University of Oslo
Accelerators to Applications
GC 211:Data Structures Week 2: Algorithm Analysis Tools
GC 211:Data Structures Algorithm Analysis Tools
DATA STRUCTURES AND OBJECT ORIENTED PROGRAMMING IN C++
Course Description Algorithms are: Recipes for solving problems.
Parallel Sorting Algorithms
Analysis and design of algorithm
Capriccio – A Thread Model
Using compiler-directed approach to create MPI code automatically
GC 211:Data Structures Algorithm Analysis Tools
Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt Oct 24, 2013.
Parallel Sorting Algorithms
Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson slides5.ppt August 17, 2014.
Using compiler-directed approach to create MPI code automatically
Patterns Paraguin Compiler Version 2.1.
Presentation transcript:

Advanced Optimization Techniques for Complex Problems Técnicas de Optimización Avanzadas para Problemas Complejos TRACER:ULL Barcelona, October 25th, TIC C05-05 University of La Laguna

Outline Objectives Researchers Problems Branch and Bound and Divide and Conquer Skeletons  Knapsack Problem  Matrix Product  Constrained two-dimensional cutting stock problem CALL and LLAC: tools for Complexity Analysis  Symbolic regression Problem An analytical model for Pipeline and Master-Slave algorithms over heterogeneous clusters  Resource allocation problem  Prediction of the RNA Secondary Structure problem Results

TRACER::ULL Objectives The TRACER::ULL main objective is to achieve an efficient resolution of the following complex problems by developing new optimization procedures:  Constrained two-dimensional cutting stock problem  Symbolic regression problem  Prediction of the RNA secondary structure problem We propose the design, implementation and evaluation of solving tools using exact techniques:  Divide and Conquer  Branch and Bound  Dynamic Programming It is an objective to provide sequential, parallel and distributed implementations for academia problems:  Resource allocation problem  Knapsack problem  Matrix Product A second research track is related with the building of a methodology and the associated tool for the complexity and performance analysis of both sequential and parallel algorithms. Another goal is the implementation of  An Internet execution systems  A Problem repository  Performance Analysis Web site:

Researchers ULL Staff  Coromoto León Hernández  Isabel Dorta González  Daniel González Morales  Casiano Rodríguez León  Jesús Alberto González Martínez Foreing  Rumen Andonov Students  Juan Ramón González González  Gara Miranda Valladares  María Dolores Medina Barroso Grants Branch and Bound Dynamic Programming Performance Analysis Tools and Symbolic regression problem Divide and Conquer two dimensional cutting stock problem Prediction of the RNA secondary structure problem

Shared Memory Branch and Bound Skeletons // shared variables {bqueue, bstemp, soltemp, data} // private variables {auxSol, high, low} // the initial subproblem is already inserted in the global shared queue while(!bqueue.empty()) { nn = bqueue.getNumberOfNodes(); nt = (nn > maxthread)?maxthread:nn; data = new SubProblem[nt]; for (int j = 0; j < nt; j++) data[j] = bqueue.remove(); set.num.threads(nt); parallel forall (i = 0; i < nt; i++) { high = data[i].upper_bound(pbm,auxSol); if ( high > bstemp ) { low = data[i].lower_bound(pbm,auxSol); if ( low > bstemp ) { // critical region // only one thread can change the value at any time bstemp = low; soltemp = auxSol; } if ( high != low ) { // critical region // just one thread can insert subproblems in the queue at any time data[i].branch(pbm,bqueue); } } bestSol = bstemp; sol = soltemp;

0-1 Knapsack Problem The 0/1 Knapsack Problem can be stated as follows: "We have been provided with a knapsack of capacity C and with a set of N objects; p[k] and w[k] are the profit and weight associated to object k. Without exceeding the capacity of the knapsack, the objects must be inserted into the knapsack providing the maximum profit". Martello, S., Toth, P. : Knapsack Problems Algorithms and Computer Implementatios. John Wiley & Sons Ltd. (1990)

Comparison between MPI and OpenMP skeletons Origin CIEMAT

Distributed Branch and Bound skeleton Initialization Phase Resolution Phase  Conditional Communication  Message Reception  Avoiding starvation  Compute  Best bound Propagation  Work querying  Ending resolution phase Solution Building

Distributed Branch and Bound skeleton

Matrix Product Definition: Strassen algorithm: Lets be y

Distributed Divide and Conquer skeleton

Two dimensional cutting stock Problem: User Interface In this problem we are given a large stock rectangle S of dimension LxW and n types of smaller rectangles (pieces) where the i-th type has dimension l i xw i. Furthermore, each problem is now to cut off from the large rectangle a set of small rectangles such that:  All pieces have a fixed orientation, i.e., a piece of length l and width w is different from a piece of length w and width l (l≠w)  All applied cuts are of guillotine type, i.e., cut that start form one edge and run parallel to the other two edges.  There are at most b i rectangles of type i in the cutting plane, the demand constrain of the i-th piece.  The overall profit obtained by Σ i=1 n c i x i where x i denotes the number of rectangles of type i in the cutting patter, is maximized. Aplicación del Proyecto Magos

Performance: CALL & LLAC MPI PVM Standard LibrariesParallel Architectures We need a well accepted Parallel Computing Model BSP LogP......

CALL & LLAC Architecture

Performance: CALL & LLAC #pragma cll mp mp[0] + mp[1]*N + mp[2]*N*N + mp[3]*N*N*N for (i = 0; i < N; i++) { for (j = 0; j < N; j++) { sum = 0; for (k = 0; k < N; k++) sum += A(i,k) * B(k,j); C(i,j) = sum; } #pragma cll end mp Square Matrix Product. A, B y C of dimension N×N,

Measuring and Predicting Performance while(!bqueue.empty()) { auxSp = bqueue.remove(); // pop a problem from the local queue high = auxSp.upper_bound(pbm,auxSol); // upper bound if ( high > bestSol ) { low = auxSp.lower_bound(pbm,auxSol); // lower bound if ( low > bestSol ) { bestSol = low; sol = auxSol; outputPacket.send(MASTER, SOLVE_TAG, bestSol, sol); } if ( high != low ) { // calculate the number of required slaves rSlaves = bqueue.getNumberOfNodes(); op.send(MASTER, BnB_TAG, high, rSlaves); inputPacket.recv(MASTER, nfSlaves, bestSol, rank {1,..., nfSlaves}); if ( nfSlaves >= 0) { auxSp.branch(pbm,bqueue); // branch and save in the local queue for i=0, nfSlaves{ // send subproblems to the assigned slaves auxSp = bqueue.remove(); outputPacket.send(rank, PBM_TAG, auxSp, bestSol, sol); } } // if nfSlaves == DONE the problem is bounded (cut) } #pragma cll code numvis++;

How to compile? call kpr.c kpr.cll.h kpr.cll.c cc kpr kpr.c.datkpr.c.dat.1kpr.c.dat.n... EXPERIMENT: "kps" BEGIN_LINE: 115 END_LINE: 119 FORMULA: p 0 p 1 v 0 * + INFORMULA: kps[0]+kps[1]*numvis MAXTESTS: DIMENSION: 2 PARAMETERS: NUMIDENTS: 1 IDENTS: numvis OBSERVABLES: CLOCK COMPONENTS: 1 numvis POSTFIX_COMPONENT_0: 1 POSTFIX_COMPONENT_1: v 0 NUMTESTS: 1 SAMPLE: CPU NCPUS numvis CLOCK

Number of visited Nodes Study

Measuring and Predicting Performance int main(int argc, char ** argv) { number sol; readKnap(data); /* obj. sig., capacidad rest., beneficio */ sol = knap( 0, M, 0); printf("\nsol = ", sol); return 0; } #pragma cll code double numvis = 0.0; #pragma cll report all #pragma cll kps kps[0]*unknown(numvis) posteriori numvis #pragma cll end kps

Symbolic Regression Problem Find the unknown complexity formula starting from the experimental data gathered by CALL. We can use Symbolic Regression: the induction of mathematical expressions on data. Rather than searching for the values of the regression constants, The object of search is a symbolic description of the system. See Scientific Discovery using Genetic Programming by Maarten Keijzer Currently we use a fitness function that measures the error of the predictions “on the asymptotic side” using linear regression on a small sub-sample Aplicación del Proyecto Magos

Prediction of the RNA Secondary Structure Problem RNA molecule: string of n characters: R=r 1 r 2... r n such that r i  {A, C, G, U} Nucleotides join to free energy: A  U G  U C  G The iteration space is n x n triangular Dependences nonuniform: dependences among non-consecutive stages Aplicación del Proyecto Magos E(S i+1,j-1 ) +  ( r i, r j ), E( S i,j ) = min min { E(S i,k-1 ) + E(S k,j ) } i < k  j

TRACER::ULL 2003 Results Journals:  Authors: Dorta, León, Rodríguez Title: Comparing MPI and openMP Implementations of the 0-1 Knapsack Problem Journal: Parallel and Distributed Computing Practices. ISSN (Accepted) Date: 2003  Authors: Blanco V., García L., González J.A., Rodríguez C., Rodríguez G. Title: A Performance Model for the Analysis of OpenMP Programs Journal: Parallel and Distributed Computing Practices. ISSN (Accepted) Date: 2003

TRACER::ULL 2003 Results International Conferences:  Blanco V., González J. A., León C., Rodríguez C., Rodríguez G. “From Complexity Analysis to Performance Analysis”. Euro-Par International Conference on Parallel and Distributed Computing. Klagenfurt, Austria August  Dorta I., León C., Rodríguez C., Rojas A.”Parallel Skeletons for Divide and Conquer and Branch and Bound Techniques”. 11th euromicro Conference on Parallel and Network-Based Processing. ISSN Genova, Italy. 5-7 February,  Dorta I., León C., Rodríguez C. “A comparison between MPI and OpenMP Branch-and- Bound Skeletons”. 8th International Workshop on High-Level Parallel Programming Models and Supportive Enviroments. ISBN X. Nice, France.22 April,  Dorta I., León C., Rodríguez C., Rojas A. “Parallel Skeletons. Branch-and-Bound and Divide-and-Conquer Techniques”. TAM User Group Meeting Barcelona, Spain. 16 May, 2003  Dorta I., León C., Rodríguez C., Rojas A. “MPI and OpenMP implementations of Branch and Bound Skeletons”. ParCo2003. Dresden, Germany. 2-5 Septiembre,  Dorta I., León C., Rodríguez C. “Parallel Branch and Bound Skeletons: Message Passing and Shared Memory Implementtions”. 5th International Conference on Parallel Processing and Applied Mathematics. Czestochowa, Poland September,  García L., González J.A., González J.C., León C., Rodríguez C., Rodríguez G. “Complexity Driven Performance Analysis”. 10th EuroPVM/MPI Venice, Italy. Sep 29 - Oct 2, 2003.

TRACER::ULL 2003 Results National Conferences:  Dorta I., León C., Rodríguez C. Rodríguez, G., Rojas A. “Complejidad Algorítmica: de la Teoría a la Práctica”. JENUI’03 (Jornadas de Enseñanza Universitaria de la Informática). ISBN Cádiz Julio, 2003  González J.R., León, C., Rodríguez C., ”Un esqueleto para Ramificación y Acotación Distribuido”. XIV Jornadas De Paralelismo. Leganés (Madrid) septiembre 2003 PFC  González J. R., “Esqueletos Paralelos Distribuidos. Paradigmas de Ramificación y Acotación y Divide y Vencerás”. Documento de Trabajo Interno del DEIOC: DT Julio 2003.