Www.company.com MapReduce on Matlab By: Erum Afzal.

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
MapReduce.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
Distributed Approximate Spectral Clustering for Large- Scale Datasets FEI GAO, WAEL ABD-ALMAGEED, MOHAMED HEFEEDA PRESENTED BY : BITA KAZEMI ZAHRANI 1.
Dimensionality Reduction PCA -- SVD
DISTRIBUTED COMPUTING & MAP REDUCE CS16: Introduction to Data Structures & Algorithms Thursday, April 17,
More MR Fingerprinting
Distributed Computations
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture.
ACL, June Pairwise Document Similarity in Large Collections with MapReduce Tamer Elsayed, Jimmy Lin, and Douglas W. Oard University of Maryland,
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
MapReduce Simplified Data Processing on Large Clusters Google, Inc. Presented by Prasad Raghavendra.
Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.
Distributed Computations MapReduce
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Radial Basis Function Networks
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland MLG, January, 2014 Jaehwan Lee.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
Ch 4. The Evolution of Analytic Scalability
MapReduce.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
MapReduce. Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture emerging: – Cluster of.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Neighbourhood Sampling for Local Properties on a Graph Stream A. Pavan, Iowa State University Kanat Tangwongsan, IBM Research Srikanta Tirthapura, Iowa.
Advanced Topics NP-complete reports. Continue on NP, parallelism.
Süleyman Fatih GİRİŞ CONTENT 1. Introduction 2. Programming Model 2.1 Example 2.2 More Examples 3. Implementation 3.1 ExecutionOverview 3.2.
Purpose of study A high-quality computing education equips pupils to use computational thinking and creativity to understand and change the world. Computing.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
Pattern Matching in DAME using AURA technology Jim Austin, Robert Davis, Bojian Liang, Andy Pasley University of York.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.
Distributed Computing Rik Sarkar. Distributed Computing Old style: Use a computer for computation.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
CSE 548 Advanced Computer Network Security Document Search in MobiCloud using Hadoop Framework Sayan Cole Jaya Chakladar Group No: 1.
Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 作者 :Rutvik Karve , Devendra Dahiphale , Amit Chhajer 報告 : 饒展榕.
MapReduce M/R slides adapted from those of Jeff Dean’s.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Anomaly Detection via Online Over-Sampling Principal Component Analysis.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
Google News Personalization Big Data reading group November 12, 2007 Presented by Babu Pillai.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Mining of Massive Datasets Edited based on Leskovec’s from
A “Peak” at the Algorithm Behind “Peaklet Analysis” Software Bruce Kessler Western Kentucky University for KY MAA Annual Meeting March 26, 2011 * This.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )
Big Data is a Big Deal!.
Large Graph Mining: Power Tools and a Practitioner’s guide
Introduction to MapReduce and Hadoop
MapReduce Simplied Data Processing on Large Clusters
湖南大学-信息科学与工程学院-计算机与科学系
Ch 4. The Evolution of Analytic Scalability
Asymmetric Transitivity Preserving Graph Embedding
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

MapReduce on Matlab By: Erum Afzal

MapReduce MapReduce is a programming model devised at Google to facilitate the processing of large data sets. For example, it is used at Google for indexing websites

Matlab Matlab, being software tenders with a technical computing environment. It is being used for numerical manipulation, simulations and data processing.

MapReduce on Matlab MapReduce on Matlab allows Matlab users to apply MapReduce’s framework to their own data processing requirements. Like all data mining tasks, dense detailed digital images. Similarly if we could import matlab file to Map Reduce framework several functionalities of Matlab can processed on Hadoop as well as.

Working of MapReduce As with the application of MapReduce, data can be processed using multiple processors in parallel. With this it can Handle large volumes of input data. Speed up processing due to parallelization of tasks

Continue… Map: Each piece of input data, identified by a key and a value, is mapped to 1 or more intermediate key/value pairs. Reduce Each worker processes a part of the intermediate key/values pairs, to generate the final key/value pairs.

Working of Matlab The Matlab Parallel Computing Toolbox offers the framework to write programs for a cluster of computers. This enables a master computer to dispatch jobs to workers running on McGill’s cluster. Master creates MapReduce job, passes user defined Map and Reduce functions to workers At each worker, the input key pairs are fed into the map function to get intermediate key/value pairs At each worker, the intermediate key/value pairs are fed into the reduce function to get final key/value pairs the output

Continue…

Orthogonal Matching Pursuit Here in example A sparse signal is that x, can be stored by multiplying it with a measurement matrix, A: Where, y = Ax y can be used to recover x by using OMP,

Application with Mapreduce OMP becomes slow in its tradition solution as A grows larger in size. If we resolve the problem by processing individual performed using MapReduce.

Continue…. OMP becomes slow as A grows larger in size. This problem can be solved by processing individual slices of A in parallel. The MapReduce method actually.

Results MapReduce was implemented on Matlab, and was used to run Orthogonal Matching Pursuit.. MapReduce on Matlab has the potential to improve the performance of numerous parallel processing algorithms by bringing the power ofthe MapReduce programming model to Matlab

Singular Value Decomposition (SVD) The Singular Value Decomposition (SVD) is a powerful matrix decomposition frequently used for dimensionality reduction. SVD is widely used in problems involving least squares problems, linear systems and finding a low rank representation of a matrix. A wide range of applications uses SVD as its main algorithmic tool.

Problem Finding patterns in large scale graphs, with millions and billions of edges is increasing in computer network security intrusion detection, spamming, in web applications. Such a setting is the estimation of the clustering coefficients and the transitivity ratio of the graph, which effectively translates in computing the number of triangles that each node participates in or the total number of triangles in the graph respectively. The triangles are a frequently used network statistic in the exponential random graph model and naturally appear in models of real-world network evolution, the triangles have been used in several applications such as spam detection,uncovering the hidden thematic structure of the web and for link recommendation in online social networks. It is worth noting that in social networks triangles have a natural interpretation. AS “friends of friends are frequently friends themselves.”

MATLAB implementation, k-rank approx function 0 = EigenTriangleLocal(A,k) {A is the adjacency matrix, k is the required rank approximation} n = size(A,1); 0 = zeros(n,1); {Preallocate space for 0} opts.isreal=1; opts.issym=1; {Specify that the matrix is real and symmetric} [u l] = eigs(A,k,’LM’,opts); {Compute top k eigenvalues and eigenvectors of A} l = diag(l)’; for j=1:n do 0(j) = sum( l.ˆ3.*u(j,:).ˆ2)/2 end for

Summary of network data

Results

Continue…. In this work the EIGENTRIANGLE and EIGENTRIANGLELOCAL algorithms have been proposed to estimate the total number of triangles and the number of triangles per node respectively in an undirected, outweighed graph. The special spectral properties which real-world networks frequently possess make both algorithms efficient for the triangle counting problem. our knowledge, the knowledge

Fast Randomized Tensor Decompositions There are many real-world problems involve multiple aspect data. For example fMRI (functional magnetic resonance imaging) scans, one of the most popular neuroimaging techniques, result in multi-aspect data: voxels × subjects × trials ×task conditions × timeticks. Monitoring systems result in three-way data, machine id × type of measurement × timeticks. The machine depending on the setting can be for instance a sensor (sensor networks) or a computer (computer networks). Large data volumes generated by personalized web search, are frequently modeled as three way tensors, i.e., users × queries × web pages. All above is quite time taking task….

Problem Ignoring the multi-aspect nature of the data by flattening them in a two-way matrix and applying an exploratory analysis algorithm, e.g., singular value decomposition (SVD) is not optimal and typically hurts significantly the performance The same problem holds in the case of applying e.g., SVD on different 2-way slices of the tensor as observed by [94]. On the contrary, multiway data analysis techniques succeed in capturing the multilinear structures in the data, thus achieving better performance than the aforementioned ideas.

Problem Solution Tensor decompositions have found as solution in many applications in different scientific disciplines. Specially in computer vision and signal processing like neuroscience, time series anomaly detection, psychometrics, graph analysis and data mining.

Algorithm 8 MACH-HOSVD

Results

Continue…. Tensor decompositions are useful in many real world problems. A simple randomized algorithm MACH is purposed which is easily parallelizable and adapted to online streaming systems. This algorithm will be incorporated in the PEGASUS library, a graph and tensor mining system for handling large amounts of data.

More Applications Comparing the Performance of Clusters, Hadoop, and Active Disks on Microarray Correlation Computations. Beyond Online Aggregation: Parallel and Incremental Data Mining with Online Map- Reduce (DRAFT). Map-Reduce for Machine Learning on Multicore.

Refrences Charalampos E. Tsourakaki “Data Mining with MAPREDUCE:Graph and Tensor Algorithmswith Applications”, March Arjita Madan, “ MapReduce on Matlab”