Presentation is loading. Please wait.

Presentation is loading. Please wait.

Map Reduce.

Similar presentations


Presentation on theme: "Map Reduce."— Presentation transcript:

1 Map Reduce

2 History 2003 - Google releases the Google File System
Google publishes white paper detailing the MapReduce paradigm Yahoo creates Hadoop (named after creator's son's stuffed elephant) Hadoop is scaled to tens of thousands of nodes Hadoop market > $800 million, big data market > $23 billion

3 MapReduce Based on functional programming (LISP), it utilizes parallel operations to make processing huge amounts of data feasible. This runtime system handles many of the messy engineering aspects of parallelization, fault tolerance, data distribution, load balancing and management of task communication. Motivation: implementing hundreds of special-purpose computations on large datasets Computing inverted indexes from Web content collected via Web crawling (which pages point at this page) Extracting statistics from Web logs (frequency distribution of search topics, by region, by type of user) Easy to conceptualize, hard to implement because of size of dataset.

4 MapReduce Part 2 An underlying model of the data is assumed. This model treats an object of interest in the form of a unique key that has associated content or value (key-value pair). Many computations can be expressed as applying a map operation to each logical record that produces a set of intermediate key-value pairs. Then apply a reduce operation to all the values that share the same key (the purpose of sharing a key is to combine derived data). The ideas of MapReduce are not new (most are from the 1950's) but the implementation for distributed systems was.

5 Example in Python Problem: We want the word count (number of particular word occurrences) for a document Map: for each word in the document emit the word-count pair def get_words(part_of_doc): for word in part_of_doc: yield word, 1 Reduce: combine the shared key and values def combine_counts(word, values): return word, sum(values) Note this is a different syntax than Project 6.

6 MapReduce Use Cases Distributed Grep:
Grep looks for a text pattern in a document Map: Output a line if it matches the pattern Reduce: Identity function (output everything it is given) Reverse Web-Link Graph Purpose is to output (target URLs, source URLs) pairs for each link to a target page found in a page named source. Map: Output each link found Reduce: Concatenates the list of all source URLs associated with a given target URL and outputs (target, list of sources) Inverted Index Find all of the documents that contain specific words Map: Parses each document and outputs each (word, document_id) pair Reduce: Takes all the pairs of a given word, sorts them by document_id, emits a (word, list of document_ids)


Download ppt "Map Reduce."

Similar presentations


Ads by Google