6. Files of (horizontal) Records The concepts of pages or blocks suffices when doing I/O, but the higher layers of a DBMS operate on records and files.

6. Files of (horizontal) Records The concepts of pages or blocks suffices when doing I/O, but the higher layers of a DBMS operate on records and files of records FILE: A collection of pages, each containing a collection of records, which supports: – insert, delete, and modify (on a record) – read a particular record (specified using Record ID or RID) – scan of all records (possibly with some conditions on the records to be retrieved) Section 6 # 1

Files Types The three basic file organizations supported by the File Manager of most DBMSs are: HEAP FILES (files of un-ordered records) SORTED or CLUSTERED FILES ( records sorted or clustered on some field(s) ) HASHED FILES (files in which records are positioned based on a hash function on some field(s)

Unordered (Heap) Files Simplest file structure contains records in no particular order. As file grows and shrinks, disk pages are allocated and de- allocated. To support record level operations, DBMS must: – keep track of the pages in a file – keep track of free space on pages – keep track of the records on a page There are many alternatives for keeping track of these.

Heap File Implemented as a Linked List The header page id and Heap file name must be stored someplace. Each page contains 2 `pointers’ plus data. Header Page Data Page Data Page Data Page Data Page Data Page Data Page Pages with Free Space Full Pages … …

Heap File Using a Page Directory The entry for a page can include the number of free bytes on the page. The directory is a collection of pages; linked list implementation is just one alternative. Data Page 1 Data Page 2 Data Page N Header Page DIRECTORY (linked list of Header blocks Containing page IDs)

Heap File Facts Record insert? Method-1: System inserts new records at the End Of File (need Next Open Slot indicator), moves last record into freed slot following a deletion, updates indicator. - doesn't allow support of the RID or RRN concept. Or a deleted record slot can remain empty (until file reorganized) - If record are only moved into freed slots upon reorganization, then RIDs and RRNs can be supported. <- page | record |0 | record |1 | record |2 | |3 | |4 | |5 |3 | <- next open slot indicator

Heap File Facts Record insert Method-2: Insert in any open slot. Must maintain a data structure indicating open slots (e.g., bit filter (or bit map) identifies open slots) - as a list or - as a bit_filter <- page record 101001  availability bit filter (0 means available) If we want all records with a given value in particular field, need an "index" Of course index files must provide a fast way to find the particular value entries of interest (the heap file organization for index files would makes little sense). Index files are usually sorted files. Indexes are examples of ACCESS PATHS.

Sorted File (Clustered File) Facts File is sorted on one attribute (e.g., using the unpacked record-pointer page-format) Advantages over heap includes: - reading records in that particular order is efficient - finding next record in order is efficient. For efficient "value-based" ordering (clustering), a level of indirection is useful (unpacked, record-pointer page-format) page 3 RID(3,3) 0 RID(3,0) 1 RID(3,4) 2 RID(3,2) 3 RID(3,1) 4 5 20341  unpacked record-pointer page-format slot-directory What happens when a page fills up? RID(3,8) 5 RID(3,6) 0 Use an overflow page for next record? Ovfl page 9 0 1 2 3 4 5 | When a page fills up and,e.g., a record must be inserted and clustered between (3,1) and (3,5), one solution is to simply place it on an overflow page in arrival order. Then the overflow page is scanned like an unordered file page, when necessary. Periodically the primary and overflow pages can be reorganized as an unpacked record-pointer extent to improve sequential access speed (next slide for an example)

Sorted File (Clustered File) Facts Reorganizing a Sorted File with several overflow levels. THE BEFORE: Ovfl page 2 0 1 2 3 4 5 page 3 RID(3,3) 0 RID(3,0) 1 RID(3,4) 2 RID(3,2) 3 RID(3,1) 4 5 520341 RID(3,8) RID(3,6) Ovfl page 9 0 1 2 3 4 5 RID(3,9) RID(3,5) RID(3,11) RID(3,10) RID(3,15) 534102 RID(3,7) 0 341250 Ovfl page 9 0 1 2 3 4 5 AFTER: Ovfl page 2 0 1 2 3 4 5 page 3 RID(3,3) 0 RID(3,0) 1 RID(3,4) 2 RID(3,2) 3 RID(3,1) 4 5 520341 RID(3,5) RID(3,6) RID(3,9) RID(3,8) RID(3,11) RID(3,10) RID(3,7) RID(3,15) 0 Here, re-organization requires only 2 record swaps and 1 slot directory re-write.

Hash files A hash function is applied to the key of a record to determine which "file bucket" it goes to ("file buckets" are usually the pages of that file. Assume there are M pages, numbered 0 through M-1. Then the hash function can be any function that converts the key to a number between 0 and M-1 (e.g., for numeric keys, MOD M is typically used. For non-numeric keys, first map the non-numeric key value to a number and then apply MOD M...) ). Collisions or Overflows can occur (when a new record hashes to a bucket that is already full). The simplest Overflow method is to use separate Overflow pages : e.g., h(key) mod M h key Primary bucket pages 1 0 M-1 Overflow pages(as separate link list) Overflow pages are allocated if needed (as a separate link list for each bucket. Page#s are needed for pointers) or a shared link list. 2 Overflow pages(as Single link list)...... Long overflow chains can develop and degrade performance. – Extendible and Linear Hashing are dynamic techniques to fix this problem.

Other Static Hashing overflow handling methods e.g., h(key) mod M h key bucket pages rec Overflow can be handled by open addressing also (more commonly used for internal hash tables where a bucket is a allocation of main memory, not a page. rec In Open Addressing, upon collision, search forward in the bucket sequence for the next open record slot. rec 01234560123456 h(rec_key)=1 Collision! 2? no 3? yes Then to search, apply h. If not found, search sequentially ahead until found (circle around to search start point)!

Other overflow handling methods h 0 (key) h then h 1 then h 2... bucket pages rec Overflow can be handled by re-hashing also. rec In re-hashing, upon collision, apply next hash function from a sequence of hash functions.. 01234560123456 rec Then to search, apply h. If not found, apply next hash function until found or list exhausted. These methods can be combined also.

Extendible Hashing Idea: Use directory of pointers to buckets, split just the bucket that overflowed double the directory when needed Directory is much smaller than file, so doubling it is cheap. Only one page of data entries is split. No overflow page! Trick lies in how hash function is adjusted!

Example blocking factor(bfr)=4 (# entries per bucket) Local depth of a bucket: # of bits used to determine if an entry belongs to bucket Global depth of directory: Max # of bits needed to tell which bucket an entry belongs to (= max of local depths) Insert : If bucket is full, split it ( allocate 1 new page, re-distribute over those 2 pages ). GLOBAL DEPTH = gd DATA PAGES 13* 00 01 10 11 2 2 2 2 2 LOCAL DEPTH DIRECTORY Bucket A Bucket B Bucket C Bucket D 10* 1*21* 4*12*32* 16* 15*7*19* 5* To find the bucket for a new key value, r, take just the last global depth bits of h(r), not all of it! (last 2 bits in this example) (for simplicity we let h(r)=r here) E.g., h(5)=5=101 binary thus it's in bucket pointed in the directory by 01. Apply hash function, h, to key value, r Follow pointer of last 2 bits of h(r).

Example how did we get there? GLOBAL DEPTH = gd DATA PAGES 0 1 1 1 LOCAL DEPTH DIRECTORY Bucket A 4* First insert is 4: h(4) = 4 = 100 binary in bucket pointed to by 0 in the directory. 0 Bucket B

Example GLOBAL DEPTH = gd DATA PAGES 0 1 1 1 LOCAL DEPTH DIRECTORY Bucket A 4* Insert: 12, 32, 16 and 1 1 Bucket B 1* 12*32* 16* h(12) = 12 = 1100 binary in bucket pointed in the directory by 0. h(32) = 32 = 10 0000 binary in bucket pointed in the directory by 0. h(16) = 16 = 1 0000 binary in bucket pointed in the directory by 0. h(1) = 1 = 1 binary in bucket pointed in the directory by 1.

Example GLOBAL DEPTH = gd DATA PAGES 0 1 1 1 LOCAL DEPTH DIRECTORY Bucket A 4* Insert: 5, 21 and 13 1 Bucket B 1* 12*32* 16* 13*21* 5* h(5) = 5 = 101 binary in bucket pointed in the directory by 1. h(21) = 21 = 1 0101 binary in bucket pointed in the directory by 1. h(13) = 13 = 1101 binary in bucket pointed in the directory by 1.

0 0 1 1 Example GLOBAL DEPTH = gd DATA PAGES 0 1 2 2 LOCAL DEPTH DIRECTORY Bucket A 4* 9 th insert: 10 h(10) = 10 = 1010 binary in bucket pointed in the directory by 0. Collision! 1 Bucket B 1* 12*32* 16* 13*21* 5* 2 Bucket C 10* Split bucket A into A and C. 0 1 Reset one pointer. Double directory (by copying what is there and adding a bit on the left). Redistribute values among A and C (if necessary Not necessary this time since all green bits (2's position bits) are correct: 4 = 100 12 = 1100 32 =100000 16 = 10000 10 = 1010

0 0 1 1 Example GLOBAL DEPTH = gd 0 1 2 2 LOCAL DEPTH DIRECTORY Bucket A 4* h(15) = 15 = 1111 binary 1 Bucket B 1* 12*32* 16* 13*21* 5* 2 Bucket C 10* Split bucket B into B and D. No need to double directory because the local depth of B is less than the global depth. 0 1 Reset one pointer, and redistribute values among B and D (if necessary, not necessary this time). DATA PAGES 1 Bucket D 15* Reset local depth of B and D 2 2 Inserts: 15, 7 and 19 h(15) = 7 = 111 binary 19*7* h(19) = 15 = 1 0011 binary

Insert h(20)=20=10100  Bucket pointed to by 00 is full! 2 2 2 2 LOCAL DEPTH 2 GLOBAL DEPTH Bucket A Bucket B Bucket C Bucket D 1* 5*21*13* 10* 15*7*19* 00 01 10 11 4* 12* 32* 16* Split A.Double directory and reset 1 pointer. 00 01 10 11 0 0 0 0 1 1 1 1 3 Bucket E (`split image' of Bucket A) 3 3 4* 12* Redistribute contents of A

Points to Note 20 = binary 10100. Last 2 bits (00) tell us r belongs in either A or A2, but not which one. Last 3 bits needed to tell which one. – Local depth of a bucket: # of bits used to determine if an entry belongs to this bucket. – Global depth of directory: Max # of bits needed to tell which bucket an entry belongs to (= max of local depths) When does bucket split cause directory doubling? – Before insert, local depth of bucket = global depth. Insert causes local depth to become > global depth; directory is doubled by copying it over and `fixing’ pointer to split image page. – Use of least significant bits enables efficient doubling via copying of directory!)

Comments on Extendible Hashing If directory fits in memory, equality search answered with one disk access; else two. – Directory grows in spurts, and, if the distribution of hash values is skewed, directory can grow large. – Multiple entries with same hash value cause problems! Delete: If removal of data entry makes bucket empty, can be merged with its `split image’. –As soon as each directory element points to same bucket as its (merged) split image, can halve directory.

Linear Hash File Starts with M buckets (numbered 0, 1,..., M-1 and initial hash function, h 0 =mod M (or more general, h 0 (key)=h(key)mod M for any hash ftn h which maps into the integers Use Chaining to shared overflow-pages to handle overflows. At the first overflow, split bucket 0 into bucket 0 and bucket M and rehash bucket 0 records using h 1 =mod 2M. Henceforth if h 0 yields value  n=0, rehash using h 1 =mod 2M At the next overflow, split bucket 1 into bucket 1 and bucket M+1 and rehash bucket 1 records using h 1 =mod 2M. Henceforth if h 0 yields value  n=1, use h 1... When all of the original M buckets have been split (M collisions), then rehash all overflow records using h 1. Relabel h 1 as h 0, (discarding the old h 0 forever) and start a new "round" by repeating the process above for all future collisions (i.e., now there are buckets 0,...,(2M-1) and h 0 = MOD 2M ). To search for a record, let n = number of splits so far in the given round, if h 0 (key) is not greater than n, then use h 1, else use h 0.

02|BAID |NY |NY 5 21 Linear Hash ex. M=5 Bucketpg 0 45 1 99 2 23 3 78 4 98 25|CLAY |OUTBK|NJ 33|GOOD |GATER|FL 14|THAISZ|KNOB |NJ 11|BROWN |NY |NY 45 | | | 23 | | | 99 27|JONES |MHD |MN 22|ZHU |SF |CA Insert h 0 (27)  mod 5 (27)=2 C! 0  0,5, mod 10 rehash 0; n=0 27|JONES |MHD |MN OF | | | 21 Insert h 0 (8)  mod 5 (8)=3 8|SINGH |FGO |ND Insert h 0 (15)  mod 5 (15)=0  n 15|LOWE |ZAP |ND h 1 (15)  mod 10 (15)=5 15|LOWE |ZAP |ND Insert h 0 (32)  mod 5 (32)=2! 1  1,6, mod 10 rehash 1; n=1 32|FARNS |BEEP |NY 24|CROWE |SJ |CA 78 98 32|FARNS |BEEP |NY 6 101 21|BARBIE|NY |NY | | | 101 Insert h 0 (39)  mod 5 (39)=4! 2  2,7; mod 10 rehash 2; n=2 39|TULIP |DERLK|IN 7 104 | | | 104 39|TULIP |DERLK|IN Insert h 0 (31)  mod 5 (31)=1<n mod 10 (31)=1! 3  3,8; mod 10 rehash 3 n=3 31|ROSE |MIAME|OH 8 105 | | | 105 Insert h 0 (36)  mod 5 (36)=1<n mod 10 (36)=6 36|SCHOTZ|CORN |IA | | |

02|BAID |NY |NY 5 21 LHex. 2 nd rnd M=10 h 0  mod 10 Bucketpg 0 45 1 99 2 23 3 78 4 98 25|CLAY |OUTBK|NJ 33|GOOD |GATER|FL 14|THAISZ|KNOB |NJ 11|BROWN |NY |NY 45 | | | 23 | | | 27|JONES |MHD |MN 22|ZHU |SF |CA h 0 27=7 | | | 21 8|SINGH |FGO |ND 15|LOWE |ZAP |ND 24|CROWE |SJ |CA 78 98 32|FARNS |BEEP |NY 6 101 21|BARBIE|NY |NY | | | 7 104 | | | 39|TULIP |DERLK|IN 31|ROSE |MIAME|OH 8 105 | | | 36|SCHOTZ|CORN |IA | | | 99 101 104 105 h 0 32=2 Collision! rehash mod 20 9 109 | | | 109 h 0 39=9 h 0 31=1 Collision! rehash mod 20 10 110 | | | 110 | | | OVERFLOW Insert h 0 (10)  mod 10 (10)=0 10|RADHA |FGO |ND ETC.

Summary Hash-based indexes: best for equality searches, cannot support range searches. Static Hashing can lead to performance degradation due to collision handling problems. Extendible Hashing avoids performance problems by splitting a full bucket when a new data entry is to be added to it. (Duplicates may require overflow pages.) – Directory to keep track of buckets, doubles periodically. – Can get large with skewed data; additional I/O if this does not fit in main memory.

Summary Linear Hashing avoids directory by splitting buckets round-robin, and using overflow pages. – Overflow pages not likely to be long. – Duplicates handled easily. – Space utilization could be lower than Extendible Hashing, since splits not concentrated on `dense’ data areas. skewed occurs when the hash values of the data entries are not uniform! v 1 v 2 v 3 v 4 v 5... v n Distribution skew count values... v 1 v 2 v 3 v 4 v 5... v n Count skew count values... v 1 v 2 v 3 v 4 v 5... v n Dist & Count skew count values

Map Reduce (from Wikipedia) MapReduce is a framework for processing parallelizable problems across huge datasets using a large number of computers (nodes). Computational processing can occur on data stored either in a file system (unstructured) or in a database (structured). "Map" step: The master node takes the input, divides it into smaller sub-problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem and passes the answer back. "Reduce" step: The master node then collects the answers to all the sub-problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve. MapReduce allows for distributed processing of the map and reduction operations. Provided each mapping operation is independent of the others, all maps can be performed in parallel. Similarly, a set of 'reducers' can perform the reduction phase - provided all outputs of the map operation that share the same key are presented to the same reducer at the same time, or if the reduction function is associative. Another way to look at MapReduce is as a 5-step parallel and distributed computation: Prepare the Map() input – the "MapReduce system" designates Map processors, assigns the K1 input key value each processor would work on, and provides that processor with all the input data associated with that key value. Run the user-provided Map() code – Map() is run exactly once for each K1 key value, generating output organized by key values K2. "Shuffle" the Map output to the Reduce processors – the MapReduce system designates Reduce processors, assigns the K2 key value each processor would work on, and provides that processor with all the Map-generated data associated with that key value. Run the user-provided Reduce() code – Reduce() is run exactly once for each K2 key value produced by the Map step. Produce the final output – the MapReduce system collects all the Reduce output, and sorts it by K2 to produce the final outcome. Logically these 5 steps can be thought of as running in sequence – each step starts only after the previous step is completed – though in practice, of course, they can be intertwined, as long as the final result is not affected. In many situations the input data might already be distributed among many different servers, in which case step 1 could sometimes be greatly simplified by assigning Map servers that would process the locally present input data. Similarly, step 3 could sometimes be sped up by assigning Reduce processors that are as much as possible local to the Map-generated data they need to process. The Map and Reduce functions of MapReduce are both defined with respect to data structured in (key, value) pairs. Map takes one pair of data with a type in one data domain, and returns a list of pairs in a different domain: Map(k1,v1) → list(k2,v2) The Map function is applied in parallel to every pair in the input dataset. This produces a list of pairs for each call. After that, the MapReduce framework collects all pairs with the same key from all lists and groups them together, creating one group for each key. The Reduce function is applied in parallel to each group, producing a collection of values in the same domain: Reduce(k2,list(v2))→list(v3). Each Reduce call typically produces either one value v3 or an empty return, though one call is allowed to return more than one value. The returns of all calls are collected as the desired result list. Thus the MapReduce framework transforms a list of (key, value) pairs into a list of values.

Map Reduce-2 The prototypical MapReduce example counts the appearance of each word in a set of documents function map(String name, String document): // name: document name; document: document contents for each word w in document: emit (w, 1) function reduce(String word, Iterator partialCounts): // word: a word; partialCounts: a list of aggregated partial counts sum = 0 for each pc in partialCounts: sum += ParseInt(pc) emit (word, sum) Here, each document is split into words, and each word is counted by the map function, using the word as the result key. The framework puts together all the pairs with the same key and feeds them to the same call to reduce, thus this function just needs to sum all of its input values to find the total appearances of that word. As another example, imagine that for a database of 1.1 billion people, one would like to compute the average number of social contacts a person has according to age. In SQL such a query could be expressed as: SELECT age AS Y, AVG(contacts) AS A FROM social.person GROUP BY age ORDER BY age. Using MapReduce, the K1 key values could be the integers 1 thu 1,100, each representing a batch of 1M records,K2 key value could be a person’s age in yrs, and this comp could be achieved using the following functions:SQL function Map is input: integer K1 between 1 and 1100, representing a batch of 1 million social.person records for each social.person record in the K1 batch do let Y be the person's age let N be the number of contacts the person has produce one output record repeat end function function Reduce is input: age (in years) Y for each input record do Accumulate in S the sum of N; Accumulate in C the count of records so far repeat let A be S/C produce one output record end function

The MapReduce System would line up the 1,100 Map processors, and would provide each with its corresponding 1 million input records. The Map step would produce 1.1 billion records, with Y values ranging between, say, 8 and 103. The MapReduce System would then line up the 96 Reduce processors by performing shuffling operation of the key/value pairs due to the fact that we need average per age, and provide each with its millions of corresponding input records. The Reduce step would result in the much reduced set of only 96 output records, which would be put in the final result file, sorted by Y. Dataflow The frozen part of the MapReduce framework is a large distributed sort. The hot spots, which the application defines, are: an input reader, a Map function, a partition function, a compare function, a Reduce function, an output writer Input reader: The input reader divides the input into appropriate size 'splits' (in practice typically 16 MB to 128 MB) and the framework assigns one split to each Map function. The input reader reads data from stable storage (typically a distributed file system) and generates key/value pairs. A common example will read a directory full of text files and return each line as a record. Map function: The Map function takes a series of key/value pairs, processes each, and generates zero or more output key/value pairs. The input and output types of the map can be (and often are) different from each other. If the application is doing a word count, the map function would break the line into words and output a key/value pair for each word. Each output pair would contain the word as the key and the number of instances of that word in the line as the value. Partition function: Each Map function output is allocated to a particular reducer by the application's partition function for sharding purposes. The partition function is given the key and the number of reducers and returns the index of the desired reduce.sharding A typical default is to hash the key and use the hash value modulo the number of reducers. It is important to pick a partition function that gives an approximately uniform distribution of data per shard for load-balancing purposes, otherwise the MapReduce operation can be held up waiting for slow reducers (reducers assigned more than their share of data) to finish.hashmoduloload-balancing Between the map and reduce stages, the data is shuffled (parallel-sorted / exchanged between nodes) in order to move the data from the map node that produced it to the shard in which it will be reduced. The shuffle can sometimes take longer than the computation time depending on network bandwidth, CPU speeds, data produced and time taken by map and reduce computations. Comparison function: The input for each Reduce is pulled from the machine where Map ran and sorted using the app's comparison function. Reduce function:The framework calls the application's Reduce function once for each unique key in the sorted order. The Reduce can iterate through the values that are associated with that key and produce zero or more outputs. In the word count example, the Reduce function takes the input values, sums them and generates a single output of the word and the final sum. Output writer: The Output Writer writes the output of the Reduce to the stable storage, usually a distributed file system.distributed file system Distribution and reliability: MapReduce achieves reliability by parceling out a number of operations on the set of data to each node in the network. Each node is expected to report back periodically with completed work and status updates. If a node falls silent for longer than that interval, the master node (similar to the master server in the Google File System) records the node as dead and sends out the node's assigned work to other nodes. Individual operations use atomic operations for naming file outputs as a check to ensure that there are not parallel conflicting threads running. When files are renamed, it’s possible to also copy them to another name in addition to the task name (allowing for side-effects).Google File Systematomicside-effects Map Reduce-3

The reduce operations operate much the same way. Because of their inferior properties with regard to parallel operations, the master node attempts to schedule reduce operations on the same node, or in the same rack as the node holding the data being operated on. This property is desirable as it conserves bandwidth across the backbone network of the datacenter. Implementations are not necessarily highly reliable. For example, in older versions of Hadoop theHadoop NameNode was a single point of failure for the distributed filesystem. Later versions of Hadoop have high availability with an active/passive failover for the "NameNode."single point of failure Uses: MapReduce is useful in a wide range of applications, including distributed pattern-based searching, distributed sorting, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document clustering, machine learning, [5] and statistical machine translation. Moreover, the MapReduce model has been adapted to several computing environments like multi-core and many-core systems, [6][7] desktop grids, [8] volunteer computing environments, [9] dynamic cloud environments, [10] and mobile environments. [11]inverted indexmachine learning [5]statistical machine translation [6][7] [8] [9] [10] [11] At Google, MapReduce was used to completely regenerate Google's index of the World Wide Web. It replaced the old ad hoc programs that updated the index and ran the various analyses. [12] MapReduce's stable inputs and outputs are usually stored in a distributed file system. The transient data is usually stored on local disk and fetched remotely by the reducers.World Wide Web [12]distributed file system Criticism: David DeWitt and Michael Stonebraker, computer scientists specializing in parallel databases and shared-nothing architectures, have been critical of the breadth of problems that MapReduce can be used for. [13] They called its interface too low-level and questioned whether it really represents the paradigm shift its proponents have claimed it is. [14] They challenged the MapReduce proponents' claims of novelty, citing Teradata as an example of prior art that has existed for over two decades. They also compared MapReduce programmers to Codasyl programmers, noting both are "writing in a low-level language performing low-level record manipulation." [14] MapReduce's use of input files and lack of schema support prevents the performance improvements enabled by common database system features such as B-trees and hash partitioning, tho projects such as Pig (or PigLatin), Sawzall, Apache Hive, [15] YSmart, [16] HBase [17] and BigTable [17][18] are addressing these probs.David DeWittMichael Stonebrakerparallel databasesshared-nothing architectures [13]paradigm shift [14] Teradataprior artCodasyllow-level language [14]schemaB-treeshash partitioningPig (or PigLatin)SawzallApache Hive [15]YSmart [16]HBase [17]BigTable [17][18] Greg Jorgensen wrote an article rejecting these views. [19] Jorgensen asserts that DeWitt and Stonebraker's entire analysis is groundless as MapReduce was never designed nor intended to be used as a database. DeWitt and Stonebraker have published a detailed benchmark study in 2009 comparing performance of Hadoop's MapReduce and RDBMS approaches on several specific problems. [20] They concluded that relational databases offer real advantages for many kinds of data use, especially on complex processing or where the data is used across an enterprise, but that MapReduce may be easier for users to adopt for simple or one-time processing tasks. They have published the data and code used in their study to allow other researchers to do comparable studies. Google has been granted a patent on MapReduce. [21] However, there have been claims that this patent should not have been granted because MapReduce is too similar to existing products. For example, map and reduce functionality can be very easily implemented in Oracle's PL/SQL database oriented language. [22] [19]Hadoop'sRDBMS [20] [21]Oracle'sPL/SQL [22] Conferences and users groups The First International Workshop on MapReduce and its Applications (MAPREDUCE'10)The First International Workshop on MapReduce and its Applications (MAPREDUCE'10) was held with the HPDC conference and OGF'29 meeting in Chicago, IL. MapReduce Users Groups around the world.MapReduce Users Groups Map Reduce-4

See also HadoopHadoop, Apache's free and open source implementation of MapReduce.Apache PentahoPentaho - Open source data integration (Kettle), analytics, reporting, visualization and predictive analytics directly from Hadoop nodes NutchNutch - An effort to build an open source search engine based on Lucene and Hadoop, also created by Doug CuttingLucene DatameerDatameer Analytics Solution (DAS) - data source integration, storage, analytics engine and visualization Apache AccumuloApache Accumulo - Secure Big Table HBase - BigTable-model database Hypertable - HBase alternativeHBaseBigTableHypertable Apache CassandraApache Cassandra - column-oriented DB supports access from Hadoop HPCC - LexisNexis Risk Solutions High Perf Computing ClusterHPCCLexisNexis Sector/SphereSector/Sphere - Open source distributed storage and processing Cloud computing Big data Data Intensive ComputingCloud computingBig dataData Intensive Computing Algorithmic skeletonAlgorithmic skeleton - A high-level parallel programming model for parallel and distributed computing MongoDBMongoDB - A scalable, high-performance, open source NoSQL database MapReduce-MPI MapReduce-MPI Libraryscalableopen sourceNoSQLdatabaseMapReduce-MPI Specific references: ^^ Google spotlights data center inner workings | Tech news blog - CNET News.comGoogle spotlights data center inner workings | Tech news blog - CNET News.com ^^ "Our abstraction is inspired by the map and reduce primitives present in Lisp and many other functional languages." -"MapReduce: Simplified Data Processing on Large Clusters", by Jeffrey Dean and Sanjay Ghemawat; from Google Research"MapReduce: Simplified Data Processing on Large Clusters"Google Research ^^ "Google's MapReduce Programming Model -- Revisited" — paper by Ralf Lämmel; from Microsoft"Google's MapReduce Programming Model -- Revisited"Microsoft ^^ http://research.google.com/archive/mapreduce-osdi04-slides/index-auto-0004.htmlhttp://research.google.com/archive/mapreduce-osdi04-slides/index-auto-0004.html ^^ Cheng-Tao Chu; Sang Kyun Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Andrew Ng, and Kunle Olukotun. "Map-Reduce for Machine Learning on Multicore". NIPS 2006.Cheng-Tao ChuKunle Olukotun"Map-Reduce for Machine Learning on Multicore" ^^ Colby Ranger; Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, and Christos Kozyrakis. "Evaluating MapReduce for Multi-core and Multiprocessor Systems". HPCA 2007, Best Paper.Colby Ranger"Evaluating MapReduce for Multi-core and Multiprocessor Systems" ^^ Bingsheng He, et al.. "Mars: a MapReduce framework on graphics processors". PACT'08."Mars: a MapReduce framework on graphics processors" ^^ Bing Tang, Moca, M., Chevalier, S., Haiwu He and Fedak, G. "Towards MapReduce for Desktop Grid Computing". 3PGCIC'10."Towards MapReduce for Desktop Grid Computing" ^^ Heshan Lin, et al. "MOON: MapReduce On Opportunistic eNvironments". HPDC'10."MOON: MapReduce On Opportunistic eNvironments" ^^ Fabrizio Marozzo, Domenico Talia, Paolo Trunfio. "P2P-MapReduce: Parallel data processing in dynamic Cloud environments". In: Journal of Computer and System Sciences, vol. 78, n. 5, pp. 1382--1402, Elsevier Science, September 2012."P2P-MapReduce: Parallel data processing in dynamic Cloud environments" ^^ Adam Dou, et al. "Misco: a MapReduce framework for mobile systems". HPDC'10."Misco: a MapReduce framework for mobile systems" ^^ "How Google Works". baselinemag.com. "As of October, Google was running about 3,000 computing jobs per day through MapReduce, representing thousands of machine-days. Among others, these batch routines analyze latest Web pages and update Google's indexes.""How Google Works" ^^ "Database Experts Jump the MapReduce Shark"."Database Experts Jump the MapReduce Shark" ^ a b David DeWitt; Michael Stonebraker. "MapReduce: A major step backwards". craig-henderson.blogspot.com. Retrieved 2008-08-27. a bDavid DeWittMichael Stonebraker"MapReduce: A major step backwards" ^^ "Apache Hive - Index of - Apache Software Foundation"."Apache Hive - Index of - Apache Software Foundation" ^^ Rubao Lee, et al "YSmart: Yet Another SQL-to-MapReduce Translator" (PDF)."YSmart: Yet Another SQL-to-MapReduce Translator" Map Reduce-5

^ a b "HBase - HBase Home - Apache Software Foundation". a b"HBase - HBase Home - Apache Software Foundation" ^^ "Bigtable: A Distributed Storage System for Structured Data" (PDF)."Bigtable: A Distributed Storage System for Structured Data" ^^ Greg Jorgensen. "Relational Database Experts Jump The MapReduce Shark". typicalprogrammer.com. Retrieved 2009-11-11.Greg Jorgensen"Relational Database Experts Jump The MapReduce Shark" ^^ D. J. Dewitt, M. Stonebraker. et al "A Comparison of Approaches to Large-Scale Data Analysis". Brown University. Retrieved 2010-01-11.D. J. DewittM. Stonebraker"A Comparison of Approaches to Large-Scale Data Analysis" ^^ US Patent 7,650,331: "System and method for efficient large-scale data processing "US Patent 7,650,331: "System and method for efficient large-scale data processing " ^^ Curt Monash. "More patent nonsense — Google MapReduce". dbms2.com. Retrieved 2010-03-07.Curt Monash"More patent nonsense — Google MapReduce" General references: Dean, Jeffrey & Ghemawat, Sanjay (2004). "MapReduce: Simplified Data Processing on Large Clusters". Retrieved Nov. 23, 2011."MapReduce: Simplified Data Processing on Large Clusters" Matt WIlliams (2009). "Understanding Map-Reduce". Retrieved Apr. 13, 2011."Understanding Map-Reduce" External links: Papers "CloudSVM: Training an SVM Classifier in Cloud Computing Systems""CloudSVM: Training an SVM Classifier in Cloud Computing Systems"-paper by F. Ozgur Catak, M. Erdal Balaban, Springer, LNCS "A Hierarchical Framework for Cross-Domain MapReduce Execution""A Hierarchical Framework for Cross-Domain MapReduce Execution" — paper by Yuan Luo, Zhenhua Guo, Yiming Sun, Beth Plale, Judy Qiu; from Indiana University and Wilfred Li; from University of California, San DiegoIndiana UniversityUniversity of California, San Diego "Interpreting the Data: Parallel Analysis with Sawzall""Interpreting the Data: Parallel Analysis with Sawzall" — paper by Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan; from Google LabsGoogle Labs "Evaluating MapReduce for Multi-core and Multiprocessor Systems""Evaluating MapReduce for Multi-core and Multiprocessor Systems" — paper by Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, and Christos Kozyrakis; from Stanford UniversityStanford University "Why MapReduce Matters to SQL Data Warehousing""Why MapReduce Matters to SQL Data Warehousing" — intro of MapReduce/SQL integration by Aster Data Systems and GreenplumAster Data SystemsGreenplum "MapReduce for the Cell B.E. Architecture""MapReduce for the Cell B.E. Architecture" — paper by Marc de Kruijf and Karthikeyan Sankaralingam; from University of Wisconsin–MadisonUniversity of Wisconsin–Madison "Mars: A MapReduce Framework on Graphics Processors""Mars: A MapReduce Framework on Graphics Processors" — paper by Bingsheng He, et al from Hong Kong University of Science and Technology; published in Proc. PACT 2008. It presents the design and implementation of MapReduce on graphics processors.Hong Kong University of Science and Technology "A Peer-to-Peer Framework for Supporting MapReduce Applications in Dynamic Cloud Environments""A Peer-to-Peer Framework for Supporting MapReduce Applications in Dynamic Cloud Environments" Fabrizio Marozzo, et al University of Calabria; Cloud Computing: Principles, Systems and Applications, chapt. 7, pp. 113–125, Springer, 2010, ISBN 978-1-84996-240-7.University of CalabriaISBN 978-1-84996-240-7 "Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters""Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters" — paper by Hung-Chih Yang, et al Yahoo and UCLA; published in Proc. of ACM SIGMOD, pp. 1029–1040, 2007. (This paper shows how to extend MapReduce for relational data processing.)YahooUCLA FLuX: the Fault-tolerant, Load Balancing eXchange operator from UC Berkeley provides an integration of partitioned parallelism with process pairs. This results in a more pipelined approach than Google's MapReduce with instantaneous failover, but with additional implementation cost.Fault-tolerantLoad BalancingUC Berkeley "A New Computation Model for Rack-Based Computing""A New Computation Model for Rack-Based Computing" — paper by Foto N. Afrati; Jeffrey D. Ullman; from Stanford University; Not published as of Nov 2009. This paper is an attempt to develop a general model in which one can compare algorithms for computing in an environment similar to what map-reduce expects.Stanford University FPMR: MapReduce framework on FPGAFPMR: MapReduce framework on FPGA—paper by Yi Shan, Bo Wang, Jing Yan, Yu Wang, Ningyi Xu, Huazhong Yang (2010), in FPGA '10, Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays. Map Reduce-6

6. Files of (horizontal) Records The concepts of pages or blocks suffices when doing I/O, but the higher layers of a DBMS operate on records and files.

Similar presentations

Presentation on theme: "6. Files of (horizontal) Records The concepts of pages or blocks suffices when doing I/O, but the higher layers of a DBMS operate on records and files."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

6. Files of (horizontal) Records The concepts of pages or blocks suffices when doing I/O, but the higher layers of a DBMS operate on records and files.

Similar presentations

Presentation on theme: "6. Files of (horizontal) Records The concepts of pages or blocks suffices when doing I/O, but the higher layers of a DBMS operate on records and files."— Presentation transcript:

Similar presentations

About project

Feedback