Presentation is loading. Please wait.

Presentation is loading. Please wait.

MapReduce and NoSQL CMSC 461 Michael Wilson. Big data  The term big data has become fairly popular as of late  There is a need to store vast quantities.

Similar presentations


Presentation on theme: "MapReduce and NoSQL CMSC 461 Michael Wilson. Big data  The term big data has become fairly popular as of late  There is a need to store vast quantities."— Presentation transcript:

1 MapReduce and NoSQL CMSC 461 Michael Wilson

2 Big data  The term big data has become fairly popular as of late  There is a need to store vast quantities of data and retrieve them in a short amount of time  Images, movies, etc.  Large files

3 MapReduce  http://research.google.com/archive/map reduce.html http://research.google.com/archive/map reduce.html  Concept pioneered by Google  Performing operations on large volumes of data  Map function  Reduce function

4 Map function  Map function  Receives a set of key value pairs as input  Performs some operation (user defined)  Produces a set of new key value pairs

5 Reduce function  Receives the intermediate key value pairs  Can have multiple values for the same key  Merges the values together in some way  Produces a merged output

6 When to use MapReduce  MapReduce doesn’t work for all problems  Problems have to be parallelizable  In other words, an algorithm that involves stateful steps is not necessarily a good candidate for MapReduce

7 Commodity hardware  MapReduce clusters are commodity hardware  X86 processors, several gigabytes of RAM  In this day and age, more computers are cheap  Rather than beef up the machines, just use more

8 Hadoop  Hadoop is a Java based MapReduce implementation  Very popular  Has a secondary component, HDFS  Hadoop Distributed File System

9 HDFS  File system spread across a Hadoop MapReduce cluster  Large block sizes – 64 MB by default  Very popular base for other distributed applications  In particular, NoSQL applications

10 NoSQL  NoSQL is a somewhat nebulous term  Basically means “not SQL,” or “something other than SQL”  Many different approaches  Key-Value stores are a big part of the NoSQL movement  Focus on them here

11 Key-Value?!  This almost seems like a step backward  Key-Value stores are far less structured  Can’t establish relations between entities in a key value store  Can’t constrain data very well  Why is reducing the structure gaining popularity?

12 Distributable nature  Many Key-Value stores can be distributed amongst many nodes  By distributing these nodes, searches and operations on vast swaths of data can be performed in a sensible amount of time  Not all, however  Some can be single server applications stored in RAM

13 NoSQL Key-Value implementations  Hbase  Accumulo  Memcached  Dynamo  Many many more


Download ppt "MapReduce and NoSQL CMSC 461 Michael Wilson. Big data  The term big data has become fairly popular as of late  There is a need to store vast quantities."

Similar presentations


Ads by Google