Hadoop Ali Sharza Khan High Performance Computing 1
Table of Content Hadoop Where did Hadoop come from ? What problems can Hadoop solve? Where does Hadoop applies to ? How is Hadoop architected? Two main parts of Hadoop Conclusion 2
Hadoop What is Hadoop ? – Open Source project – Processing Large data sets in parallel 3
Where did Hadoop come from? Google Yahoo, Facebook, Twitter and Linkedln are actively contributing towards Hadoop. 4
What problems can Hadoop solve? Where you have lot of data Run analytics that are deep and computational extensive 5
Where does Hadoop applies to ? Search engine Finance Online Retail Government Media and entertainment Research Institution and other market 6
How is Hadoop architected? Every server has 2 or 4 or 8 Cpu’s. Each server operates on its own little piece of data. Hadoop clusters at Yahoo covers servers, and store 25 petabytes of application data. The largest cluster being 3500 servers. 7
Cloudera CEO Interview NP4_ICDeqE 8
Two main parts of Hadoop HDFS (Hadoop Distributed File System) Map Reduce Framework – Map Phase – Reduce Phase – JobTracker (The master) – TaskTracker (The slave) 9
MapReduce FrameWork 10
Conclusion Why Hadoop is able to deal with lots of data? Why Hadoop is able to compute complicated Computational questions? 11