Presentation is loading. Please wait.

Presentation is loading. Please wait.

Programming in Hadoop Guangda HU Huayang GUO

Similar presentations


Presentation on theme: "Programming in Hadoop Guangda HU Huayang GUO"— Presentation transcript:

1 Programming in Hadoop Guangda HU tarlou.gd@gmail.com Huayang GUO dragonghy@gmail.com

2 Hadoop Overview About Hadoop –Apache Hadoop is a Java software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data.

3 Hadoop Overview Architecture –HDFS (Hadoop Distributed File System) –Job Tracker –Task Tracker

4 Hadoop Overview Mechanism –Map and Reduce

5 Hadoop Overview Applications –Facebook (Hadoop, Hive, Scribe) –Yahoo! (Hadoop in Yahoo Search) –Veritas (San Point Direct, Veritas File System) –IBM Transarc (Andrew File System) –UW Computer Science Alumni (Condor Project)

6 Our Work Setup running environment –Single node setup –Multi-node cluster setup –Network access Experiments and analysis –Word count –Integration –Largest number

7 Environment Setup Hardware –Two multi-core machines with Linux –Ethernet connection Software –Ubuntu 9.04 –Hadoop 0.20.1 –Five virtual machine on VirtualBox

8 Environment Setup Cluster structure –Two machines 166.111.69.85 59.66.132.161 –One master node –Three slave nodes

9 Experiments Benchmark –Word count (default example) –Super word count (SuperWordCount.java) –Integration (Integration.java) –Largest numbers (LargestGen.java)

10 Benchmark Analysis

11

12 More experiments FilesComputationTime (s) 242.4 * 10 9 102 1202.4 * 10 9 179 NodesFilesSlope (sec/10 9 ) 424≈ 30 224≈ 40

13 Challenges & Acquirements Network & virtual cluster communication Hadoop technique survey Cooperation

14 References http://www.ibm.com/developerworks/cn/ http://en.wikipedia.org/wiki/Hadoop http://www.michael-noll.com/wiki/ Linux Man Pages Hadoop source code and Java Doc

15 Thanks


Download ppt "Programming in Hadoop Guangda HU Huayang GUO"

Similar presentations


Ads by Google