Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our system’s architecture  Flow chart of the hadoop’s job(web crawler) working.

Cloud Computing project NSYSU Sec. 1 Demo

NSYSU EE IT_LAB2 Outline  Our system’s architecture  Flow chart of the hadoop’s job(web crawler) working on hadoop cluster –Basic setup –Flow chart  Compare crawler’s efficiency on different types’ hadoop cluster

NSYSU EE IT_LAB3 Architecture  Hardware –2 ASUS Servers, Intel Xeon CPU X3330 2.66GHz, 1TB HD & 3G ram (master, slave1) 1TB HD & 3G ram (master, slave1) –1 PC, Intel Core 2 Quad CPU Q6600 2.40GHz, 500G HD, 4G ram (slave2) 500G HD, 4G ram (slave2)  Software –CentOS 5.03 –Hadoop 0.20.1

NSYSU EE IT_LAB4 Architecture Machine 01 Machine 02Machine 03 master (x.x.x.1) slave2 (x.x.x.3)slave1 (x.x.x.2) Namenode JobTracker Datanode TaskTracker Datanode TaskTracker Datanode TaskTracker administer http://x.x.x.1:50070 http://x.x.x.1:50030 user Job

NSYSU EE IT_LAB5 HDFS HDFS http://x.x.x.1:50070

NSYSU EE IT_LAB6 HDFS HDFS http://x.x.x.1:50070

NSYSU EE IT_LAB7 Job admin Job admin http://x.x.x.1:50030

NSYSU EE IT_LAB10 Basic setup (hadoop) 1.Setup communication without password through ssh protocol 2.Install java 3.Import java path (or any files’ path needed) in {hadoop dir}/conf/hadoop-env.sh 4.Import namenode and Jobtracker hosts’ name in {hadoop dir}/conf/hadoop-site.sh

NSYSU EE IT_LAB11 Basic setup (hadoop) 5.Setup master file and slaves file 6.Format HDFS (hadoop distributed file system) (hadoop distributed file system) 7.Start Hadoop 8.Check hadoop HDFS http://namenode’s ip:50070 HDFS http://namenode’s ip:50070 Job admin http://Jobtracker’s ip:50030 Job admin http://Jobtracker’s ip:50030

NSYSU EE IT_LAB12 Basic setup (crawler) 1.Check your web robot agent file 2.Setup urls filter file 3.Set your seed urls file by manual input or web’s url package (Some details’ setting steps are ignored here.)

NSYSU EE IT_LAB13 Flow chart Seed urls Run crawl command as a hadoop job Assign job’s fragments to each tasktracker; go fetch web’s data Store context to output dir. on HDFS Link log New fetch list Doc. data Fetch log HDFS user ( ) Map & reduce

NSYSU EE IT_LAB14 Hadoop cluster – 1 node Machine 01 master (x.x.x.1) Namenode JobTracker Datanode TaskTracker

NSYSU EE IT_LAB15 Hadoop cluster – 2 nodes Machine 01 Machine 02 master (x.x.x.1) slave1 (x.x.x.2) Namenode JobTracker Datanode TaskTracker Datanode TaskTracker

NSYSU EE IT_LAB16 Hadoop cluster – 3 nodes Machine 01 Machine 02Machine 03 master (x.x.x.1) slave2 (x.x.x.3)slave1 (x.x.x.2) Namenode JobTracker Datanode TaskTracker Datanode TaskTracker Datanode TaskTracker

NSYSU EE IT_LAB17 Urls set  Get urls package from http://dmoz.org/ http://dmoz.org/  select one out of every 500, so that we end up with around 10000 URLs

NSYSU EE IT_LAB18 Crawler input (seeds.txt)

NSYSU EE IT_LAB19 Crawler ouput  Output to HDFS

NSYSU EE IT_LAB20 Speed compare Hadoop job costs time (9199 urls) 1 work node 1888 seconds 2 work nodes 1679 seconds 3 work nodes 1628 seconds

NSYSU EE IT_LAB21 Thanks for your attention!!

Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our system’s architecture  Flow chart of the hadoop’s job(web crawler) working.

Similar presentations

Presentation on theme: "Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our system’s architecture  Flow chart of the hadoop’s job(web crawler) working."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our system’s architecture  Flow chart of the hadoop’s job(web crawler) working.

Similar presentations

Presentation on theme: "Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our system’s architecture  Flow chart of the hadoop’s job(web crawler) working."— Presentation transcript:

Similar presentations

About project

Feedback