Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning Google foster@hf.webex.com.

Similar presentations


Presentation on theme: "Learning Google foster@hf.webex.com."— Presentation transcript:

1 Learning Google

2 Reference http://zh.wikipedia.org/zh/MapReduce
Engineering Case Study training by Peter Xiao and Stanley Huang:

3 Agenda Google Overview Data Center Google File System Map Reduce
Big Table Google App Engine -Demo

4 Google Overview Why named google: typo from word "googolplex" , mathematical term for a 1 followed by 100 zeros Mission:Organize the world’s information and make it universally accessible and useful Infrastructure: Three layer stack

5 Google Data Center - Overview
Total of 35 data centers globally 19 in US 12 in Europe 3 in Asia 1 in South America. 1 in Russia

6 Google Data Center – Cost and Scale
According to Google’s earning report, 1.9 billion for 2006 2.4 billion for 2007 4 data centers proposed in 2007, each costs $600 million Power consumption Mega Watts per major data center Size No standard physical size Google’s data center at Dalles,Oregon(俄勒冈州 ) 2000 square foot administration building 1600 square foot “transient employee dormitory” 1800 square foot facility for cooling towers (Estimate of power consumption is 103 mage watts)

7 Google Data Center – Interior & Exterior
Google data center at Dalles,Oregon

8 Google Data Center – Hardware and Software
Google customizes commodity hardware to minimize energy consumption Google web servers Google Ethernet switches Google builds in-house software to achieve high performance and scalability Google web servers (GWS) Google Front End (WFE) Google File System (GFS) Google MapReducer Google BigTable

9 Google Data Center – Service Availability
All 32 data centers reached 4 nines of uptime

10 Google Data Center vs Cisco/CSG’s

11 Architecture Foundation - overview
Architecture serve the Mission organize the world’s information and make it universally accessible and useful Storage For Raw data Google file system Distributed On Thousands of Machines on Tans of Data Centers Backend computing K-V Relational Data storage Map Reduce Big Table

12 Architecture Foundation - GFS
Background Typical way to store persistent file data; Local Disk, NFS, Storage GFS is a scalable(~100TB) distributed file system. on top of thousands of machine Goal: Performance and Scalability for Large file and Concurrent visit Workflow

13 Architecture Foundation – GFS –Cont’
Dive in further Hadoop: Java based open-source software for reliable, scalable, distributed computing. HDFS: Hadoop File System, a similar distributed file system as GFS GFS is good in handing large file with Appending(no random editing) write and tons of read KFS:(KOSMOS DISTRIBUTED FILE SYSTEM),一个类似GFS、Hadoop中HDFS 的一个开源的分布式文件系统 Where we are Distributed storage concept has been used in Queue&Dispatch service(WAPI2.0) and Search Farm

14 Architecture Foundation - MapReduce
Background Typical way to do computing:local CPU, parallel computing in application level MapReduce is a programming Model for computing large data set by distribution Goal:computing Terabytes of data on thousands of machines for performance Example: Google Page rank; 1 Terabytes file, Calculate count of every word Workflow Pseudo Code Map(Stirng key,String value) //key: document name //Value: document contents For each word w in value Emitlntermediate(w,”1”); Reduce(String key,Merator values) //key: a word //value: a list of counts Int result=0; For each v in values; Result += ParseInt(v); Emit(AsString(result));

15 Architecture Foundation – MR – cont’
Dive in further MapReduce is good for Simple large computing work Hadoop map reduce provide similar functionality Where we are Some computing happened in Oracle DB layer Many computing happened parallel in application layer, example: Search Farm, Activity Server etc.

16 Architecture Foundation – BigTable
Background BigTable is a scalable, distributed, multi-dimensional K-V store Goal: High performance One way search in large data volume Example: Google earth, grab all geographic data based on location Workflow

17 Architecture Foundation – BigTable – cont’
Dive in further Cassandra is Open Source implementation for Big Table concept Data Store design is all about: CAP (Data Consistency, Availability, Partition) Which one you focus? Depend on what value/user experience you try to provide. Where we are Cassandra with modification has been used in WAPI 2.0 for User Wall/Feed Memcached used from WAP2.0

18 Google App Engine – Development and deployment deom
An experienced developer can develop and deploy a “Hello word” application to App Engine within 1-2 hours Create and App Engine account Download App Engine SDK or Eclipse with App Engine plug-in Develop “Hello World” application Deploy application Access “Hello World” Application via

19 Thanks!


Download ppt "Learning Google foster@hf.webex.com."

Similar presentations


Ads by Google