Presentation is loading. Please wait.

Presentation is loading. Please wait.

QUICK TIPS (--THIS SECTION DOES NOT PRINT--) This PowerPoint template requires basic PowerPoint (version 2007 or newer) skills. Below is a list of commonly.

Similar presentations


Presentation on theme: "QUICK TIPS (--THIS SECTION DOES NOT PRINT--) This PowerPoint template requires basic PowerPoint (version 2007 or newer) skills. Below is a list of commonly."— Presentation transcript:

1 QUICK TIPS (--THIS SECTION DOES NOT PRINT--) This PowerPoint template requires basic PowerPoint (version 2007 or newer) skills. Below is a list of commonly asked questions specific to this template. If you are using an older version of PowerPoint some template features may not work properly. Using the template Verifying the quality of your graphics Go to the VIEW menu and click on ZOOM to set your preferred magnification. This template is at 100% the size of the final poster. All text and graphics will be printed at 100% their size. To see what your poster will look like when printed, set the zoom to 100% and evaluate the quality of all your graphics before you submit your poster for printing. Using the placeholders To add text to this template click inside a placeholder and type in or paste your text. To move a placeholder, click on it once (to select it), place your cursor on its frame and your cursor will change to this symbol: Then, click once and drag it to its new location where you can resize it as needed. Additional placeholders can be found on the left side of this template. Modifying the layout This template has four different column layouts. Right-click your mouse on the background and click on “Layout” to see the layout options. The columns in the provided layouts are fixed and cannot be moved but advanced users can modify any layout by going to VIEW and then SLIDE MASTER. Importing text and graphics from external sources TEXT: Paste or type your text into a pre-existing placeholder or drag in a new placeholder from the left side of the template. Move it anywhere as needed. PHOTOS: Drag in a picture placeholder, size it first, click in it and insert a photo from the menu. TABLES: You can copy and paste a table from an external document onto this poster template. To adjust the way the text fits within the cells of a table that has been pasted, right-click on the table, click FORMAT SHAPE then click on TEXT BOX and change the INTERNAL MARGIN values to 0.25 Modifying the color scheme To change the color scheme of this template go to the “Design” menu and click on “Colors”. You can choose from the provide color combinations or you can create your own. QUICK DESIGN GUIDE (--THIS SECTION DOES NOT PRINT--) This PowerPoint 2007 template produces an A0 size professional poster. It will save you valuable time placing titles, subtitles, text, and graphics. Use it to create your presentation. Then send it to PosterPresentations.com for premium quality, same day affordable printing. We provide a series of online tutorials that will guide you through the poster design process and answer your poster production questions. View our online tutorials at: (copy and paste the link into your web browser). For assistance and to order your printed poster call PosterPresentations.com at Object Placeholders Use the placeholders provided below to add new elements to your poster: Drag a placeholder onto the poster area, size it, and click it to edit. Section Header placeholder Move this preformatted section header placeholder to the poster area to add another section header. Use section headers to separate topics or concepts within your presentation. Text placeholder Move this preformatted text placeholder to the poster to add a new body of text. Picture placeholder Move this graphic placeholder onto your poster, size it first, and then click it to add a picture to the poster. © 2012 PosterPresentations.com 2117 Fourth Street, Unit C Berkeley CA Student discounts are available on our Facebook page. Go to PosterPresentations.com and click on the FB icon. The Hadoop Distributed File System (HDFS) has been widely adopted to build cloud storage systems. It provides reliable storage and high throughput access to large-scale data by Map/Reduce parallel applications. Based on data access patterns, the data in HDFS is classified into three types: Hot data: the popular data, which means the data receives not only a large number of concurrent accesses, but also a high intensity of access. Cold data: the unpopular data that is rarely accessible. Normal data: the rest Data replication has been widely used as a means of providing high performance, reliability and availability. Triplication policy has been favored in HDFS not only because it can be easily implemented, but also for its high performance, and reliability. However, there have been two problems: In a large and busy HDFS cluster, the hot data could be accessed by many distributed clients concurrently. Replicating hot data only on three different nodes is not enough to avoid contention for datanodes storing the hot data. The triplication policy comes with a high overhead cost in terms of management for the cold data. Too many replicas may not significantly improve availability, but bring unnecessary expenditure instead. The management cost of cold data, including storage and network bandwidth, will significantly increase with the high number of replica. In view of these issues, we designed and implemented ERMS, an elastic replica management system for HDFS. ERMS introduces an active/standby storage model, takes advantage of a high-performance complex event processing (CEP) engine to distinguish the real-time data types, and brings in an elastic replication policy for the different types of data. ERMS uses Condor to increase the replication number for hot data in standby nodes, and to remove the extra replicas after the data cooling down. The erasure codes could be used to save storage space and network bandwidth when the data becomes cold data. INTRODUCTION Elastic Replication Management System for HDFS EXPERIMENTAL EVALUATION We evaluated ERMS in a private cluster with one namenode and fifteen datanodes (ten active nodes and five standby nodes) of commodity computer. We implemented ERMS in Hadoop-20, which is Facebook's real-time distributed Hadoop, modified the replica placement mechanism and added configuration parameters to suit the ERMS. CONCLUSION We present the design and implement of ERMS, an elastic replica management system for HDFS that seeks to increase data locality by replicating the hot data while keeping a minimum number of replicas for the cold data. ERMS dynamically adapt to changes in data access patterns and data popularity, and impose a low network overhead. The active/standby storage model and replica placement strategy used by ERMS would enhance the reliability and availability of data. In the future, we plan to : investigate more effective solutions to detect and predict the real-time data types. evaluate ERMS in real cloud systems, which are provide by Tencent and HuaWei. CONTACT Author: Zhendong Cheng Sino-German Joint Software Institute BeiHang University (BUAA) No.37 XueYuan Road, HaiDian District, Beijing, P.R.China,  System Architecture Sino-German Joint Software Institute, Beihang University, China University of Wisconsin–Madison, USA Tencent Research, China Zhendong Cheng, Zhongzhi Luan, You Meng, Depei Qian, Alain Roy, Gang Guan ERMS: Elastic Replication Management System for HDFS ACKNOWLEDGMENT This work was partially supported by the National High Technology Research and Development Program ("863"Program) of China under the grant No. 2011AA01A203. ERMS introduces an active/standby storage model. This model classifies the store nodes into two types: active nodes and standby nodes. We use an active/standby storage model. Standby nodes might be better than active nodes when the active nodes are heavily used. The standby nodes only store the extra replica of hot data. HDFS is the basic storage appliance. The Data Judge Module obtains the system metrics from HDFS clusters and uses CEP to distinguish current data types in real-time. According to the different types of data, the manager of ERMS could schedule replication manager tool and erasure coding tool to manage the replicas of data. Condor would be an good choice for the manager. The architecture of ERMS is showed in Fig. 1. It automatically manages the replication number and replica placement strategy in HDFS clusters. Figure 1: System Architecture of ERMS  Active/Standby Storage Model Figure 2: Active/Standby Storage Model Time window is one of the major features of CEP systems. ERMS makes use of CEP analyzing HDFS audit logs to tell the data types in HDFS. Taking advantage of the time window t w of CEP, ERMS obtains concurrent accesses number τ within the time t w and then distinguishes the real-time data types. The data is hot data or cold data if τ is higher than τ M or lower than τ m. We run jobs synthesized from the SWIM, which provides one mouth job trace and replay scripts of a Facebook 3000-machine production cluster trace. We evaluate data locality and average reading throughput of these jobs under different thresholds (τ M1 >τ M2 >τ M3 ). Data locality and reading throughput are two critical metrics for performance of HDFS. Data locality could reduce pressure on the network fabric. The results show that ERMS could effectively improve data locality and reading throughput, as shown in Fig. 3. The threshold τ M is also an important parameter. It is a tradeoff between system performance and storage cost. We also experiment with erasure codes. For the cold data, which concurrent accesses number τ is lower than τ m, we use Reed Solomon codes to encode it, with a replication factor of one and four coding parities. The results show that this erasure codes doesn’t heart data reliability and reduce storage overhead. Figure 3: The Performance of ERMS Figure 4: CDF of Data Accessing and Storage utilization


Download ppt "QUICK TIPS (--THIS SECTION DOES NOT PRINT--) This PowerPoint template requires basic PowerPoint (version 2007 or newer) skills. Below is a list of commonly."

Similar presentations


Ads by Google