ETRI Site Introduction Han Namgoong, 2009. 6. 8.

Slides:



Advertisements
Similar presentations
How We Manage SaaS Infrastructure Knowledge Track
Advertisements

From Startup to Enterprise A Story of MySQL Evolution Vidur Apparao, CTO Stephen OSullivan, Manager of Data and Grid Technologies April 2009.
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
SALSA HPC Group School of Informatics and Computing Indiana University.
Multi-Data-Center Hadoop in a Snap Dr. Konstantin Boudnik Vice President, Open Source Development.
1 Storage Today Victor Hatridge – CIO Nashville Electric Service (615)
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
Components and Architecture CS 543 – Data Warehousing.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
CPS216: Advanced Database Systems (Data-intensive Computing Systems) How MapReduce Works (in Hadoop) Shivnath Babu.
Lync /19/2017 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
Simplify your Job – Automatic Storage Management Angelo Session id:
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.
Ch 4. The Evolution of Analytic Scalability
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
High Performance Computing Cluster OSCAR Team Member Jin Wei, Pengfei Xuan CPSC 424/624 Project ( 2011 Spring ) Instructor Dr. Grossman.
Software Architecture
DISTRIBUTED COMPUTING
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
M.A.Doman Short video intro Model for enabling the delivery of computing as a SERVICE.
Goodbye rows and tables, hello documents and collections.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Introduction to Hadoop and HDFS
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
Managed Operations MO
Performance Evaluation on Hadoop Hbase By Abhinav Gopisetty Manish Kantamneni.
August 3-4, 2004 San Jose, CA Developing a Complete VoIP System Asif Naseem Senior Vice President & CTO GoAhead Software.
SALSA HPC Group School of Informatics and Computing Indiana University.
Performance Evaluation of Image Conversion Module Based on MapReduce for Transcoding and Transmoding in SMCCSE Speaker : 吳靖緯 MA0G IEEE.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Case Study ProsperaSoft’s global sourcing model gives the maximum benefit to customers in terms of cost savings, improved quality, access to highly talented.
WS2012 File and Storage Services Management Name Jeff Alexander Technical Evangelist – Windows Infrastructure Microsoft Australia
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Windows Azure. Azure Application platform for the public cloud. Windows Azure is an operating system You can: – build a web application that runs.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
7 Strategies for Extracting, Transforming, and Loading.
Stairway to the cloud or can we take the highway? Taivo Liik.
Powered by Microsoft Azure, PointMatter Is a Flexible Solution to Move and Share Data between Business Groups and IT MICROSOFT AZURE ISV PROFILE: LOGICMATTER.
A Technical Overview Bill Branan DuraCloud Technical Lead.
Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.
Next Generation of Apache Hadoop MapReduce Owen
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.
BIG DATA/ Hadoop Interview Questions.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Slide 1 © 2016, Lera Technologies. All Rights Reserved. Oracle Data Integrator By Lera Technologies.
The Post Windows Operating System
ETRI Site Introduction
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
Central Florida Business Intelligence User Group
Ch 4. The Evolution of Analytic Scalability
Clouds & Containers: Case Studies for Big Data
Introduction to Apache
Andy Puckett – Sales Engineer
SQL Server 2005 Reporting Services
Presentation transcript:

ETRI Site Introduction Han Namgoong,

Government sponsored Research Institute  3,000 staffs, 500M USD (year 2009)  focus on technologies of broadcasting, software and contents, IT convergence, and convergence components and materials ETRI

ETRI Cluster Topology (1/2)

Server Pool +Agent +10,000Nodes Monitoring +Provisioning Proxy +DHCP +Agents +256 nodes +Provisioning Server +LVS +DB +40 Group Masters Cluster Master Database Master File System Master Group Master +Global Service Dispatcher +Disaster Recovery +100 Data Centers Distributed Procesing Master ETRI Cluster Topology (2/2)

Video based Internet Application Services UGC Search ServiceIPTV Servicee-Learning Service Platform OS and HW Low Power OS Node Manager Low Power HW Global File System File Metadata Management File Store And Replication File Remote Backup/Archiving Large Scale Data Mgmt. Service Data Management Distributed Data Store Data Access and Recovery Internet Services Common Components Large Scale Parallel Processing Job Partition and Merge Distributed Job Scheduling Video Management Components Security ProductionTaggingStoreRetrievalDelivery Device/ kernel authen. User/ service authen. Cluster Management Cluster Management Cluster Orchestration Provisioning Resources Monitoring Service Mgmt. ETRI Cluster Software Stack and Services ( )

Research Topics ( ) 1.Monitoring Tool for Large Cluster System - current monitoring SW  heavy overhead cpu/memory  small/light monitoring tool 2. Management of Big Video Feature Data - Google YouTube(2006) * Upload : 70,000 per day, Viewing : 100 Million play per day - Keyword based Retrieval (vague, imprecise,..) - Content based Retrieval (not simple interface/slow result)  Integrated Query(Keyword + Content based) 3.Elimination of Duplicated Video Data - Lots of same video files occupied storage spaces.  File (NOT data) deduplication is strongly required.

Schedule ( ) 1.Phase 1 : ~ Cloud stack (OSS) for evaluation - System management/Monitoring tool - Middleware(Web/AP/DB server) - Linux(CentOS,..) - Virtualization(Xen, KVM) - Distributed file-system/DB (Hadoop, Hbase) - Authentication(OpenLDAP) Evaluation point - Error recovery procedure, configuration, structure - Add resource(planned, unexpected) - Remove resource by degrade of load, and Migration - Overhead of virtualization, distributed file-system, distributed DB - Authentication between systems Source : Tomomi Suzuki, Status report of Cloud Computing activity, Japan OSS Promotion Forum,

Schedule ( ) 2. Phase 2 : ~ Selection of Requirements Develop, Test and Deployment - Monitoring Tool for Large Cluster System - Management of Big Video Feature Data - Elimination of Duplicated Video Data Distributed Data Management based on Hadoop/Hbase - Multi dimensional map model - Support a composite row key - Column group based storage model - Distribute partitions splited by a composite row key - Data access control by user and privilege management ……… Distributed Processing - Fail-over of task execution node and job manage node - Distributed task processing based on data location - Configurable job scheduling : 9 policies ……..

Plans, Expectations (1/3) Hadoop/MapReduceWhatExpectationsCategory Parallel Processing Model Cluster Size Job Control Job Scheduling Task Distribution High Availability - Map/Reduce Programming Model - I/O Source : HDFS, LFS, Hbase - Map/Reduce Programming Model -I/O Source : + new-FS, new-DB Enlargement of parallel processing target - Thousand nodes - Manually configure - Thousand nodes - Automatically configure Easy to manage parallel processing cluster - None - Execution control based on user Access control to parallel processing cluster - Direct Priority, FIFO - Priority management by job - 9 configurable scheduling policies - Priority management by job, Group and user Support of various jobs - Consideration of data location and node position - Consideration of data location, node position and node resource Increase of node utilization - Fail-over of task execution node - Fail-over of job manage node - Increase availability -Reduction of Job execution time

Plans, Expectations (2/3) HbaseWhatExpectationsCategory Data Model Video Manage Data Storage Model Data Distribution Access Control High Availability Query Language - Multi dimensional map - Row key : single field - Multi dimensional map - Row key : composite field Easy to construct key - None - High dimensional index manage - k-NN search Provide large scale video content based retrieval - Column oriented - Per column - Column oriented - Per column group Performance enhancement - Distribute partitions splited by row key - Distribute clusters by high dimensional index Performance enhancement of key-based/content based retrieval - None - User management - Privilege management of table/column Provide data security - Fail-over of partition management node  serial processing log file and parallel recovery - Fail-over of partition management node  parallel processing log file and parallel recovery - Fail-over of master node - Increase availability - Reduction of down time - Use in shell - Use in application Easy to develop application

Category FunctionWhatOSCAR Cluster Orchestration StructureHierarchicalFlat ScalabilityAutomatic ReconfigurationPxe+DHCP AvailabilityIndependent HA ToolActiv-active(2 head node) Management InterfaceWebX-GUI, Command(C3) CommunicationXMLXDR, XML IP ManagementServer ConfigurationDHCP auto/static Maximum Nodes 10,000 per data center / Max. 1,000,000 Oscar 440 Load BalancingFront-end LVS, Back-end new-DP Front-end PBS, TORQUE, MAUI Service Management Node Reconfiguration By Load Balancing YesNone Master Management Master Node Configuration Hierarchy (Key Master, Cluster Master, Group Master) Head node Resources Monitoring Monitoring ToolProprietaryGanglia Provisioning Provisioning (image)OS imaging Provisioning (streaming)SW streamingSW tar/rpm Plans, Expectations (3/3)

Thank you