ETRI Site Introduction

Slides:



Advertisements
Similar presentations
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Advertisements

Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
SALSA HPC Group School of Informatics and Computing Indiana University.
1 Storage Today Victor Hatridge – CIO Nashville Electric Service (615)
Features Scalability Availability Latency Lifecycle Data Integrity Portability Manage Services Deliver Features Faster Create Business Value.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Ch 4. The Evolution of Analytic Scalability
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
High Performance Computing Cluster OSCAR Team Member Jin Wei, Pengfei Xuan CPSC 424/624 Project ( 2011 Spring ) Instructor Dr. Grossman.
DISTRIBUTED DATA FLOW WEB-SERVICES FOR ACCESSING AND PROCESSING OF BIG DATA SETS IN EARTH SCIENCES A.A. Poyda 1, M.N. Zhizhin 1, D.P. Medvedev 2, D.Y.
Software Architecture
DISTRIBUTED COMPUTING
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
SALSA HPC Group School of Informatics and Computing Indiana University.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
1 Makes Mobile WiMAX Simple Netspan Overview Andy Hobbs Director, Product Management 5 th October 2007.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Powered by Microsoft Azure, PointMatter Is a Flexible Solution to Move and Share Data between Business Groups and IT MICROSOFT AZURE ISV PROFILE: LOGICMATTER.
A Technical Overview Bill Branan DuraCloud Technical Lead.
ETRI Site Introduction Han Namgoong,
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
Slide 1 © 2016, Lera Technologies. All Rights Reserved. Oracle Data Integrator By Lera Technologies.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Commvault and Nutanix October Changing IT landscape Today’s Challenges Datacenter Complexity Building for Scale Managing disparate solutions.
Prof. Jong-Moon Chung’s Lecture Notes at Yonsei University
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
Scalable Web Apps Target this solution to brand leaders responsible for customer engagement and roll-out of global marketing campaigns. Implement scenarios.
Architectural Overview Of Cloud Computing
Hadoop.
Introduction to Distributed Platforms
Software Systems Development
OpenMosix, Open SSI, and LinuxPMI
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
StratusLab Final Periodic Review
StratusLab Final Periodic Review
Joseph JaJa, Mike Smorul, and Sangchul Song
NOSQL.
Cloud Computing Platform as a Service
Integration of Network Services Interface version 2 with the JUNOS Space SDK
Introduction to HDFS: Hadoop Distributed File System
CHAPTER 3 Architectures for Distributed Systems
Scalable Web Apps Target this solution to brand leaders responsible for customer engagement and roll-out of global marketing campaigns. Implement scenarios.
Central Florida Business Intelligence User Group
Ministry of Higher Education
Big Data - in Performance Engineering
湖南大学-信息科学与工程学院-计算机与科学系
MANAGING DATA RESOURCES
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
QNX Technology Overview
Ch 4. The Evolution of Analytic Scalability
Clouds & Containers: Case Studies for Big Data
Introduction to Apache
Internet Protocols IP: Internet Protocol
Large Scale Distributed Computing
Andy Puckett – Sales Engineer
敦群數位科技有限公司(vanGene Digital Inc.) 游家德(Jade Yu.)
SQL Server 2005 Reporting Services
Pig Hive HBase Zookeeper
Presentation transcript:

ETRI Site Introduction 2009. 6. 8 Han Namgoong, nghan@etri.re.kr

ETRI Government sponsored Research Institute  3,000 staffs, 500M USD (year 2009)  focus on technologies of broadcasting, software and contents, IT convergence, and convergence components and materials

ETRI Cluster Topology (1/2)

ETRI Cluster Topology (2/2) Key Masters Server Pool +Agent +10,000Nodes Monitoring +Provisioning Proxy +DHCP +Agents +256 nodes +Provisioning Server +LVS +DB +40 Group Masters Cluster Master Database File System Group +Global Service Dispatcher +Disaster Recovery +100 Data Centers Distributed Procesing

Video based Internet Application Services ETRI Cluster Software Stack and Services (2009.6.8) Video based Internet Application Services UGC Search Service IPTV Service e-Learning Service Internet Services Common Components Video Management Components Security Cluster Management Production Tagging Store Retrieval Delivery Large Scale Parallel Processing Large Scale Data Mgmt. User/ service authen. Cluster Orchestration Service Data Management Job Partition and Merge Distributed Data Store Device/ kernel authen. Distributed Job Scheduling Provisioning Data Access and Recovery Global File System Service Mgmt. File Metadata Management File Store And Replication File Remote Backup/Archiving Resources Monitoring Platform OS and HW Low Power OS Node Manager Low Power HW

Monitoring Tool for Large Cluster System Research Topics (2009.6.8) Monitoring Tool for Large Cluster System - current monitoring SW  heavy overhead cpu/memory  small/light monitoring tool 2. Management of Big Video Feature Data - Google YouTube(2006) * Upload : 70,000 per day, Viewing : 100 Million play per day - Keyword based Retrieval (vague, imprecise,..) - Content based Retrieval (not simple interface/slow result)  Integrated Query(Keyword + Content based) 3. Elimination of Duplicated Video Data - Lots of same video files occupied storage spaces.  File (NOT data) deduplication is strongly required.

Cloud stack (OSS) for evaluation Schedule (2009.6.8) Phase 1 : 2009.9.1 ~ 2010.2.28 Cloud stack (OSS) for evaluation - System management/Monitoring tool - Middleware(Web/AP/DB server) - Linux(CentOS,..) - Virtualization(Xen, KVM) - Distributed file-system/DB (Hadoop, Hbase) - Authentication(OpenLDAP) Evaluation point - Error recovery procedure, configuration, structure - Add resource(planned, unexpected) - Remove resource by degrade of load, and Migration - Overhead of virtualization, distributed file-system, distributed DB - Authentication between systems Source : Tomomi Suzuki, Status report of Cloud Computing activity, Japan OSS Promotion Forum, 2009.6.4

Schedule (2009.6.8) 2. Phase 2 : 2010.3.1 ~ 2011.2.28 Selection of Requirements Develop , Test and Deployment - Monitoring Tool for Large Cluster System - Management of Big Video Feature Data - Elimination of Duplicated Video Data Distributed Processing - Fail-over of task execution node and job manage node - Distributed task processing based on data location - Configurable job scheduling : 9 policies …….. Distributed Data Management based on Hadoop/Hbase - Multi dimensional map model - Support a composite row key - Column group based storage model - Distribute partitions splited by a composite row key - Data access control by user and privilege management ………

Parallel Processing Model Plans, Expectations (1/3) Category Hadoop/MapReduce What Expectations Parallel Processing Model - Map/Reduce Programming Model - I/O Source : HDFS, LFS, Hbase - Map/Reduce Programming Model I/O Source : + new-FS, new-DB Enlargement of parallel processing target Cluster Size - Thousand nodes - Manually configure Thousand nodes Automatically configure Easy to manage parallel processing cluster Job Control - None - Execution control based on user Access control to parallel processing cluster Job Scheduling - Direct Priority, FIFO - Priority management by job - 9 configurable scheduling policies - Priority management by job, Group and user Support of various jobs Task Distribution - Consideration of data location and node position - Consideration of data location, node position and node resource Increase of node utilization High Availability - Fail-over of task execution node Fail-over of task execution node Fail-over of job manage node - Increase availability Reduction of Job execution time

Plans, Expectations (2/3) Category Hbase What Expectations Data Model - Multi dimensional map - Row key : single field Multi dimensional map Row key : composite field Easy to construct key Video Manage - None High dimensional index manage k-NN search Provide large scale video content based retrieval Data Storage Model Column oriented Per column Column oriented Per column group Performance enhancement Data Distribution Distribute partitions splited by row key - Distribute partitions splited by row key - Distribute clusters by high dimensional index Performance enhancement of key-based/content based retrieval Access Control - None User management Privilege management of table/column Provide data security High Availability Fail-over of partition management node  serial processing log file and parallel recovery Fail-over of partition management node  parallel processing log file and parallel recovery Fail-over of master node - Increase availability - Reduction of down time Query Language - Use in shell - Use in application Easy to develop application

Plans, Expectations (3/3) Category Function What OSCAR Cluster Orchestration Structure Hierarchical Flat Scalability Automatic Reconfiguration Pxe+DHCP Availability Independent HA Tool Activ-active(2 head node) Management Interface Web X-GUI, Command(C3) Communication XML XDR, XML IP Management Server Configuration DHCP auto/static Maximum Nodes 10,000 per data center / Max. 1,000,000 Oscar 440 Load Balancing Front-end LVS, Back-end new-DP Front-end PBS, TORQUE, MAUI Service Management Node Reconfiguration By Load Balancing Yes None Master Master Node Configuration Hierarchy (Key Master, Cluster Master, Group Master) Head node Resources Monitoring Monitoring Tool Proprietary Ganglia Provisioning Provisioning (image) OS imaging Provisioning (streaming) SW streaming SW tar/rpm

Thank you