YZStack: Provisioning Customizable Solution for Big Data Sai Wu, Chun Chen, Gang Chen, Lidan Shou, Ke Chen Zhejiang University Hui Cao, yzBigData Co. Lte.

Slides:



Advertisements
Similar presentations
Pennsylvania BANNER Users Group 2006 Integrate Your Decision Support with Cognos 8.
Advertisements

Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
epiC: an Extensible and Scalable System for Processing Big Data
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Futures – Alpha Cloud Deployment and Application Management.
Movie Recommendation System
T-FLEX DOCs PLM, Document and Workflow Management.
Copyright 2009 FUJITSU TECHNOLOGY SOLUTIONS PRIMERGY Servers and Windows Server® 2008 R2 Benefit from an efficient, high performance and flexible platform.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Building Knowledge-Driven DSS and Mining Data
Business Intelligence System September 2013 BI.
Knowledge Portals and Knowledge Management Tools
RETAIL STORE MANAGEMENT ERP SOFTWARE BY COPY RIGHT BY
Presented by Sujit Tilak. Evolution of Client/Server Architecture Clients & Server on different computer systems Local Area Network for Server and Client.
SQL Server Management Studio Introduction
Professional Informatics & Quality Assurance Software Lifecycle Manager „Tools that are more a help than a hindrance”
1DBTest2008. Motivation Background Relational Data Warehousing (DW) SQL Server 2008 Starjoin improvement Testing Challenge Extending Enterprise-class.
Internet GIS. A vast network connecting computers throughout the world Computers on the Internet are physically connected Computers on the Internet use.
Banking Clouds V International Youth Banking Forum.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
STEALTH Content Store for SharePoint using Windows Azure  Boosting your SharePoint to the MAX! "Optimizing your Business behind the scenes"
無線射頻識別報告 授課教師 : 黃秋煌 報告人 : 王重凱 1. Outline ECA Rule-based RFID Data Management An Efficient RFID Data Processing Scheme for Data Filtering and Recognition.
Construction of efficient PDP scheme for Distributed Cloud Storage. By Manognya Reddy Kondam.
Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,
Katanosh Morovat.   This concept is a formal approach for identifying the rules that encapsulate the structure, constraint, and control of the operation.
SensIT PI Meeting, January 15-17, Self-Organizing Sensor Networks: Efficient Distributed Mechanisms Alvin S. Lim Computer Science and Software Engineering.
Avalanche Internet Data Management System. Presentation plan 1. The problem to be solved 2. Description of the software needed 3. The solution 4. Avalanche.
IMS 6217: Data Warehousing / Business Intelligence Part 3 1 Dr. Lawrence West, Management Dept., University of Central Florida Analysis.
STEALTH Content Store for SharePoint using Caringo CAStor  Boosting your SharePoint to the MAX! "Optimizing your Business behind the scenes"
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
KM Technology Assessment “Knowledge and team collaboration servers” DSC8030/CIS8260 Dr. Samaddar Summer 2004 Jon A. Preston.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)
Image Management and Rain on FutureGrid Javier Diaz - Fugang Wang – Gregor von.
Career Profile: Systems Analyst Jenn Sroka. Is a Career as a Systems Analyst right for you? Duties include: Planning, design, installation, and development.
Spatial Tajo Supporting Spatial Queries on Apache Tajo Slideshare Shorten URL : goo.gl/j0VLXpgoo.gl/j0VLXp.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Hands-On Microsoft Windows Server Implementing Microsoft Internet Information Services Microsoft Internet Information Services (IIS) –Software included.
S. Shumilov – Zürich Analytical Visualization Framework - a visual data processing and knowledge discovery system Ivan Denisovich, Serge Shumilov Department.
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
Site Technology TOI Fest Q Celebration From Keyword-based Search to Semantic Search, How Big Data Enables That?
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Smart Home Technologies
Nov 2006 Google released the paper on BigTable.
Devanshu Bawa Customization Specialist Logo Business Solutions.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Why PACKZ? Innovation No real innovations in pre-press for years Offers new approach using standard open file formats Technology is moving fast, we are.
T EST T OOLS U NIT VI This unit contains the overview of the test tools. Also prerequisites for applying these tools, tools selection and implementation.
IHP Im Technologiepark Frankfurt (Oder) Germany IHP Im Technologiepark Frankfurt (Oder) Germany ©
Big Data Yuan Xue CS 292 Special topics on.
I am Xinyuan Niu I am here because I love to give presentations. Data Warehousing.
Harnessing Big Data with Hadoop Dipti Sangani; Madhu Reddy DBI210.
1 VLDB, Background What is important for the user.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
EpiC: an Extensible and Scalable System for Processing Big Data Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian Lee Tan, Sai Wu School of Computing, National.
Prof. Jong-Moon Chung’s Lecture Notes at Yonsei University
Hadoop and Spark Dynamic Data Models Amila Kottege Software Developer
Smart Cities and Communities and Social Innovation
PLM, Document and Workflow Management
Automatic Video Shot Detection from MPEG Bit Stream
Pathology Spatial Analysis February 2017
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
Hadoop Clusters Tess Fulkerson.
07 | Analyzing Big Data with Excel
Introduction to Apache
Renouncing Hotel’s Data Through Queries Using Hadoop
26th Meeting of the Wiesbaden Group on Business Registers
IBM C IBM Big Data Engineer. You want to train yourself to do better in exam or you want to test your preparation in either situation Dumpspedia’s.
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Presentation transcript:

YZStack: Provisioning Customizable Solution for Big Data Sai Wu, Chun Chen, Gang Chen, Lidan Shou, Ke Chen Zhejiang University Hui Cao, yzBigData Co. Lte. He Bai, City Cloud Technology

3H Problem in Deploying the Big Data System How can I build and deploy a big data system without back-ground knowledge? How can I migrate existing applications to the big data system? How can I use my big data system to do the analysis job?

Too Many Choices Visualization : –Openstack –Cloudstack –Vmware Cloud storage: –key-value store (hbase, cassandra, redis,…) –relational service (AWS, spanner,…) Processing engine: –MapReduce/Hadoop –Dryad –Pregel, GraphLab –Spark –epiC Application service: –Mahout –Hive –Spatial Hadoop

Can I Deploy a Big Data System Like Installing a Windows Software? Configure the installation as a customization process The installation software will copy the binary codes to all servers and do the configuration automatically A browser-based management system to start/stop the services and monitor the status

YZStack: the Architecture Layers are loosely connected Each layer includes many selectable modules Modules of different layers are linked via the common interfaces Optimizations are implemented as special plugins

Features of YZStack Adaptive Image –Based on openstack, partition the big image into small chunks –Different images share the same chunk Optimization Plugins –Column-oriented plugin –Index plugin –Query optimization plugin –Iterative job plugin Visualization Tool –Zoom in/out for different dimensions

Optimization Plugin

Use Case: the Smart Financial System Built for the Zhejiang Provincial Department of Finance (ZPDF)

Economic Prediction Collaborate with researchers from college of economics, Zhejiang University Step 1: –Use the OLAP module to provide a basic view for each registered company

Economic Prediction (cont.) Step 2: –Healthy Model: Based on the historical data, the healthy model discovers risks and predicts prospects of an industry –Energy Consumption Model: We link the financial data with the electronic, water, and environment data to rank each industry based on its energy consumption per unit of output value. –Economic Impact: Model By connecting the financial data to the human resource data, we study how many workers are employed for an industry and their average salary –Combine all three models to rank all industries accordingly

Economic Prediction (cont.) Step 3: Index of Economic (ongoing work) –To predict the status of the whole Zhejiang Province using statistics generated by previous two steps –Involving multiple complex economic models –Our economic researchers are using the visualization tools to build and study their models

Detection of Improper Payment What is the improper payment? –A person is classified as the low-income type and buys a house specially for low-and-medium wage earners. However, he is actually employed by IT company –One company may submit different registration files to different government departments (e.g., it registers as a high-tech company in the Department of Science, but as a labor-intensive one in the Department of Labor) to enjoy various allowances from the government.

Why ZPDF? A harbor of financial data in Zhejiang Province –Electronic department –Traffic department –Tax department –… It is well motivated –Expected to save more than 1 billion CNYs

Improper Payment Step 1 (Consistent Problem): –To detect improper payment from two databases, D0 and D1, –we first generate two star-join queries, Q0 and Q1, which selectively merge the fact tables with the dimension tables. –The trick is that the entities returned by Q0 should not exist in the results of Q1. –E.g., Q0 returns the high-income persons, while Q1 returns the users who own a house specially for low- and-medium wage earners.

Consistent Problem we apply the LSH (Locality Sensitive Hashing) to generate k hash values for each tuple from T0 and T1. So the tuples sharing the same hash value are considered as a candidate group. We define a similarity function sim(ti; tj) to evaluate the probability of two tuples representing the same entity. If sim(ti; tj) is greater than a predefined threshold, it will be forwarded to the verification module where a human-aided algorithm is applied to filter out the false positives.

Conclusion YZStack is tailored for the users who have little or no experience in deploying and maintaining the cloud system. It simplifies the development of a new big data application as the process of module selection and customization. To show the flexibility and usability of YZStack, we demonstrate how we build a smart financial system for the Zhejiang Provincial Department of Finance using YZStack.