InfiniDB Overview.

Slides:



Advertisements
Similar presentations
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Advertisements

HadoopDB Inneke Ponet.  Introduction  Technologies for data analysis  HadoopDB  Desired properties  Layers of HadoopDB  HadoopDB Components.
A Survey of Distributed Database Management Systems Brady Kyle CSC
A Fast Growing Market. Interesting New Players Lyzasoft.
FAST FORWARD WITH MICROSOFT BIG DATA Vinoo Srinivas M Solutions Specialist Windows Azure (Hadoop, HPC, Media)
HadoopDB An Architectural Hybrid of Map Reduce and DBMS Technologies for Analytical Workloads Presented By: Wen Zhang and Shawn Holbrook.
Meanwhile RAM cost continues to drop Moore’s Law on total CPU processing power holds but in parallel processing… CPU clock rate stalled… Because.
Overview Distributed vs. decentralized Why distributed databases
Organizing Data & Information
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Microsoft SQL Server x 46% 900+ For Hosting Service Providers
Mihai Pintea. 2 Agenda Hadoop and MongoDB DataDirect driver What is Big Data.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
1 © Prentice Hall, 2002 The Client/Server Database Environment.
PARALLEL DBMS VS MAP REDUCE “MapReduce and parallel DBMSs: friends or foes?” Stonebraker, Daniel Abadi, David J Dewitt et al.
Hive – A Warehousing Solution Over a Map-Reduce Framework Presented by: Atul Bohara Feb 18, 2014.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1 Preview of Oracle Database 12 c In-Memory Option Thomas Kyte
SQL on Hadoop. Todays agenda Introduction Hive – the first SQL approach Data ingestion and data formats Impala – MPP SQL.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Hive: A data warehouse on Hadoop Based on Facebook Team’s paperon Facebook Team’s paper 8/18/20151.
Scale-out databases for CERN use cases Strata Hadoop World London 6 th of May,2015 Zbigniew Baranowski, CERN IT-DB.
Page  1 SaaS – BUSINESS MODEL Debmalya Khan DEBMALYA KHAN.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Getting connected.  Java application calls the JDBC library.  JDBC loads a driver which talks to the database.  We can change database engines without.
JDBC Vs. Java Blend Presentation by Gopal Manchikanti Shivakumar Balasubramanyam.
H ADOOP DB: A N A RCHITECTURAL H YBRID OF M AP R EDUCE AND DBMS T ECHNOLOGIES FOR A NALYTICAL W ORKLOADS By: Muhammad Mudassar MS-IT-8 1.
Oracle Challenges Parallelism Limitations Parallelism is the ability for a single query to be run across multiple processors or servers. Large queries.
September 2011Copyright 2011 Teradata Corporation1 Teradata Columnar.
Penwell Debug Intel Confidential BRIEF OVERVIEW OF HIVE Jonathan Brauer ESE 380L Feb
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Introduction to Hadoop and HDFS
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
Hive Facebook 2009.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
Server to Server Communication Redis as an enabler Orion Free
 2009 Calpont Corporation 1 Calpont Open Source Columnar Storage Engine for Scalable MySQL Data Warehousing April 22, 2009 MySQL User Conference Santa.
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
Ayyat IT Group Murad Faridi Roll NO#2492 Muhammad Waqas Roll NO#2803 Salman Raza Roll NO#2473 Junaid Pervaiz Roll NO#2468 Instructor :- “ Madam Sana Saeed”
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
Hosting Websites and Web Applications with Microsoft ® SQL Server ® 2008.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
Nov 2006 Google released the paper on BigTable.
BACS 287 Big Data & NoSQL 2016 by Jones & Bartlett Learning LLC.
Cloudera Kudu Introduction
Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.
Oracle Business Intelligence Foundation - Commonly Used Features in Repository.
1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
BIG DATA/ Hadoop Interview Questions.
Oracle Announced New In- Memory Database G1 Emre Eftelioglu, Fen Liu [09/27/13] 1 [1]
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Ignite in Sberbank: In-Memory Data Fabric for Financial Services
BI 202 Data in the Cloud Creating SharePoint 2013 BI Solutions using Azure 6/20/2014 SharePoint Fest NYC.
Connected Infrastructure
5/7/ :44 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Chapter 9: The Client/Server Database Environment
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
Open Source distributed document DB for an enterprise
The Client/Server Database Environment
Connected Infrastructure
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
Server & Tools Business
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
TN19-TCI: Integration and API management using TIBCO Cloud™ Integration
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Presentation transcript:

InfiniDB Overview

Copyright © 2014 InfiniDB. All Rights Reserved. What is InfiniDB? Massively Parallel MySQL Storage Engine for Fast Analytics Linear scale to handle exponential growth Open-Source Runs on premise, on AWS cloud or Hadoop HDFS cluster Standard ANSI SQL compliance First MySQL storage engine to support ANSI SQL11-compliant windowing functions 100 % Columnar architecture Scalable, high-performance analytics platform Massive parallel processing (MPP) technology Linearly scale with storage hardware and data volumes Standard ANSI SQL compliance as a storage engine of MySQL First MySQL storage engine to support ANSI SQL11-compliant windowing functions Open-Source Runs on premise, on AWS cloud or Hadoop HDFS cluster Copyright © 2014 InfiniDB. All Rights Reserved.

Custom Handler Class InfiniDB Server User Connections MySQL Functions MySQL Client MySQL Connectivity (JDBC, ODBC) MySQL Security Initial SQL Statement Parsing Initial SQL Optimization < Custom Handler Class > Execute final sort and final limit Display final results --------------------------------------------------------------------- InfiniDB ExeMgr Functions SQL Optimization Distribute work for scan, filter, join, functions, expressions, group by, aggregation, etc. to the all available Performance Modules to be run in parallel. Collect the results returned by the Performance Modules Return the final results to MySQL for display InfiniDB Server MySQL ----------------------- InfiniDB ExeMgr User Module Performance Module(s) Storage

InfiniDB Design Principles ® Fast Scalable Simple

Copyright © 2014 InfiniDB. All Rights Reserved. InfiniDB Parallelism User Module – Processes SQL Requests Performance Module – Executes the Queries Single Server MPP or Copyright © 2014 InfiniDB. All Rights Reserved.

Tiered MPP Building Blocks Module Process Functionality Value MySQL Hosts MySQL Connection management SQL parsing & optimization Familiar DBMS interface Leverages existing partner integrations Delivers full SQL syntax support Extent Map Abstracts physical and logical storage Metadata store Enables shared nothing and shared everything storage Enables partition elimination Built-in failover ExeMgr Work distribution Final results management and aggregation Independent scalability and tunable concurrency Multi-threaded to take advantage of multi-core HW platforms SQL

Tiered MPP Building Blocks Module Process Functionality Value PrimProc Scale-out cache management Distributed scan, filter, join and aggregation operations Resource management Independent scalability and tunable performance Multi-threaded to take advantage of multi-core HW platforms Data High Speed Bulk Load Transactional DML and DDL Online schema extensions Enables concurrent reads and writes, non-blocking read enabled Data Blocks

InfiniDB Foundation - Parallelism Purpose-built C++ engine Parallelism is at the thread level Example: 12 PM Servers with 8 cores each yields 96 parallel processing engines. SQL is translated into thousands or tens of thousands of discrete jobs or “primitives”. The UM sends primitives to the processing engines.

InfiniDB Parallelism – Fixed Thread Pool User Module – Processes SQL Requests Performance Module – Executes the Queries Single Server MPP Primitives are issued into a thread queue within each performance module. Local disk / EBS GlusterFS / HDFS Copyright © 2014 InfiniDB. All Rights Reserved.

Architectural Differentiation Greenplum, Netezza, etc Parent Process Parent Process Database Layer 1 - Executing SQL Worker Process Worker Process Worker Process Database Layer 2 - Executing SQL Database Layer - Executing SQL Block Processing Layer Custom DoW

Architectural Differentiation Greenplum, Netezza, etc Parent Process Parent Process Worker Process Worker Process Worker Process Threads dedicated for the duration of a query. Threads operate from queue, dedicated for a fraction of a second.

InfiniDB Design Principles ® Fast Scalable Simple

Row-Oriented vs. Column-Oriented Row-oriented: rows stored sequentially Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 3 Daffy Duck 10013 (212) 227-1810 35 4 Elmer Fudd ME 04578 (207) 882-7323 43 5 Witch Hazel MA 01970 (978) 744-0991 57 F Column-oriented: each column is stored in a separate file Key 1 2 3 4 5 Fname Bugs Yosemite Daffy Elmer Witch Lname Bunny Sam Duck Fudd Hazel State NY CA ME MA Zip 11217 95389 10013 04578 01970 Phone (718) 938-3235 (209) 375-6572 (212) 227-1810 (207) 882-7323 (978) 744-0991 Age 34 52 35 43 57 Sex M F Each column for a given row is at the same offset. Copyright © 2014 InfiniDB. All Rights Reserved.

2-Dimensional Data Partitioning Vertical Partitioning by Column Not Column-Family (no relation to HBase) Only do I/O for columns requested Horizontal Partitioning by range of rows Meta-data stored within in-memory structure 10 TB of data maps to ~150k-300k discrete files. Copyright © 2014 InfiniDB. All Rights Reserved.

Column Restriction and Projection |-------------- Column # Four ---------------| |-------------- Column # Six ---------------| |-------- Column # Seventeen -----------| Extent # 5 Filter 1 Filter 2 Filter 3 Projection Projection Extent # 27 Automatic Vertical Partitioning + Horizontal Partitioning Just-In-Time Materialization 15 15

InfiniDB Design Principles ® Fast Scalable Simple

Simplicity – Automated Everything Column storage Compression /compression type No index build or maintenance required Extent Map partitioning – Vertical/ Horizontal Distribution of data across server/disk resources Distribution of work Ad-hoc performance 17

Windowing Analytic Functions InfiniDB What’s New ® Open Source – GPL v2 New Company Name Funding InfiniDB for Hadoop Windowing Analytic Functions Fast Scalable Simple

What is InfiniDB for Hadoop? Fast SQL for Hadoop offering for real-time and ad-hoc reporting and analytics Non-map/reduce engine for real-time SQL 40x to 100x faster than Hive SQL in Hadoop Reads and writes directly to HDFS/GPFS Best of breed SQL in Hadoop Superior ad-hoc usage, syntax vs. Impala/Presto MySQL Compatibility InfiniDB presents Hadoop as MySQL data source Don’t talk about every bullet, pick out key ones and relate comments back to highlights from the ask.com portion

InfiniDB Background – InfiniDB for Hadoop InfiniDB is a non-map/reduce engine Reads and writes natively to HDFS HBase Pig/Hive InfiniDB for Hadoop Map Reduce Hadoop Distributed File System

Value Proposition For InfiniDB for Hadoop Enables access to Hadoop data via familiar interface Response to competitive challenge from Cloudera Impala Complete the Hadoop Checklist Cost-effective storage Robust transforms via map/reduce Real-time SQL for analytics with InfiniDB for Hadoop Don’t talk about every bullet, pick out key ones and relate comments back to highlights from the ask.com portion

Benchmark Hive, Presto, Impala, InfiniDB http://infinidb.co/system/files/RadiantAdvisors_Benchmark_SQL-on-Hadoop_2014Q1.pdf Copyright © 2014 InfiniDB. All Rights Reserved.

PARTITION and FRAME For each row, calculation for an aggregation is done over a FRAME of rows The PARTITION of a row is the group of rows that have a value for a specific column same as the current row FRAME for each row is a subset of a PARTITION for the row SELECT x,y,sum(x) OVER (PARTITION BY y RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM a Row Number X Y PARTITION FRAME 1 Partition for rows 1 to 4 Frame for row 1 sum(x) = 22 Frame for row 2 sum(x) = 21 Frame for row 3 sum(x) = 17 Frame for row 4 sum(x) = 10 2 4 3 7 10 5 Partition for rows 5 to 7 Frame for row 5 15 Frame for row 6 13 Frame for row 7 sum(x) = 8 6 8 Partition for rows 8 to 10 Frame for row 8 18 Frame for row 9 Frame for row 10 sum(x) = 9 9

InfiniDB Use Cases Who is using it? When to use it? ® Scalable Fast Simple

Copyright © 2014 InfiniDB. All Rights Reserved. InfiniDB Customers Copyright © 2014 InfiniDB. All Rights Reserved.

InfiniDB’s place in the Big Data world Designed for high performance analytics Provides flexibility for ad hoc queries Not suited for OLTP, NoSQL, KeyValue Copyright © 2014 Calpont. All Rights Reserved.

Workload – Query Vision/Scope 1 100 10,000 1,000,000 100,000,000 10,000,000,000 Query Vision/Scope OLTP/NoSQL Workloads Analytic Workloads General DBMS missed the target (dated database technology generally suboptimal) Copyright © 2014 Calpont. All Rights Reserved.

What is your typical query? Analytic Workloads 1 100 10,000 1,000,000 100,000,000 10,000,000,000 OLTP/NoSQL Workloads Query Vision/Scope There is no “average” query. The challenges are at the extremes: The challenge of high concurrency levels with OLTP/NoSQL. The challenge of latency for very large queries. Most use cases imply multiple data technologies.

Columnar Appropriate Workloads ROLAP/Analytic/Reporting Workloads 1 100 10,000 1,000,000 100,000,000 10,000,000,000 OLTP/NoSQL Workloads Query Vision/Scope Pure Columnar about 10x worse I/O for single record lookups Pure Columnar about 10x better I/O for large data access patterns

Benefits of InfiniDB Products Real-time, Consistent Query Performance Linear Scale for Massive Data Removes Limits to Dimensions and Granularity Easy to Deploy and Maintain Scale-out MPP analytic database MySQL Columnar + Map Reduction Commercial Open Core model Products InfiniDB Enterprise Forthcoming 4th major release InfiniDB Community Modified Open Source license

Core Features of InfiniDB Scalable MPP architecture Performant ad hoc analysis Consistent query response time Simplified data administration Analytic window functions Native MySQL® driver support Open source license Deployable on premise, in the cloud, & on Apache Hadoop™ Optional Enterprise support subscription Copyright © 2014 Calpont. All Rights Reserved.