Presentation is loading. Please wait.

Presentation is loading. Please wait.

APACHE HAWQ 2.X A Hadoop Native SQL Engine

Similar presentations


Presentation on theme: "APACHE HAWQ 2.X A Hadoop Native SQL Engine"— Presentation transcript:

1 APACHE HAWQ 2.X A Hadoop Native SQL Engine

2 A Apache HAWQ committer in Pivotal
hello! I’m Ivan Weng A Apache HAWQ committer in Pivotal For any HAWQ question, feel free to reach me or

3 Introduction Architecture 2.X New Features Query Processing Future
Agenda Introduction Architecture 2.X New Features Query Processing Future Community

4 Hawq History Postgres Greenplum HAWQ 2005 2009 2011 2013 2015 2017
Fork Postgres 8.0.2 2005 2009 2011 2013 2015 2017 Rebase Postgres Fork GPDB 4.2.0 HAWQ 1.0.0 HAWQ & Apache Incubation HAWQ 2.1.1

5 HAWQ Motivation Hadoop’s advantages Hadoop’s limitations SQL & MPP
Low cost Scalability Fault tolerance …... Low level programming model Poor performance for interactive analysis …... SQL is best choice for data analysis and analyst MPP fast query processing capabilities …... HAWQ

6 Challenges Storage Resource Ecosystem Other
In shared HDFS, how to control data locality ? In shared Cluster, how to control resources usage (CPU, Memory, IO …...) ? In Hadoop ecosystem, how to collaborate with other components, like Yarn Hive, HBase, Ranger ? How to handle more larger data volume, various format (Parquet, Text, Json), different languages integration ?

7 HAWQ 2.X Architecture YARN Resource Manager Catalog Service
YARN Node Manager YARN Node Manager YARN Node Manager container container container container container container Catalog Service QE QE QE QE QE QE Segment Segment Segment HAWQ Master QE QE QE QE QE QE HDFS Namenode HDFS DataNode HDFS DataNode HDFS DataNode

8 Yarn Namenode Client External System libyarn Resource Broker Optimizer
Parser/Analyzer HAWQ Master libpq Resource Manager Resource Enforcement HDFS Catalog Cache Client Fault Tolerance Service Dispatcher Catalog Service HAWQ Segment VS VS Virtual Segment VS VS VS Interconnect Interconnect Physical Segment Physical Segment Physical Segment libhdfs3 Node Manager NM & DN Datanode PXF External System

9 HAWQ 2.X New Features Elastic execution runtime Resource management
One physical segment per node Multiple virtual segments can be started on each node Queries can run on subsets of nodes Resource management Three layer resource management Global YARN/Query/Operator Hierarchical resource queues YARN Integration Resource enforcement Dynamic expansion & shrinking Fast and without redistribution No interruption of current workload for expansion New sessions will use all resources on the new segments New dispatcher Flexible gang management Process reuse between queries New fault tolerance service Heartbeat and on-demand probe Per table directory Complete separation between compute and storage Ease integration with external system Block level storage AO & Parquet HDFS catalog cache Accelerate data locality compute New management tools Consolidate all management tools

10 Query Processing Server1: 2 containers Server2: 1 container ServerN: 2 containers 6 Resource Manager Yarn RM SELECT COUNT(*) FROM t; 5 query query tree query plan Client Parser/ Analyzer Planner Dispatcher I need 5 containers, each with 1 CPU core and 256MB RAM Namenode 4 1 2 3 7 dispatch query plan (slices and resource quota) and get query result HAWQ Master HAWQ Segment …... …... …... …... …... QE QE QE QE QE Interconnect Interconnect VS VS VS VS VS Physical Segment Physical Segment Physical Segment NM & DN NM & DN NM & DN …... Server1 Server2 ServerN

11 Future Hybrid OLTP & OLAP Active Geo-Replication Multi Master Index
Update & Delete Rolling upgrade …...

12 Community Birthday Website Wiki Code Repo JIRA Mailing list Papers
HAWQ has been incubating since Website Wiki Code Repo JIRA Mailing list Papers Lei Chang et al: HAWQ: a massively parallel processing SQL engine in hadoop. SIGMOD Conference 2014: Mohamed A. Soliman et al: Orca: a modular query optimizer architecture for big data. SIGMOD Conference 2014: Lyublena Antova et al: Optimizing queries over partitioned tables in MPP systems. SIGMOD Conference 2014: Amr El-Helw et al: Optimization of Common Table Expressions in MPP Database Systems. PVLDB 8(12): (2015)

13 HAWQ/GPDB is hiring Join us: pivotalrnd_china_jobs@pivotal.io
thanks! HAWQ/GPDB is hiring Join us:


Download ppt "APACHE HAWQ 2.X A Hadoop Native SQL Engine"

Similar presentations


Ads by Google