Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Sqoop 2 Introduction Mengwei Ding, Software Engineer Intern at Cloudera.

Similar presentations


Presentation on theme: "1 Sqoop 2 Introduction Mengwei Ding, Software Engineer Intern at Cloudera."— Presentation transcript:

1 1 Sqoop 2 Introduction Mengwei Ding, Software Engineer Intern at Cloudera

2 What is Sqoop Apache Top-Level Project SQl and hadOOP Transfer a large bulk of data From relational data warehouses: Teradata, MySQL, PostgreSQL, Oracle, Netezza To Hadoop ecosystem: HDFS, Hive, HBase, Avio Vice versa Sqoop 1(1.4.3) and Sqoop 2(1.99.2) 2

3 Sqoop 1 3

4 Sqoop 1 Challenges Command line tool, configured with line arguments(60+!) Connector-driven: o Responsible for metadata lookups and data transfer o JDBC vocabulary-enforced (--connect) o Implicit connector selection Non-uniform, duplicated functionality Client accesses hadoop configurations and databases directly Security Concerns: o Client needs to know credentials to databases Type mapping is not clearly defined 4

5 Sqoop 2 - Design Goals Same goal: transfer data around Ease of Use o Sqoop as a Service o Domain Specific Interactions without too many args Ease of Extension o No low-level Hadoop knowledge needed o Uniform functionality of connectors, no functional overlap between connectors Security and Separation of Concerns o Role based access and use 5

6 Sqoop 2 - Design Goals 6

7 Sqoop 2 - Connection vs Job Metadata There are two distinct sets of options o Connection (distinct per database) o Job (distinct per table) 7

8 Sqoop 2 - Connection vs Job Metadata Another distinct two sets of arguments o Connector specific o Shared across all connectors 8

9 Sqoop 2 - Security Support for secure access to external system via role-based access to connection objects o Administrators create/edit/delete connections o Operators use connections Connection encompass credentials o Connection created once, then reused later o Created by Admin, used by operator to safeguard credential access from end user 9

10 Sqoop 2 - Resource Management Connections allow specification of resource policy o Administrator can limit the total number of physical connections open at one time o Connections can be disabled 10

11 Sqoop 2 - Current Status Primary focus of Sqoop community Second cut: o bits and docs: 11

12 Demo Time 12

13 13 Thank You!


Download ppt "1 Sqoop 2 Introduction Mengwei Ding, Software Engineer Intern at Cloudera."

Similar presentations


Ads by Google