Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sqoop 2 Introduction Mengwei Ding, Software Engineer Intern at Cloudera.

Similar presentations


Presentation on theme: "Sqoop 2 Introduction Mengwei Ding, Software Engineer Intern at Cloudera."— Presentation transcript:

1 Sqoop 2 Introduction Mengwei Ding, Software Engineer Intern at Cloudera

2 What is Sqoop Apache Top-Level Project SQl and hadOOP
Transfer a large bulk of data From relational data warehouses: Teradata, MySQL, PostgreSQL, Oracle, Netezza To Hadoop ecosystem: HDFS, Hive, HBase, Avio Vice versa Sqoop 1(1.4.3) and Sqoop 2(1.99.2)

3 Sqoop 1

4 Sqoop 1 Challenges Command line tool, configured with line arguments(60+!) Connector-driven: Responsible for metadata lookups and data transfer JDBC vocabulary-enforced (--connect) Implicit connector selection Non-uniform, duplicated functionality Client accesses hadoop configurations and databases directly Security Concerns: Client needs to know credentials to databases Type mapping is not clearly defined

5 Sqoop 2 - Design Goals Same goal: transfer data around Ease of Use
Sqoop as a Service Domain Specific Interactions without too many args Ease of Extension No low-level Hadoop knowledge needed Uniform functionality of connectors, no functional overlap between connectors Security and Separation of Concerns Role based access and use

6 Sqoop 2 - Design Goals

7 Sqoop 2 - Connection vs Job Metadata
There are two distinct sets of options Connection (distinct per database) Job (distinct per table)

8 Sqoop 2 - Connection vs Job Metadata
Another distinct two sets of arguments Connector specific Shared across all connectors

9 Sqoop 2 - Security Support for secure access to external system via role-based access to connection objects Administrators create/edit/delete connections Operators use connections Connection encompass credentials Connection created once, then reused later Created by Admin, used by operator to safeguard credential access from end user

10 Sqoop 2 - Resource Management
Connections allow specification of resource policy Administrator can limit the total number of physical connections open at one time Connections can be disabled

11 Sqoop 2 - Current Status Primary focus of Sqoop community
Second cut: bits and docs:

12 Demo Time

13 Thank You!


Download ppt "Sqoop 2 Introduction Mengwei Ding, Software Engineer Intern at Cloudera."

Similar presentations


Ads by Google