Data Warehousing in the age of Big Data (1)

Data Warehousing in the age of Big Data (1)

3V of Big Data

Additional Big Data characteristics.
Ambiguity (애매성) a lack of metadata creates ambiguity in Big Data. For example, in a photograph or in a graph, M and F can depict gender or can depict Monday and Friday. Viscosity (저항성) measures the resistance (slow down) to flow in the volume of data. Resistance can manifest in dataflows, business rules, and even be a limitation of technology. For example, social media monitoring falls into this category, where a number of enterprises just cannot understand how it impacts their business and resist the usage of the data until it is too late in many cases. Virality (전염성) measures and describes how quickly data is shared in a people-to-people (peer) network. Rate of spread is measured in time. For example, re-tweets that are shared from an original tweet is a good way to follow a topic or a trend. The context of the tweet to the topic matters in this situation.

Hive an open-source data warehousing solution that has been built on top of Hadoop Architecture Metastore: stores the system catalog and metadata about tables, columns, and partitions. Driver: maintains session details, process handles, and statistics, and manages the life cycle of a HiveQL statement as it moves through Hive. Query compiler: compiles HiveQL into Map and Reduce tasks Execution engine: processes and executes the tasks produced by the compiler in a dependency order. The execution engine manages all the interactions between the compiler and Hadoop. Thrift server: provides a thrift interface, a JDBC/ODBC server, and a rich API to integrate Hive with other applications. CLI and web UI—two client interfaces. the command line interface (CLI) allows command-line execution and the web user interface (web UI) is a management console.

Data Abstractions in Hive

Query Processing in Hive

DW Architectures Pros Bottom-up approach
Faster implementation of multiple manageable modules Simple design at the datamart level Less risk of failure Incremental approach to building most important or complex datamarts first Can deploy in smaller footprint of infrastructure. Cons A datamart cannot see outside of its subject area of focus. Redundant data architecture can become expensive. Needs all requirements to be completed before the start of the project. Difficult to manage operational workflows for complex business intelligence. Bottom-up approach

DW Architectures Top-down approach Pros
Provides an enterprise view of the data Centralized architecture Central rules and control Refresh of data happens at one location Extremely high performance Can build in multiple steps Cons High risk of failure Data quality can stall processing data to the data warehouse Expensive to maintain Needs more scalable infrastructure

Data Warehouse 2.0 4 data layers Huge amounts of data
Complex types- and diverse format of data (e.g., text, images, video, sensor data, and etc.) The next-generation data warehouse is an integrated architecture of Big Data and traditional data in one heterogeneous platform. Next-generation DW should focus on usability and scalability from a user perspective.

Data Warehouse 2.0

DSS 2.0 architecture Integration platform BI + content management
BI + process management

DSS 2.0 matrix Business Intelligence를 3개의 영역으로 구분

Big Data Platform (including DW)

Big Data & DW

Data Warehousing in the age of Big Data (1)

Similar presentations

Presentation on theme: "Data Warehousing in the age of Big Data (1)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Warehousing in the age of Big Data (1)

Similar presentations

Presentation on theme: "Data Warehousing in the age of Big Data (1)"— Presentation transcript:

Similar presentations

About project

Feedback