Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data Enterprise Patterns

Similar presentations


Presentation on theme: "Big Data Enterprise Patterns"— Presentation transcript:

1 Big Data Enterprise Patterns
Dr. Ramkumar Lakshminarayanan Mr. Rajasekar Ramalingam Ministry of Higher Education, College of Applied Sciences, Sur, Sultanate of Oman

2 Big Data Traditional BI Methodology Big Data
works on the principle of assembling all the enterprise data in a central server. Big Data Distributed file system instead of traditional BI solution. Processing functions are taken to the data. Data is of different formats Data is both real-time data as well as offline Technology relies on massively parallel processing (MPP) concepts. 2. The Processing functions are taken to the data rather that data being taken to the functions. 3. Data is of different formats, both structured as well as unstructured.

3 High level conceptual reference architecture for Big Data
Analogous to the cloud architectures, the big data landscape can be divided in to BFaaS Industry Business Functions Big Data Analysis and Visualization Tools DaaS PaaS NoSQL and Relational Databases IaaS Big Data Storage and Infrastructure Layer

4 The Big Data Architecture

5 Data Source Layer Data sources of different volumes, velocity, and variety vie with each other to be included in the final big data set to be analyzed. Variety of Data Sources

6 Ingestion Layer The ingestion layer is the new data sentinel of the enterprise. It is the responsibility of this layer to separate the noise from the relevant information. It should have the capability to validate, cleanse, transform, reduce and integrate the data into the big data tech stack for further processing.

7 Distributed Storage Layer
HDFS (Hadoop distributed file system) A file system designed to store a very large volume of information (terrabytes or petabytes) across large volume of machines in a cluster. HDFS is not accessible as a logical data structure for easy data manipulation. To facilitate that, new distributed, non relational data stores that includes key-value pair, document, graph, columnar, and geospatial databases. These are collectively referred as NoSQL, or not only SQL

8 NoSQL Databases

9 Hadoop Infrastructure Layer
Hadoop physical infrastructure layer (HPIL) is based on a distributed computing model Share-nothing architecture Hadoop and HDFS can manage the infrastructure layer in a virtualized cloud environment or a distributed grid of commodity servers over a fast gigabit network

10 Hadoop Platform Management Layer
The layer provides tools and query languages to access the NoSQL databases using the HDFS storage file system.

11 Security Layer Big data projects are subject to security issues because of the distributed architecture. To implement security baseline foundation, the minimum security design considerations are : Authenticates nodes using protocols like Kerberos Enable file-layer encryption Subscribes for trusted keys and certificates Use tools like Chef or Puppet for validation during deployment of data. Logs the communication between nodes, and use distributed logging mechanism Ensure all communication between nodes is secure.

12 Monitoring Layer With so many distributed data storage clusters and multiple data source ingestion point, it is important to a get a complete picture of the big data tech stack so that the availability SLAs are met with minimum downtime. Performance is the key parameter to monitor so that there is very low overhead and high parallelism.

13 Analytics Engine Enterprises need to adopt different approaches to solve different problems using big data; Some analysis will use a traditional data ware house Some need both big data and traditional business intelligence methods Some may only big data sources. Mediation happens when data flows between the data ware house and big data stores.

14 Visualization Layer A huge volume of big data can lead to information overload. If visualization is incorporated early-on as an integral part of the big data tech stack, it will be useful for data analyst and scientists to gain insights faster and increase their ability to look at different aspects of the data in various visual modes.

15 Big data typical software stack

16 Big Data Deployment Patterns
Big Data deployment involves distributed computing, multiple clusters, networks and firewalls. Infrastructure for a big data implementation includes storage, network, and processing power units. Security infrastructure Appliances or NoSQL storage layers Hybrid infrastructure

17 Traditional tree network pattern

18 Resource negotiator pattern for Security and Data Integrity

19 Spine Fabric Pattern

20 Federation Pattern

21 References 1. Apache Hadoop 2.Big Data Application Architecture Q&A - A Problem-Solution Approach

22 Questions

23 Thanks For queries contact us as at ramkumar.sur@cas.edu.om


Download ppt "Big Data Enterprise Patterns"

Similar presentations


Ads by Google