Big Data Enterprise Patterns

Slides:



Advertisements
Similar presentations
Distributed Data Processing
Advertisements

An Information Architecture for Hadoop Mark Samson – Systems Engineer, Cloudera.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
David Besemer, CTO On Demand Data Integration with Data Virtualization.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
Microsoft Azure and DataStax: Start Anywhere and Scale to Any Size in the Cloud, On- Premises, or Both with a Leading Distributed Database MICROSOFT AZURE.
Flight is a SaaS Solution that Accelerates the Secure Transfer of Large Files and Data Sets Into and Out of Microsoft Azure Blob Storage MICROSOFT AZURE.
Role Activity Sub-role Functional Components Control Data Software.
Saasabi’s Analytical Processing Engine in the Cloud Makes Business Intelligence Affordable for Everyone COMPANY PROFILE: Saasabi Saasabi is a BizSpark.
Smart Grid Big Data: Automating Analysis of Distribution Systems Steve Pascoe Manager Business Development E&O - NISC.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Big Data analytics in the Cloud Ahmed Alhanaei. What is Cloud computing?  Cloud computing is Internet-based computing, whereby shared resources, software.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
BI 202 Data in the Cloud Creating SharePoint 2013 BI Solutions using Azure 6/20/2014 SharePoint Fest NYC.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
Big Data-An Analysis. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult.
DATA Storage and analytics with AZURE DATA LAKE
Protecting a Tsunami of Data in Hadoop
AuraPortal Cloud Helps Empower Organizations to Organize and Control Their Business Processes via Applications on the Microsoft Azure Cloud Platform MICROSOFT.
Understanding The Cloud
Organizations Are Embracing New Opportunities
Data Platform and Analytics Foundational Training
SAS users meeting in Halifax
DocFusion 365 Intelligent Template Designer and Document Generation Engine on Azure Enables Your Team to Increase Productivity MICROSOFT AZURE APP BUILDER.
Introduction to Distributed Platforms
Ralleo Enterprise-Grade Solution for Managing Change and Business Transformation Provides Opportunities to Better Analyze Real-Time Data MICROSOFT AZURE.
Partner Logo Veropath Offers a Next-Gen Expense Management SaaS Technology Solution, Built Specifically to Harness Big Data Analytics Capabilities in Azure.
Zhangxi Lin, The Rawls College,
Big Data Technology.
Open Source distributed document DB for an enterprise
Couchbase Server is a NoSQL Database with a SQL-Based Query Language
Grid Computing.
Veeam Backup Repository
Language Understanding Intelligent Service and Microsoft Azure Enable Rover, PLEX.AI’s Artificial Intelligence-Powered Virtual Insurance Advisor MICROSOFT.
Get Real Value and Insights from Your Data: Biin Solutions Provides Predictive Analytics, IoT, and Business Intelligence with Microsoft Azure Power MICROSOFT.
Establishing A Data Management Fabric For Grid Modernization At Exelon
Enterprise security for big data solutions on Azure HDInsight
Ministry of Higher Education
Designed for Big Data Visual Analytics, Zoomdata Allows Business Users to Quickly Connect, Stream, and Visualize Data in the Microsoft Azure Platform MICROSOFT.
Yellowfin: An Azure-Compatible Business Intelligence Platform That Connects People with Their Data for Better Decision Making MICROSOFT AZURE APP BUILDER.
Scalable SoftNAS Cloud Protects Customers’ Mission-Critical Data in the Cloud with a Highly Available, Flexible Solution for Microsoft Azure MICROSOFT.
Big Data - in Performance Engineering
Voice Analytics on Microsoft Azure Allows Various Customers to Get the Most Out of Conversations with Clients Through Efficient Content Analysis MICROSOFT.
Massively Parallel Processing in Azure Comparing Hadoop and SQL based MPP architectures in the cloud Josh Sivey SQL Saturday #597 | Phoenix.
DeFacto Planning on the Powerful Microsoft Azure Platform Puts the Power of Intelligent and Timely Planning at Any Business Manager’s Fingertips Partner.
Data Security for Microsoft Azure
MARMIND’s New Service Delivers a Single Centralized Marketing Plan That Connects Teams, Campaigns and Outcomes by Using the Power of the Azure Platform.
Tools for Processing Big Data Jinan Al Aridhee and Christian Bach
Ch 4. The Evolution of Analytic Scalability
XtremeData on the Microsoft Azure Cloud Platform:
Overview of big data tools
Big Data Young Lee BUS 550.
TEMPLATE NOTES Our datasheet and mini-case study templates are formatted specifically for consistency of branding at Microsoft. Please do not alter font.
Improve Patient Experience with Saama and Microsoft Azure
Last.Backend is a Continuous Delivery Platform for Developers and Dev Teams, Allowing Them to Manage and Deploy Applications Easier and Faster MICROSOFT.
Charles Tappert Seidenberg School of CSIS, Pace University
Business Intelligence
Big DATA.
Windows Azure Hybrid Architectures and Patterns
Big-Data Analytics with Azure HDInsight
SQL Server 2019 Bringing Apache Spark to SQL Server
Architecture of modern data warehouse
Big Data.
Presentation transcript:

Big Data Enterprise Patterns Dr. Ramkumar Lakshminarayanan Mr. Rajasekar Ramalingam Ministry of Higher Education, College of Applied Sciences, Sur, Sultanate of Oman

Big Data Traditional BI Methodology Big Data works on the principle of assembling all the enterprise data in a central server. Big Data Distributed file system instead of traditional BI solution. Processing functions are taken to the data. Data is of different formats Data is both real-time data as well as offline Technology relies on massively parallel processing (MPP) concepts. 2. The Processing functions are taken to the data rather that data being taken to the functions. 3. Data is of different formats, both structured as well as unstructured.

High level conceptual reference architecture for Big Data Analogous to the cloud architectures, the big data landscape can be divided in to BFaaS Industry Business Functions Big Data Analysis and Visualization Tools DaaS PaaS NoSQL and Relational Databases IaaS Big Data Storage and Infrastructure Layer

The Big Data Architecture

Data Source Layer Data sources of different volumes, velocity, and variety vie with each other to be included in the final big data set to be analyzed. Variety of Data Sources

Ingestion Layer The ingestion layer is the new data sentinel of the enterprise. It is the responsibility of this layer to separate the noise from the relevant information. It should have the capability to validate, cleanse, transform, reduce and integrate the data into the big data tech stack for further processing.

Distributed Storage Layer HDFS (Hadoop distributed file system) A file system designed to store a very large volume of information (terrabytes or petabytes) across large volume of machines in a cluster. HDFS is not accessible as a logical data structure for easy data manipulation. To facilitate that, new distributed, non relational data stores that includes key-value pair, document, graph, columnar, and geospatial databases. These are collectively referred as NoSQL, or not only SQL

NoSQL Databases

Hadoop Infrastructure Layer Hadoop physical infrastructure layer (HPIL) is based on a distributed computing model Share-nothing architecture Hadoop and HDFS can manage the infrastructure layer in a virtualized cloud environment or a distributed grid of commodity servers over a fast gigabit network

Hadoop Platform Management Layer The layer provides tools and query languages to access the NoSQL databases using the HDFS storage file system.

Security Layer Big data projects are subject to security issues because of the distributed architecture. To implement security baseline foundation, the minimum security design considerations are : Authenticates nodes using protocols like Kerberos Enable file-layer encryption Subscribes for trusted keys and certificates Use tools like Chef or Puppet for validation during deployment of data. Logs the communication between nodes, and use distributed logging mechanism Ensure all communication between nodes is secure.

Monitoring Layer With so many distributed data storage clusters and multiple data source ingestion point, it is important to a get a complete picture of the big data tech stack so that the availability SLAs are met with minimum downtime. Performance is the key parameter to monitor so that there is very low overhead and high parallelism.

Analytics Engine Enterprises need to adopt different approaches to solve different problems using big data; Some analysis will use a traditional data ware house Some need both big data and traditional business intelligence methods Some may only big data sources. Mediation happens when data flows between the data ware house and big data stores.

Visualization Layer A huge volume of big data can lead to information overload. If visualization is incorporated early-on as an integral part of the big data tech stack, it will be useful for data analyst and scientists to gain insights faster and increase their ability to look at different aspects of the data in various visual modes.

Big data typical software stack

Big Data Deployment Patterns Big Data deployment involves distributed computing, multiple clusters, networks and firewalls. Infrastructure for a big data implementation includes storage, network, and processing power units. Security infrastructure Appliances or NoSQL storage layers Hybrid infrastructure

Traditional tree network pattern

Resource negotiator pattern for Security and Data Integrity

Spine Fabric Pattern

Federation Pattern

References 1. Apache Hadoop 2.Big Data Application Architecture Q&A - A Problem-Solution Approach

Questions

Thanks For queries contact us as at ramkumar.sur@cas.edu.om rajasekar.sur@cas.edu.om