Data Service Abstraction Transformation Provider Data Consumer Role DATA Data Provider Role DATA Capabilities Provider Big Data Framework Scalable Infrastructures.

Slides:



Advertisements
Similar presentations
Distributed Data Processing
Advertisements

Page 1 Ricardo Villalobos Windows Azure Architect Evangelist Microsoft Corporation Designing, Building, and Deploying Windows Azure applications.
1 Security on OpenStack 11/7/2013 Brian Chong – Global Technology Strategist.
A Java Architecture for the Internet of Things Noel Poore, Architect Pete St. Pierre, Product Manager Java Platform Group, Internet of Things September.
Reference Architecture Subgroup NIST Big Data Public Working Group Reference Architecture Subgroup September 30, 2013 Co-chairs: Orit LevinMicrosoft James.
Paula Ta-Shma, IBM Haifa Research 1 “Advanced Topics on Storage Systems” - Spring 2013, Tel-Aviv University Big Data and.
“It’s going to take a month to get a proof of concept going.” “I know VMM, but don’t know how it works with SPF and the Portal” “I know Azure, but.
8.
Technical Architectures
Zhipeng (Howard) Huang
FI-WARE – Future Internet Core Platform FI-WARE Cloud Hosting July 2011 High-level description.
Asper School of Business University of Manitoba Systems Analysis & Design Instructor: Bob Travica System architectures Updated: November 2014.
NIST Big Data Public Working Group Reference Architecture Subgroup September 30, 2013 Co-chairs: Orit LevinMicrosoft James KetnerAT&T Don KrapohlAugmented.
Cloud computing Tahani aljehani.
INTRODUCTION TO CLOUD COMPUTING Cs 595 Lecture 5 2/11/2015.
Introduction to Databases Transparencies 1. ©Pearson Education 2009 Objectives Common uses of database systems. Meaning of the term database. Meaning.
For more notes and topics visit:
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Introduction to the Enterprise Library. Sounds familiar? Writing a component to encapsulate data access Building a component that allows you to log errors.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
C Copyright © 2009, Oracle. All rights reserved. Appendix C: Service-Oriented Architectures.
MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …
K E Y : SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Application Provider Visualization Access Analytics Curation Collection.
Software Architecture
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Database System Concepts and Architecture
M1G Introduction to Database Development 6. Building Applications.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
Databases and Database Management Systems
 Chapter 6 Architecture 1. What is Architecture?  Overall Structure of system  First Stage in Design process 2.
Cloud Computing & Amazon Web Services – EC2 Arpita Patel Software Engineer.
Cloud Computing Nathan Bosen Kelsie Cagampang MIS 424 May 29, 2013.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
K E Y : SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Transformation Provider Visualization Access Analytics Curation Collection.
Visual Studio Windows Azure Portal Rest APIs / PS Cmdlets US-North Central Region FC TOR PDU Servers TOR PDU Servers TOR PDU Servers TOR PDU.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Windows Azure. Azure Application platform for the public cloud. Windows Azure is an operating system You can: – build a web application that runs.
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Architecture & Cybersecurity – Module 3 ELO-100Identify the features of virtualization. (Figure 3) ELO-060Identify the different components of a cloud.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
K E Y : DATA SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Hardware (Storage, Networking, etc.) Big Data Framework Scalable.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Role Activity Sub-role Functional Components Control Data Software.
2) Database System Concepts and Architecture. Slide 2- 2 Outline Data Models and Their Categories Schemas, Instances, and States Three-Schema Architecture.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Andy Roberts Data Architect
© Copyright IBM Corporation 2016 Diagram Template IBM Cloud Architecture Center Using the Diagram Template This template is for use in creating a visual.
AZ PASS User Group Azure Data Factory Overview Josh Sivey, Solution Partner October
E-commerce Architecture Ayşe Başar Bener. Client Server Architecture E-commerce is based on client/ server architecture –Client processes requesting service.
BIG DATA/ Hadoop Interview Questions.
CLOUD ARCHITECTURE Many organizations and researchers have defined the architecture for cloud computing. Basically the whole system can be divided into.
Chapter 6: Securing the Cloud
Big Data Enterprise Patterns
Connected Living Connected Living What to look for Architecture
Smart Building Solution
Big Data A Quick Review on Analytical Tools
Smart Building Solution
Connected Living Connected Living What to look for Architecture
MVC and other n-tier Architectures
#01 Client/Server Computing
Cloudy with a Chance of Data
Cloud Computing: Concepts
#01 Client/Server Computing
Presentation transcript:

Data Service Abstraction Transformation Provider Data Consumer Role DATA Data Provider Role DATA Capabilities Provider Big Data Framework Scalable Infrastructures (VM cluster, H/W,Storage. ) Legacy Infrastructures Scalable Platforms (databases, etc.) Legacy Platforms Scalable Processing (analytic tools, etc.) Legacy Processing System/Data Orchestration Roles SW Usage Service Abstraction Capabilities Service Abstraction System Service Abstraction DATA SW IT STACK / VALUE CHAIN INFORMATION FLOW / VALUE CHAIN CMD Data Mgmt Role Data Security Role System Mgt Role Curation Collection Access Big Data Application Frameworks CMD

Ref Architecture Description Flows Movement of Data: Computer Generated Instructions: – E.g. Send me this data Manual Instructions: – E.g. Set up an alert to send me data Xfer and execution of Software: – E.g. SQL Query, JAR File with Map/Reduce DATA SW CMD

Ref Architecture Description System/Data Orchestration Roles Collection of Roles performed by one or more actors which manages and orchestrates the operation of the Big Data System Data Mgmt Role: This role is responsible for managing data flow into the system and management of the data with-in the system. – Works closely with the Data Security Role to control what data can leave the system and that when data is ingested it is properly secured. – Works closely with the System Mgt Role to insure that there are adequate resources to store and operate on the data and that data is archived/backed up as required. Data Security Role: This role is responsible for managing the security of the data in the system. – Works closely with the Data Manager to identify and implement appropriate security controls on data to insure the no improper data enters the system and only allowed data is released to consumers (which may vary based on the consumer – Attribute/Role based access control) – Works closely with the System Mgt Role to make sure that security and data access controls are implemented across the system. System Mgt Role: This role is responsible for managing the operations of, access to, and resources of the system (e.g. System Administrator) – Works with the Data Mgmt Role to allocate resources to data and associated applications. Implements data retention and backup policies identified by the Data Mgmt Role. – Works with the Data Security Role to implement system and data access controls. Insures that appropriate audit/security logs are maintained per the Data Security Role requirements. Manages access to system and data resources based on Data Security Role guidance.

Ref Architecture Description Data Provider and Data Service Abstraction This role is frequently external to the system They may be a system (legacy or big data) or someone with a tape. They may have full or partial ownership of the data (e.g. they have limits on how they can share it). They may be the direct producer of the data (a sensor) or someone to creates data from other data. Data Service Abstraction – This is the interface to the provider for the data. It may be complex or simple – As an example the provider may be a data base with a SQL interface (absstraction) but the backend may be Oracle, MySQL, or Hadoop fronted by Hive. – The abstraction may also be that I am submitting a code for execution within a framework. – Not all commands to the abstraction may be automated (from a machine) but rather might involve a Data Mgmt role logging into the system and providing directions where new data should be transferred (say via FTP).

Ref Architecture Description Data Consumer This role may be an end user or another system (big data or legacy) Typically they are external or at most at the edge of the system They gain access to the data via the Usage Service Abstraction Layer Their access is controlled by the System/Data Orchestration Roles Consumption may mean: – Downloading a raw data chunk – Downloading the results of an analytic – Viewing and interacting with a visualization

Concerns Use colors explicitly, roles one color, capabilities another, etc. I am not certain of the purpose use of the orange arrows (Info Flow/Val Chain, IT Stack/Val Chain). – On the Info flow that is really a top to bottom flow Data moves from provider to consumer In that move value is added – On the IT Stack – I am just not certain what it provides. To me: there should be a single Data/Usage Service Abstraction term. – If you stack these diagram where the consumer is a system it is really interacting with the big data systems as a data provider. I focused on providing words around the boxes which had the most discussion. Still need lots of words around the middle and right boxes.

Capabilities Provider Big Data Framework Infrastructures (VM cluster, H/W,Storage. ) Platforms (databases, etc.) Processing (analytic tools, etc.) Capabilities Service Abstraction 1. Why is hardware, storage, and networking completely separate from the Big Data Framework? I am asking this because my experience is that they are intimately tied to and part of the overall Big Data framework. For example, Using a large centralized SAN with say an Hadoop implementation is generally considered very poor practice. If I am implementing any sort of Grid or Cluster environment the networking between the nodes and racks is critical to the performance of the cluster. My recommendation which I believe will make the diagram more internally consistent is to extend the Big Data Framework box to include the Scalable Infrastructure and delete the grey Hardware box. If nothing else it is one less box to explain. But if I am describing a Big Data framework I almost always have to discuss Hardware components.

Capabilities Provider Big Data Framework Infrastructures (VM cluster, H/W,Storage. ) Platforms (databases, etc.) Processing (analytic tools, etc.) Capabilities Service Abstraction 2. In looking at the transformation Provider all the "Application" functionality is embedded in there. In Big Data Framework there is Scalable and Legacy Applications. But it mentions Analytic Tools. Tools themselves do not implement application functionality but rather enable it. As a component of a Capabilities Provider the indication that general capabilities are provided versus applications. Applications are also not Frameworks. An analytic tool like R or Distributed R is a capability which would allow transformation, visualization, analytics to be run on some data stored in a platform (HDFS, RDBMS, HBASE, etc.). My question is what is the purpose or meaning of the term application and how is it different from the application functions provided in the transformation provider? My recommendation is to replace the word "Applications" in that box with "Processing". That brings that box more inline with the idea of a framework. To illustrate by example: Scalable Processing: Map/Reduce, R, Giraph, Storm, Messaging (MPI, Active Mqueue) Legacy Processing: Service Bus, SOA platforms (JBOSS, Tomcat, etc.), Web Servers Scalable Platforms: HDFS, Accumulo, HBASE, Solr, Mongo. Legacy Platforms: MySQL, Oracle, Berkley DB Also, I should point out that in some discussions there have been objections to the word "Scalable" being used here. Lots of things in the legacy realm are scalable. Oracle and MySQL will certainly scale not just vertically but also have fairly well known horizontal scaling strategies. My personnel thought is that we simplify the diagram and just talk about Processing, Platforms, and Infrastructure. BTW - By going to those terms in my mind we would align better with the Cloud Computing Ref Architecture with: Infrastructrue ~ IaaS Platform ~ PaaS Processing ~ SaaS