IBM Cloud Private for Data Overview

IBM Cloud Private for Data Overview
Burt Vialpando – IBM Executive Analytics Architect August 2, 2018

IBM Cloud Private for Data – the business case
Agenda IBM Cloud Private for Data – the business case Industry drivers Technology drivers IBM Cloud Private – the foundation IBM Cloud Private for Data – what is it? Value proposition Architecture & component capabilities Collect, organize, and analyze: screen shots Packaging & use cases IBM Cloud / © 2018 IBM Corporation

IBM Cloud Private for Data -
The Business Case

Multi-cloud and AI are driving digital transformation
>85% Of enterprise IT organization will commit to multi-cloud architectures >59% Of large enterprises see improved application quality & reduced defects using containers 3 out of 4 Of large enterprises will have digital transformation at the center of corporate strategy within two years 4 out of 5 Of companies do not yet understand the data required for AI Multi-Cloud and AI is about business value – productivity, innovation, differentiation, and competitiveness. We are at the tipping point at which cloud and AI becomes a true business platform - on which we will: Create new business models and consumer services Connect and unlock billions of value in existing applications and data optimize every work load according to the to the best fit – public, private and traditional on premise -- according to economics, flexibility, performance, location, etc. This is the process clients are going through. Just as the Internet digitized commerce – cloud is digitizing virtually every business process, interaction, and experience. Today, 75% of companies say that digital transformation will be at the core of their corporate strategy within 2 years. Underpinning digital transformation is a shift to multi-cloud for agility, along with an increased focus on capturing the value of AI. However, 81% of companies says they do not yet understand the data needed for AI. IBM Cloud / © 2018 IBM Corporation

Portable Applications
Multi-cloud is being driven by cloud native architectures Microservices and containers are changing IT The Compound Annual Growth Rate of traditional IT continues to shrink while public and private cloud continues to grow CAGR: -8% Containers and microservices are keys to this transformation Traditional IT New App Development Lift and Shift Public Integration Public with Virtual Private Cloud Private Cloud Public Cloud Portable Applications Public Cloud Over the past few years, there has been a steady shift in how software is developed and consumed with increased adoption of public cloud. However, many enterprise customers are wary of fully embracing public cloud. Part of this is due to regulatory and compliance requirements, while part of it is cultural. Data gravity is also a factor given most of enterprise data currently resides on-premises and moving it to public cloud is not a viable option in the short term. Despite these concerns, customers are looking to modernize their workloads to prepare for a cloud future. With the advent of private cloud, customers can now realize benefits of cloud behind their firewall. Companies are moving traditional on-premises workloads to private for increased agility and efficiency – both lift and shift, and afresh. Cloud-native architectures and containers are driving multi-cloud adoption. In fact, more than 85% of companies are committing to multi-cloud. CAGR: 16% CAGR: 30% IBM Cloud / © 2018 IBM Corporation

Monolithic Architecture Microservices Architecture
Microservices – the first key to cloud native applications Making development & deployment more efficient Microservice vs. monolithic architecture Microservice Data Access Layer Business Logic DB UI Monolithic Architecture Microservices Architecture Microservices benefits Improves fault isolation: Larger applications can remain largely unaffected by the failure of a single module Eliminates long-term commitment to a single technology stack: Try out a new technology stack on an individual service and roll it back if required Easier development: A new developer can more easily understand the functionality of a service Easier deployment: Auto provision, auto scale and provide auto- redundancy Being Cloud Native means designing and running your applications to take advantage of all the benefits of a cloud system, including focusing on horizontal design – and treating the instances of your applications or services as “cattle” instead of using the traditional “pet” model. The Cloud Native Computing Foundation describes cloud native applications as those that have the following three properties: They are container packaged, running the application and processes as isolated units They are dynamically managed by a central orchestrating process such as Kubernetes or Apache Mesos They are microservices-oriented, having small, focused applications that are designed to be composable via service endpoints (such as REST) This approach allows enterprises to ship faster, reduce risk, and increase efficiency by building smaller, autonomous (yet composable) services that can be started and stopped on-demand with little to no consequence. Like SOA (or Service Oriented Architectures), Microservices are smaller services (Lego Blocks) within a larger architecture. In the case of having a shopping cart – you might have a cart service, a shipping service, and a payment service. Each service is really it’s own separate application – hosted on its own server (or in the cloud), abstracted by its own API, and operated independently of the other services. However, in contrast to SOA communication protocols used amongst microservices are more modern tailored to the needs of the cloud (e.g. REST, Thrift, HTTP and RPC vs SOAP). Thus in a Cloud Native architecture, applications are COMPOSED of microservices reducing time to value and increasing reliability of the resulting application. However, microservices require a change in the way applications are being developed. A critical point to make is that while CONTAINER services are excellent to host microservices they can equally well host a complete legacy stateful application (or Monolithic Applications), allowing customers to lift and shift traditional applications to the cloud in a step wise approach to application modernization. IBM Cloud / © 2018 IBM Corporation

Containers – the second key to cloud native applications Reducing operational and development costs
Virtual machines vs. containers App 1 App 2 App 3 Bins/Libs Bins/Libs Bins/Libs Bins/Libs App 1 App 2 App 3 Guest OS Guest OS Guest OS overhead Container Engine Hypervisor Operating System Infrastructure Infrastructure Containers can be 2 – 3 times more resource efficient than virtual machines On average Docker developers ship software 7x more frequently BV Virtualization technologies are a solution to the problem of how to get software to run reliably when moved from one computing environment to another. Though both VMware and Docker can be categorized as virtualization technologies, optimal use cases for each can be quite different. For example, VMware emulates virtual hardware and must account for all the underlying system requirements – subsequently, virtual machine images are significantly larger than containers (up to 2 – 3X more resource intensive). Docker does not create an entire virtual operating system – instead, all required components not already running on the host machine are packaged up inside the container with the application. Since the host kernel is shared amongst Docker containers, applications only ship with what they need to run – no more, no less. This makes Docker applications easier and more lightweight to deploy and faster to start up than virtual machines. While VMWARE is inherently more secure as images are isolated from each other, IBM’s vulnerability advisor and security portfolio can make Docker containers as secure as VMWARE. On average, Docker users ship software 7X more frequently. Can accelerate development and CI/CD pipelines by eliminating headaches of setting up environments and dealing with differences between environments. Containers virtual software in the way that virtual machines have virtualized hardware IBM Cloud / © 2018 IBM Corporation

Container automation and orchestration is essential Enter: Kubernetes
Containers are revolutionizing IT But they require orchestration Kubernetes - κυβερνήτης Means “helmsman” or “pilot” Containers by themselves can out of hand, plus they don’t naturally lend to scaling, high availability and such. Therefore, an orchestrator is required. IBM Cloud / © 2018 IBM Corporation

Private Clouds address the new IT reality Created by digital transformation
Method Development Deployment Environment Waterfall Monolithic Bare metal On-Premises Agile Programming N-Tier Virtual Server Off-Premises Agile DevOps Microservices Containers Cloud Time to value Perception of cost Customers are facing a new IT reality, driven by the need for digital transformation. The changes started as a slow evolution of the status quo in the late eighties and have dramatically accelerated in the past 10 years with the advent of mobile technologies and the cloud. Methods: We have moved from Waterfall (80’s), to Agile programming (2000) to Agile Devops (2010+) Application Architecture and Development: from Monolithic Applications, to Client Server, to SOA and the Evolution of SOA represented by Microservices Deployment: from Bare metal, to VMWARE Virtualization (allowing you to pack Applications, Middleware and OS and transport to different compute infrastructures) to Docker Containers (a lean and much more economical virtualization offering) Hosting Environment: from On Premise, to Off Premise, to the Cloud (either on site or off site) All these changes have been in an effort to speed up time to value, and innovate to survive the digital disruption while lowering costs. Containerization transforms the data center from being machine-oriented to being application-oriented. Building management APIs around containers rather than machines shifts the “primary key” of the data center from machine to application. Containers encapsulate the application environment, abstracting away many details of machines and operating systems from the application developer and the deployment infrastructure. The decoupling of image and OS makes it possible to provide the same deployment environment in both development and production, which, in turn, improves deployment reliability and speeds up development by reducing inconsistencies and friction. How ICP helps accelerate IT’s transformation efforts: Cost effective Infrastructure – ICP at its core is a Docker Container, Kubernetes based platform that brings all the benefits of such a containerized environment i.e., 2 – 3 times reduction of the amount of resources needed vs VMWARE or Bare Metal when running typical Enterprise workloads. Accelerate Time to Market - ICP is the perfect environment for BOTH Cloud Native (Microservices based) applications as well as to Lift and Shift and modernize existing monolithic applications . ICP comes with a very broad catalogue of Services that are from both OS (e.g. MONGO, POSTGRES, REDIS, MYSQL, RABBITMQ etc.) and IBM (WAS Liberty, MQS, DB2 as well as AI Services such as WATSON Compare & Comply etc.) thus, providing Developers choice and the ability to innovate composing applications from tested and proven components. The DEVOPS experience is that of extreme speed and agility (e.g. 5 Sec to deploy an MQS Containerized service in Production) . Manage – Using Kubernetes and state of the art services such as Terraform (IBM CAM for Multi Cloud Management), HELM (Open Source Kubernetes Package Manager), PROMETHEUS (OS Based Monitoring Engine) , GRAFANA (OS Based, Dashboarding engine) and other OS and IBM Services for HA/DR, Integrated Access Management, Key encryption etc. results into a platform that offers very high degrees of Automation while at the same time being extremely scalable and able to run on top of multiple cloud Infrastructures (both On and Off Premises). Open community-based platforms (e.g., Kubernetes, Cloud Foundry, Open Whisk, etc.) for choice of cloud providers and flexibility in deploying on and off-premises, develop ONCE – deploy anywhere (e.g. SL, AWS, AZURE , GOOGLE). No vendor lock-in. Provides the ultimate approach to cloud flexibility – giving enterprises a seamless choice between private and public environments for the best of both worlds. IBM Cloud / © 2018 IBM Corporation

Public Cloud + Private Cloud = Hybrid Cloud * Different cloud options
On-Premises Private Cloud Hosted Private Cloud Hybrid Cloud Hardware Deployment and Management Vendor Customer Shared between vendor and customer Hardware Sharing Model Shared Dedicated Partially shared and partially dedicated Scalability High Medium Low Cost Yes Sometimes Predictable Cost No Utility Billing Depends on vendor Partial Flexibility Limited Customization Capabilities Enhanced Security and Compliance Instant Provisioning Often called “the best of both worlds,” hybrid clouds combine on-premises infrastructure (private clouds), with public clouds so organizations can leverage the advantages of both. In a hybrid cloud, data and applications can move between private and public clouds for greater agility, flexibility, and economics. * A “Hybrid Cloud” is a highly orchestrated environment, where all sources act as one A “Multi-cloud” environment simply refers to the use of multiple cloud sources of any kind, without necessarily being orchestrated IBM Cloud / © 2018 IBM Corporation

Why care about Private Clouds? Adoption brings agility and efficiency
Cost Efficient & Scalable Infrastructure Accelerate Time to Market Build, package & deploy applications in containers; run at scale with Kubernetes Refactor applications into microservices & modernize monolithic applications Manage Data at Scale Access, govern, & analyze your data at scale; accelerate your journey to AI 50% Benefit 3-Year $5.4 Million Cost Savings; 255% ROI Business Value Assessment Customer Output: Standard On-Premises vs IBM Cloud Private Data Center System Utilization & Server Reduction 75% Benefit Manage Performance Elasticity, Bursting, High Availability 35% Benefit DevOps Faster Deployments 30% Benefit Deployment Efficiency Containers & Microservices Improved Security Management & Risk Reduction Why are companies adopting private cloud? Adoption is being driven by the need for agility and increased efficiency (better economics). IBM Cloud Private answers to 3 key customer needs: Cost effective Infrastructure – ICP at its core is a Docker Container , Kubernetes based platform that brings all the benefits of such a containerized environment; e.g., 2 – 3 times reduction of the amount of resources needed vs VMWARE or Bare Metal when running typical Enterprise workloads. Accelerate Time to Market - ICP is the perfect environment for BOTH Cloud Native (Micro Services based) applications as well as to Lift and Shift and modernize existing monolithic applications . ICP comes with a very broad catalogue of Services that are from both OS (e.g. MONGO, POSTGRES, REDIS, MYSQL, RABBITMQ etc.) and IBM (WAS Liberty, MQS, DB2 as well as AI Services such as WATSON Compare & Comply etc. ) thus, providing Developers choice and the ability to innovate composing applications from tested and proven components. The DEVOPS experience is that of extreme speed and agility (e.g. 5 Sec to deploy an MQS Containerized service in Production). Manage – Using Kubernetes and state of the art Services such as Terraform (IBM CAM for Multi Cloud Management), HELM (Open Source Kubernetes Package Manager), PROMETHEUS (OS Based Monitoring Engine), GRAFANA (OS Based, dashboarding engine) and other OS & IBM Services for HA/DR, Integrated Access Management, key encryption etc. results into a platform that offers very high degrees of Automation while at the same time being extremely scalable and able to run on top of multiple cloud Infrastructures (both on & off premises). IBM Private Cloud business value assessment benefits roll-up map per application. This BVA is a standard customer output comparing BAU WAS Customer at 1,000 PVUs. IBM Cloud / © 2018 IBM Corporation

Management & compliance
Introducing IBM Cloud Private (ICP) Brings the native cloud to the enterprise Rapid innovation Docker & Kubernetes Cloud Foundry DevOps Hybrid integration Integrated Secured Consistent Investment leverage Existing apps, data, skills, infrastructure Containerized middleware Prescriptive guidance Management & compliance Core operational services Flexibility to integrate with other enterprise management components Get the speed of public with the control of private: IBM Cloud Private. Fast. Flexible. Enterprise-grade. When you need that extra attention to security, speed and control, empower your teams with IBM Cloud Private, a transformative platform for building cloud-native applications and modernizing existing ones in days or hours, not weeks or months. A fully integrated, enterprise-class solution, IBM Cloud Private delivers a single platform located behind your firewall. You can leverage your on-premises software portfolio or easily integrate next-generation data and software optimized for cloud. Built on open source frameworks, like containers, Kubernetes, Helm, Terraform, and Cloud Foundry, IBM Cloud Private offers flexibility, control, security and easy integration with public cloud. Plus cloud management solutions are included so you can provision, orchestrate, and manage multi-cloud infrastructures and applications, all from a single pane of glass. IBM Cloud Private is also a foundational component of the larger IBM Application Modernization story. IBM Cloud / © 2018 IBM Corporation

IBM Cloud Private – the foundation for ICP for Data The key components
Docker – The company driving the container movement and the only container platform provider to address every application across the hybrid cloud. Docker is the clear leader in container technology with 2B+ downloads and the #2 most popular project. It also has an open design with contributors like IBM, Red Hat, Google, Microsoft, VMware, Amazon, Rackspace, and others. Kubernetes – A portable, extensible open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation. It has a large, rapidly growing ecosystem with services, support, and tools widely available. It is the clear leader in the container orchestration space. Helm – A Kubernetes package manager that helps you define, install, and upgrade Kubernetes applications, regardless of complexity. It is the current leader with the most mature capabilities in Kubernetes package management space. Docker is a computer program that performs operating-system-level virtualization also known as containerization. It was first released in 2013 and is developed by Docker, Inc. Docker is used to run software packages called "containers“. Open software, launched March 2013, with 2.0B+ downloads of Docker images. Open contribution, #2 most popular project. Open design, contributors include IBM, Red Hat, Google, MSFT, VMWare, AWS, Rackspace, and others. Open governance. Kubernetes is an open-source container-orchestration system for automating deployment, scaling and management of containerized applications. It was originally designed by Google and is now maintained by the Cloud Native Computing Foundation. Helm helps you manage Kubernetes applications — Helm Charts helps you define, install, and upgrade even the most complex Kubernetes application. Charts are easy to create, version, share, and publish — so start using Helm and stop the copy-and-paste madness. IBM Cloud / © 2018 IBM Corporation

IBM Cloud Private – the foundation for ICP for Data Other essential components
IBM Cloud Automation Manager – a multi-cloud, self-service management platform that allows you to efficiently manage and deliver services through end-to-end automation. IBM WebSphere Liberty – a highly composable, fast to start, dynamic application server runtime environment. IBM SDK for Node.js – provides a JavaScript runtime and server-side JavaScript solution for IBM operating systems. IBM Microservice builder – delivers a turnkey solution that incorporates runtime, tools, DevOps, fabric, & customer-managed container orchestration. “Discover & Try” Others – Prometheus, Jenkins, Logstash, Kibana, Calico, etcd, Grafana, Heketi, Filebeat, GlusterfS, MariaDB, Nginx, Kafka, Zookeeper, many more Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoudCloud. Since its inception in 2012, many companies and organizations have adopted Prometheus, and the project has a very active developer and user community. It is now a standalone open source project and maintained independently of any company. Elasticsearch is a search engine based on Lucene. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Cloudant is an IBM software product, which is primarily delivered as a cloud-based service. Cloudant is a non-relational, distributed database service of the same name. IBM Cloudant is a NoSQL JSON document store that’s optimized for handling heavy workloads of concurrent reads and writes in the cloud; a workload that is typical of large, fast-growing web and mobile apps. You can use Cloudant as a fully-managed DBaaS running on IBM Cloud. Cloudant provides a seamless and cost-effective user experience online and offline, with IBM customers saving up to 95 percent in infrastructure and hosting costs. Jenkins is the leading open source automation server written in Java. Jenkins helps to automate the non-human part of the software development process, with continuous integration and facilitating technical aspects of continuous delivery that provides hundreds of plugins to support building, deploying and automating any project. IBM Cloud / © 2018 IBM Corporation

IBM Cloud Private for Data - What is it?
Now that the foundation for “Why ICP for Data” has been established, let’s dig down into what ICP for Data is.

What is IBM Cloud Private for Data?
ICP for Data is a well-integrated collection of microservices built on cloud native architecture enables your organization to access a vast array of enterprise on-prem or cloud data sources ICP for Data is a robust end-to-end platform for all data and analytic needs within your enterprise encourages collaboration and communication within the enterprise by removing barriers advances a data driven culture within your organization applies data management, governance and analytics capabilities within a private cloud setting Value proposition Enables modernization of data infrastructure Optimize costs with cloud-native behind the firewall Reduce costs of manage multiple product stacks and enable use cases Agility, scalability, elasticity, self-healing Empowers end users with self-service Provision trusted data and analytic services in minutes Use tools of choice (open source, etc.) No vendor locking Accelerates teams through governed collaboration Break down silos through governance and lineage Automate seamlessly integrated cross-functional workflows Enterprise security IBM Cloud Private for Data is a robust end-to-end platform for all data and analytic needs within your enterprise. It can enable your organization to access a vast array of enterprise data sources on-premises and in the cloud, while applying data management, governance, and analytics capabilities within a private cloud setting. A well-integrated collection of microservices built on cloud native architecture, IBM Cloud Private for Data: Features role-focused, purpose-built data, governance, and analytics capabilities tailored for your data scientists, business users, data engineers, CDOs or CIOs, data stewards, and application developers Encourages collaboration and communication, removing divisional barriers within the enterprise Helps advance a data-driven culture for your organization IBM Cloud / © 2018 IBM Corporation

ICP for Data solves your data challenges in one platform
Collect Organize Analyze Hybrid Data Management Collecting structured and unstructured data Inside or outside the cluster Managing fit-for-purpose data repositories Unified Governance & Integration Cataloging, masking, and finding data Integrating, transforming, and shaping data Data Science & Business Analytics Self-service analytics tooling and productivity Descriptive & predictive models* & business reporting The three platforms within the IBM Cloud and Analytics offerings and services are represented in the component capabilities of ICP for Data. “Collect” represents the Hybrid Data Management platform offerings and services. “Organize” represents the Unified Governance and Integration (UG&I) offerings. “Analyze” represents the Data Science and Business Analytics offerings and services. IBM Cloud / © 2018 IBM Corporation * Prescriptive roadmap: Q4, 2018

ICP for Data – at a glance
Extensible, Open API Platform App Developers Data Engineers Business Partners Personalized, Collaborative Team Platform Data Scientists Business Analysts Data Stewards IBM CLOUD PRIVATE FOR DATA Cloud Native Microservices Administration Machine Learning & Model Deploy Policy Management Reporting and Dashboarding Data Science Database Management Transformation Services Discovery & Exploration Classification & Governance Enterprise Data Catalog optional IBM CLOUD PRIVATE BV Many companies are embracing cloud-concepts because they need reliable, scalable applications. Additionally, companies need to modernize their data workloads to use hardware effectively and efficiently. IBM Cloud Private for Data is composed of pre-configured microservices that run on a multi-node IBM Cloud Private cluster, which enables it to manage resources elastically and to run with little downtime. By bringing together data governance and analytics services, IBM Cloud Private for Data enables you to reduce the cost and burden of maintaining multiple applications on disparate hardware. It also enables you to assign resources to workloads when you need to and reclaim those resources when you don’t. With a single, managed platform, it's easier for your enterprise to adopt modern DevOps practices while simplifying your IT operations and reducing time to value. IBM Cloud / © 2018 IBM Corporation

ICP for Data – component capabilities
Component Capability Description Database Management 1) in-cluster databases, or 2) federated databases, or 3) native connection databases and data source repositories Transformation Services Transformation jobs, working with data source definitions, management of ETL flows Discovery & Exploration Database schema and table extractions, metadata syncs to feed into the Enterprise Catalog Classification & Governance Auto classification of data assets in the catalog, assignment of terms to assets, tagging with ML & fuzzy/pattern based classifications for governance Policy Management Business rules and validation of data assets, compliance, define quality standards in the catalog Data Science Access to open source tooling, frameworks, and IBM value-add Machine Learning Model Management Deployment of machine learning models with elasticity and load balancing, monitoring, and management of models in production Dashboards Aggregate & display metrics & key performance indicators (KPIs), enabling them to be examined at a glance before further exploration Collect Organize Analyze BV IBM Cloud / © 2018 IBM Corporation

ICP for Data – persona enablement examples
Data Engineer (Collect) Create new Db2 (and other) relational databases and warehouses Support data federation & virtualization best practices Facilitate cross-cloud hybrid data access & movements Build data movement & transformation flows to help prepare data for other users to consume Data Steward (Organize) Build an “enterprise catalog” through auto-discover & classify all existing data sources Tag & annotate data sets & other assets, index for search - generally make assets easy to find Visualize consumers of data & producers with lineage, metrics about consumed assets, with quality profiles Data Scientist (Analyze) Find data of interest from all data in the enterprise Explore, visualize & understand data Prepare, refine/shape and transform data - then publish transformed data sets Share analytics assets with other collaborators & publish into the enterprise catalog Train machine learning models & deploy a scoring service System Administrator Provision quickly with the complete set of capabilities Allocate and manage resource usage and scale as needed Monitor the health of the system BV This slide only represents some of the personas that can be serviced by the ICP for Data platform. IBM Cloud / © 2018 IBM Corporation

System administration console (stand-alone deployment) Capabilities
Provisioning capabilities Configuring, upgrading, and automating Allocating and managing resources Monitoring system health Ensuring uptime and performance Maintaining system security e.g. SSH, SSL, LDAP, tokens, users, etc. Ensuring parity between dev, test, and prod environments BV The way that you administer IBM Cloud Private for Data depends on how you deployed the application. If you deployed IBM Cloud Private for Data to an existing IBM Cloud Private environment, administration tasks related to monitoring your deployment environment are handled through IBM Cloud Private. However, administration tasks related to your application are handled through the embedded administration tools. If you deployed a stand-alone instance of IBM Cloud Private for Data, you can use the embedded administration tools to complete all administration tasks, such as adding users and monitoring the health of the application and the deployment environment. IBM Cloud / © 2018 IBM Corporation

System administration console (ICP) Manage all users, services, images, and system health
IBM Cloud / © 2018 IBM Corporation

In-cluster databases*
Collect IBM proprietary database repositories Db2 Flagship, full featured RDBMS, configured in row organized tables that is well-suited for transactional workloads. Db2 Warehouse RDBMS configured in column organized tables that is well-suited for analytics workloads. Db2 Event Store In-memory database optimized for event-driven data processing and analysis built on Apache Spark and Apache Parquet, capable of high speed ingest that is well-suited for IoT data sources. Open source database repositories MongoDB Document-oriented NoSQL JSON database. EnterpriseDB Based on PostgreSQL, this is an object-relational database system that uses and extends the SQL language combined with features to safely store and scale complicated data workloads. In-Cluster data repositories: Db2 is a the RDBMS customized for transactional workloads. Db2 Warehouse is the RDBMS customized for analytical workloads. Db2 Event Store is for high speed ingest of data, especially useful for IoT data sources. Coming soon: Open source data repositories to be offered as add-ons. IBM Cloud / © 2018 IBM Corporation * Databases in grey road mapped for early Q3, 2018

* Databases in grey road mapped for early Q3, 2018
Federated databases* Collect Data federation can access multiple data sources & join them together as one unified SQL processing view It also has a number of highly optimized performance techniques, including workload pushdown, function compensation & SQL transparency Federation of data with IBM Cloud Private for Data is for bridging data across multiple data sources without having to create a whole new integrated physical data platform. IBM Cloud Private for Data federation provides a consistent interface to data that is distributed across multiple disparate sources. These sources can include not just different databases and repositories, but different kinds of databases and repositories, such as Db2 and Oracle. When you federate data this way, you create views of data that are much easier to consume and use for other tasks such as transforming, cleansing, shaping, analyzing, and governing. IBM Cloud Private for Data federation helps you to break down silos and realize the full value of your data assets. Before federation, a data scientist might have to spend hours searching for useful data in many separate sources, collate the data and merge the data to one location, and be forced to pull data from tables that is unneeded only to have to filter or discard it later. With IBM Cloud Private for Data, if a data scientist requests access to data that isn't available in the same source, the data engineer can create a set of federated data. By federating, the data engineer can pull together the data without the need to move unneeded data from tables in the sources, which improves speed and efficiency and reduces complexity. With the federated data, the data scientist can then work with data that is easy to use for a variety of enterprise data tasks. Other benefits include reduced risk of errors, less workload on systems from not moving data around, and less data storage required. IBM Cloud / © 2018 IBM Corporation * Databases in grey road mapped for early Q3, 2018

Native connection databases and data repositories
Collect Data Source Data Transformation Governance Data Science Dashboards Amazon S3 ✔ BDFS BigSQL Custom JDBC Db2 Cloud Data Set Db2 Db2 for z/OS File Greenplum HDFS HDFS-CDH HDFS-HDP Hive Data Source Data Transformation Governance Data Science Dashboards Hive-CDH ✔ Hive-HDP Informix JDBC Kafka MS SQLServer Netezza ODBC Oracle PostgreSQL Seq Teradata WebSphere WebSphere MQ With IBM Cloud Private for Data you can use additional data sources in supplementary services as shown in these tables. BV IBM Cloud / © 2018 IBM Corporation

Transformation services
Organize Perform ETL with a powerful UI with: Projects Connections Table definitions Jobs Create, edit, load, and run transformation jobs Data automatically propagates metadata from one stage to the other stages later in the job, increasing productivity. Highlights all compilation errors Find what you need fast by using the flexible Search feature BV Use IBM Cloud Private for Data to transform data that provides enriched and tailored information for your enterprise. With IBM Cloud Private for Data, you can create, edit, load, and run transformation jobs. IBM Cloud Private for Data has features like built-in search, automatic metadata propagation, and simultaneous highlighting of all compilation errors. Developers can use these features to be more productive. IBM Cloud / © 2018 IBM Corporation

Automatic asset discovery – explore and profile
Organize Automation of metadata ingestion, column profiling, and data quality assessment Automation of rules and business term assignment Explore and profile data quality in IBM Cloud Private for Data. With a single click, you can start automated discovery to start exploring and profiling your data. Do you have a database, but you're not sure what's in there? All you need to do is start automated discovery to import all the data in the data source. During the process, the data is profiled. The profiling process includes a quality analysis to determine the quality of the data, automatic term assignment to help you assess and classify the data, and the publication of analysis results, which allows you to share the profiling results with others. After the data is discovered and imported into the catalog, use the relationships graph to see how the data is connected. You can also rate assets and add comments about assets to enhance your information catalog and help others to understand the data. IBM Cloud / © 2018 IBM Corporation

Business glossary Terms
Organize Create common vocabulary for the organization Framework for creating, nurturing, and promoting terms connecting business and IT Enable ability to create, preserve, and publish knowledge about information across the organization Terms A term is a word or phrase that describes a characteristic of the enterprise. Terms are the fundamental building blocks of the glossary. Each term has a parent category, but it can also be referenced by other categories. When you create a term, you need to provide a meaningful name. Terms can be assigned to other terms, and to other asset types as well. Before you create terms, you must define their meaning and use for the organization to ensure clarity and compatibility among departments, projects, or products. You can use the following guidelines. If the term name is long, use spaces instead of underscores or hyphens to break up the name. Otherwise, when long names are shown in results tables, the name does not wrap and the adjacent columns are narrow. For example, instead of Northeast_Office_Billing_Address, use Northeast Office Billing Address. Define term names and descriptions according to the standards of the International Organization for Standardization and the International Electrotechnical Commission (InterISO/IEC 11179–4): Write term names in the singular form. State term descriptions as a phrase or with a few sentences. In descriptions, do not include definitions of other concepts or terms. Phrase term descriptions without embedded rationale, functional usage, procedural information, or definitions of other concepts. Phrase term descriptions affirmatively. Explain what a term is rather than just what it is not. Use only commonly understood abbreviations in term descriptions. Make the term description precise, unambiguous, and concise. Express the description without embedded rationale, functional usage, or procedural information. Avoid descriptions that use other term descriptions. Use the same terminology and consistent logical structure for related descriptions. Be appropriate for the type of metadata item that is being defined. IBM Cloud / © 2018 IBM Corporation

Business glossary Term details (grouped under Category)
Organize Categories Categories provide the logical structure for the glossary so that you can browse and understand the relationships among terms and categories in the glossary. Categories can be organized in a hierarchy based on their meaning and relationships to one another. The category hierarchy reflects how a user might search for information. A common strategy is to divide the business by subject areas, such as customers and products. A second strategy is to divide the top-level categories by departments such as Marketing, Finance, and Human Resources. For large companies that have independent business units such as Hardware, Software, and Services, another strategy is to organize the business units into top-level categories. For example, the top-level category for the Hardware business unit might be hardware, which has the subcategories automotive, home, electronic, and security. A category can contain other categories and terms. In addition, a category can reference terms that it does not contain. For example, the category named Customer Summary has a subcategory named Customer Expense Summary. The category has a term named Credit Card Risk Score and references the term Credit Card Risk. IBM Cloud / © 2018 IBM Corporation

Governance policies & rules
Organize Create governance policies and rules Enforce compliance and data quality Rules Rules formalize the declarations of the policy, and establish the framework for the standardization, validation or security of information. Rules, unlike policies, are specific and concrete and therefore multiple rules may be required to enact a single policy or set of policies. Rules are required for the effective management and valuation of information, ensuring consistency of information, and that it conforms to given standards and quality metrics. They further allow for the active monitoring and compliance against the corporate or regulatory initiatives and specified business objectives. Such rules may involve the profiling and analysis of information, determining the classification and characteristics of information, or the integration of reference data. For example, scanning a customer account table will alert the data governance team to the presence of address information, and the rules for standardizing such information through varied operational rules. IBM Cloud / © 2018 IBM Corporation

Governance rules & policies Policy details
Organize Policies Policies define the parameters for the operational activities and storage of information. They are considered a documented set of guidelines for ensuring the proper management and usage of information. They reflect upon the accountability and allowed or intended usage of information. Policies are generally exacting and purposeful, aligned with campaigns such as Data Security, Data Transformation or Life Cycle Management within the context of a regulatory requirement. Within the enterprise, the data governance team should initially establish abstract policies that reflect such campaigns and core set of requirements, then expand upon such areas or requirements with a more refined and exacting set of policies. A hierarchy of policies and references establishes the domain and specificity for each. IBM Cloud / © 2018 IBM Corporation

Governance rules & policies Rule details – masking example
Organize This is an example of a rule detail. IBM Cloud / © 2018 IBM Corporation

Curation dashboard to govern and approve asset requests
Organize Manage asset promotion and govern quality across the organization Approve, reject, or ask for additional information Single approval process across data and analytic asset types e.g., datasets, models, notebooks IBM Cloud / © 2018 IBM Corporation

Enterprise catalog Find and explore governed data and analytics assets
Organize Find and explore governed data and analytics assets View metadata and lineage Dig deeper into data and analytics relationships BV IBM Cloud Private for Data enables you to structure your enterprise information in a logical way, discover relationships between assets, and keep your data always up-to-date. Your enterprise has a lot of data, many assets are related to one another but not in an obvious way. And this data changes, every day. IBM Cloud Private for Data helps you structure your data in a logical way. You can create a data dictionary with common business vocabulary which helps define all important aspects of your enterprise. To ensure compliance with business objectives, you can create information governance rules and policies. Your data catalog can also contain analytical assets such as models, notebooks and RShiny apps, which you can govern by assigning terms, information governance policies and rules. IBM Cloud / © 2018 IBM Corporation

Relationship graph with explorer
Organize Explore relationships between data assets, terms, analytic assets, users, etc. Gain in-depth understanding of metadata through crowdsourcing (e.g., ratings, comments) and machine learning Understand context, rates, and usage patterns Explore deeper to understand context, rates, and usage patterns Graph explorer enables you to see how assets are related to one another in your organization. With complex interconnections, graphical depiction of such relationships helps in understanding the structure of your data. While analyzing the relationships, you might discover unexpected correlations between your assets, which might be crucial in making the right decisions. Each relationship has direction and name. You can expand and collapse nodes. You can display details of assets such as ratings and comments. IBM Cloud / © 2018 IBM Corporation

Manage data science projects
Analyze Manage data science projects Tailored experience for data science teams, enabling teams to boost productivity Version control across analytic artifacts throughout project lifecycle Manage collaborators with correct level of access control An analytics project is a collection of assets that you use to achieve a particular data analysis goal. Your project assets can include: Notebooks RStudio files Models Data assets (local files, data sources, and remote data sets) Scripts Restriction: A project name, asset name, data source name, and remote data set name cannot contain any special characters. IBM Cloud / © 2018 IBM Corporation

Refine and shape data Visually refine and shape your data
Analyze Refine and shape data Visually refine and shape your data Made simple for the data scientist, citizen data scientist or business analyst BV Before you analyze local data sets, you can refine the data by cleansing and shaping it. To refine the data, you create a data flow that contains steps to cleanse and shape the data the way you want it: Cleanse the data Fix or remove data that is incorrect, incomplete, improperly formatted, or duplicated. Shape the data Filter, sort, combine, or remove columns, and perform operations. You can refine the data in real-time to build a customized data flow, and then save the data flow as either a separate JSON file or an R script. Use the Data Refiner to perform the following tasks: Create a data flow Validate data Visualize data IBM Cloud / © 2018 IBM Corporation

Models: create, train and test Notebook example
Analyze Models: create, train and test Notebook example Create models using your favorite runtimes and APIs e.g., Python, R, Scala, TensorFlow, Keras, Spark ML, etc. Use favorite open source tools of your choice e.g., Jupyter or Zepplin notebooks, RStudio, H2O, etc. BV A Jupyter or Zeppelin notebook is a web-based environment for interactive computing. You can run small pieces of code that process your data, and you can immediately view the results of your computation. Notebooks include all of the building blocks you need to work with data: The data The code computations that process the data Visualizations of the results Text and rich media to enhance understanding Code computations can build upon each other to quickly unlock key insights from your data. Notebooks record how you worked with data, so you can understand exactly what was done, reproduce computations reliably, and share your findings with others. You can create Python, Scala, and R notebooks to analyze your data. You can collaborate with others on your notebooks, add comments, and view a history of your notebooks. IBM Cloud / © 2018 IBM Corporation

Models: create, train and test Visual modeler example
Analyze Models: create, train and test Visual modeler example Use IBM value-add modeling such as the visual modeler & SPSS modeler Version, test, and evaluate models Manage models across dev, test, staging, and prod Deploy models and scale automatically for online, batch, and streaming use Monitor model performance with feedback loop and automatically trigger retraining and redeployment with rolling upgrades Spark ML PMML Custom models scikit-learn (Python 2.7 and Python 3.5) (GPU-Python 3.5) with pickle or joblib format XGBoost 0.7.post3 (Python 2.7 and 3.5) (GPU-Python 3.5) Keras (Python 2.7 and Python 3.5) (GPU-Python 3.5) TensorFlow (Python 2.7 and Python 3.5) (GPU-Python 3.5) WML IBM Cloud / © 2018 IBM Corporation

Publish analytics assets (and their data) to the catalog
Analyze Publish analytics assets (and their data) to the catalog Publish datasets, notebooks, ML models, and projects to Enterprise Catalog Enterprise Catalog enables others to understand data usage, lineage, and relationships BV IBM Cloud / © 2018 IBM Corporation

Cognos Dashboard Embedded
Analyze Cognos Dashboard Embedded Live connection to underlying data Smart creation of visualizations Interactive exploration of data through filtering and navigation paths Embed dashboards where users are without losing interactivity With the analytics dashboard, you can build sophisticated visualizations of your analytics results and communicate the insights you've discovered in your data on the dashboard. Then, share the dashboard with others. The analytics dashboard tool in IBM Cloud Private for Data provides a great way for a line-of-business user to begin investigating data for patterns and insights. The dashboard can then be handed off to a data scientist for deeper analysis and predictive modeling. The analytics dashboard tool is an easy-to-use drag and drop dashboarding tool for data analysts. Data analysts do not need to understand coding or SQL to explore the data and gain insights. The results can be easily shared with other users both within your organization and externally through a URL link. IBM Cloud / © 2018 IBM Corporation

IBM Cloud Private for Data – top use cases
Enable governed, secure access to your data regardless of where it lives Build the Information Architecture you need to enable self-service analytics and AI Shift from monolithic to microservices for agility and efficiency Accelerate transformation for regulatory compliance, such as GDPR – the biggest change in data protection in 20 years IBM Cloud / © 2018 IBM Corporation

IBM Business Partner: Building enterprise data pipelines
Organizations that ignore AI will soon be left behind by more agile competitors. Datameer and IBM have teamed up to accelerate your journey to AI through an integrated platform that delivers a foundational approach to seamlessly collect, organize, secure, and analyze data from across your enterprise. IBM Cloud / © 2018 IBM Corporation

IBM Cloud Private for Data – packaging options
Make your data ready for AI – Cloud Agility, Lightning Fast & AI-ready IBM Cloud Private for Data Community Edition* Build apps and manage data in non-production Cloud Native Edition Collect, Organize & Analyze data in production Enterprise Edition Full function deployments with enterprise QoS, add-on modules & AI ready Basic Support Basic & Premium Support Includes “ICP Foundations” IBM Cloud Private Build open, cloud-native apps with public services and run them anywhere Cloud Native Edition Create apps in production Enterprise Edition Modernize apps in production BV ICP for Data Enterprise Edition - IBM Cloud Private for Data is a robust end-to-end platform for all data and analytic needs within your enterprise. It can enable your organization to access a vast array of enterprise data sources on-premises and in the cloud, while applying data management, governance, and analytics capabilities within a private cloud setting. A well-integrated collection of micro-services built on cloud native architecture, IBM Cloud Private for Data: Features role-focused, purpose-built data, governance, and analytics capabilities tailored for your data scientists, business users, data engineers, CDOs or CIOs, data stewards, and application developers Encourages collaboration and communication, removing divisional barriers within the enterprise Helps advance a data-driven culture for your organization ICP for Data Cloud Native Edition - Cloud Native edition includes the same functionality as that of enterprise edition at a significantly lower price point with scalability licensing restrictions. It is targeted for medium enterprises (also called “commercial customers”) and embedded partners where a competitive price is critical and scalability is not a significant requirement. The key rationale behind this edition is to create a package that is both attractive and appropriate for a sub set of our customers who we couldn’t target with enterprise edition. It is also an entry level offering for customers to try out before upgrading to an enterprise edition. Licensee's use of the Program, per Container Cluster*, is limited to the following: 1) one million records per dataset for auto discovery of meta-data 2) 500 transformation jobs per day using configuration files that specify only one data partition 3) 3,000 enterprise data catalog assets in accepted status at any given time (wherein assets include glossary categories and terms, governance policies and rules, data rule definitions and data rule set definitions, automation rules and quality rules, data classes and business labels and data science assets) 4) 50 published data science model deployments at any given time 5) 60 defined data science batch jobs at any given time 6) 40 Shiny apps at any given time, and 7) 4 active nodes across all database warehouses created by the Program at any given time. Note*: A Container Cluster is the foundation of a container orchestration engine (for example, Kubernetes). A Container Cluster contains at least one master node for cluster management and one or more worker nodes to support containerized applications. * Roadmap: Mid September, 2018 IBM Cloud / © 2018 IBM Corporation

IBM Cloud Private for Data Get started today!
Send questions, comments and flowers to IBM Cloud / © 2018 IBM Corporation

IBM Cloud Private for Data Overview

Similar presentations

Presentation on theme: "IBM Cloud Private for Data Overview"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

IBM Cloud Private for Data Overview

Similar presentations

Presentation on theme: "IBM Cloud Private for Data Overview"— Presentation transcript:

Similar presentations

About project

Feedback