We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byCaroline Lutts
Modified about 1 year ago
© 2013 Lockheed Martin Corporation. All rights reserved. Addressing Legacy Risks with Hadoop 30 October 2013 Ravi Hubbly Principal Architect Lockheed Martin Corporation
Who am I? Build and support data solutions for Lockheed Martin’s customers Customers are primarily Government Federal agencies Support internal research and development initiatives Spent last 3 years focused on so called Big data problems Why am I here? Share my experience in enabling Enterprise IT incorporate emerging Big Data solutions © 2013 Lockheed Martin Corporation. All rights reserved. 2
Business Needs Descriptive AnalyticsPredictive Analytics DATA Clinical, Claims, Monitoring, others How are we doing?What might happen in the future? Prescriptive Analytics Best course of action given objectives, requirements & constraints How many claims did we pay today? Which of tomorrow’s claims might be requesting an Emergency Room (ER) admission? What would be effective steps to reduce probability of ER admission? Today: Analytics is about what happened Tomorrow: Analytics will be how we do business © 2013 Lockheed Martin Corporation. All rights reserved. 3
Organization Challenge Run high speed train on old railroad? Outcome is nothing but Disaster! © 2013 Lockheed Martin Corporation. All rights reserved. 4
Legacy environment looks like this? Source Systems Known / Shadow Processes Data Warehouse / Mart Data Known / Shadow Processes Derived Data Mart Shadow Processes Target Systems Process 1 4 6 3 1 2 3 4 5 6 7 … … © 2013 Lockheed Martin Corporation. All rights reserved. 5
Realize Big Data value © 2013 Lockheed Martin Corporation. All rights reserved. 6 Build out foundation to handle Data challenges “Big Data Enabled” Reap Benefits of low cost, high performing environment New and improved analytics TIME Collaborative analytics
Misunderstandings New capabilities can be implemented using legacy data foundation “It is about processing unstructured data, social media data, etc. My data is structured hence not applicable.” “It is all for new algorithms such as machine learning (recommendation engines), predictive and prescriptive. I do not have such needs.” “It is not to address my existing data management problem.” “I do not have big budget to do big data.” © 2013 Lockheed Martin Corporation. All rights reserved. 7
Observation Big Data tools is a game changer. It addresses both unstructured and structured data. Advanced algorithm needs are not only reason for Big Data. Big Data tools optimize existing environment. Big Data tools compliment existing tools to provide solid foundation for analytics. You do not need Big budget to implement Big Data. © 2013 Lockheed Martin Corporation. All rights reserved. 8
Data Foundation – Ready for Big Data Business Intelligence © 2013 Lockheed Martin Corporation. All rights reserved. 9 Mainframe Store Transform Process Analysis/ Query Analysis/ Query Hadoop Operational Systems Data Warehouse Data Integration ETL/ELT Jobs Mainframe
Cost and Feature comparison © 2013 Lockheed Martin Corporation. All rights reserved. 10 Scalability Performance Reliability Agility Skills Supply Cost of Managing 1TB of Data $20,000 – $100,000$15,000 – $80,000$250 – $2,000 MainframeEDWHadoop Data from “Leveraging Mainframe Data with Hadoop” by Syncsort and Cloudera Oct 2013, linklink
Foundation Projects Target opportunities associated with Mainframe processes Data Integration Processes Data warehouse opportunities Long running processes & reports Storage/Backup/Archive Enable new capabilities such as New Data exploration/collaboration Maintain focus on Data Management Data Definitions including Metadata Management Data Provenance Data Stewardship © 2013 Lockheed Martin Corporation. All rights reserved. 11
Mainframes Presence © 2013 Lockheed Martin Corporation. All rights reserved. 12 Mainframes are critical Major Government services including Medicare, Social Security, and Treasury Worlds top 25 Banks Worlds top 9 insurance company 23 of 25 Top retail companies Mainframes are valued Reliability: Mainframe is stable and is known to run for months without any downtime Centralized: Easy to manage, control and is more secured Huge data processing capabilities Years of built and tested business logic * Some data from “Leveraging Mainframe Data with Hadoop” by Syncsort and Cloudera Oct 2013, linklink
Mainframe Components Characteristics One Big Central System Highly reliable (99.999%) Batch processing Data schema independence Stores data mainly as files Tape Backups Transaction system (CICS) © 2013 Lockheed Martin Corporation. All rights reserved. 13 Data Storage DB2/DMS/ others CICS COBOL/Copybook Data Processing Customer Information Control System (CICS) Tape Backup
Working with Mainframe is getting Harder © 2013 Lockheed Martin Corporation. All rights reserved. 14 Inherently difficult to integrate Batch processing Data conversion and sharing is hard Integration Challenges Hosts mission critical sensitive data Difficult to install new software especially for data extraction Security Challenges Mainframe data is (expensive) Big Data Any processing on the mainframe even file transfers costs CPU cycles (Million Instructions Per Second - MIPS) Cost Challenges Mainframe expertise is retiring Legacy applications have poor documentation Difficult to adapt to changing business needs Crumbling with increasing data volume, velocity, & variety Business Challenges
Reduces cost in processing and storage Can ingest any data types hence ready for Mainframe data Address data volume, velocity, and variety challenges Reduce risk with Subject Matter Expert (SME) availability Opportunity © 2013 Lockheed Martin Corporation. All rights reserved. 15 Mainframe processing is expensive and cost directly relates to MIPS Any MIPS reduction = Instant Operation Expense (OpEx) Savings Mainframe storage is expensive can be $100k/TB Uses expensive Tape based backup Uses custom data types such as EBCDIC, packed decimal Processing Storage Hadoop Opportunity
Why Apache Hadoop?* © 2013 Lockheed Martin Corporation. All rights reserved. 16 Hadoop Distributed File System (HDFS) File Sharing & Data Protection Across Physical Servers MapReduce Distributed Computing Across Physical Servers Has the Flexibility to Store and Mine Any Type of Data Ask questions across structured and unstructured data that were previously impossible to ask or solve Not bound by a single schema Excels at Processing Complex Data Scale-out architecture divides workloads across multiple nodes Flexible file system eliminates ETL bottlenecks Scales Economically Can be deployed on commodity hardware Open source platform guards against vendor lock Apache Hadoop is a platform for data storage and processing that is… Scalable Fault tolerant Open source CORE HADOOP COMPONENTS *Slide taken from Cloudera’s overview of Apache Hadoop
Interesting Similarities between Hadoop and Mainframe © 2013 Lockheed Martin Corporation. All rights reserved. 17 Data Files COPYBOOKS HIVE Meta store COBOL Programs MAPREDUCE Jobs (PIG Programs) JCL OOZIE MainframeHADOOP Others Both are one central system supporting various applications Both are built to be highly reliable systems Primary processing paradigm is Batch Processing (Data processing and Analytics) Primary data storage is file based Metadata is separate from data. Hence schema on read is influenced
Data Foundation Initiatives Move to Batch Processing to Hadoop Leverage legacy code mining/analysis tools Utilize step by step iterative approach Implement good data integration between Mainframe and Hadoop Leverage automation tools Attain immediate cost savings in storage and improved performance Focus on migrating Archive + Backup from Tape to Hadoop Provide common data services such as EBCDIC to ASCII Enable Hadoop as the more effective Data Distribution services provider © 2013 Lockheed Martin Corporation. All rights reserved. 18
Programs Metadata Data Migration © 2013 Lockheed Martin Corporation. All rights reserved. Data Files COPYBOOKSHIVE Meta store COBOL Programs MAPREDUCE Jobs (PIG Programs) JCLOOZIE MainframeHADOOP Code Migration Metadata Migration Data movement 19
Automate Mainframe Data Ingestion and Query © 2013 Lockheed Martin Corporation. All rights reserved. 20 EBCDIC + Packed Decimal Data Map-Reduce ETL Job User access Create Hadoop Schema Mainframe Copybooks Metadata ASCII Data Hadoop Metastore 01 FIN-REC. 03 NAME PI 03 STAT PIC 03 IMGMD PI 0010 0001 1010 0011 0110 1110 1010 0111 0011 0101 0100 1010 Build Time Tools
"name": "Automate Mainframe Data Ingestion and Query © 2013 Lockheed Martin Corporation.",
"description": "All rights reserved. 20 EBCDIC + Packed Decimal Data Map-Reduce ETL Job User access Create Hadoop Schema Mainframe Copybooks Metadata ASCII Data Hadoop Metastore 01 FIN-REC. 03 NAME PI 03 STAT PIC 03 IMGMD PI 0010 0001 1010 0011 0110 1110 1010 0111 0011 0101 0100 1010 Build Time Tools
Process for Optimization Analyze all of the workload Queries Reports Objects User Communities Jobs © 2013 Lockheed Martin Corporation. All rights reserved. 21 ProfilePrioritize & PlanMigrateValidate Identify workloads based on: Risk Complexity User impacts Use prioritization framework to quantify Utilize automation in migration of logic & data: Move objects Move workloads Move users Verify Results: Evaluate performance Side-by-side “burn-in” period Cut over Optimize data analytics environment with Hadoop Periodic (annual/bi-annual)
Target Analytics Environment Data Sources Clinical Trials University Research Internal Research Drug Research Patients Information Claim Data Financial Performance SPSS SAS STATA Analytics Environment Enterprise Data Repository External Big Data Store & Compute Workspace #2 Workspace #3 Workspace #4 Workspace (adhoc analytics) Analytic Tools of choice Microstrategy, Cognos Research Data Management Team © 2013 Lockheed Martin Corporation. All rights reserved. 22 Collaboration Environment
Example Client Project © 2013 Lockheed Martin Corporation. All rights reserved. 23 Business Need “An effective & efficient data & analytics environment supporting goals of providing effective service at lower cost” Today’s Challenge Delayed Access to Data Data & Analytics Growth In-flexible Legacy solutions Data and Analytics Duplication Results by numbers Market data from current 4-6 weeks to less than an hour ●Process 25 million records in less than hour Reduced legacy cost and improve agility to meet new business needs ●Incorporates new data changes in days compared to months Cost effective and flexible solution supporting data & analytics growth ●$8K Big Data v/s $24K Data Warehouse v/s $15K Mainframe
Message Need strong Data Foundation to enable value from Big Data Hadoop technology provides a great opportunity to finally address legacy risks because of inherent similarities © 2013 Lockheed Martin Corporation. All rights reserved. 24
© 2009 IBM Corporation Maximize Cost Savings While Improving Visibility Into Lines of Business Wendy Tam, CDC Product Marketing Manager
Apache Hadoop on Windows Azure Avkash Chauhan
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Axis AI Solves Challenges of Complex Data Extraction and Document Classification through Advanced Natural Language Processing and Machine Learning MICROSOFT.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
Solving Today’s Data Protection Challenges with NSB 1.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Syncsort Data Integration Update Summary Helping Data Intensive Organizations Across the Big Data Continuum Hadoop – The Operating System.
Wrangling Customer Usage Data with Hadoop Clearwire – Thursday, June 27 th Carmen Hall – IT Director Mathew Johnson – Sr. IT Manager.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
Project Management May 30th, Team Members Name Project Role Gint of Communications Sai
© 2011 IBM Corporation Smarter Software for a Smarter Planet The Capabilities of IBM Software Borislav Borissov SWG Manager, IBM.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
© 2009 VMware Inc. All rights reserved Big Data’s Virtualization Journey Andrew Yu Sr. Director, Big Data R&D VMware.
Committed to Deliver…. We are Leaders in Hadoop Ecosystem. We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,
INTEGRATING BIG DATA TECHNOLOGY INTO LEGACY SYSTEMS Robert Cooley, Ph.D.CodeFreeze 1/16/2014.
SM STRATA PRESENTATION Tim Garnto - SVP Engineering, edo Interactive Rob Rosen – Big Data Field Lead, Pentaho.
TITLE SLIDE: HEADLINE Presenter name Title, Red Hat Date For Red Hat, it's 1994 all over again Sarangan Rangachari VP and GM, Storage and Big Data Red.
© 2007 IBM Corporation IBM Information Management Accelerate information on demand with dynamic warehousing April 2007.
This is a free Course Available on Hadoop-Skills.com.
Source: Article reprinted from 2014 Cloud Survey Report, Copyright: KPMG LLP © 2015, a Delaware limited liability partnership and the U.S. member.
Next Generation of Apache Hadoop MapReduce Owen
What is it and why it matters? Hadoop. What Is Hadoop? Hadoop is an open-source software framework for storing data and running applications on clusters.
Workload Management BMO Financial Group Case Study IRMAC, January 2008 Sorina Faur, Database Development Manager.
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
By N.Gopinath AP/CSE. Why a Data Warehouse Application – Business Perspectives There are several reasons why organizations consider Data Warehousing.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Microsoft Azure and DataStax: Start Anywhere and Scale to Any Size in the Cloud, On- Premises, or Both with a Leading Distributed Database MICROSOFT AZURE.
Components and Architecture CS 543 – Data Warehousing.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP StoreOnce How to win.
11© 2011 Hitachi Data Systems. All rights reserved. HITACHI DATA DISCOVERY FOR MICROSOFT® SHAREPOINT ® SOLUTION SCALING YOUR SHAREPOINT ENVIRONMENT PRESENTER.
Highlights Builds on Splunk implementations – extending enterprise value to include mission-critical IBM mainframe data. Unified mainframe data source.
Datawarehouse A sneak preview. 2 Data Warehouse Approach An old idea with a new interest: Cheap Computing Power Special Purpose Hardware New Data Structures.
- Ahmad Al-Ghoul Data design. 2 learning Objectives Explain data design concepts and data structures Explain data design concepts and data structures.
An Introduction To Big Data For The SQL Server DBA.
MarkLogic The Only Enterprise NoSQL Database Presented by: Aashi Rastogi ( ) Sanket Patel ( )
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
© 2009 EMC Corporation. All rights reserved. Introduction to Information Storage and Management Module 1.1.
2009 Calpont Corporation 1 Calpont Open Source Columnar Storage Engine for Scalable MySQL Data Warehousing April 22, 2009 MySQL User Conference Santa.
MAR Capability Overview Deck Protean Analytics.
Managing Data Resources File Organization and databases for business information systems.
Database Systems – Data Warehousing INTRODUCTION There exists an information gap amongst organizations. Organizations have plenty of data, but little information.
BIG DATA/ Hadoop Interview Questions.
Efficient BI Solution Presented by: Leo Khaskin, PowerCubes Lab Value of Information as Business Asset.
Data Center Management Microsoft System Center. Objective: Drive Cost of Data Center Management 78% Maintenance 22% New Issue:Issue: 78% of IT budgets.
With the Help of the Microsoft Azure Platform, Devbridge Group Provides Powerful, Flexible, and Scalable Responsive Web Solutions MICROSOFT AZURE ISV PROFILE:
® IBM Software Group © IBM Corporation IBM Information Server Metadata Management.
© Hitachi Data Systems Corporation All Rights Reserved. Hitachi Data Systems File and Content Services Solutions Presented to (Contact name) (Customer.
1/17/20141 Leveraging Cloudbursting To Drive Down IT Costs Eric Burgener Senior Vice President, Product Marketing March 9, 2010.
© 2017 SlidePlayer.com Inc. All rights reserved.