A New Era Of Analytic Ömer Sever IBM SWG TR

A New Era Of Analytic Ömer Sever (omers@tr.ibm.com) IBM SWG TR
Thank you for your time today, and we hope that you’ll join us for further discussion during lunch. Ömer Sever IBM SWG TR Enterprise Content Management

Case Studies on Big Data How To Start With A Big Data Project ? Q & A
Agenda The Myth About Big Data Case Studies on Big Data How To Start With A Big Data Project ? Q & A Jervin: BigData is not NEW, it’s been around for years and one way or another your organization already has big data e.g. DW. However, Big Data Is more than just a DW that requires to store/analysis large volume of data.. BigData is not just about Volume of data that resides in DW today.. The volume could be batch and realtime (trigger feed)

Big Data Is Only About Massive Data Volume Big Data Means Hadoop
The Myth About Big Data Big Data Is New Big Data Is Only About Massive Data Volume Big Data Means Hadoop Big Data Need A Data Warehouse Big Data Means Unstructured Data Big Data Is for Social Media & Sentiment Analysis Brian Gentile, CEO of Jaspersoft, has written an article for Mashable about the top five Big Data myths. One myth is that Big Data means Hadoop: “Hadoop is the Apache open-source software framework for working with Big Data. It was derived from Google technology and put to practice by Yahoo and others. But, Big Data is too varied and complex for a one-size-fits-all solution. While Hadoop has surely captured the greatest name recognition, it is just one of three classes of technologies well suited to storing and managing Big Data. The other two classes are NoSQL and Massively Parallel Processing (MPP) data stores. (See myth number five below for more about NoSQL.) Examples of MPP data stores include EMC’s Greenplum, IBM’s Netezza, and HP’s Vertica.” Another is the NoSQL means No SQL: “NoSQL means ‘not only’ SQL because these types of data stores offer domain-specific access and query techniques in addition to SQL or SQL-like interfaces. Technologies in this NoSQL category include key value stores, document-oriented databases, graph databases, big table structures, and caching data stores. The specific native access methods to stored data provide a rich, low-latency approach, typically through a proprietary interface. SQL access has the advantage of familiarity and compatibility with many existing tools.” Read more here. == With the amount of hype around Big Data it’s easy to forget that we’re just in the first inning. More than three exabytes of new data are created each day, and market research firm IDC estimates that 1,200 exabytes of data will be generated this year alone. The expansion of digital data has been underway for more than a decade and for those who’ve done a little homework, they understand that Big Data references more than just Google, eBay, or Amazon-sized data sets. The opportunity for a company of any size to gain advantages from Big Data stem from data aggregation, data exhaust, and metadata — the fundamental building blocks to tomorrow’s business analytics. Combined, these data forces present an unparalleled opportunity. Yet, despite how broadly Big Data is being discussed, it appears that it is still a very big mystery to many. In fact, outside of the experts who have a strong command of this topic, the misunderstandings around Big Data seem to have reached mythical proportions. Here are the top five myths. 1. Big Data is Only About Massive Data Volume Volume is just one key element in defining Big Data, and it is arguably the least important of three elements. The other two are variety and velocity. Taken together, these three “Vs” of Big Data were originally posited by Gartner’s Doug Laney in a 2001 research report. Generally speaking, experts consider petabytes of data volumes as the starting point for Big Data, although this volume indicator is a moving target. Therefore, while volume is important, the next two “Vs” are better individual indicators. Variety refers to the many different data and file types that are important to manage and analyze more thoroughly, but for which traditional relational databases are poorly suited. Some examples of this variety include sound and movie files, images, documents, geo-location data, web logs, and text strings. Velocity is about the rate of change in the data and how quickly it must be used to create real value. Traditional technologies are especially poorly suited to storing and using high-velocity data. So new approaches are needed. If the data in question is created and aggregates very quickly and must be used swiftly to uncover patterns and problems, the greater the velocity and the more likely that you have a Big Data opportunity. 2. Big Data Means Hadoop Hadoop is the Apache open-source software framework for working with Big Data. It was derived from Google technology and put to practice by Yahoo and others. But, Big Data is too varied and complex for a one-size-fits-all solution. While Hadoop has surely captured the greatest name recognition, it is just one of three classes of technologies well suited to storing and managing Big Data. The other two classes are NoSQL and Massively Parallel Processing (MPP) data stores. (See myth number five below for more about NoSQL.) Examples of MPP data stores include EMC’s Greenplum, IBM’s Netezza, and HP’s Vertica. Plus, Hadoop is a software framework, which means it includes a number of components that were specifically designed to solve large-scale distributed data storage, analysis and retrieval tasks. Not all of the Hadoop components are necessary for a Big Data solution, and some of these components can be replaced with other technologies that better complement a user's needs. One example is MapR’s Hadoop distribution, which includes NFS as an alternative to HDFS, and offers a full random-access, read/write file system. 3. Big Data Means Unstructured Data The term “unstructured" is imprecise and doesn’t account for the many varying and subtle structures typically associated with Big Data types. Also, Big Data may well have different data types within the same set that do not contain the same structure. Therefore, Big Data is probably better termed “multi-structured” as it could include text strings, documents of all types, audio and video files, metadata, web pages, messages, social media feeds, form data, and so on. The consistent trait of these varied data types is that the data schema isn’t known or defined when the data is captured and stored. Rather, a data model is often applied at the time the data is used. 4. Big Data is for Social Media Feeds and Sentiment Analysis Simply put, if your organization needs to broadly analyze web traffic, IT system logs, customer sentiment, or any other type of digital shadows being created in record volumes each day, Big Data offers a way to do this. Even though the early pioneers of Big Data have been the largest, web-based, social media companies — Google, Yahoo, Facebook — it was the volume, variety, and velocity of data generated by their services that required a radically new solution rather than the need to analyze social feeds or gauge audience sentiment. Now, thanks to rapidly increasing computer power (often cloud-based), open source software (e.g., the Apache Hadoop distribution), and a modern onslaught of data that could generate economic value if properly utilized, there are an endless stream of Big Data uses and applications. A favorite and brief primer on Big Data, which contains some thought-provoking uses, was published as an article early this year in Forbes. 5. NoSQL means No SQL NoSQL means “not only” SQL because these types of data stores offer domain-specific access and query techniques in addition to SQL or SQL-like interfaces. Technologies in this NoSQL category include key value stores, document-oriented databases, graph databases, big table structures, and caching data stores. The specific native access methods to stored data provide a rich, low-latency approach, typically through a proprietary interface. SQL access has the advantage of familiarity and compatibility with many existing tools. Although this is usually at some expense of latency driven by the interpretation of the query to the native “language” of the underlying system. For example, Cassandra, the popular open source key value store offered in commercial form by DataStax, not only includes native APIs for direct access to Cassandra data, but CQL (it’s SQL-like interface) as its emerging preferred access mechanism. It’s important to choose the right NoSQL technology to fit both the business problem and data type and the many categories of NoSQL technologies offer plenty of choice.

Big Data Is.. It is all about better Analytic on a broader spectrum of data, and therefore represents an opportunity to create even more differentiation among industry peers. Jervin: BigData is not NEW, it’s been around for years and one way or another your organization already has big data e.g. DW. However, Big Data Is more than just a DW that requires to store/analysis large volume of data.. BigData is not just about Volume of data that resides in DW today.. The volume could be batch and realtime (trigger feed)

Where Is This “Big Data” Coming From ?
2+ billion people on the Web by end 2011 30 billion RFID tags today (1.3B in 2005) 4.6 billion camera phones world wide 100s of millions of GPS enabled devices sold annually 76 million smart meters in 2009… 200M by 2014 12+ TBs of tweet data every day ? TBs of data every day Obviously, there are many other forms and sources of data. Let’s start with the hottest topic associated with Big Data today: social networks. Twitter generates about 12 terabytes a day of tweet data – which is every single day. Now, keep in mind, these numbers are hard to count on, so the point is that they’re big, right? So don’t fixate on the actual number because they change all the time and realize that even if these numbers are out of date in 2 years, it’s at a point where it’s too staggering to handle exclusively using traditional approaches. +CLICK+ Facebook over a year ago was generating 25 terabytes of log data every day (Facebook log data reference: ) and probably about 7 to 8 terabytes of data that goes up on the Internet. Google, who knows? Look at Google Plus, YouTube, Google Maps, and all that kind of stuff. So that’s the left hand of this chart – the social network layer. Now let’s get back to instrumentation: there are massive amounts of proliferated technologies that allow us to be more interconnected than in the history of the world – and it just isn’t P2P (people to people) interconnections, it’s M2M (machine to machine) as well. Again, with these numbers, who cares what the current number is, I try to keep them updated, but it’s the point that even if they are out of date, it’s almost unimaginable how large these numbers are. Over 4.6 billion camera phones that leverage built-in GPS to tag the location or your photos, purpose built GPS devices, smart metres. If you recall the bridge that collapsed in Minneapolis a number of years ago in the USA, it was rebuilt with smart sensors inside it that measure the contraction and flex of the concrete based on weather conditions, ice build up, and so much more. So I didn’t realise how true it was when Sam P launched Smart Planet: I thought it was a marketing play. But truly the world is more instrumented, interconnected, and intelligent than it’s ever been and this capability allows us to address new problems and gain new insight never before thought possible and that’s what the Big Data opportunity is all about! 25+ TBs of log data every day 5

With Big Data, We’ve Moved into a New Era of Analytics
Volume of Tweets create daily. 12+ terabytes trade events per second. 5+ million Velocity Variety of different types of data. 100’s Veracity decision makers trust their information. Only 1 in 3 Jervin : there is so much that we can with BigData… Look at (VOLUME/VARIETY) the amount of data that we can use to boost our ANALYTIC IQ, It is also CRITICAL, while BigData gives lots of opportunity, there is a “VERACITY” components that related to TRUST of source of data… how do we TRUST and GOVERN that data. Next is VELOCITY (the speed of data that arrives at your door step..). What Are you going to do and how long does it that for you to REACT on it. +CLICK+ I think we can all relate to Volume when describing Big Data. Of course all of the numbers on this slide are out of date the moment I saved them; but you get the point. I think back 7 years ago when I used to maintain a TB Club for data warehouse customers, today I have a 1TB in my pocket. Big Data gives us the opportunity to include different kinds of data into our analysis, thereby boosting your analytics IQ. Veracity is another characteristic of Big Data; this goes to if you can trust the source of the data, or understand it. It’s critical, if you are going to reach out into s, call center, Tweets, Facebook, and more, you’re going to have to trust the source. One of the biggest differentiators for the IBM Big Data platform is around the final V, Velocity. This is about how fast data arrives at the organization’s doorstep, but more: what are you going to do about and how long does it take. You get some details in the next slide.

The number of organizations who see analytics as a competitive advantage is growing.
70% 57% 2010 2011 2012 63% IQ business initiative BUSINESS IMPERATIVE In this environment, organizations using analytics are gaining real competitive advantage -57% increase from 2010 to 2011 in respondents who say analytics creates a competitive advantage Source: IBM IBV/MIT Sloan Management Review Study 2011 Copyright Massachusetts Institute of Technology 2011

Studies show that organizations competing on analytics outperform their peers
substantially outperform The organizations are significantly more likely to outperform their peers. Organizations achieving competitive advantage with analytics are 220% more likely to be substantially outperform their industry peers Source: IBM IBV/MIT Sloan Management Review Study 2011 Copyright Massachusetts Institute of Technology 2011 IBM IBV/MIT Sloan Management Review Study 2011 Copyright Massachusetts Institute of Technology 2011 1.6x Revenue Growth 2.5x Stock Price Appreciation 2.0x EBITDA Growth

Four Characteristics of Big Data
Cost efficiently processing the growing Volume Responding to the increasing Velocity Collectively Analyzing the broadening Variety 30 Billion RFID sensors and counting 50x 35 ZB 80% of the worlds data is unstructured 2010 2020 Big data has 5 key characteristics. The first is volume. Of course this may seem obvious, but it is complex that you may think. Yes the volume of data is growing. Experts predict that the volume of data in the world will grow to 25 Zettabytes in That same phenomenon affects every business – their data is growing at the same exponential rate too. But it isn’t jus the volume of data that is growing. It’s the number of sources of that data. And that leads to the third characteristic of big data, variety, which we will cover later. Data is increasingly accelerating the velocity at which it is created and at which it is integrated. We’ve moved from batch to a real-time business. Data comes at you at a record or a byte level, not always in bulk. And the demands of the business have increased as well – from an answer next week to an answer in a minute. And the world is also becoming more instrumented and interconnected. The volume of data streaming off those instruments is exponentially larger than it was even 2 years ago. Variety presents an equally difficult challenge. The growth in data sources has fuelled the growth in data types. In fact, 80% of the worlds data is unstructured. Yet most traditional methods apply analytics only to structured information. And finally we have veracity. How can you act upon information if you don’t trust it. Establishing trust in big data presents a huge challenge as the sources and the variety grows. Establishing the Veracity of big data sources 1 in 3 business leaders don’t trust the information they use to make decisions

Analytic With Data-In-Motion & Data At Rest
Opportunity Cost Starts Here Data Ingest Bootstrap Enrich Forecast Forecast Nowcast Jervin: Here’s the simple example of how (Velocity – Data In Motion & Rest at work), typically all data that need to be analyzed MUST be stored FIRST before we can analyzed.. Either we store them in Hadoop or DW.. That said the opportunity that we have should be EARLIER.. Velocity is really about how fast data is being produced and changed and the speed with which data, must be received, understood, and processed. Now I want you to think about the fact that when I talk Big Data IBM, I uniquely talk about Big Data in motion and at rest. As you can see on this slide, the opportunity cost starts way at the left, and it takes a while for you to get the insight once it hits your warehouse. This is where Hadoop and the Big Data at rest notion came from, folks wanted to speed analytics so they turned to Hadoop or Netezza (depending on the data and the task) and as you can see on this slide, the analysis stats to go faster. In the case of Hadoop, it’s going faster because you are willing to give up some of the consistency, and in the case of Netezza, because it’s optimized for these tasks on structured data. So you build all this insight into your business and what’s UNIQUE about IBM is you can apply this insight to the in-motion part of the Big Data story. Notice on the slide the [T] box is the same, that’s because you just pick up analytics built with the Text Analytic Toolkit on the right and place them on the left of this slide. This allows you to create an adaptive analytics ecosystem and bootstrap or enrich the intelligence you gleamed out at the frontier. In short, once you harvest an analytic asset, you can bring it from the at rest portion to the in motion. And so we have PoTs that show this, where we’re starting to pick up information we find at rest and then we put the analysis of that information out on the frontier, if you will, so that analysis is performed on that data as soon as it hits the enterprise. Adaptive Analytics Model

Myth About Big Data.. What Is It ? Case Studies on Big Data
Agenda Myth About Big Data.. What Is It ? Case Studies on Big Data How To Start With Big Data Project ? Q & A Jervin: BigData is not NEW, it’s been around for years and one way or another your organization already has big data e.g. DW. However, Big Data Is more than just a DW that requires to store/analysis large volume of data.. BigData is not just about Volume of data that resides in DW today.. The volume could be batch and realtime (trigger feed)

The 5 Key Big Data Use Cases
Big Data Exploration Find, visualize, understand all big data to improve decision making Enhanced 360o View of the Customer Extend existing customer views (MDM, CRM, etc) by incorporating additional internal and external information sources Security/Intelligence Extension Lower risk, detect fraud and monitor cyber security in real-time Operations Analysis Analyze a variety of machine data for improved business results Data Warehouse Augmentation Integrate big data and data warehouse capabilities to increase operational efficiency Our product management, engineering, marketing, CTPs, etc, etc teams have all been working together to help to better understand the big data market. We’ve done surveys, met with analysts and studied their findings, we’ve met in person with customers and prospects (over 300 meetings) and are confident that we found market “sweet spots” for big data. These 5 use cases are our sweet spots. These will resonate with the majority of prospects that you meet with. In the coming slides we’ll cover each of these in detail, we’ll walk through the need, the value and a customer example. © 2013 IBM Corporation

Big Data Exploration: Needs
Find, visualize, understand all big data to improve decision making Struggling to manage and extract value from the growing 3 V’s of data in the enterprise; Need to unify information across federated sources Inability to relate “raw” data collected from system logs, sensors, clickstreams, etc., with customer and line-of-business data managed in enterprise systems Risk of exposing unsecure personally identifiable information (PII) and/or privileged data due to lack of information awareness 13

Big Data Exploration: Value & Diagram
IBM IOD 2011 1/1/2019 Big Data Exploration: Value & Diagram Data Explorer File Systems Relational Data Content Management CRM Supply Chain ERP RSS Feeds Cloud Custom Sources Find, Visualize & Understand all big data to improve business knowledge Greater efficiencies in business processes New insights from combining and analyzing data types in new ways Develop new business models with resulting increased market presence and revenue Application/ Users Enterprise-wide, InfoSphere Data Explorer addresses the ongoing challenge of “information silos,” a challenge that isn’t going away any time soon. Each of the systems in your enterprise was designed to serve a critical function, whether it’s managing customer data, managing your supply chain, securing sensitive content or any of a myriad of different functions. Systems such as CRM, ECM, supply chain management, and others are necessary to perform these specific functions. Each of these systems is a silo with its own login, user interface and way of delivering information. The problem is that almost no one in your organization can rely on only one of these silos for the information they need to do their job. [click] Velocity delivers business value by enabling everyone—from management through knowledge workers to front-line employees—to access all of the information they need in a single view, regardless of format or where it is managed. Rather than wasting time accessing each silo separately, Velocity enables them to navigate seamlessly across all available sources, and provides the added advantage of cross-repository visibility. Information is secured so that users only see the content that they are permitted to view when logged directly into the target application. [click] In addition, Velocity gives users the ability to comment, tag and rate content, as well as create shared folders for content they would like to share with other users. [click] All of this user feedback and social content is then fed back into Velocity’s relevance analytics to ensure that the most valuable content is presented to users. The result is Better decisions More efficient operations Better understanding of customers Innovation Prensenter name here.ppt

Big Data Exploration: Customer Example
Airline Manufacturer Exploring 4 TB to drive point business solutions (supplier portal, call center, etc.) Single-point of data fusion for all employees to use Reduced costs & improved operational performance for the business Key Questions to Ask Jervin: We are know building an airplane is not an easy tasks, not only there are thousand of parts with thousand documentation/document be it either correspondence, spare part document etc.. If there is an defect, it is NOT easy for this manufacturer to trace or gather relevant information.. Hence they need DATA EXPLORER Can you separate the “noise” from useful content? Can you perform data exploration on large and complex data? Can you find insights in new or unstructured data types (e.g. social media and )? Can you navigate and explore all enterprise and external content in a single user interface? Can you quickly identify areas of data risk? Do you have a logical starting point for your big data initiatives? Product Starting Point: InfoSphere Data Explorer

Enhanced 360º View of the Customer: Needs
Extend existing customer views (MDM, CRM, etc) by incorporating additional internal and external information sources Need a deeper understanding of customer sentiment from both internal and external sources Desire to increase customer loyalty and satisfaction by understanding what meaningful actions are needed Challenged getting the right information to the right people to provide customers what they need to solve problems, cross-sell & up-sell 17

Enhanced 360º View of the Customer: Value & Diagram
CRM J Robertson Pittsburgh, PA 35 West 15th Name: Address: ERP Janet Robertson 35 West 15th St. Legacy Jan Robertson 36 West 15th St. SOURCE SYSTEMS Master Data Management Janet 35 West 15th St Pittsburgh Robertson PA / 15213 F 48 1/4/64 First: Last: Address: City: State/Zip: Gender: Age: DOB: 360 View of Party Identity Jervin: All these extension to MDM, CRM etc.. Can be provided into a unified view of 360 degree view of customer’s information.. , provide a single search rather than multiple place for information.. Information about a client as viewed in an application built with the Data Explorer Application Builder. The Data Explorer app combines information in context from CRM, content management, supply chain, order tracking database, and many more systems to give a 360º view of the client so the user doesn’t have to log into and search multiple different systems. In this one view the customer-facing professional can see all of the contact’s information -- what products she’s, any recent support incidents, news about her company, recent conversations and more. An “activity feed” in the center of the screen shows up-to-the-moment updates about the customer, product or other entity that is being viewed. Analytics from BigInsights, Streams and IBM’s BI products can also be shown, with the context of the analytics defined by the information displayed in the application. This frees the CFP to interact with the customer and leverage this complete view to increase revenue and improve customer loyalty. As I mentioned a moment ago, this use case is very synergistic with IBM’s MDM offerings. MDM provides a single, consistent view of data across all of the client’s various systems. This consistency ensures that the view created by Data Explorer will incorporate consistent and accurate data about an entity. In one sense, Data Explorer provides a business user interface to trusted master data combined with related content from other structured and unstructured data sources. The availability of MDM accelerates implementation of the Data Explorer 360º application and ensures its accuracy and consistency. BigInsights Streams Warehouse Unified View of Party’s Information Unified View of Customer’s Information © 2013 IBM Corporation 18

Enhanced 360º Customer View: Customer Example
Confidential, Internal Use Only Create “Facebook” Identify 200+ different customer profiles to help in fulfillment & marketing efforts Leverage new data types in customer analysis Key Questions to Ask Can you identify and deliver all data as it relates to a customer, product, competitor to those to need it? Can you gathering insights about your customers from social data, surveys, support s, etc.? Can you combine your structured and unstructured data to run analytics? How are you driving consistency across your information assets when representing your customer, clients, partners etc.? How can a complete view of the customer enhance your line of business users and result in better business outcomes? Product Starting Point: InfoSphere MDM Server, Data Explorer, BigInsights

Security/Intelligence Extension: Needs
Security/Intelligence Extension enhances traditional security solutions by analyzing all types and sources of under-leveraged data Enhanced Intelligence & Surveillance Insight Analyze data-in-motion & at rest to: Find associations Uncover patterns and facts Maintain currency of information Real-time Cyber Attack Prediction & Mitigation Analyze network traffic to: Discover new threats early Detect known complex threats Take action in real-time Analyze Telco & social data to: Gather criminal evidence Prevent criminal activities Proactively apprehend criminals Customer Retention Crime prediction & protection Reduce Customer Churn © 2013 IBM Corporation 21

Asian Telco reduces billing costs and improves customer satisfaction.
Capabilities: Stream Computing Analytic Accelerators Real-time mediation and analysis of Billion CDRs per day Data processing time reduced from hrs to 1 sec Hardware cost reduced to 1/8th Proactively address issues (e.g. dropped calls) impacting customer satisfaction. We have been working with an Indian telco client for some time now to help reduce their billing costs and improve customer satisfaction. Challenge: Call Detail Record (CDR) processing within their data warehouse was sub-optimal, Could not achieve real time billing which required handling billions of CDRs per day and de-duplication against 15 days worth of CDR data Unable to support for future IT and Business with real-time analytics Solution: Single platform for mediation and real time analytics reduces IT complexity The PMML standard is used to import data mining models from InfoSphere Warehouse. Offloaded the CDR processing to InfoSphere Streams resulting in enhanced data warehouse performance and improved TCO Each incoming CDR is analyzed using these data mining models, allowing immediate detection of events (ex: dropped calls) that might create customer satisfaction issues. Business Benefit: Data now processed at the speed of Business - from 12 hours to 1 second HW Costs reduced to 1/8th Support for future growth without the need to re-architect, more data, more analysis Platform in-place for real-time analytics to drive revenue 22 22

Asian Government Agency
National Intelligence Platform Capabilities: Stream Computing Analyze all Internet traffic (social media, , etc) Track persons of interest (drug/sex traffickers, terrorists, illegal refugees/immigrants) and civil/border activity 23 23

Big Data Exploration Find, visualize, understand all big data to improve decision making Enhanced 360o View of the Customer Extend existing customer views (MDM, CRM, etc) by incorporating additional internal and external information sources Security/Intelligence Extension Lower risk, detect fraud and monitor cyber security in real-time Operations Analysis Analyze a variety of machine data for improved business results Data Warehouse Augmentation Integrate big data and data warehouse capabilities to increase operational efficiency Our product management, engineering, marketing, CTPs, etc, etc teams have all been working together to help to better understand the big data market. We’ve done surveys, met with analysts and studied their findings, we’ve met in person with customers and prospects (over 300 meetings) and are confident that we found market “sweet spots” for big data. These 5 use cases are our sweet spots. These will resonate with the majority of prospects that you meet with. In the coming slides we’ll cover each of these in detail, we’ll walk through the need, the value and a customer example.

Operations Analysis: Needs
Analyze a variety of machine data for improved business results Business Challenges: Complexity and rapid growth of machine data Difficult to capture small fraction of machine for better decision In-ability to analyze machine data and combine it with enterprise data for a full view analysis What is Operations Analysis? It’s using big data technologies to enable a new generation of applications that analyze large volumes of multi-structured, often in-motion machine data and gain insight from it, which in turn improves business results What are the drivers for an Operations Analysis use case? In its raw format, businesses are unable to leverage machine data Growing at exponential rates Comes in large volumes, variety of formats, often in-motion Needs to be combined with existing enterprise data Requires complex analysis and correlation across different types of data sets Requires unique visualization capabilities based on data type and industry/application Organizations want to leverage machine data to improve business results and decision-making Benefits: Gain real-time visibility into operations, customer experience, transactions and behavior Proactively plan to increase operational efficiency Identify and investigate anomalies Monitor end-to-end infrastructure to proactively avoid service degradation or outages 25

Operations Analysis: Value & Diagram
Indexing, Search Raw Logs and Machine Data Only store what is needed Statistical Modeling Root Cause Analysis Machine Data Accelerator NET: Big ROI here for companies that adopt this – at the moment they may be making decisions based on up to 1-10% of their available information. ALSO – they are potentially storing information that they do not need… Huge volumes of machine data (in lots of different formats) coming into your HDFS (BigInsights) Data can also be coming from Streams BigInsights, which comes with Machine Data Accelerator, is able to perform deep data analysis from all of these complex data sources. Machine data can then be correlated with other enterprise data (customer, product information, etc.) Combining IT and business data allows you to put it in the hands of operational-decision maker to increase operational intelligence. These decision-makers can visualize data across many systems to get the most informed view Business decisions are more informed and can happen in fraction of a second They can: Gain deep insights into operations & more Proactively plan to increase efficiency Visualize data from a variety of complex systems to aid in decision making Real-time analysis to monitor and provide alerts *Note that this is not a Tivoli play where we’re selling big data to IT so they can monitor their machines, hardware, applications or networks. This is about being able to leverage the data generated by machines to make better decisions and improve business results. Products involved: BigInsights, which comes with a new Machine Data Analytics Accelerator Streams (optional), for analyzing data in-motion InfoSphere Data Explorer, for federated navigation and discovery Gain deep insights into operations, customer experience, transactions and behavior Proactively plan to increase operational efficiency Visualize data from a variety of complex systems to ensure all data is being used in decision-making Machine Data Ingestion Push data batches to HDFS Validate metadata Data parsing and extraction Record splitting Field extraction Event standardization Event generalization Event enrichment Data available for visualization via BigSheets Customizable/extendable extraction rules Federated Navigation & Discovery Real-time Analysis 26

OPERATIONAL - ANALYSIS
Capabilities: Hadoop & Stream Computing Intelligent Infrastructure Management: log analytics, energy bill forecasting, energy consumption optimization, anomalous energy usage detection, presence-aware energy management Optimized building energy consumption with centralized monitoring; Automated preventive and corrective maintenance Jervin: IBM Helped CISCO to build an intelligence infrastructure management to optimized a CENTRALIZED building energy consumption How do you know if Operations Analysis is right for your customer? Do you deal with large volumes of machine data (i.e. raw data generated by logs, sensors, smart meters, message queues, utility systems, facility systems, clickstream data, configuration files, database audit logs and tables)? Are you unable to perform the complex analysis, often in real time, needed to correlated across different data sets? Are you unable to search and access all of this machine data? Are you able to monitor data in real time and generate alerts? Do you lack the ability to visualize streaming data and react to it in real time? Are you unable to perform root cause analysis using that data? Do you want the ability to correlate KPI to events? Cisco is a client that is leveraging multiple big data capabilities to develop an intelligent infrastructure management solution. Background: Using its intelligent networking capabilities, Cisco launched a Smart+Connected Communities (S+CC) initiative to weave together people, services, community assets and information into a single pervasive solution. There are two initial use cases out of a total of 15 planned solutions: (1) Intelligent Infrastructure Management Service (IIMS). An S+CC service that enables centralized monitoring and control of building systems through an integrated user interface while providing real time usage information to optimize building energy resource consumption (2) Intelligent Maintenance Management Service (IMMS). An S+CC service that automates preventive and corrective maintenance of building systems and enhances lifetime of the equipment while reducing overall maintenance cost. In these use cases, the following types of applications are being leveraged: Log Analytics Energy Bill Forecasting Energy consumption optimization Detection of anomalous energy usage Presence-aware energy management Policy management / enforcement Challenge: 1) Before engaging IBM, Cisco used an internally developed web-based reporting structure, which included statistical information, to measure the effectiveness of these solutions. However, it could not use the information generated in the context of the solutions for in-depth analysis. 2) The effective use of such information - along with relevant external information - required advanced information management and analytics tools and capabilities. Solution: IBM stream computing (Streams) software allows user-developed applications to rapidly ingest, analyze and correlate information as it arrives from thousands of real-time sources. IBM Hadoop system (BigInsights) to efficiently manage and analyze big data, digest unstructured data and build environmental and location data. – IBM business analytics to generate solution-relevant dashboards and reports to explore data in any combination and over any time period Benefits: Robust service delivery platform (SDP) capable of delivering improved solutions to its S+CC environment, thereby increasing operating efficiency and enhancing its service levels Cisco significantly reduced costs, increased its revenues and improved its competitive position.

Data Warehouse Augmentation: Needs
Integrate big data and data warehouse capabilities to increase operational efficiency Jervin: Customer would like to leverage on VARIERY of Data (BigData) with the existing Data Warehouse ?.. What is data warehouse augmentation and what are the drivers? Data Warehouse augmentation builds on an existing data warehouse infrastructure, leveraging big data technologies to ‘augment’ its value Two main drivers: Need to leverage variety of data Structured, unstructured, and streaming Low latency requirements (hours not weeks or months) Requires query access to data Optimize warehouse infrastructure Warehouse data volumes reaching big data levels Large portion of data in warehouse not accessed frequently Need to optimize warehouse investment (Note: this is not to imply that our warehousing solutions are expensive, but instead that augmenting with big data technologies can make the warehouse a more optimal investment since you no longer attempt to store and analyze EVERYTHING within the warehouse, which can strain it from a performance and cost perspective. Need to leverage variety of data Extend warehouse infrastructure Structured, unstructured, and streaming data sources required for deep analysis Low latency requirements (hours—not weeks or months) Required query access to data Optimized storage, maintenance and licensing costs by migrating rarely used data to Hadoop Reduced storage costs through smart processing of streaming data Improved warehouse performance by determining what data to feed into it 29

Data Warehouse Augmentation: Customer Example
Internal Use Only Creates pre-processing hub and performs ad hoc analysis Hadoop-based landing zone used to store, manage and analyze structured, semi-structured and multi-structured data before moving to the warehouse Benefits: Data warehouse optimized for workload and performance Utilized InfoSphere BigInsights, InfoSphere DataStage Key Questions to Ask How do you know if a data warehouse augmentation is right for your customer? Are you drowning in very large data sets (TBs to PBs)? Do you use your warehouse environment as a repository for ALL data? Do you have a lot of cold, or low-touch, data? Do you have to throw data away because you’re unable to store or process it? Do you want to be able to perform analysis of data in-motion to determine, in real-time, what data should be stored in the warehouse? Do you want to be able to perform data exploration on complex and large amounts of data? Do you want to be able to do analysis of non-operational data? Are you interested in using your data for traditional and new types of analytics? Example (blinded) of large automotive company that used BigInsights for data warehouse augmentation (primarily as pre—processing hub, but also ad hoc analysis) Are you drowning in very large data sets (TBs to PBs) that are difficult and costly to store? Are you able to utilize and store new data types? Are you facing rising maintenance/licensing costs? Do you use your warehouse environment as a repository for all data? Do you have a lot of cold, or low-touch, data driving up costs or slowing performance? Do you want to perform analysis of data in-motion to determine what should be stored in the warehouse? Do you want to perform data exploration on all data? Are you using your data for new types of analytics? Product Starting Point: BigInsights, Streams

Myth About Big Data.. What Is It ? Case Studies on Big Data
Agenda Myth About Big Data.. What Is It ? Case Studies on Big Data How To Start With Big Data Project ? Q & A Jervin: BigData is not NEW, it’s been around for years and one way or another your organization already has big data e.g. DW. However, Big Data Is more than just a DW that requires to store/analysis large volume of data.. BigData is not just about Volume of data that resides in DW today.. The volume could be batch and realtime (trigger feed)

IBM Big Data Strategy: Move the ANALYTICS Closer to the Data
“IBM has the deepest Hadoop platform and application portfolio. IBM, an established EDW vendor, has its own Hadoop distribution; an extensive professional services force working on Hadoop projects; extensive R&D programs developing Hadoop technologies; connections to Hadoop from its EDW.” –The Forrester Wave™: 1Q12 “IBM InfoSphere BigInsights is a core capability of the most comprehensive Big Data analytics platforms out there right now…” – Krishna RoyLars BI / Reporting Exploration / Visualization Functional App Industry App Predictive Analytics Content Analytics Analytic Applications New analytic applications drive the requirements for a big data platform Integrate and manage the full variety, velocity and volume of data Apply advanced analytics to information in its native form Visualize all available data for ad-hoc analysis (even in motion!) Development environment for building new analytic applications Workload optimization and scheduling Security and Governance IBM Big Data Platform Visualization & Discovery Application Development Systems Management Accelerators BigInsights certified Apache Hadoop Hadoop System Stream Computing Data Warehouse New analytic applications drive the requirements for a big data platform Integrate and manage the full variety, velocity and volume of data Apply advanced analytics to information in its native form Visualize all available data for ad-hoc analysis Development environment for building new analytic applications Workload optimization and scheduling Security and Governance Key Points - Integrate v3 – the point is to have one platform to manage all of the data – there’s no point in having separate silos of data, each creating separate silos of insight. From the customer POV (a solution POV) big data has to be bigger than just one technology Analyze v3 – very important point – we see big data as a viable place to analyze and store data. New technology is not just a pre-processor to get data into a structured DW for analysis. Significant area of value add by IBM – and the game has changed – unlike DBs/SQL, the market is asking who gets the better answer and therefore sophistication and accuracy of the analytics matters Visualization – need to bring big data to the users – spreadsheet metaphor is the key to doing son Development – need sophisticated development tools for the engines and across them to enable the market to develop analytic applications Workload optimization – improvements upon open source for efficient processing and storage Security and Governance – many are rushing into big data like the wild west. But there is sensitive data that needs to be protected, retention policies need to be determined – all of the maturity of governance for the structured world can benefit the big data world And grow and evolve on your current IT infrastructure Information Integration & Governance 32

Four Entry Points of Big Data
Unlock Big Data BI / Reporting Exploration / Visualization Functional App Industry App Predictive Analytics Content Analytics Analytic Applications Simplify Your Warehouse IBM Big Data Platform Visualization & Discovery Application Development Systems Management Preprocess Raw Data Accelerators Hadoop System Stream Computing Data Warehouse Analyse Streaming Data Information Integration & Governance

Thank you Thank you for your time today, and we hope that you’ll join us for further discussion during lunch.

A New Era Of Analytic Ömer Sever IBM SWG TR

Similar presentations

Presentation on theme: "A New Era Of Analytic Ömer Sever IBM SWG TR"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A New Era Of Analytic Ömer Sever IBM SWG TR

Similar presentations

Presentation on theme: "A New Era Of Analytic Ömer Sever IBM SWG TR"— Presentation transcript:

Similar presentations

About project

Feedback