Presentation is loading. Please wait.

Presentation is loading. Please wait.

2.5 quintillion bytes of new data created daily

Similar presentations


Presentation on theme: "2.5 quintillion bytes of new data created daily"— Presentation transcript:

1 Applying Cognitive Computing to Understand and Answer Users Information Needs
2.5 quintillion bytes of new data created daily Expertise matters more today than ever, but even top experts can’t keep up

2 Big Data Challenges Challenges Watson
Professionals across industries struggle daily to get the information and insights they need, when they need them Challenges Watson Explore Unified view of information from ALL sources to enable new insights and better decisions Information Access Data, applications and services distributed on-premise and in cloud - employees struggle to get a complete view Too many silos Inability combine data from multiple sources Unstructured Content 80% of data is unstructured but only a small percentage leveraged for insights Analyze Delivers insights from unstructured content * In today’s data-driven world, professionals, at all levels of management, and across industries, struggle to efficiently access and leverage the data they need – not because there is a lack of data The challenge is in finding the RIGHT information in the time and place it’s needed. By information, we mean all kinds whether it’s raw data, analytics, documents or any of the forms that information can take today. As data in our organizations continues to grow in volume, variety, veracity, and velocity, so too does the challenge in leveraging that data in our day-to-day activities. Typical remarks from individuals facing these challenges: “I can’t unlock the value in my data to drive economic value to my business.” “I don’t know what I don’t know – where is my business exposed?” * Data is locked up in different silos and there is no easy way of combining data from the multiple sources – we end up relying on a single source which may not be the best source Data can be locked up in generic databases, websites, systems, Customer Relationship Management (CRM) systems, Product Life Cycle (PLC) systems, Content Management Systems (CMS), etc. Information related to a specific entity, say a customer account may be available in many of these sources. How do we make sure we have all the relevant information about that customer? * This results in not only lower productivity of professionals, but also lower quality of decisions and service This costs organizations money because, depending on the individual’s role in the organization, they’re spending significant amounts of time just looking for information by logging into many different systems and searching separately. But the challenge goes deeper than that because when THE RIGHT information is scarce, employees—whether senior management or front-line employees—have to proceed with what they have. That means that customers don’t get the best answers from your contact center; R&D staff end up “re-inventing the wheel,” marketing plans are made without full understanding of the target markets and customers, and so forth, right up through senior management. Let’s focus now on the specific challenges that we see in organizations when confronted with today’s complex hybrid landscape of information and applications and how Watson Explorer can address these challenges. * Today’s knowledge workers struggle to get the information they need, when they need it, in the relevant context, efficiently – Watson Explorer can unify information access from all sources in the enterprise and deliver it in context If you’re like most organizations , it’s a constant struggle to deliver the information and insights that your front-line employees need for top performance. This was true even before the proliferation of data and hybrid environments, and it’s even more true today. The first challenge we see is one of information access. This results both from the sheer volume and variety of data in your organization and from the varied ways in which it is stored and managed. If you’re like most organizations, you have 100s of terabytes of data and it is spread across on-premise, cloud-based and external systems. To address this Watson Explorer provides the ability to explore across many different systems—on-premise and off—and deliver the right information to front-line staff, at the “point of impact” and in the proper context. If done right, this means that it doesn’t matter where the data and analytic services reside. * 80% of organizational data is unstructured (in free text, documents, web pages, images, etc.) and only a small percentage of it is utilized – Watson Explorer can analyze this data and surface patterns and insights Another challenge most organizations face is that most of their data—80% according to industry analysts—is “unstructured” in the form of documents, web pages and other forms of human language, mixed it with images and other media. As we’ll see in a few minutes, a tremendous amount of insight could be gained from this data if you only had a scalable way to analyze it all. To address this, Watson Explorer provides very advanced content analytics capabilities. * Expertise matters more today than ever, but is scarce and difficult to scale – Watson Explorer’s cognitive computing can help! And finally, you face the challenge of scaling the expertise in your organization. How many times have you said to yourself that you’d like to “clone” one of your experts or top performers? Now we’re not promising anything along those lines, but in fact a lot of the work that humans do to consume and process information, form hypotheses and make judgments, can be automated to a degree. Here I’m talking about the field of cognitive computing. The Watson Group has taken the key capabilities of the Watson Platform and made them accessible in the cloud. We have provided an integration layer with Watson Explorer with the ability to tap into the cognitive capabilities of the IBM Watson platform and to leverage those immediately. Let’s examine each of these challenges individually and understand how Watson Explorer can help. Lack of insight due to reliance on a single source of info Scaling Expertise Pressure to increase performance and innovation—while doing more with less Interpret Applies cognitive computing to scale human expertise

3 Why Are Cognitive Systems Important to Modern Life
Digital information growth is exponential Understand the intention of human expression Act as a research assistant to pinpoint relevant information Amplify human abilities to scale the democratization of expertise Reduce uncertainty in decision-making

4 Extract Implied/Tacit Knowledge From Unstructured Data
UIMA provides the core contextual analytics through a variety of annotators Text Analytics (NLP) a set of linguistics, statistical, and machine learning techniques that allow text to be analyzed and key information extraction for business integration Content Analytics (Text Analytics + Mining) ability to visually identify and explore trends, patterns, sentiment, and statistically relevant facts found in various types of content Text Summary - Encoder-decoder recurrent neural network

5 Hybrid Cloud Cognitive Search Engines
Cognitive Exploration solution that… … leverages Big Data across diverse subject domains Compound document is a feature to treat a set of documents efficiently Each document (document body or attachment file) are indexed separately but treated as if they were one single document

6 Natural Language Querying (NLQ)/Domain Adaptive Search
Uses techniques in natural language processing and artificial intelligence Offers the search tuning framework and pre-defined functional components for well-tuned context- aware search results Custom query rewrite/suggestions annotator integration at query processing for query modification and intelligent query suggestions Knowledge extraction annotator integration at document ingestion to extract concepts and enrich the index (it can involve external repository such as a triple store) Includes essential functional components useful for context-aware search like POS based tuning capabilities, phrasal processing based on POS pattern, etc. Documents ingested: Bring a compass and a ruler Index w/ Stationary Query: Natural language processing (NLP) is a field of computer science, artificial intelligence and computational linguistics concerned with the interactions between computers and human (natural) languages, and, in particular, concerned with programming computers to fruitfully process large natural language corpora. What is a tool to draw circles? Look for Object Append Stationary w/ OR Append Compass w/ OR Boost nouns higher Match NOT Match Small climbing compass Index w/ Navigational Instrument

7 Social Search and Analytics
Social collaboration is becoming more prevalent within the enterprise Social Network Discovery Framework (SaND) Supports social entity and activity People come into play in various businesses – creating, tagging, comments, relationships Discovers and analyzes relationships content and people Search keywords in social contents WIKI BLOG Ordinary Search Contact & Collaborate WIKI BLOG See who else has used that data and what they’ve done with it Post Comment Put Tag WCA V3.5 (SaND) Rank the results by peers’ action on documents Recommend documents more relevant to me Show who are interested in the documents Social search in Watson™ Explorer Content Analytics is based on Social Networks and Discovery (SaND). SaND is a social analytics and discovery framework that aggregates social information from various sources across the enterprise and extracts relationships between documents, people, and tags. To support basic aspects of social search, some crawlers can create index fields from basic social elements in the source data, such as who rated a document or who added a comment. TheFileNet® P8 crawler, Seed list crawler for IBM® Connections, and SharePoint crawler can use SaND technology to extract relationships from social elements. In social search, search is expanded to provide information about entities that might be related to each other through various types of relationships, such as users who both comment on a document or users who report to the same manager. Entities are independently searchable, but the results show related documents, related people, related tags, and so on. A document's ratings, comments, and tags can reveal the popularity of the document and add valuable content to the original document content. Associations express how a person is associated with documents in the result set and how a person is related to other people. For example, two users might be authors of a document, several users might have commented on the document, and other users might have added tags to the document. All of these entities are associated, but an author might be ranked as more relevant than a commenter who, in turn, is ranked as more relevant than a tagger. Associations are the basic elements for defining relationships and calculating the most related recommendations and experts. When processing social entities, the crawlers and index processes attempt to apply "best match" associations according to a defined schema. In some cases, the indexed associations might not reflect the actual meaning of the original data in the data source. You can explore associations only when you search a single collection at a time (collection federation is not supported). The number of relationships per document is limited to Relationships A relationship represents how two people relate. The relationship type is based on the associations stored for each person and the document type. For example, if two users are associated with a document as contributors, and the document type is a document, then the co-contributor relationship is generated and stored in the index. When you enter a query for a person, the list of related people in the results lets you explore the relationships. For example, a person might be an author and also a co-commenter or co-tagger with other people. Identify focus group and analyze documents associated with the group Manage contents based on the group 6

8 Cognitive Search Engine Demo
My Home Page (70 sec) This is my Watson Explorer home page where I have personalized information about my Garnet product line of ICD devices and leads [Point to My Products widget]. With this consolidated view, I am able to track key performance indicators such as sales to forecast [Point to Sales widget], production against plan [Point to Production widget], in-flight marketing campaigns, and so forth, [Point to Marketing widget] using data from existing systems in my organization such as Salesforce, Sharepoint, Master Data Management systems, and various production and inventory databases. [Point to Social Media Widget] I can get a quick temperature check of perceptions and sentiment from social media. [Point to News Widget] I can also scan news articles on industry and competitor products right in my home page, using cognitive services available from the Watson Developer Cloud platform. Watson Explorer is the fastest, most direct way to bring cognitive capabilities to your organization. Without Watson's AlchemyNews service, I would have to look through subscriptions and competitor press releases for this information. [Point to FDA Reports] Let's look at the FDA reports. Watson Explorer is able to pull reports filed with the FDA’s public medical device adverse events reporting database. I have been following an increasing trend of reports related to my product portfolio and click through the widget to drill into the problem. [Point to Production Health Widget] [Point to Marketing Campaigns Widget]

9 What’s Watson Watson combines transformational capabilities to deliver a new world experience Watson understands me. Watson engages me. Watson learns and improves over time. Watson helps me discover. Watson establishes trust. Watson has endless capacity for insight. Watson operates in a timely fashion. 1 Understands natural language and human style communication 2 Generates and evaluates evidence-based hypothesis Introduction – Today, I’m here to discuss how IBM trained Watson’s to understand cyber security domain. Watson’s cognitive technologies enable Watson to think similar to human. Watson uses three powerful characteristics - natural language, hypothesis generation, and evidence based learning. It combines these technologies and applies massive parallel probabilistic processing techniques to fundamentally change the way organization look at quickly solving problems Looking at these one by one, understanding natural language and the way we speak breaks down the communication barrier that has stood in the way between people and their machines for so long. Hypothesis generation bypasses the historic deterministic way that computers function and recognizes that there are various probabilities of various outcomes rather than a single definitive ‘right’ response. And adaptation and learning helps Watson continuously improve in the same way that humans learn….it keeps track of which of its selections were selected by users and which responses got positive feedback thus improving future response generation These cognitive technologies enable Watson to extract implied and tacit knowledge from unstructured cyber data and stores it in a knowledge corpus. When cyber incidents are detected by a SIEM tool like QRadar, that information is passed to Watson. Watson is able to correlate cyber incident data with cyber knowledge. This enables Watson provide real time threat intelligence to support SoC operations and improve efficiency in incident response. 3 Adapts and learns from training, interaction, and outcomes 8 8

10 Why Watson Is A Differentiator
Changing the World to probabilistic computing … understand and reason from human interactions and evaluating data. Today Watson is in the 17 industries across 6 continents In 2011, Watson beat former winners Jeopardy Brad Rutter and Ken Jennings Watson is trained to perform cyber intelligence and incident response Combining Watson with Robotics - reach costumers on a more personal level Watson Beat helped write song “Not Easy” for Grammy-Winning producer Alex Da Kid MSKCC is using Watson to help oncology physicians battle cancer Watson is changing the world – from deterministic decision making to probabilistic computing. Watson is able to understand and reason based human interactions and evaluating data. Many IBM Watson Solutions focus on questions where the answer is known… but with cyber intelligence, IBM needed Watson to extract tacit/implied knowledge from unstructured cyber data and correlate it with cyber forensic data to explore questions where the answer is unknown Memorial Sloan-Kettering Cancer Center (MSKCC) is using Watson to help oncology physicians battle cancer. Traditionally, oncology physicians diagnose cancer using a patient’s chart, x-rays, laboratory data, a few medical books; and they might then recommend either the general radiation therapy or three types of chemotherapy. Today, oncology physicians face a perpetually growing sea of data in their efforts to effectively deal with in every aspect of their patients’ care. The associated medical information doubles every five years -- e.g., thousands of books and articles, electronic patient and family medical records, over 800 different cancer therapies ,sequencing 340 cancer tumors (each with multiple mutations), analyzing 20,000 genes, correspondence with over 1,000 physicians, and the exponential rise in medical publications. Traditional processes for cancer prognosis and the recommendation of therapies are no longer able to effectively harness all of the available data. Keeping up with medical literature could take up to 160 hours per week – an unrealistic option. Hence physicians are turning to Watson to develop precision based medicine in cancer.  This year 2 million women will be diagnosed with breast cancer worldwide • Therapies available for treatment : more than 800 • To keep up, it would take a physician 160 hours of reading per week! • 15,000 hours of training by MSK, Watson ingested more than 600,000 pieces of medical evidence, 2 million pages of text, 1.5 million patient records and 26,000 clinical cases  Grammy award-winning music producer Alex Da Kid uses Watson BEAT to create a song Not Easy. Watson Beat an IBM cognitive technology that understands music and lets artists change the sound of a song based on the mood they want to express. IBM is currently testing robotics technology with certain companies in hospitality and consumer packed goods. But combining the software with Pepper, businesses can reach costumers on a more personal level Sesame Street, and IBM (NYSE: IBM)- today announced a collaboration to use IBM Watson’s cognitive computing technology and Sesame’s early childhood expertise to help advance preschool education around the world. Sesame and IBM are poised to lead early childhood education into the future with Watson’s cognitive computing capabilities -- data mining, pattern recognition and natural language processing. These capabilities will create the one-of-a-kind, highly personalized learning experiences our children need by tapping into their personal learning preferences, including abilities, likes, dislikes and engagement levels, bringing each child her or his own 'private tutor' or 'virtual learning assistant.'

11 The Architecture Underlying Watson - DeepQA
Generates many hypotheses, collects a wide range of evidence and balances the combined confidences of over 100 different analytics that analyze the evidence from different dimensions Answer Scoring Models Answer & Confidence Question Evidence Sources Primary Search Candidate Answer Generation Hypothesis Hypothesis and Evidence Scoring Final Confidence Merging & Ranking Synthesis Answer Sources Question & Topic Analysis Evidence Retrieval Deep Evidence Scoring Learned Models help combine and weigh the Evidence Hypothesis and Evidence Scoring Decomposition Models The DeepQA architecture is what enables Watson to emulate human thinking. It performs these actions: 1. When a question is first presented to Watson, it parses the question to extract the major features of the question. 2. It generates a set of hypotheses by looking across the corpus for passages that have some potential for containing a valuable response. 3. It performs a deep comparison of the language of the question and the language of each potential response by using various reasoning algorithms. Watson uses hundreds of reasoning algorithms, each of which does a different comparison. For example, some look at the matching of terms and synonyms, some look at the temporal and spatial features, and some look at relevant sources of contextual information. 4. Each reasoning algorithm produces one or more scores, indicating the extent to which the potential response is inferred by the question based on the specific area of focus of that algorithm. 5. Each resulting score is then weighted against a statistical model that captures how well that algorithm did at establishing the inferences between two similar passages for that question domain during the “training period” for Watson. That statistical model can then be used to summarize a level of confidence that Watson has about the evidence that the candidate answer is inferred by the question. 6. Watson repeats this process for each of the candidate answers until it can find responses that surface as being stronger candidates than the others. . . . 10

12 Is there any literature to support this recommendation?
What are the key elements of my patient’s health record that support this diagnosis? What are the potential negative side effects? What’s the best treatment option for my patient? Am I missing any key information? Make smarter decisions faster. What if you could make more timely, accurate and evidence-based decisions? Consider this: 1 in 5 diagnoses are wrong or incomplete and less than 50% of medical decisions are evidence-based. Almost one quarter of CEOs say their organizations operate below par in terms of driving value from data. In the medical field, for instance, it’s challenge for oncologists to keep up with rapidly evolving standards of care – much less stay abreast of the latest research, therapies, and clinical trials, but when it comes to oncology, innovation starts with individualization. How can you design the treatment plan that’s most effective for each patient? We’ve collaborated with Memorial Sloane Kettering Cancer Center to train Watson as a powerful assistant and decision advisor for Medical Oncologists. Watson for Oncology encapsulates the clinical expertise of MSK, delves into the latest medical literature, and extracts key information from each patient’s health record to offer tailored, evidence based treatment options at the point of care along with supporting evidence at the clinician’s fingertips - helping oncologists achieve better outcomes. MSK Demo: © 2015 International Business Machines Corporation

13 Thank You IBM US Federal Lee Angelelli ( langelel@us.ibm.com )
Justin Fessler

14 Where can you go from here?
We can direct potential candidates in the audience to sources of information and next steps Take the next step and apply to the Ecosystem: © 2014 International Business Machines Corporation

15 Simplify Big Data Analytics and Cognitive Development
Bluemix is an open-standards, cloud-based innovation platform for building, managing, and running applications of all types (web, mobile, big data, smart devices, and more).

16 IBM Watson on Bluemix Watson cognitive services provide developers easy access to cognitive building blocks Give applications the full power of cognitive using Watson’s building blocks for speech, language, vision, and data insights. Answer your customers' most pressing questions Quickly extract key information from all documents Reveal insights, patterns and relationships across data Make your apps Read, Hear, Talk, See & Learn Rapid Innovation in Cognitive Solutions

17 Overall Picture of Relevancy Improvement
Extract concepts/categories, insert those into index or custom linguistic resources Parse/Index Plug-in point Crawl/Parse Analyze Annotated text Source documents Knowledge extraction Annotator Index Dictionaries TripleStore Additional external linguistic resources Lucene Index Facets Add information like. location, preference Refer Refine queries by extracted concepts Search Layering/ Post-Rank Search Submit/ Expansion Refine Query text Query text w/ user preference Query interpretations w/ concepts Search results Query rewrite/suggestion Annotator Ranking filters w/ custom rankers Suggestions Intelligent query suggestions for interactive query refinement Construct layered results w/ custom rank that fit query context

18 Our experience with data integration, analytics, and cognitive computing is leading to an emerging model for contextual systems that can be deployed across the ecosystem from mobile to cloud ESB Information Data Knowledge Context Intelligence Decisions & Actions C O N N E C T G A T H E R R E A S O N A D A P T NoSQL SQL Graph DB Feedback & Learning Feedback & Learning Feedback & Learning Gather Collect all relevant data from a variety of sources, keep everything you can as long as you can. Connect Dynamically extract features and create metadata from diverse data sources to continually build and update context. Reason Analyze data in context to uncover hidden information and find new relationships. Analytics both add to context via metadata extraction, and use context to broaden information exploitation Adapt Compose recommended interactions, use context to deliver to point of action. Learn from user behavior and interaction patterns to improve context 12. Our experience with data integration, analytics, and cognitive computing is leading to an emerging model for contextual systems that can be deployed across the ecosystem from mobile to cloud IBM’s experience over the last decade with Data Integration (Infosphere, EDW, MDM), Analytics (BigInsights, Streams, SPSS, Cognos) and Cognitive Computing (WATSON) has given us a unique perspective on combining these technologies to create the contextual enterprise.  There are four activities that are key to success in developing contextual computing applications. Gather: Collecting all potentially relevant data from a high variety of sources, and keeping as much of that data as you can for as long as you can.  The reasons for this are twofold:  first, the greater the variety of potentially relevant data, the high probability of discovering high value context; second, downstream insights, reasoning and both user and application context can create new perspectives on existing data, driving new questions and analytics across multiple data stores.  Note that there will be multiple stores of various kinds (Relational, NoSQL, RDF, etc.) to manage the data at this stage, depending on the data types and the organizations required for the downstream analytics and context accumulation pipeline.  A key differentiator in this phase will be the ease of gathering data from multiple sources, using technologies like Research’s Virtual Information Exchange. Connect: Using technologies such as Infosphere Streams for high volume ingestion analytics, G2 for real time sense making and context accumulation, UIMA for unstructured data analytics and MIDAS on Hadoop/BigInsights for deep contextual analytics, connect the dots across all data sources to create an operational context model of the enterprise.  Since high volumes of continual input data are expected in the contextual enterprise, continual curation and analysis of the contextual model will be required.  While the source data itself may remain in traditional databases, the contextual relationships that connect that data will need to be managed in special ways due to its scale, access patterns, and dynamic graph structure.  New database systems will be required. Reason:  Once the relevant context for all input data has been established, new classes of adaptive analytics can uncover hidden relationships and discover insights at a scale impossible for humans to achieve manually.  New analytics will be required that can adapt to the ever-changing context of their input data.  In addition to WATSON-style reasoning and decision support, rule systems such as ILOG and application platforms such as Websphere and Worklight will exploit changes in contextual information to deliver more insightful applications and user experiences. Adapt: Once decisions have been made and actions determined by context-aware applications, the resulting interactions between both systems and individual users must be composed and adapted to maximize effectiveness and consumability.  For systems, this may require dynamic translation of interactions into automated business processes, or reformating to support APIs with other systems.  For human interaction, the dynamic context of the user must be considered, including location, timeframe, device, language, business role and process in play, as well as both the history of the user’s prior interactions and their preference profile.  For both systems and users, the latency requirements for maximizing the value of the interaction must be taken into account. Across all these phases of information processing and applications, the high variety of data involved and the mixing of data from various sources will require deep and pervasive support for data-centric security, where the security, privacy and provenance for each data element is tracked and respected throughout all aspects of the contextual enterprise. Support for pervasive data-centric security, privacy, provenance, and latency appropriate delivery will be essential 17

19 Leveraging Contextual Analytics For Water Stability Index
Contextual analytics will be used to extract water and food security entities, relations, and concepts from unstructured text Contextual analytics analysis will be used as input into the Water Stability Index calculations that measures regional socioeconomic and ecological stability Reveal temporal patterns Curated data Time-related correlations Compare term to reveal correlations Dive deep into data to see connections


Download ppt "2.5 quintillion bytes of new data created daily"

Similar presentations


Ads by Google