In the Data Lake: Not Waving but Drowning Dr. Barry Devlin 9sight

Slides:



Advertisements
Similar presentations
Defining Decision Support System
Advertisements

The Death of the Data Warehouse Michigan Oracle User Summit 14 November 2012.
Systems Development Environment
2013 9sight Consulting, All Rights Reserved Copyright © sight Consulting, All Rights Reserved Dr Barry Devlin Founder & Principal 9sight Consulting.
CHAPTER 7 Roderick Dickson Kelli Grubb Tracyann Pryce Shakita White.
The Database Environment
Setting Big Data Capabilities Free How to Make Business on Big Data? Stig Torngaard, Partner Platon.
Visibility Information Exchange Web System. Source Data Import Source Data Validation Database Rules Program Logic Storage RetrievalPresentation AnalysisInterpretation.
Quantum trajectories for the laboratory: modeling engineered quantum systems Andrew Doherty University of Sydney.
Chapter 9 Designing Systems for Diverse Environments.
Data and Knowledge Management
© 2007 by Prentice Hall 1 Chapter 1: The Database Environment Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
Data and Knowledge Management
MANAGEMENT INFORMATION SYSTEM
with “Not Waving But Drowning” By Stevie Smith (An all-time favorite)
Chapter 4 Database Management Systems. Chapter 4Slide 2 What is a Database Management System (DBMS)?  Database An organized collection of related data.
Chapter 1: The Database Environment
Amadeus Travel Intelligence ‘Monetising’ big data sets
A Comparsion of Databases and Data Warehouses Name: Liliana Livorová Subject: Distributed Data Processing.
Introduction to the course January 9, Points to Cover  What is GIS?  GIS and Geographic Information Science  Components of GIS Spatial data.
© 2003, Prentice-Hall Chapter Chapter 2: The Data Warehouse Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas.
Understanding Data Warehousing
Information System and Management
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
Data and Knowledge Management
© 2007 by Prentice Hall 1 Introduction to databases.
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 10: The Data Warehouse Decision Support Systems in the 21 st.
© 2009 Pearson Education, Inc. Publishing as Prentice Hall 1 Chapter 1: The Database Environment Modern Database Management 9 th Edition Jeffrey A. Hoffer,
1 Data Warehouses BUAD/American University Data Warehouses.
BMI Consulting Business Intelligence Roadmap Business Analysis Requirements Subject Modeling.
University of Nevada, Reno Organizational Data Design Architecture 1 Organizational Data Architecture (2/19 – 2/21)  Recap current status.  Discuss the.
Copyright © 2003 Sherif Kamel Issues in Knowledge Management Dr Sherif Kamel The American University in Cairo.
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
Chapter 1 Chapter 1: The Database Environment Modern Database Management 8 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden © 2007 by Prentice.
Chapter 1 1 Lecture # 1 & 2 Chapter # 1 Databases and Database Users Muhammad Emran Database Systems.
Best Practices in Higher Education Student Data Warehousing Forum Northwestern University October 21-22, 2003 FIRST QUESTIONS Emily Thomas Stony Brook.
Aarohan 2013 South Asian women RISE UP Against Violence August 23 rd – 25 th.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS Instructor Ms. Arwa Binsaleh.
1 Database Systems Instructor: Nasir Minhas Assistant Professor UIIT PMAS-AAUR
Co-funded by the European Union Semantic CMS Community Content and Knowledge Management From free text input to automatic entity enrichment Copyright IKS.
Knowledge Management & Knowledge Management Systems By: Chad Thomison MIS 650.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
Andrew C. Samuels, Information Technology Specialist Trainer c/o Ministry of Education Mona High School, Kingston, Jamaica 1 Unit 1 Module 1 Obj. 4 Specific.
Foundations of Information Systems in Business. System ® System  A system is an interrelated set of business procedures used within one business unit.
Chapter 7 Affective Computing. Structure IntroductionEmotions Emotions & Computers Applications.
Chapter 1 © 2013 Pearson Education, Inc. Publishing as Prentice Hall Chapter 1: The Database Environment and Development Process Modern Database Management.
Learning Objectives Understand the concepts of Information systems.
University of Nevada, Reno Organizational Data Design Architecture 1 Agenda for Class: 02/06/2014  Recap current status. Explain structure of assignments.
Systems Development Lifecycle
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
DATA Storage and analytics with AZURE DATA LAKE
Data Analytics 1 - THE HISTORY AND CONCEPTS OF DATA ANALYTICS
CHAPTER SIX DATA Business Intelligence
Zhangxi Lin, The Rawls College,
A Big Data Cheat Sheet: The Big Pharma Edition
Data Warehouse.
Information System and Management
Consulting Services for IoT
Chapter 1 Database Systems
Chapter 1: The Database Environment
The Database Environment
Chapter 1: The Database Environment
KNOWLEDGE MANAGEMENT (KM) Session # 37
Chapter 1 Database Systems
DATABASE ENGINEERING INTRODUCTION.
Data Warehousing Concepts
The Database Environment
Database Systems: Design, Implementation, and Management Tenth Edition
Presentation transcript:

In the Data Lake: Not Waving but Drowning Dr. Barry Devlin 9sight

"If you think of a data mart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples." James Dixon, CTO, Pentaho (Forbes, 2011) "If you think of a data mart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples." James Dixon, CTO, Pentaho (Forbes, 2011) What is a Data Lake?  Words have meanings  Metaphors make images Copyright © 2014, 9sight Consulting 2

Data Lake – definitions and questions  Is all data of equal value?  Is quality and consistency no longer needed?  Should we really store everything?  Build it and they will come?  What problem are we trying to solve? 3 Copyright © 2014, 9sight Consulting A data lake is a large object- based storage repository that holds data in its native format until it is needed. Margaret Rouse, WhatIs.com A data lake is a large object- based storage repository that holds data in its native format until it is needed. Margaret Rouse, WhatIs.com A data lake is a massive, easily accessible, centralized repository of large volumes of structured and unstructured data. Cory Janssen, Technopedia.com A data lake is a massive, easily accessible, centralized repository of large volumes of structured and unstructured data. Cory Janssen, Technopedia.com

The Data Lake Fallacy: All Water and Little Substance  Gartner report, G , 23 July 2014, Nick Heudecker, Andrew White  The main risk of using data lakes is the absence of metadata and an underlying mechanism to maintain it… the lack of which can turn a data lake into a “data swamp”  Copyright © 2014, 9sight Consulting Image: anaxi.deviantart.com/art/Lostless-Swamp-Concept

Do we need a new architecture?  Yes!  Original data warehouse is too restrictive  Business needs agility, speed and consistency  Emerging biz-tech ecosystem -Business / IT symbiosis 5 Copyright © 2014, 9sight Consulting Information abundance and variety Customer interaction and technical savvy Speed of decision and appropriate action Market flexibility and uncertainty Competition Mobile devices Externally-sourced information

One more time, let’s do architecture  The IDEAL architecture consists of three conceptual “thinking spaces”.  Characteristics -Integrated -Distributed -Emergent -Adaptive -Latent  Also read as a story: People process information 6 Copyright © 2014, 9sight Consulting Information Process People

The tri-domain information model  Process-mediated data -“Traditional” operational & informational data -Via data entry & cleansing processes  Machine-generated data -Output of machines and sensors -The Internet of Things  Human-sourced information -Subjectively interpreted record of personal experiences -From Tweets to Videos 7 Copyright © 2014, 9sight Consulting Human-sourced information Machine- generated data Process-mediated data Structure/Context Timeliness/ Consistency HistoricalReconciledStableLiveIn-flight Raw Atomic Derived Compound Textual Multiplex

Introducing information pillars  One architecture for all types of information -Mix/match technology as needed -Relational, NoSQL, Hadoop, etc.  Integration of sources and stores -Instantiation gathers inputs -Assimilation integrates stored info.  Data flows as fast as needed and reconciled when necessary -No unnecessary storage or transformations  Distinct data management / governance approaches as required 8 Copyright © 2014, 9sight Consulting Transactions Human- sourced (information) Machine- generated (data) Process- mediated (data) Context-setting (information) Assimilation Transactional (data) Events Measures Messages Instantiation

From metadata to context-setting information  Metadata is two four-letter words! -Information (not data) -Describes all “stuff” (not just data) -Indistinguishable (mostly) from “business information” The Mars Climate Orbiter, lost in 1999, at a cost of $325M, due to metadata error Copyright © 2014, 9sight Consulting 9 What was the most expensive metadata error in history?  Context-setting information (CSI) -New image – describes what it is and does -Provides the background to each piece of information, to every process component and to all the people that constitute the business -All information adds context to something else; it is all context setting

m 3 : the modern meaning model  Ackoff’s DIKW pyramid is no longer viable  Information precedes data -Data is simply information optimized for computers -The Web has fully devalued “facts” -People process information Locus Structure Physical Loose Mental Strict Interpersonal Hard Information Soft Information Explicit Knowledge Tacit Knowledge Meaning The stories we tell ourselves Objective / universal Subjective / unique Sense- making Mentoring Understanding Insight Data Content Articulation Practice Documenting Learning Videoing Observing Modeling Interpreting From Physical World From Human World Copyright © 2014, 9sight Consulting 10

Human, social and collaborative dimension  Meaning is a personal/ social interpretation based (loosely) on information and knowledge -Rationality is only one part -Gut-feel may be more effective than rationality in decision making -Emotional state plays an important role  Intention drives understanding and action  We are social animals -Business is a social enterprise  Innovation is often team-based 11 Copyright © 2014, 9sight Consulting

From BI to Business unIntelligence  Rationality of thought and far beyond it  Logic of process, predefined and emergent  Information, knowledge and meaning  The confluence of -Reason and inspiration, emotion and intention -Collaboration and competition -All that comprises the human and social milieu that is business  Not business intelligence… Business Intelligence  : 25% discount with code “BIInsights25” Copyright © 2014, 9sight Consulting 12 un ^

Conclusions 13 Copyright © 2014, 9sight Consulting 1.Speed, flexibility and quality vital in modern business -Biz-tech ecosystem shows direction -Data Lake driven by “Big Data blindness” 2.Modern information architecture is highly diverse -Structure and consistency where needed -Agility and speed when required -Data Lake ignores need for structure and consistency 3.Context and meaning are keystone concepts -Flexibility & quality bridged via context-setting information -Business unIntelligence provides overall structure

Not Waving but Drowning Nobody heard him, the dead man, But still he lay moaning: I was much further out than you thought And not waving but drowning. Poor chap, he always loved larking And now he’s dead It must have been too cold for him his heart gave way, They said. Oh, no no no, it was too cold always (Still the dead one lay moaning) I was much too far out all my life And not waving but drowning. Stevie Smith (1957)