Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stop Data Wrangling, Start Transforming Data to Intelligence

Similar presentations


Presentation on theme: "Stop Data Wrangling, Start Transforming Data to Intelligence"— Presentation transcript:

1 Stop Data Wrangling, Start Transforming Data to Intelligence
BABAR BHATTI DAMA linkedin.com/in/bbhatti @thebabar

2 EXCITING TIMES FOR DATA SCIENCE + AI
AI is for real Market Adoption Public Awareness Software, Hardware Improvements Data Everywhere DIVERGENCE.AI

3 Phases of Analytics Work
Every Analytics Project has 4 phases DIVERGENCE.AI

4 Data Wrangling Is Costly
Source: Forbes Survey DIVERGENCE.AI

5 DATA PREPARATION Process of cleaning, structuring, and enriching raw data into a desired output for analysis. DATA PREPARATION AS A SERVICE vs Do It Yourself vs Self-service products such as Alteryx, Datawatch, Tamr, Google/Trifecta etc Data Prep Is Data Access Without The Data Management Overhead - Forrester DIVERGENCE.AI

6 Data Prep Tools Accelerate Insights
Source: Forrester, Vendor Landscape, Data Preparation Tools, Feb 2016 DIVERGENCE.AI

7 DATA PREPARATION Import / Ingest Data Cleanse and Normalize
3 stages and 12-step process to prepare data. Stages are Prepare, Enrich, and Publish 1 2 3 4 Import / Ingest Data Cleanse and Normalize Schema Detection Duplicate Identification Sensitive Data Discovery Data Profiling Data Classification Data Enrichment Attribute Extraction Schema Discovery Source / Target Definition Export Formatting PREPARE PREPARE PREPARE PREPARE 5 6 7 8 PREPARE ENRICH ENRICH ENRICH 9 10 11 12 ENRICH ENRICH PUBLISH PUBLISH DIVERGENCE.AI

8 USE CASES Risk, Compliance, and Security Retail & Commerce
Starting with Risk, Compliance, and Security Risk, Compliance, and Security Retail & Commerce Customer Behavior Analytics Churn Analysis Customer 360 DIVERGENCE.AI

9 RISK, COMPLIANCE, AND SECURITY
Integrating on our Data science and Cybersecurity capabilities Detect Corporate Fraud Wrangle comprehensive and complex data, such as multi-layered and multi-party s or web chats, to better understand what constitutes deviant behavior. Enable Information Security Keep pace with the billions of security events your institution receives each day by empowering non-technical users to wrangle, investigate and clean datasets faster. Risk Modeling Standardize and quantify structured and unstructured data types quickly to ensure accurate and replicable modeling results. Improve Compliance Track and isolate compliance-sensitive data, from transactions to s, to ensure that industry and government standards are met. DIVERGENCE.AI

10 RETAIL & COMMERCE Supplier Onboarding Product Integration
Supplier Onboarding and Product Integration Supplier Onboarding Integrate and map data from different suppliers into a single schema Identify and flag products with identical attributes, but have differing or incorrect article numbers Enrich product information with attributes from other data sets, e.g. package dimensions, barcodes, etc. Product Integration Normalize key attributes such as color, weight, measurement, units, size, part numbers, etc. Standardize all product and brand names Identify configurable attributes and cluster product variants Categorize products according to your own taxonomy and sub- categories DIVERGENCE.AI

11 DATA PREPARATION AND REPAIR
Stage One - Preparation Statistical Profiling – standard statistical analysis of numerical data and frequency and term analysis of text data. Cleansing, Normalization – removing non-essential characters, standardizing content such as dates. Data Repair – identifying and fixing where possible inconsistencies in the data. Data Enrichment – Knowledge Service based enrichments on related data. Explicit Schema Detection – identifying the schema/metadata that is explicitly defined in header, field, tag, or other information. Duplicate Identification – identifying duplicates in data. DIVERGENCE.AI

12 SEMANTIC METADATA DISCOVERY, ENRICHMENT, AND CORRELATION
Stage Two - Enrichment Classification, Attribute Extraction – identifying categories in the data and identify characteristics of the data in terms of attributes, properties, schemata. Implicit Schema Detection – often it is possible to identify schema by the instances associated with the schema such as address, postal address, name, date, time, etc. The service provides this out-of-the-box capability for many standard classes in structured and semi-structured data. DIVERGENCE.AI

13 PUBLISHING Stage Three - Publish Source/Targets – the system supports a rich set of sources and targets including Oracle Storage Cloud, other external Cloud Stores, and URL sources.  Formats – the service provides the ability to export the curated datasets to commonly used formats which enables downstream and on-premises BI, Analytics, and ETL processes. DIVERGENCE.AI

14 NEXT STEPS - DATA PREPARATION ASSESSMENT
Complexity (Size vs Sources) and Transformation (Enhancement vs Enrichment) BIG COMPLEX BIG ADVANCED Large Unique Data Size Data Enrich SIMPLE DIVERSIFIED SIMPLE DIVERSIFIED Small General Few Many Few Many # of Sources (Tables) Data Enhance DIVERGENCE.AI

15 ENGAGEMENT MODEL - Example
Assess, Provision, and Deliver Data Preparation Service Assess Two-day Data Complexity Assessment Provision Team (Shared or Dedicated) Data Preparation Infrastructure Setup (Onetime or Ongoing) Adhere to corporate information access policies over network Deliver Share transformed data DIVERGENCE.AI

16 BABAR BHATTI 09.21.17 DAMA linkedin.com/in/bbhatti @thebabar
THANK YOU BABAR BHATTI DAMA linkedin.com/in/bbhatti @thebabar


Download ppt "Stop Data Wrangling, Start Transforming Data to Intelligence"

Similar presentations


Ads by Google