Joe Caserta President Elliott Cordo Chief Architect September 30, 2015, Javits Center, New York City Building a Data Lake for Digital Music Dominance.

Slides:



Advertisements
Similar presentations
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
Advertisements

© 2014 Cognizant 4 th March 2015 MBaaS: Mobile Backend as a Service Pablo Gutiérrez / Senior Mobility developer.
19 % System Center FY14 Revenue Growth Large enterprises actively using SC 63% SC customers actively using SCOM 30% SC customers still using.
Running Hadoop-as-a-Service in the Cloud
An Information Architecture for Hadoop Mark Samson – Systems Engineer, Cloudera.
MyCloudIT Removes the Complexity of Moving Cloud Customers’ Entire IT Infrastructures to Microsoft Azure – Including the Desktop MICROSOFT AZURE ISV: MYCLOUDIT.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
An Introduction to Cloud Computing. The challenge Add new services for your users quickly and cost effectively.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | OFSAAAI: Modeling Platform Enterprise R Modeling Platform Gagan Deep Singh Director.
Page 1 © Hortonworks Inc – All Rights Reserved Hortonworks Naser Ali UK Building Energy Management Group Hadoop: A Data platform for businesses.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
.. Skytap Better Software Faster Visual Studio Industry Partner Skytap NEXT STEPS Contact us at: Insert your company description here.
“Clouds: a construction zone” (and Why PaaS is the future…) Matt Thompson General Manager, Developer & Platform Evangelism Microsoft.
MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …
1DMG Confidential. Background: Key Problem Areas  Scalability Ingest and export processes not able to handle burst traffic loads Exponential growth in.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
Creating New Business Value with Big Data Attivio Active Intelligence Engine®
Michael Corcoran Sr. Vice President & CMO New Data Requirements Driven By Analytics 1.
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
© 2007 IBM Corporation IBM Information Management Accelerate information on demand with dynamic warehousing April 2007.
Essential Capabilities The Platform You Own Choice of Solutions for your Business needs Choice of Solutions for your Business needs Preserve your.
Cloud Strategy made Simple David G. Fletcher. 2 Hybrid Cloud Approach Utah is building a private cloud to provision services from its virtualized infrastructure.
Datalayer Notebook Allows Data Scientists to Play with Big Data, Build Innovative Models, and Share Results Easily on Microsoft Azure MICROSOFT AZURE ISV.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Big Data: Industry Needs Data Scientists Data Analysts Data Infrastructure Engineers Developers (all kinds) 2-3:30, August 10, 2015 Room 261 RSC.
Powered by Microsoft Azure, PointMatter Is a Flexible Solution to Move and Share Data between Business Groups and IT MICROSOFT AZURE ISV PROFILE: LOGICMATTER.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
© 2015 IBM Corporation IBM PureApplication Executive Symposium Diego Segre Vice President, Middleware, Break down the barriers to digital.
MAR Capability Overview Deck Protean Analytics.
Axis AI Solves Challenges of Complex Data Extraction and Document Classification through Advanced Natural Language Processing and Machine Learning MICROSOFT.
LIMPOPO DEPARTMENT OF ECONOMIC DEVELOPMENT, ENVIRONMENT AND TOURISM The heartland of southern Africa – development is about people! 2015 ICT YOUTH CONFERENCE.
Cisco Consulting Services for Application-Centric Cloud Your Company Needs Fast IT Cisco Application-Centric Cloud Can Help.
Alfresco on Azure Shah Rahman Founder and CEO, CloudlyIO.
Dato Confidential 1 Danny Bickson Co-Founder. Dato Confidential 2 Successful apps in 2015 must be intelligent Machine learning key to next-gen apps Recommenders.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
St. Petersburg, 2016 Openstack Disk Storage vs Amazon Disk Storage Computing Clusters, Grids and Cloud Erasmus Mundus Master Program in PERCCOM Author:
1 Cloud-Native Data Warehousing Bob Muglia. 2 Scenarios with affinity for cloud Gartner 2016 Predictions: By 2018, six billion connected things will be.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
Data Analytics Summit III
DATA Storage and analytics with AZURE DATA LAKE
EMC: Redefining ERP and ROI with a Virtualized SAP HANA® Deployment
Penn State Center for e-Design Site Vision and Capabilities
Organizations Are Embracing New Opportunities
  Choice Hotels’ journey to better understand its customers through self-service analytics Narasimhan Sampath & Avinash Ramineni Strata Hadoop World |
Data Platform and Analytics Foundational Training
Big Data Enterprise Patterns
5/9/2018 7:28 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
Cloud adoption NECOOST Advisory | June 2017.
Big Data Management – Fall 2016
Enable the Hybrid Data Platform
architecting the DIGITAL enterprise
The Azure Cloud Platform Delivers Data-Driven Digital Transformation for Credit Union Industry Partner Logo “When Helios and Matheson Analytics embarked.
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
Pentaho 7.1.
9/21/2018 3:41 AM BRK3180 Architect your big data solutions with SQL Data Warehouse & Azure Analysis Services Josh Caplan & Matt Usher Program Managers.
Operationalize your data lake Accelerate business insight
Out of the swamp Suggestions to bring your analytics back on track
Big Data - in Performance Engineering
Near Real Time ETLs with Azure Serverless Architecture
Copyright © JanBask Training. All rights reserved Top 10 Charming IT jobs that would be High in Demand in 2019.
Technical Capabilities
Data Warehousing in the age of Big Data (1)
Modern data architecture at scale in the cloud : Best practices of Serverless, lambda and microservices architecture Prakriteswar Santikary, PhD Vice President.
Remedy Integration Strategy Leverage the power of the industry’s leading service management solution via open APIs February 2018.
DBOS DecisionBrain Optimization Server
Data Wrangling as the key to success with Data Lake
Thank you to our Sponsors
Open Systems Technologies Data Analyst Internship:
Customer 360.
Presentation transcript:

Joe Caserta President Elliott Cordo Chief Architect September 30, 2015, Javits Center, New York City Building a Data Lake for Digital Music Dominance

Big Data Strategy Innovation Technical Implementation Awards and Recognition

The Music Maze

Build a Dynamic Platform – Paradigm Shift OLD WAY: Structure  Ingest  Analyze Fixed Capacity Monolith NEW WAY: Ingest  Analyze  Structure Dynamic Capacity Ecosystem RECIPE: Cloud Data Lake Polyglot Warehouse

Move to the Cloud Existing On-Premise Solution Challenges with operations of Hadoop servers in Data Center Increasing infrastructure complexity Keeping up with data growth Cloud Advantages Reduced upfront capital investment Faster speed to value Elasticity “Those that go out and buy expensive infrastructure find that the problem scope and domain shift really quickly. By the time they get around to answering the original question, the business has moved on.” - Matt Wood, AWS

Cost savings of dynamic capacity

Elasticity not only saves money

Essentially, Servers Suck But more importantly think Infrastructure as code Your servers should be API calls Use stateless processes Make all resources ephemeral Make everything scalable and elastic!

Ephemeral? Disposable: Processing Fleets Elastic Map Reduce Clusters Redshift Clusters Use distributed services and systems to maintain state and preserve your data: Cassandra, Dynamo S3

Anatomy of our Processing Fleet S3 Input Buckets Auto-scaling Queuing service S3 Output Buckets

Elastic Map Reduce Hadoop on Demand No Operations –your cluster dies so what Bootstrap whatever processing engine makes sense Programmatically estimate instance type and cluster size

You May Need Some Persistent Servers If at all possible they should be inherently scalable, distributed, and elastic

Move to a Data Lake Paradigm Technology: Scalable distributed storage  S3 Pluggable fit-for-purpose processing  EMR Functional Capabilities: Remove barriers from data ingestion and analysis Storage and processing for all data Tunable Governance

Ingest Raw Data Organize, Define, Complete Munging, Blending Machine Learning Data Quality and Monitoring Metadata, ILM, Security Data Catalog Data Integration Fully Governed ( trusted) Arbitrary/Ad-hoc Queries and Reporting Usage PatternData Governance Metadata, ILM, Security Putting it together: The Big Data Pyramid

Data Ingestion and Onboarding Incoming to S3: – Lightweight API wrapper – Web front end – Direct writes to S3 Ingest the data in a reasonable partitioning schema: Bucket and Keys Turn analysts and data scientists loose  Late bind analytics

But we need to feed the cash register Data needs to be refined and mapped: – Processing Fleet – EMR 80/20 rule: metadata driven when possible Abstract away “Big Data” And make sure it’s right! – Automated data quality checks using HAMBOT, soon to be open sourced

“…any decent sized enterprise will have a variety of different data technologies for different kinds of data. There will still be large amounts of it managed in relational stores, but increasingly we'll be first asking how we want to manipulate the data and only then figuring out what technology is the best bet for it.” - Martin Fowler Think Data Ecosystem, Not Tech Stack

Polyglot in Practice Best practices from traditional EDW Consolidation Data Governance Master Data Tuned for analytics Applied to: Fit-for-purpose technologies and approaches Relational, MPP, Graph, KV, TimeseriesDB, Data Lake Apply “tunable governance” and traditional principles Use the right tool for the job

The Landscape for Digital Dominance Landing Queue Data Lake BDW Data Science API Data Providers Near Real-time Batch Data Science Clusters EDW Graph RDS Metastore

Joe Caserta President, Caserta Elliott Cordo Chief Architect, Caserta Concepts Award-winning company Transformative Data Strategies Modern Data Engineering Advanced Architecture Innovation Partner Strategic Consulting Advanced Technical Design Build & Deploy Solutions BDW Meetup New York City 3,000+ members Knowledge sharing Data is not important, it’s what you do with it that’s important! Thank You