Data Warehousing The Easy Way with AWS Redshift

Slides:



Advertisements
Similar presentations
Cloud Business Intelligence Vendor Research Supervisor - Gary Lau Presented by Dujin Choi.
Advertisements

James Serra – Data Warehouse/BI/MDM Architect
FAST FORWARD WITH MICROSOFT BIG DATA Vinoo Srinivas M Solutions Specialist Windows Azure (Hadoop, HPC, Media)
Microsoft Ignite /16/2017 5:47 PM
Discover Analyze User directly connected to SAP system ETL/replication of SAP source data Integrate Report Power Query in Excel Power Pivot, PivotTable.
Data Management Capabilities and Past Performance Dr. Srinivas Kankanahalli.
Designing a Data Warehouse Issues in DW design. Three Fundamental Processes Data Acquisition Data Storage Data a Access.
DYNAMICS CRM AS AN xRM DEVELOPMENT PLATFORM Jim Novak Solution Architect Celedon Partners, LLC
Using Microsoft ACCESS to develop small to medium applications on campus.
Activity Running Time DurationIntro0 2 min Setup scenario 2 2 min SQL BI components & concepts 4 5 min Data input (Let’s go shopping) 9 7 min Whiteboard.
Summary of Enterprise Computing Models. Slide 2 Enterprise Dimensions Who does what? In-source out-source hardware and software Staff vs. consultant What.
IMPROVED PRODUCTIVITY THROUGH BREAKTHROUGH INSIGHTS.
 2009 Calpont Corporation 1 Calpont Open Source Columnar Storage Engine for Scalable MySQL Data Warehousing April 22, 2009 MySQL User Conference Santa.
Creating a Data Warehouse Data Acquisition: Extract, Transform, Load Extraction Process of identifying and retrieving a set of data from the operational.
Enterprise Cloud Computing
Powered by Microsoft Azure, PointMatter Is a Flexible Solution to Move and Share Data between Business Groups and IT MICROSOFT AZURE ISV PROFILE: LOGICMATTER.
Boost Developer Productivity with a 360- Degree View of Every Software Change by Using FinditEZ, Certified Microsoft Platform Ready for SQL Azure MICROSOFT.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
Business Intelligence for everyone 2 For BI to deliver maximum value, all Information Workers must participate: Broad access to uncover and share insights.
Azure SQL DW – Elastic Data Analytics in the cloud Josh Sivey | Microsoft TSP #492 | Phoenix.
© 2016 Catalyze, Inc. Go-To-Market Services HIPAA Compliance in the Cloud: Catalyze Provides Microsoft Azure Customers with a HITRUST Certified Platform-as-a-Service.
An Introduction To Big Data For The SQL Server DBA.
1 Cloud-Native Data Warehousing Bob Muglia. 2 Scenarios with affinity for cloud Gartner 2016 Predictions: By 2018, six billion connected things will be.
Agenda  What is Cloud Computing?  Milestone of Cloud Computing  Common Attributes of Cloud Computing  Cloud Service Layers  Cloud Implementation.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
When Big Data Meets Fast Data
Data Management Capabilities and Past Performance
Pipe Engineering.
Connected Infrastructure
MICROSOFT AZURE ISV PROFILE: BMC SOFTWARE
Data Platform and Analytics Foundational Training
Bryte Systems Gets in the Flow with Marketing Efforts
Welcome! Power BI User Group (PUG)
Microsoft Azure: The only consistent Hybrid Cloud
What is Cloud Computing - How cloud computing help your Business?
How to build a successful Data Lake
NeoFirma Taps into the Microsoft Azure Cloud Platform to Deliver Digital Oilfield SaaS to North American Independent Oil and Gas Producers MICROSOFT AZURE.
Incrementally Moving to the Cloud Using Biml
Introduction to Big Data
Cherwell Service Management is an IT Service Management Solution that Makes it Easier for Users to Capitalize on Power of Microsoft Azure MICROSOFT AZURE.
Microsoft Build /22/ :52 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
Informix Red Brick Warehouse 5.1
Connected Infrastructure
Data Warehouse.
SmartHOTEL Solutions Powered by Microsoft Azure Provide Hoteliers with Comprehensive, One-Stop Automated Management of All Booking Channels MICROSOFT AZURE.
New Mexico State University
Establishing A Data Management Fabric For Grid Modernization At Exelon
ETL Architecture for Real-Time BI
Interlake Hybrid Cloud Management Suite
DeFacto Planning on the Powerful Microsoft Azure Platform Puts the Power of Intelligent and Timely Planning at Any Business Manager’s Fingertips Partner.
Accelerate Your Self-Service Data Analytics
Azure's Performance, Scalability, SQL Servers Automate Real Time Data Transfer at Low Cost MINI-CASE STUDY “Azure offers high performance, scalable, and.
Ch 4. The Evolution of Analytic Scalability
Microsoft Azure Provides Insight and Analytics Partner with Value, Speed, Global Marketplace MINI-CASE STUDY “We have been using Microsoft Azure from when.
Appcelerator Arrow: Build APIs in Minutes. Connect to Any Data Source
XtremeData on the Microsoft Azure Cloud Platform:
Azure Data Lake for First Time Swimmers
Architecture for Real-Time ETL
What is this and how can I use it?
Amazon Web Services.
Get data insights faster with Data Wrangling
Data Wrangling as the key to success with Data Lake
Moving your on-prem data warehouse to cloud. What are your options?
Data Wrangling for ETL enthusiasts
Best Practices in Higher Education Student Data Warehousing Forum
Resources.
Visual Data Flows – Azure Data Factory v2
Visual Data Flows – Azure Data Factory v2
Architecture of modern data warehouse
Presentation transcript:

Data Warehousing The Easy Way with AWS Redshift Case study

Landed $30M Growth Equity With Susquehanna in 2015 Q4 About Field Nation Field Nation is the contingent work platform for business. We are the business hub helping enterprises get their critical work done through freelancers, service providers & their own workforce. Landed $30M Growth Equity With Susquehanna in 2015 Q4 Tekne Award Winner for Top Information Technology Services ~ MHTA, 2015 About Me Data Scientist at Field Nation Worked in a variety of data warehouse teams as a consultant or employee M.S. Predictive Analytics

Agenda Background Introduction to Amazon Redshift Solution Approach Solution Stages Data Pipeline Data Staging Data Presentation Advantages

Background Situation Multiple data warehouse platforms Data not accessible to users Short timeline for results Lynn Langit Big Data and Cloud Architect. Technical Author. Community technical education partner awards from AWS, Google and Microsoft https://lynnlangit.com/ @lynnlangit Big Relational Use cloud data warehouse platform Postpones need to introduce complexities of Hadoop Origin story Microsoft and Redshift w/custom scripting Main dashboard was Excel spreadsheet http://www.kdnuggets.com/2015/02/big-data-trends-strata-hadoop-san-jose.html

Amazon’s hosted data warehouse platform Redshift Amazon’s hosted data warehouse platform Fully managed AWS handles back ups, resizing, fault tolerance, etc. Strong partner network Integrates well with other AWS services like S3, Kinesis Familiar interface Acts like a Postgres-standard relational database Use SQL for querying and management Optimized for performance Column-store Massively Parallel Processing (MPP) architecture Can scale up to multiple petabytes Data compression Interleaved sorting Similar to Azure SQL Warehouse and Google Big Query

Solution Approach Philosophy Optimize user experience over data processing complexity Spend the bulk of time solving unique business problems which can’t be out-sourced Any commodity work that can be automated or out-sourced to other parties should be Assumptions Data sources accessible from the cloud Budget exists to cover software licensing fees Data is structured Storage / computational complexity Your data is stored in the cloud either in hosted databases like Amazon Web Services (AWS) Relational Database Service (RDS) and popular SaaS platforms such as Salesforce and Zendesk

Getting data into Redshift Data Pipeline Getting data into Redshift Moves data from raw data sources into Redshift Service provided by third-party vendor Can load from a variety of data sources Each data source is loaded into a different schema First of three stages

Data Pipeline Example First of three stages

Data Pipeline Vendors Databases MySQL / Aurora Postgres SQL Server MongoDB Elasticsearch SaaS Salesforce Zendesk Google Analytics Snowplow MailChimp Mixpanel

Transforming data inside Redshift Data Staging Transforming data inside Redshift All raw data available inside Redshift schemas Use a transform tool to convert data into staging area Results in clean, normalized schema

Data Staging Example First of three stages

Data Staging Tools ETL tools Talend SnapLogic Informatica ELT tools Matillion Scripts Python scripts SQL Big data Spark

Making data available for users Data Presentation Making data available for users Use fully denormalized schema Star schema is unnecessary Eliminates need for slowly changing dimensions, bridge tables, etc. Joins are expensive in Redshift Simple to query for end users Star schema is unnecessary

Data Presentation Example First of three stages

Data Presentation Tools

Advantages Low start up effort Can leverage robust partner network Easy to make changes or additions Wide selection of tools Can build temporary presentation views before committing to building full ETL

Questions eric.ness@fieldnation.com https://www.linkedin.com/in/ericnessdata