Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Warehousing The Easy Way with AWS Redshift

Similar presentations


Presentation on theme: "Data Warehousing The Easy Way with AWS Redshift"— Presentation transcript:

1 Data Warehousing The Easy Way with AWS Redshift
Case study

2 Landed $30M Growth Equity With Susquehanna in 2015 Q4
About Field Nation Field Nation is the contingent work platform for business. We are the business hub helping enterprises get their critical work done through freelancers, service providers & their own workforce. Landed $30M Growth Equity With Susquehanna in 2015 Q4 Tekne Award Winner for Top Information Technology Services ~ MHTA, 2015 About Me Data Scientist at Field Nation Worked in a variety of data warehouse teams as a consultant or employee M.S. Predictive Analytics

3 Agenda Background Introduction to Amazon Redshift Solution Approach
Solution Stages Data Pipeline Data Staging Data Presentation Advantages

4 Background Situation Multiple data warehouse platforms
Data not accessible to users Short timeline for results Lynn Langit Big Data and Cloud Architect. Technical Author. Community technical education partner awards from AWS, Google and Microsoft @lynnlangit Big Relational Use cloud data warehouse platform Postpones need to introduce complexities of Hadoop Origin story Microsoft and Redshift w/custom scripting Main dashboard was Excel spreadsheet

5 Amazon’s hosted data warehouse platform
Redshift Amazon’s hosted data warehouse platform Fully managed AWS handles back ups, resizing, fault tolerance, etc. Strong partner network Integrates well with other AWS services like S3, Kinesis Familiar interface Acts like a Postgres-standard relational database Use SQL for querying and management Optimized for performance Column-store Massively Parallel Processing (MPP) architecture Can scale up to multiple petabytes Data compression Interleaved sorting Similar to Azure SQL Warehouse and Google Big Query

6 Solution Approach Philosophy
Optimize user experience over data processing complexity Spend the bulk of time solving unique business problems which can’t be out-sourced Any commodity work that can be automated or out-sourced to other parties should be Assumptions Data sources accessible from the cloud Budget exists to cover software licensing fees Data is structured Storage / computational complexity Your data is stored in the cloud either in hosted databases like Amazon Web Services (AWS) Relational Database Service (RDS) and popular SaaS platforms such as Salesforce and Zendesk

7 Getting data into Redshift
Data Pipeline Getting data into Redshift Moves data from raw data sources into Redshift Service provided by third-party vendor Can load from a variety of data sources Each data source is loaded into a different schema First of three stages

8 Data Pipeline Example First of three stages

9 Data Pipeline Vendors Databases MySQL / Aurora Postgres SQL Server
MongoDB Elasticsearch SaaS Salesforce Zendesk Google Analytics Snowplow MailChimp Mixpanel

10 Transforming data inside Redshift
Data Staging Transforming data inside Redshift All raw data available inside Redshift schemas Use a transform tool to convert data into staging area Results in clean, normalized schema

11 Data Staging Example First of three stages

12 Data Staging Tools ETL tools Talend SnapLogic Informatica ELT tools
Matillion Scripts Python scripts SQL Big data Spark

13 Making data available for users
Data Presentation Making data available for users Use fully denormalized schema Star schema is unnecessary Eliminates need for slowly changing dimensions, bridge tables, etc. Joins are expensive in Redshift Simple to query for end users Star schema is unnecessary

14 Data Presentation Example
First of three stages

15 Data Presentation Tools

16 Advantages Low start up effort Can leverage robust partner network
Easy to make changes or additions Wide selection of tools Can build temporary presentation views before committing to building full ETL

17 Questions eric.ness@fieldnation.com


Download ppt "Data Warehousing The Easy Way with AWS Redshift"

Similar presentations


Ads by Google