Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hadoop and Spark Dynamic Data Models Amila Kottege Software Developer

Similar presentations


Presentation on theme: "Hadoop and Spark Dynamic Data Models Amila Kottege Software Developer"— Presentation transcript:

1 Hadoop and Spark Dynamic Data Models Amila Kottege Software Developer
Ontario Teachers' Pension Plan

2 Agenda What we do What we're building How we're building it

3 What we do Asset Liability Model
Monte Carlo simulation that projects the pension's liabilities Simulate ~300 variables Project into the future

4 A simulation takes about 1.5hrs
What we do A simulation takes about 1.5hrs Business expects to be able to analyze the results immediately after Business runs ~5000+ simulations a year

5 Reporting system to help business perform analysis
What we're building Reporting system to help business perform analysis Reporting engine based on Hadoop ecosystem HDFS Spark Hive A set of reusable calculations and algorithms in Spark Common statistical calculations Specific business calculations

6 What we're building Two main report types Static (canned) reports
Users provide inputs and configure canned reports Dynamic reports Users want exploratory type reports Self-serve and be able to manipulate data

7

8 Calculation 1 Output of Calculation 1 Calculation 2 Output of Calculation 2 Calculation 3 Output of Calculation 3 Output Combiner Calculation 4 Output of Calculation 4 Calculation 5 Output of Calculation 5

9 Static reports are simple
What we're building Static reports are simple Perform calculations based on user input Produce an Excel file with results Dynamic reporting is difficult Self-serve is difficult How do we provide a simple interface for business to analyse the results of the calculations in a self-serve manner?

10 Sometimes includes raw output from simulation
What we're building Self-serve for us Perform the complex calculations upon user request Generate new data Allow business to slice and dice this newly created data Sometimes includes raw output from simulation

11 We looked at many self-serve BI tools
What we're building We looked at many self-serve BI tools Tableau, QlikView, and Power Pivot Each has their benefits All required a well built data model Either loaded the whole data model to client side or would send queries every time a filter changed back to server

12

13 What we're building Data size is too large to fit in client computer Sending queries back and forth constantly is not the best user experience Changing a large data model is very difficult and slow process Does the user even need all the data? From all previous reports?

14 No, the user does not need all the data
How we're building it No, the user does not need all the data Very few, if any, cases exist where they want all the data Picking one tool for everything is difficult Use the correct tool when needed

15

16 Each report becomes its own database
How we're building it Each report becomes its own database Hadoop + Hive Databases in Hive exist upon query Minimal effect for us

17 How we're building it

18 How we're building it

19 How we're building it No magic here Spark's DataFrames
Each calculation/report has a predictable output structure Leverage this structure to create facts and dimensions Spark's DataFrames

20 Data models can grow with no dependency to the past
How we're building it Data models can grow with no dependency to the past Not tied to a single tool Tableau, QlikView, PowerPivot, etc. A system that does most of the hard work Spark, Hive, HDFS

21 Where we are Generate data models per report Generate an Excel file to connect to correct database In UAT

22 Thank you.


Download ppt "Hadoop and Spark Dynamic Data Models Amila Kottege Software Developer"

Similar presentations


Ads by Google