Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building Scalable OLAP Applications with Mondrian and MySQL

Similar presentations


Presentation on theme: "Building Scalable OLAP Applications with Mondrian and MySQL"— Presentation transcript:

1 Building Scalable OLAP Applications with Mondrian and MySQL
Julian Hyde: Chief Architect, OLAP, at Pentaho and Mondrian Project Founder Tuesday, April 24th 2007

2 Agenda Pentaho Introduction What is OLAP?
Key features and architecture Getting started Schemas & queries Tuning & scalability Active Data Warehousing Case Studies Business Intelligence suite Q & A

3 Pentaho Introduction World’s most popular enterprise open source BI Suite 2 million lifetime downloads, averaging 100K / month Founded in 2004: Pioneer in professional open source BI Management - proven BI and open source veterans from Business Objects, Cognos, Hyperion, JBoss, Oracle, Red Hat, SAS Board of Directors – deep expertise and proven success in open source Larry Augustin - founder, VA Software, helped coin the phrase “open source” New Enterprise Associates – investors in SugarCRM, Xensource, others Index Ventures – investors in MySQL, Zend, others Widely recognized as the leader in open source BI Distributed worldwide by Red Hat via the Red Hat Exchange Embedded in next release of OpenOffice (40 million users worldwide) Pentaho makes the world’s most popular open source BI suite. In terms of the team, they’ve brought together experienced open source veterans like myself and others with a number of people on the engineering, support, and marketing side with a lot of experience in business intelligence. Between our customers, our projects, and our company, we’ve won a number of awards, most recently the Jolt Productivity award. We’re working with partners to rapidly expand the usage of open source in the BI market, with a global distribution agreement with Red Hat, and just last week we announced with Sun that Pentaho Reporting will be included in the next release of OpenOffice.

4 What is OLAP? View data “dimensionally” Navigate and explore
i.e. Sales by region, by channel, by time period Navigate and explore Ad Hoc analysis “Drill-down” from year to quarter Pivot Select specific members for analysis Interact with high performance Technology optimized for rapid interactive response

5 A brief history of OLAP 2010 2000 1990 1980 1970 Pentaho BEE
Open-source BI suites JRubik OpenI JPivot Palo Open-source OLAP Mondrian 2000 MySQL Microsoft OLAP Services Oracle partitions, bitmap indexes RDBMS support for DW Informix MetaCube Codd’s “12 Rules of OLAP” Sybase IQ Business Objects 1990 Desktop OLAP MicroStrategy Essbase Cognos PowerPlay – first desktop OLAP MicroStrategy – first pure ROLAP (no MD server) Business Objects Cognos PowerPlay Spreadsheets 1980 Comshare Mainframe OLAP Express 1970 APL

6 Mondrian features and architecture

7 Key Features On-Line Analytical Processing (OLAP) cubes
automated aggregation speed-of-thought response times Open Architecture 100% Java J2EE Supports any JDBC data source MDX and XML/A Analysis Viewers Enables ad-hoc, interactive data exploration Ability to slice-and-dice, drill-down, and pivot Provides insights into problems or successes

8 How Mondrian Extends MySQL for OLAP Applications
MySQL Provides Data storage SQL query execution Heavy-duty sorting, correlation, aggregation Integration point for all BI tools Mondrian Provides Dimensional view of data MDX parsing SQL generation Caching Higher-level calculations Aggregate awareness

9 Open Architecture Open Standards (Java, XML, MDX, XML/A, SQL)
Microsoft Excel (via Spreadsheet Services) Open Architecture Viewers Open Standards (Java, XML, MDX, XML/A, SQL) Cross Platform (Windows & Unix/Linux) J2EE Architecture Server Clustering Fault Tolerance Data Sources JDBC JNDI Web Server JPivot servlet J2EE Application Server File or RDBMS Repository JPivot servlet XML/A servlet Mondrian Cube Schema XML Cube Schema XML Cube Schema XML cube cube cube JDBC JDBC JDBC RDBMS RDBMS

10 Getting started

11 Getting started with Mondrian
Download mondrian from and unzip Copy mondrian.war to TOMCAT_HOME/webapps. Create a database: $ mysqladmin create foodmart $ mysql mysql> grant all privileges on *.* to identified by 'foodmart'; Query OK, 0 rows affected (0.00 sec) mysql> quit Bye Install demo dataset (300,000 rows): $ java -cp "/mondrian/lib/mondrian.jar:/mondrian/lib/log4j jar:/mondrian/lib/eigenbase-xom.jar:/mondrian/lib/eigenbase-resgen.jar:/mondrian/lib/eigenbase-properties.jar:/usr/local/mysql/mysql-connector-java bin.jar"      mondrian.test.loader.MondrianFoodMartLoader      -verbose -tables -data -indexes      -jdbcDrivers=com.mysql.jdbc.Driver      -inputFile=/mondrian/demo/FoodMartCreateData.sql      -outputJdbcURL= "jdbc:mysql://localhost/foodmart?user=foodmart&password=foodmart“ Start tomcat, and hit

12 demonstration

13 Schemas and queries

14 A Mondrian schema consists of…
A dimensional model (logical) Cubes & virtual cubes Shared & private dimensions Calculated measures in cube and in query language Parent-child hierarchies … mapped onto a star/snowflake schema (physical) Fact table Dimension tables Joined by foreign key relationships

15 Writing a Mondrian Schema
Regular cubes, dimensions, hierarchies Shared dimensions Virtual cubes Parent-child hierarchies Custom readers Access-control <!-- Shared dimensions --> <Dimension name="Region"> <Hierarchy hasAll="true“ allMemberName="All Regions"> <Table name="QUADRANT_ACTUALS"/> <Level name="Region" column="REGION“ uniqueMembers="true"/> </Hierarchy> </Dimension> <Dimension name="Department"> <Hierarchy hasAll="true“ allMemberName="All Departments"> <Level name="Department“ column="DEPARTMENT“ uniqueMembers="true"/> (Refer to )

16 Tools Schema Workbench Pentaho cube designer cmdrunner

17 MDX – Multi-Dimensional Expressions
A language for multidimensional queries Plays the same role in Mondrian’s API as SQL does in JDBC SQL-like syntax … but un-SQL-like semantics SELECT {[Measures].[Unit Sales]} ON COLUMNS, {[Store].[USA], [Store].[USA].[CA]} ON ROWS FROM [Sales] WHERE [Time].[1997].[Q1] (Refer to )

18 Scalability & Tuning

19 Tuning is a Process Some techniques:
Test performance on a realistic data set! Use tracing to identify long-running SQL statements mondrian.trace.level=2 Tune SQL queries Run the same query several times, to examine cache effectiveness When Mondrian starts, prime the cache by running queries Get to know Mondrian’s system properties

20 Tuning MySQL for Mondrian
Fact table: Index foreign keys Try indexing multiple combinations of foreign keys Use MyISAM or MERGE (also known as MRG_MyISAM) Partitioned tables Dimension tables: Index primary and foreign keys Use MyISAM for large dimension tables, MEMORY for smaller dimension tables Create aggregate tables: Contain pre-computed summaries Prevent full table scan followed by massive GROUP BY Parent-child hierarchies Create closure tables

21 Scaling to large datasets
Tune MySQL Partitions allow parallelism Aggregate tables allow you to do the hard work to load-time, not runtime Increase cache size

22 Scaling to large numbers of concurrent users
J2EE Application Server Mondrian Web Server cube RDBMS JDBC JPivot servlet Viewers XML/A servlet J2EE Application Server Mondrian cube RDBMS JDBC Viewers JPivot servlet Tune MySQL Increase JVM memory Make sure that connections are using the same cache Multiple web servers Multiple mondrian instances Multiple DBMS instances with read-only copies of the data J2EE Application Server Mondrian Web Server cube JDBC JPivot servlet XML/A servlet RDBMS Viewers J2EE Application Server Mondrian Web Server cube JDBC JPivot servlet XML/A servlet RDBMS

23 (Diagrams -- hidden) RDBMS RDBMS RDBMS RDBMS J2EE Application Server
Mondrian Web Server cube RDBMS JDBC JPivot servlet Viewers XML/A servlet J2EE Application Server Mondrian cube RDBMS JDBC Viewers JPivot servlet J2EE Application Server Mondrian Web Server cube JDBC JPivot servlet XML/A servlet RDBMS Viewers Viewers J2EE Application Server Mondrian Web Server cube JDBC JPivot servlet XML/A servlet RDBMS

24 Active Data Warehousing

25 Active Data Warehousing
An Active Data Warehouse: Extends the traditional DW to support operational processing Provides a single view of the business Is updated in near real-time Cost/benefit: High cost (technical challenges) High reward (time is money)

26 Technical challenges of Active Data Warehousing
DBMS has mixed work-load Short, fast queries to query specific records Longer queries to look at changing patterns Emphasis on recent data Background DML to keep database up to date Continuous, incremental ETL Continuous or near real-time Incremental Consider message-driven ETL OLAP cache OLAP servers use cache and/or MOLAP database for performance Cache consistency

27 Mondrian and Active Data Warehouse
“ROLAP with caching” No MOLAP data store to maintain Disable aggregate tables Unless your ETL process can maintain these incrementally Either turn off cache mondrian.rolap.star.disableCaching=true “Pure ROLAP” mode Results are cached for the lifetime of a single MDX statement Or use new CacheControl API to selectively flush mondrian’s cache

28 Cache control In an Active Warehouse, the most recent data are modified much more often than historic data Cache control API allows the application to selectively flush the cache year quarter month day unit_sales 2007 Q1 1 1500 2 1700 3 1250 4 1800 Q2 22 1378 23 1920 24 350 2007 Q2 4 24 450 2007 Q2 4 24 525 (Refer to )

29 Cache control: A simple example
Let’s run a simple MDX query: SELECT {[Time].[1997],  [Time].[1997].[Q1], [Time].[1997].[Q2]} ON COLUMNS, {[Customers].[USA],  [Customers].[USA].[OR],  [Customers].[USA].[WA]} ON ROWS FROM [Sales]

30 Cache contents after running query
year=1997, nation=USA, measure=unit_sales year=1997, nation=USA, state={OR, WA}, measure=unit_sales year nation state unit_sales 1997 USA OR 67,659 WA 124,366 year nation unit_sales 1997 USA 266,773 year=1997, quarter=any, nation=USA, measure=unit_sales year=1997, quarter=*, nation=USA, state={OR, WA}, measure=unit_sales year quarter nation unit_sales 1997 Q1 USA 66,291 Q2 62,610 year quarter nation state unit_sales 1997 Q1 USA OR 19,287 WA 30,114 Q2 15,079 29,479

31 Flushing part of a segment
Cache manager OR WA Q1 19,287 30,114 Q2 15,079 29,479 year=1997, quarter=*, nation=USA, state={OR, WA}, measure=unit_sales year=1997, quarter=*, nation=USA, state={OR, WA}, measure=unit_sales; exclude: state=WA and quarter=Q2 year quarter nation state unit_sales 1997 Q1 USA OR 19,287 WA 30,114 Q2 15,079 29,479 year=1997, quarter=*, nation=USA, state={CA, WA}, measure=unit_sales CA WA Q1 16,890 30,114 Q2 18,052 30,079

32 Code using CacheControl to flush part of a segment
Connection connection; CacheControl cacheControl = connection.getCacheControl(null); Cube salesCube = connection.getSchema().lookupCube("Sales", true); SchemaReader schemaReader = salesCube.getSchemaReader(null); // Create region containing [Customer].[USA].[OR]. Member memberOregon = schemaReader.getMemberByUniqueName( new String[]{“Customer", “USA", “OR"}, true); CacheControl.CellRegion regionOregon = cacheControl.createMemberRegion(memberOregon, true); Member memberTimeQ2 = schemaReader.getMemberByUniqueName( new String[]{“Time", “1997", “Q2"}, true); CacheControl.CellRegion regionTimeQ2 = cacheControl.createMemberRegion(memberTimeQ2, true); // Create region containing ([Customer].[USA].[OR], [Time].[1997].[Q2]) CacheControl.CellRegion regionOregonQ2 = cacheControl.createCrossjoinRegion(regionOregon, regionTimeQ2); cacheControl.flush(regionOregonQ2);

33 Case studies

34 Case Study: Frontier Airlines
Key Challenges Understanding and optimizing fares to ensure Maximum occupancy (no empty seats) Maximum profitability (revenue per seat) Pentaho Solution Pentaho Analysis (Mondrian) Chose Open Source RDBMS and Mondrian over Oracle 500 GB of data, 6 server cluster Results Comprehensive, integrated analysis to set strategic pricing Improved per-seat profitability (amount not disclosed) Why Pentaho Rich analytical and MDX functionality Cost of ownership “The competition is intense in the airline industry and Frontier is committed to staying ahead of the curve by leveraging technology that will help us offer the best prices and the best flight experience…. [the application] fits right in with our philosophy of providing world-class performance at a low price.”

35 Case Study: U.S.-based, Public Software Company
OEM Example Key Requirements Embedding analysis within a pricing planning application Incumbent Vendor MicroStrategy Decision Criteria Embeddability Extensibility Cost of ownership Best Practices Evaluate replacement costs holistically Treat migrations as an opportunity to improve a deployment, not just move it Understand and prioritize requirements – functionality, usability, architecture, etc. “The old solution (MicroStrategy) was built to be a standalone application. It expected to own the browser session, to have separate security, and its own user interface standards. We were able to make Mondrian a seamless part of the application, and it was much easier to do so.”

36 The big picture

37 Business Intelligence Suite
Mondrian OLAP Analysis tools: Pivot table Charting Dashboards ETL (extract/transform/load) Integration with operational reporting Integration with data mining Actions on operational data Design/tuning tools

38 Pentaho Open Source BI Offerings
All available in a Free Open Source license

39 A Sample of Joint MySQL-Pentaho Users
“Pentaho provided a robust, open source platform for our sales reporting application, and the ongoing support we needed. The experts at OpenBI provided outstanding services and training, and allowed us to deploy and start generating results very quickly.” “We selected Pentaho for its ease-of-use. Pentaho addressed many of our requirements -- from reporting and analysis to dashboards, OLAP and ETL, and offered our business users the Excel-based access that they wanted.”

40 Next Steps and Resources
Contact Information Julian Hyde, Chief Architect, More information and Pentaho Community Forum Go to Developer Zone Discussions Pentaho BI Platform including Mondrian Mondrian OLAP Library only

41 Birds of a Feather session: MySQL Data Warehousing and BI
Speakers: James Dixon, Senior Architect and Chief Technology (a.k.a. "Chief Geek"), Pentaho Corporation Roland Bouman, Certification Developer, MySQL A.B. Matt Casters, Data Integration Architect and Kettle Project Founder, Pentaho Julian Hyde, OLAP Architect and Mondrian Project Founder, Pentaho Brian Miezejewski, Principal Consultant, MySQL inc. Track: Data Warehousing and Business Intelligence Date: TONIGHT!! Time: 7:30pm - 9:00pm Location: Camino Real

42 Thank you for attending!


Download ppt "Building Scalable OLAP Applications with Mondrian and MySQL"

Similar presentations


Ads by Google