Louisville User Group Meeting April 25, 2012 Lori Pieper Maximize WebFOCUS Performance with Hyperstage.

Slides:



Advertisements
Similar presentations
Extreme Performance with Oracle Data Warehousing
Advertisements

Blazing Queries: Using an Open Source Database for High Performance Analytics July 2010.
Analyze/Report from large Volumes of Data
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
C6 Databases.
Database Management3-1 L3 Database Management Santa R. Susarapu Ph.D. Student Virginia Commonwealth University.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
Big Data Working with Terabytes in SQL Server Andrew Novick
Maximize WebFOCUS Performance with Hyperstage
Lecture-7/ T. Nouf Almujally
Peter Azzarello April 11, 2012 IB Toronto User Forum WebFOCUS Hyperstage Overview Summit 2012.
10 REASONS Why it makes a good option for your DB IN-MEMORY DATABASES Presenter #10: Robert Vitolo.
Teradata Columnar: A new standard for Columnar databases Source: Teradata is thinking Big Stephen Swoyer Presented by: Deesha Phalak and Kaushiki Nag.
A Fast Growing Market. Interesting New Players Lyzasoft.
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
Chapter 3 Database Management
Database Software File Management Systems Database Management Systems.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
Alternative: Bitmap Indexing Imagine the following query in huge table Find customers living in London, with 2 cars and 3 children occupying a 4 bed house.
Chapter 14 The Second Component: The Database.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Oracle BIWA SIG Basics Worldwide association of 2000 professionals interested in Oracle Database-centric business intelligence, data warehousing, and analytical.
STEALTH Content Store for SharePoint using Windows Azure  Boosting your SharePoint to the MAX! "Optimizing your Business behind the scenes"
Ch 4. The Evolution of Analytic Scalability
Systems analysis and design, 6th edition Dennis, wixom, and roth
Copyright © 2003 by Prentice Hall Computers: Tools for an Information Age Chapter 13 Database Management Systems: Getting Data Together.
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
1 Oracle Database 11g – Flashback Data Archive. 2 Data History and Retention Data retention and change control requirements are growing Regulatory oversight.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Oracle Challenges Parallelism Limitations Parallelism is the ability for a single query to be run across multiple processors or servers. Large queries.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
September 2011Copyright 2011 Teradata Corporation1 Teradata Columnar.
Data Warehousing at Acxiom Paul Montrose Data Warehousing at Acxiom Paul Montrose.
OnLine Analytical Processing (OLAP)
Using SAS® Information Map Studio
Data Compression By, Keerthi Gundapaneni. Introduction Data Compression is an very effective means to save storage space and network bandwidth. A large.
Data Warehousing.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
Dan Grady The search for the killer productivity application is over… Copyright 2009, Information Builders. Slide 1.
MIS2502: Data Analytics The Information Architecture of an Organization.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Copyright 2004 John Wiley & Sons, Inc Information Technology: Strategic Decision Making For Managers Henry C. Lucas Jr. John Wiley & Sons, Inc Dinesh.
Copyright 2007, Information Builders. Slide 1 Machine Sizing and Scalability Mark Nesson, Vashti Ragoonath June 2008.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
Last Updated : 27 th April 2004 Center of Excellence Data Warehousing Group Teradata Performance Optimization.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
Oracle Business Intelligence Foundation - Commonly Used Features in Repository.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
October 15-18, 2013 Charlotte, NC Accelerating Database Performance Using Compression Joseph D’Antoni, Solutions Architect Anexinet.
Oracle Announced New In- Memory Database G1 Emre Eftelioglu, Fen Liu [09/27/13] 1 [1]
McGraw-Hill/Irwin ©2008,The McGraw-Hill Companies, All Rights Reserved Chapter 5 Data Resource Management.
Supervisor : Prof . Abbdolahzadeh
Jaclyn Hansberry MIS2502: Data Analytics The Things You Can Do With Data The Information Architecture of an Organization Jaclyn.
Every Good Graph Starts With
Using Partitions and Fragments
Data Warehouse.
Ch 4. The Evolution of Analytic Scalability
MIS2502: Data Analytics The Information Architecture of an Organization Acknowledgement: David Schuff.
MANAGING DATA RESOURCES
Applying Data Warehouse Techniques
Chapter 3 Database Management
Applying Data Warehouse Techniques
Presentation transcript:

Louisville User Group Meeting April 25, 2012 Lori Pieper Maximize WebFOCUS Performance with Hyperstage

Agenda  The “Big Data” Business Challenge  Pivoting Your Perspective  Introducing WebFOCUS Hyperstage  How does it work?  So what’s the big deal?  Demonstration  Wrap Up and Q&A

Copyright 2007, Information Builders. Slide 3 The “Big Data”Business Challenge

Copyright 2007, Information Builders. Slide 4 Traditional Data Warehousing  Labor intensive, heavy indexing, aggregations and partitioning  Hardware intensive: massive storage; big servers  Expensive and complex More Data, More Data Sources More Kinds of Output Needed by More Users, More Quickly Limited Resources and Budget Real time data Multiple databases External Sources Data Warehousing Challenges

Source: KEEPING UP WITH EVER-EXPANDING ENTERPRISE DATA ( Joseph McKendrick Unisphere Research October 2010) How Performance Issues are Typically Addressed – by Pace of Data Growth When organizations have long running queries that limit the business, the response is often to spend much more time and money to resolve the problem IT Manager’s try to mitigate these response times …..

Limitations of “Traditional” Solutions Adding indexes:  Increases disk space requirements  Sum of index space requirements can even exceed the source DB  Index Management  Increases load times to build the index  Predefines a fixed access path  Reports run slow if you haven’t “anticipated” the reporting needs correctly

Limitations of “Traditional” Solutions Building OLAP Cubes:  Cube technology has limited scalability  Number of dimensions is limited  Amount of data is limited  Cube technology is difficult to update (add Dimension)  Usually requires a complete rebuild  Cube builds are typically slow  New design results in a new cube  Reports run slow if you haven’t “anticipated” the reporting needs correctly

Copyright 2007, Information Builders. Slide 8 Pivoting Your Perspective: Turn Row-based into Column-based

Row-based databases are ubiquitous because so many of our most important business systems are transactional. Row-oriented databases are well suited for transactional environments, such as a call center where a customer’s entire record is required when their profile is retrieved and/or when fields are frequently updated. The Ubiquity of Rows … But - Disk I/O becomes a substantial limiting factor since a row-oriented design forces the database to retrieve all column data for any query. 30 columns 50 millions Rows Why is Row-based Limiting for Analytics?

Row Oriented ( 1, Smith, New York, 50000; 2, Jones, New York, 65000; 3, Fraser, Boston, 40000; 4, Fraser, Boston, )  Works well if all the columns are needed for every query.  Efficient for transactional processing if all the data for the row is available  Works well with aggregate results (sum, count, avg. )  Only columns that are relevant need to be touched  Consistent performance with any database design  Allows for very efficient compression Column Oriented ( 1, 2, 3, 4; Smith, Jones, Fraser, Fraser; New York, New York, Boston, Boston, 50000, 65000, 40000, ) Why is Column-based Perfect for Analytics? Employee Id Name Smith Jones Fraser Location New York Boston Sales 50,000 65,000 40,000 4FraserBoston70,000

Employee Id Name Smith Jones Fraser Location New York Boston Sales 50,000 65,000 40,000 1SmithNew York50,000 2JonesNew York65,000 3FraserBoston40, SmithNew York50,000 JonesNew York65,000 Data stored in rows FraserBoston40,000 Data stored in columns Why is Column-based Perfect for Analytics? 4FraserBoston70,000 4FraserBoston70,0004FraserBoston70,000

Copyright 2007, Information Builders. Slide 12 Introducing Hyperstage

Hyperstage is a high performance analytic data store designed to handle business-driven queries on large volumes of data—with minimal IT intervention—achieving outstanding query performance, with less hardware, no database tuning and easy migration. Introducing WebFOCUS Hyperstage ….

Easy to implement and manage, Hyperstage provides the answers to your business users’ needs at a price you can afford. Introducing WebFOCUS Hyperstage …. But really… What is it?

Hyperstage combines a columnar database with intelligence we call the Knowledge Grid to deliver fast query responses. Introducing WebFOCUS Hyperstage …. How is it architected? Hyperstage Engine Knowledge Grid Compressor Bulk Loader Unmatched Administrative Simplicity: No indexes No data partitioning No materialized views

Hyperstage adds data compression of 10:1 to 40:1 so you can manage large amounts of data using much smaller disk footprint. Introducing WebFOCUS Hyperstage …. How is it architected? Hyperstage Engine Knowledge Grid Compressor Bulk Loader Powerful Data compression: Store terabytes of data with only gigabytes of disk space

Hyperstage adds a bulk loader plus an easy to use extraction and load tool, called HyperCopy, making data loading a breeze. Introducing WebFOCUS Hyperstage …. How is it architected? Hyperstage Engine Knowledge Grid Compressor Bulk Loader Includes embedded ETL: Easy and seamless migration of existing analytical databases No change in query or application required

Copyright 2007, Information Builders. Slide 18 How Does it Work?

Smarter Architecture  No maintenance  No query planning  No partition schemes  Easy “load and go” Data Packs – data stored in manageably sized, highly compressed data packs Knowledge Grid – statistics and metadata “describing” the super-compressed data Column Orientation WebFOCUS Hyperstage Engine Data compressed using algorithms tailored to data type How does it work?

64K Data Packs  Each data pack contains 65, 536 data values  Compression is applied to each individual data pack  The compression algorithm varies depending on data type and data distribution Compression  Results vary depending on the distribution of data among data packs  A typical overall compression ratio seen in the field is 10:1  Some customers have seen results have been as high as 40:1 Patent Pending Compression Algorithms 64K Data Packs and Compression Data Organization and the Knowledge Grid ….

This knowledge grid layer = 1% of the compressed volume Data Pack Nodes (DPN) A separate DPN is created for every data pack created in the database to store basic statistical information Character Maps (CMAPs) Every Data Pack that contains text creates a matrix that records the occurrence of every possible ASCII character Histograms Histograms are created for every Data Pack that contains numeric data and creates 1024 MIN-MAX intervals. Pack-to-Pack Nodes (PPN) PPNs track relationships between Data Packs when tables are joined. Query performance gets better as the database is used.

salaryagejobcity Completely Irrelevant Suspect All values match SELECT count(*) FROM employees WHERE salary > AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’; WebFOCUS Hyperstage Example: Query and Knowledge Grid

salaryagejobcity 1.Find the Data Packs with salary > SELECT count(*) FROM employees WHERE salary > AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’; WebFOCUS Hyperstage Example: salary > Completely Irrelevant All values match Suspect

salaryagejobcity 1.Find the Data Packs with salary > Find the Data Packs that contain age < 65 SELECT count(*) FROM employees WHERE salary > AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’; WebFOCUS Hyperstage Example: age<65 Completely Irrelevant Suspect All values match

salaryagejobcity 1.Find the Data Packs with salary > Find the Data Packs that contain age < 65 3.Find the Data Packs that have job = ‘shipping’ SELECT count(*) FROM employees WHERE salary > AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’; WebFOCUS Hyperstage Example: job = ‘shipping’ Completely Irrelevant Suspect All values match

salaryagejobcity 1.Find the Data Packs with salary > Find the Data Packs that contain age < 65 3.Find the Data Packs that have job = ‘shipping’ 4.Find the Data Packs that have city = ‘Louisville’ SELECT count(*) FROM employees WHERE salary > AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’; WebFOCUS Hyperstage Example: city = ‘Louisville’ Completely Irrelevant Suspect All values match

salarycity All packs ignored All packs ignored All packs ignored 1.Find the Data Packs with salary > Find the Data Packs that contain age < 65 3.Find the Data Packs that have job = ‘shipping’ 4.Find the Data Packs that have city = ‘Louisville’ 5.Eliminate All rows that have been flagged as irrelevant SELECT count(*) FROM employees WHERE salary > AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’; WebFOCUS Hyperstage Example: Eliminate Pack Rows Completely Irrelevant Suspect All values match agejob

salarycity All packs ignored Only this pack will be de-compressed All packs ignored All packs ignored 1.Find the Data Packs with salary > Find the Data Packs that contain age < 65 3.Find the Data Packs that have job = ‘shipping’ 4.Find the Data Packs that have city = ‘Louisville’ 5.Eliminate All rows that have been flagged as irrelevant 6.Finally we identify the pack that needs to be decompressed SELECT count(*) FROM employees WHERE salary > AND age < 65 AND job = ‘Shipping’ AND city = ‘Louisville’; WebFOCUS Hyperstage Example: Decompress and scan Completely Irrelevant Suspect All values match agejob

Copyright 2007, Information Builders. Slide 29 Hyperstage – So what’s the big deal?

WebFOCUS Hyperstage The Big Deal…  No indexes  No partitions  No views  No materialized aggregates  Value proposition  Low IT overhead  Reduced I/O = faster response times  Ease of implementation  Fast time to market  Less Hardware  Lower TCO “Load and Go”

Some Real World Results  Insurance Company  Query performance issues with SQL Server - Insurance claims analysis  Compression achieved 40:1  Most queries running 3X faster in Hyperstage  Large Bank  Query performance issues with SQL Server - Web traffic analysis  Compression achieved 10:1  Queries that ran for 10 to 15 mins in SQL Server ran in sub-seconds in Hyperstage  Government Application  Query performance issues with Oracle – Federal Loan/Grant Tracking  Compression achieved 15:1  Queries that ran for 10 to 15 minutes in Oracle ran in 30 seconds in Hyperstage 31

Copyright 2007, Information Builders. Slide 32 Demonstration …

Q&A Copyright 2007, Information Builders. Slide 33