Presentation is loading. Please wait.

Presentation is loading. Please wait.

Maximize WebFOCUS Performance with Hyperstage

Similar presentations


Presentation on theme: "Maximize WebFOCUS Performance with Hyperstage"— Presentation transcript:

1 Maximize WebFOCUS Performance with Hyperstage
Apr, 2012

2 Agenda Introduction to Hyperstage How does it work Recent results
Demonstration Wrap Up and Q&A

3 Introducing Hyperstage

4 WebFOCUS Hyperstage Why?
Why Do BI Applications Fail? Typically 3 Reasons…. 1. Too Complicated Self-Service, Guided Ad hoc 2. Bad Data Data Quality 3. Too Slow Hyperstage Hyperstage will improve database performance for WebFOCUS applications with less hardware, no database tuning and easy migration.

5 What is WebFOCUS Hyperstage
Embedded, columnar data store that can dramatically increase the performance of WebFOCUS applications Columnar = reduced I/O (vs relational) Easily implemented without the need for database administration Disk footprint is reduced with a powerful compression algorithm Includes embedded ETL for seamless migration of existing analytical databases No change in query or application required Data migrations are seamless and easy WF M and higher includes optimized Hyperstage Adapter Runs on commodity hardware (Intel based) Windows 64 Linux (Redhat, Centos, Suse, Debian)

6 Introducing WebFOCUS Hyperstage ….
Hyperstage is an integrated columnar oriented data store that helps WebFOCUS applications achieve outstanding query performance.

7 WebFOCUS Hyperstage Engine
How does it work? Column Orientation Smarter Architecture No maintenance No query planning No partition schemes No DBA Knowledge Grid – statistics and metadata “describing” the super-compressed data Data Packs – data stored in manageably sized, highly compressed data packs Data compressed using algorithms tailored to data type

8 Data Organization and the Knowledge Grid …

9 Pivoting Your Perspective: Columnar Technology
Employee Id Name Location Sales 1 Smith New York 50,000 2 Jones New York 65,000 3 Fraser Boston 40,000 4 Fraser Boston 70,000 Data stored in rows Data stored in columns 1 Smith New York 50,000 1 Smith New York 50,000 Column-oriented databases allow data to be stored column-by-column rather than row-by-row. This simple pivot in perspective—looking down rather than looking across—has profound implications for analytic speed. Column-oriented databases are better suited for analytics where, unlike transactions, only portions of each record are required. By grouping the data together this way, the database only needs to retrieve columns that are relevant to the query, greatly reducing the overall I/O. Returning to the example in the section above, we see that a columnar database would not only eliminate 43 days of data, it would also eliminate 28 columns of data. Returning only the columns for toasters and units sold, the columnar database would return only 14 million data elements or 93% less data. By returning so much less data, columnar databases are much faster than row-based databases when analyzing large data sets. In addition, some columnar databases compress data at high rates because each column stores a single data type (as opposed to rows that typically contain several data types), and allow compression to be optimized for each particular data type. Row-based databases have multiple data types and limitless range of values, thus making compression less efficient overall. 2 Jones New York 65,000 2 Jones New York 65,000 3 Fraser Boston 40,000 3 Fraser Boston 40,000 4 Fraser Boston 70,000 4 Fraser Boston 70,000

10 Data Organization and the Knowledge Grid ….
Data Packs - The data within each column is stored in groupings of 65,536 values called Data Packs Data Packs improves data compression as the optimal compression algorithm is applied based on the data contents An average compression ratio of 10:1 is achieved after loading data into Hyperstage. For example 1TB of raw data can be stored in about 100GB of space. Data Pack Data Pack Data Pack Data Pack Data Pack Data Pack

11 64K 64K 64K 64K Data Organization and the Knowledge Grid ….
Data Packs and Compression Data Packs Each data pack contains 65, 536 data values Compression is applied to each individual data pack The compression algorithm varies depending on data type and data distribution 64K 64K Compression Results vary depending on the distribution of data among data packs A typical overall compression ratio seen in the field is 10:1 Some customers have seen results have been as high as 40:1 64K Patent Pending Compression Algorithms 64K

12 String and character data
Data Organization and the Knowledge Grid …. The Knowledge Grid Knowledge Nodes Column A Column B Pack Row 1 Global Knowledge Pack Row 2 String and character data Built during LOAD Pack Row 3 Numeric data Pack Row 4 Distributions Pack Row 5 Pack Row 6 Built per-query e.g. for aggregates, joins Dynamic Knowledge

13 This metadata layer = 1% of the compressed volume
Data Organization and the Knowledge Grid …. Data Pack Nodes (DPN) A separate DPN is created for every data pack created in the database to store basic statistical information Character Maps (CMAPs) Every Data Pack that contains text creates a matrix that records the occurrence of every possible ASCII character Histograms Histograms are created for every Data Pack that contains numeric data and creates 1024 MIN-MAX intervals. Pack-to-Pack Nodes (PPN) PPNs track relationships between Data Packs when tables are joined. Query performance gets better as the database is used. This metadata layer = 1% of the compressed volume

14 How does it work …

15 WebFOCUS Hyperstage Example: Query and Knowledge Grid
SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Toronto’; salary age job city Lets examine a simple query that counts the employees with a salary greater than 50000, age less than 65, job of shipping in the city of Toronto. The data consists of 4 columns and 4 sets of data packs. The first thing Hyperstage does is use the information in the knowledge grid to determine which data packs need to be accessed to answer the query. All values match Completely Irrelevant Suspect

16 WebFOCUS Hyperstage Example: salary > 50000
SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Toronto’; salary age job city Find the Data Packs with salary > 50000 Using “salary>50000” it determines that the 1st, 2nd and 4th data packs for the SALARY column do not have values > These data packs can be ignored. In the 3rd data pack all of the values are >50000. All values match Completely Irrelevant

17 WebFOCUS Hyperstage Example: age<65
SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Toronto’; salary age job city Find the Data Packs with salary > 50000 Find the Data Packs that contain age < 65 Continuing with the next criteria of age < 65 we see that the 1st and 3rd AGE data packs have all values less than 65 and the 2nd and 4th data packs have some values less than 65 and some greater than 65. These datapacks could potentially needs to be decompressed and scanned to find the exact values. All values match Completely Irrelevant Suspect

18 WebFOCUS Hyperstage Example: job = ‘shipping
SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Toronto’; salary age job city Find the Data Packs with salary > 50000 Find the Data Packs that contain age < 65 Find the Data Packs that have job = ‘shipping’ For job of shipping this example shows that the 1st and 4th data packs for the SHIPPING column have some values that match and the 2nd and 3rd data packs all of the values are “shipping”. All values match Completely Irrelevant Suspect

19 WebFOCUS Hyperstage Example: city = ‘Toronto
SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Toronto’; salary age job city Find the Data Packs with salary > 50000 Find the Data Packs that contain age < 65 Find the Data Packs that have job = ‘shipping’ Find the Data Packs that have city = ‘Toronto’ For city of Toronto we have the 1st and 4th data packs no values match and the 2nd and 3rd data packs some of values equal “Toronto”. All values match Completely Irrelevant Suspect

20 WebFOCUS Hyperstage Example: Eliminate Pack Rows
SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Toronto’; salary age job city All packs ignored All packs ignored Find the Data Packs with salary > 50000 Find the Data Packs that contain age < 65 Find the Data Packs that have job = ‘shipping’ Find the Data Packs that have city = ‘Toronto’ Eliminate All rows that have been flagged as irrelevant All packs ignored Using the knowledge grid we can now completely eliminate the need to look at these data packs to answer the query. All values match Completely Irrelevant Suspect

21 WebFOCUS Hyperstage Example: Decompress and scan
SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Toronto’; salary age job city All packs ignored All packs ignored Find the Data Packs with salary > 50000 Find the Data Packs that contain age < 65 Find the Data Packs that have job = ‘shipping’ Find the Data Packs that have city = ‘Toronto’ Eliminate All rows that have been flagged as irrelevant Finally we identify the pack that needs to be decompressed All packs ignored Only this pack will be de-compressed In fact only 1 CITY data pack will need to be decompressed and scanned for Hyperstage to find the count of employees. If all the values in the data pack had matched “Toronto” then the answer would be 65,536 and the answer would have come directly from the knowledge grid in sub second time. The Hyperstage optimizer will always try and make use of the information in knowledge grid to insure high levels of performance. All values match Completely Irrelevant Suspect

22 POC Results (Internal Use Only)
Insurance Company Query performance issues with SQL Server - Insurance claims analysis 3 day POC - Compression achieved 40:1 Most queries running 3X faster in Hyperstage Large Bank Query performance issues with SQL Server - Web traffic analysis 3 day POC -Compression achieved 10:1 Queries than ran for 10 to 15 mins in SQL Server ran sub-second in Hyperstage Government Application Query performance issues with Oracle – Federal Loan/Grant Tracking 3 day POC -Compression achieved 15:1 Queries than ran for 10 to 15 mins in Oracle ran in 30 secs in Hyperstage POCs can typically be completed with 3 days

23 Beyond WebFOCUS WebFOCUS Client WebFOCUS Reporting Server
WF Hyperstage Adapter WebFOCUS Hyperstage Server Generic App Java C .Net PHP Perl Java WF Connector WF Service .Net Hyperstage is integrated in the WebFOCUS BI Architecture through the reporting server and is administered using the WebFOCUS console WebFOCUS client applications communicate directly through the reporting server Custom applications developed via Java or .Net can access the reporting server via WebFOCUS services and a supplied WebFOCUS connector Hyperstage also supports connections from any application via industry standard JDBC or ODBC connections. There are also native drivers for .NET, C, or PHP applications to connect directly to the Hyperstage engine. Data can be loaded and maintained in Hyperstage using iWay Data Integration or using any commercial ETL tool.

24 Hyperstage vs. OLAP Many companies are looking to migrate from legacy OLAP solutions Hyperstage can offer excellent query performance with a commonly understood star pattern database WebFOCUS can offer navigation and drill path navigation Hyperstage can support large numbers of dimensional attributes and can be easily updated OLAP WebFOCUS HyperStage Limited number of dimensions Supports up to 4096 columns on a single table Difficult to add new dimensions Dimension tables can be updated Rebuilding cubes can be slow Bulk loads of up to 500GB per hour Up to 10X raw data size to amount of disk consumed Typically 10:1 compression

25 Hyperstage vs. In-Memory
WebFOCUS Hyperstage is a viable alternative to BI tools that utilize an in-memory architecture like QlikView, Tableau, Cognos TM1 and Tibco/Spotfire In-memory is limited to the amount of data you can store in RAM. Hyperstage is a hybrid approach that efficiently uses disk I/O without sacrificing the performance achieved by in-memory Tableau for example has approximately a 100GB limit on its in-memory cache.   In Memory Solutions WebFOCUS HyperStage Storage: RAM Storage: RAM/Disk Expensive Cheap Short term Long Term Requires additional hardware Leverage  existing hardware

26 Demonstration …

27 NYSE Daily Stock Price History
Downloaded from internet daily history from 1970 to 2006 for 7000 stocks 14 million rows 1.4GB of raw data Compressed to 70MB Test query summarizes stock information for top tech companies in March 2000 and compares the information for the same period in March 2002 (dot com collapse) Note: Hyperstage running on a Dell laptop 1 duo core processor with 4GB of RAM

28 NYSE Daily Stock Price History (exploded)
Simulated additional stock prices up to 2043 2 billion rows 200GB of raw data Compressed to 17GB Test query summarizes stock information for top tech companies in March 2000 and compares the information for the same period in March 2002 (dot com collapse)

29 WebFOCUS Hyperstage The Big Deal…
No indexes No partitions No views No materialized aggregates Value proposition Low IT overhead Allows for autonomy from IT Ease of implementation Fast time to market Less Hardware Lower TCO No DBA Required!

30 Q&A


Download ppt "Maximize WebFOCUS Performance with Hyperstage"

Similar presentations


Ads by Google