4WebFOCUS Hyperstage Why? Why Do BI Applications Fail? Typically 3 Reasons….1. Too ComplicatedSelf-Service, Guided Ad hoc2. Bad DataData Quality3. Too SlowHyperstageHyperstage will improve database performance for WebFOCUS applications with less hardware, no database tuning and easy migration.
5What is WebFOCUS Hyperstage Embedded, columnar data store that can dramatically increase the performance of WebFOCUS applicationsColumnar = reduced I/O (vs relational)Easily implemented without the need for database administrationDisk footprint is reduced with a powerful compression algorithmIncludes embedded ETL for seamless migration of existing analytical databasesNo change in query or application requiredData migrations are seamless and easyWF M and higher includes optimized Hyperstage AdapterRuns on commodity hardware (Intel based)Windows 64Linux (Redhat, Centos, Suse, Debian)
6Introducing WebFOCUS Hyperstage …. Hyperstage is an integrated columnar oriented data store that helps WebFOCUS applications achieve outstanding query performance.
7WebFOCUS Hyperstage Engine How does it work?Column OrientationSmarter ArchitectureNo maintenanceNo query planningNo partition schemesNo DBAKnowledge Grid – statistics and metadata “describing” the super-compressed dataData Packs – data stored in manageably sized, highly compressed data packsData compressed using algorithms tailored to data type
9Pivoting Your Perspective: Columnar Technology Employee IdNameLocationSales1SmithNew York50,0002JonesNew York65,0003FraserBoston40,0004FraserBoston70,000Data stored in rowsData stored in columns1SmithNew York50,0001SmithNew York50,000Column-oriented databases allow data to be stored column-by-column rather than row-by-row. This simple pivot in perspective—looking down rather than looking across—has profound implications for analytic speed.Column-oriented databases are better suited for analytics where, unlike transactions, only portions of each record are required. By grouping the data together this way, the database only needs to retrieve columns that are relevant to the query, greatly reducing the overall I/O.Returning to the example in the section above, we see that a columnar database would not only eliminate 43 days of data, it would also eliminate 28 columns of data. Returning only the columns for toasters and units sold, the columnar database would return only 14 million data elements or 93% less data. By returning so much less data, columnar databases are much faster than row-based databases when analyzing large data sets.In addition, some columnar databases compress data at high rates because each column stores asingle data type (as opposed to rows that typically contain several data types), and allow compression to be optimized for each particular data type. Row-based databases have multiple data types and limitless range of values, thus making compression less efficient overall.2JonesNew York65,0002JonesNew York65,0003FraserBoston40,0003FraserBoston40,0004FraserBoston70,0004FraserBoston70,000
10Data Organization and the Knowledge Grid …. Data Packs - The data within each column is storedin groupings of 65,536 values called Data PacksData Packs improves data compression as the optimal compression algorithm is applied based on the data contentsAn average compression ratio of 10:1 is achieved after loading data into Hyperstage. For example 1TB of raw data can be stored in about 100GB of space.Data PackData PackData PackData PackData PackData Pack
1164K 64K 64K 64K Data Organization and the Knowledge Grid …. Data Packs and CompressionData PacksEach data pack contains 65, 536 data valuesCompression is applied to each individual data packThe compression algorithm varies depending on data type and data distribution64K64KCompressionResults vary depending on the distribution of data among data packsA typical overall compression ratio seen in the field is 10:1Some customers have seen results have been as high as 40:164KPatent PendingCompressionAlgorithms64K
12String and character data Data Organization and the Knowledge Grid ….The Knowledge GridKnowledge NodesColumn AColumn BPack Row 1Global KnowledgePack Row 2String and character dataBuilt duringLOADPack Row 3Numeric dataPack Row 4DistributionsPack Row 5Pack Row 6Built per-querye.g. foraggregates, joinsDynamic Knowledge
13This metadata layer = 1% of the compressed volume Data Organization and the Knowledge Grid ….Data Pack Nodes (DPN)A separate DPN is created for every data pack created in the database to store basic statistical informationCharacter Maps (CMAPs)Every Data Pack that contains text creates a matrix that records the occurrence of every possible ASCII characterHistogramsHistograms are created for every Data Pack that contains numeric data and creates 1024 MIN-MAX intervals.Pack-to-Pack Nodes (PPN)PPNs track relationships between Data Packs when tables are joined. Query performance gets better as the database is used.This metadata layer = 1% of the compressed volume
15WebFOCUS Hyperstage Example: Query and Knowledge Grid SELECT count(*) FROM employeesWHERE salary > 50000AND age < 65AND job = ‘Shipping’AND city = ‘Toronto’;salaryagejobcityLets examine a simple query that counts the employees with a salary greater than 50000, age less than 65, job of shipping in the city of Toronto. The data consists of 4 columns and 4 sets of data packs.The first thing Hyperstage does is use the information in the knowledge grid to determine which data packs need to be accessed to answer the query.All values matchCompletely IrrelevantSuspect
16WebFOCUS Hyperstage Example: salary > 50000 SELECT count(*) FROM employeesWHERE salary > 50000AND age < 65AND job = ‘Shipping’AND city = ‘Toronto’;salaryagejobcityFind the Data Packs with salary > 50000Using “salary>50000” it determines that the 1st, 2nd and 4th data packs for the SALARY column do not have values > These data packs can be ignored. In the 3rd data pack all of the values are >50000.All values matchCompletely Irrelevant
17WebFOCUS Hyperstage Example: age<65 SELECT count(*) FROM employeesWHERE salary > 50000AND age < 65AND job = ‘Shipping’AND city = ‘Toronto’;salaryagejobcityFind the Data Packs with salary > 50000Find the Data Packs that contain age < 65Continuing with the next criteria of age < 65 we see that the 1st and 3rd AGE data packs have all values less than 65 and the 2nd and 4th data packs have some values less than 65 and some greater than 65. These datapacks could potentially needs to be decompressed and scanned to find the exact values.All values matchCompletely IrrelevantSuspect
18WebFOCUS Hyperstage Example: job = ‘shipping SELECT count(*) FROM employeesWHERE salary > 50000AND age < 65AND job = ‘Shipping’AND city = ‘Toronto’;salaryagejobcityFind the Data Packs with salary > 50000Find the Data Packs that contain age < 65Find the Data Packs that have job = ‘shipping’For job of shipping this example shows that the 1st and 4th data packs for the SHIPPING column have some values that match and the 2nd and 3rd data packs all of the values are “shipping”.All values matchCompletely IrrelevantSuspect
19WebFOCUS Hyperstage Example: city = ‘Toronto SELECT count(*) FROM employeesWHERE salary > 50000AND age < 65AND job = ‘Shipping’AND city = ‘Toronto’;salaryagejobcityFind the Data Packs with salary > 50000Find the Data Packs that contain age < 65Find the Data Packs that have job = ‘shipping’Find the Data Packs that have city = ‘Toronto’For city of Toronto we have the 1st and 4th data packs no values match and the 2nd and 3rd data packs some of values equal “Toronto”.All values matchCompletely IrrelevantSuspect
20WebFOCUS Hyperstage Example: Eliminate Pack Rows SELECT count(*) FROM employeesWHERE salary > 50000AND age < 65AND job = ‘Shipping’AND city = ‘Toronto’;salaryagejobcityAll packsignoredAll packsignoredFind the Data Packs with salary > 50000Find the Data Packs that contain age < 65Find the Data Packs that have job = ‘shipping’Find the Data Packs that have city = ‘Toronto’Eliminate All rows that have been flagged as irrelevantAll packsignoredUsing the knowledge grid we can now completely eliminate the need to look at these data packs to answer the query.All values matchCompletely IrrelevantSuspect
21WebFOCUS Hyperstage Example: Decompress and scan SELECT count(*) FROM employeesWHERE salary > 50000AND age < 65AND job = ‘Shipping’AND city = ‘Toronto’;salaryagejobcityAll packsignoredAll packsignoredFind the Data Packs with salary > 50000Find the Data Packs that contain age < 65Find the Data Packs that have job = ‘shipping’Find the Data Packs that have city = ‘Toronto’Eliminate All rows that have been flagged as irrelevantFinally we identify the pack that needs to be decompressedAll packsignoredOnly this pack willbe de-compressedIn fact only 1 CITY data pack will need to be decompressed and scanned for Hyperstage to find the count of employees. If all the values in the data pack had matched “Toronto” then the answer would be 65,536 and the answer would have come directly from the knowledge grid in sub second time.The Hyperstage optimizer will always try and make use of the information in knowledge grid to insure high levels of performance.All values matchCompletely IrrelevantSuspect
22POC Results (Internal Use Only) Insurance CompanyQuery performance issues with SQL Server - Insurance claims analysis3 day POC - Compression achieved 40:1Most queries running 3X faster in HyperstageLarge BankQuery performance issues with SQL Server - Web traffic analysis3 day POC -Compression achieved 10:1Queries than ran for 10 to 15 mins in SQL Server ran sub-second in HyperstageGovernment ApplicationQuery performance issues with Oracle – Federal Loan/Grant Tracking3 day POC -Compression achieved 15:1Queries than ran for 10 to 15 mins in Oracle ran in 30 secs in HyperstagePOCs can typically be completed with 3 days
23Beyond WebFOCUS WebFOCUS Client WebFOCUS Reporting Server WF Hyperstage AdapterWebFOCUSHyperstageServerGeneric AppJavaC.NetPHPPerlJavaWF ConnectorWF Service.NetHyperstage is integrated in the WebFOCUS BI Architecture through the reporting server and is administered using the WebFOCUS consoleWebFOCUS client applications communicate directly through the reporting serverCustom applications developed via Java or .Net can access the reporting server via WebFOCUS services and a supplied WebFOCUS connectorHyperstage also supports connections from any application via industry standard JDBC or ODBC connections. There are also native drivers for .NET, C, or PHP applications to connect directly to the Hyperstage engine.Data can be loaded and maintained in Hyperstage using iWay Data Integration or using any commercial ETL tool.
24Hyperstage vs. OLAPMany companies are looking to migrate from legacy OLAP solutionsHyperstage can offer excellent query performance with a commonly understood star pattern databaseWebFOCUS can offer navigation and drill path navigationHyperstage can support large numbers of dimensional attributes and can be easily updatedOLAPWebFOCUS HyperStageLimited number of dimensionsSupports up to 4096 columns on a single tableDifficult to add new dimensionsDimension tables can be updatedRebuilding cubes can be slowBulk loads of up to 500GB per hourUp to 10X raw data size to amount of disk consumedTypically 10:1 compression
25Hyperstage vs. In-Memory WebFOCUS Hyperstage is a viable alternative to BI tools that utilize an in-memory architecture like QlikView, Tableau, Cognos TM1 and Tibco/SpotfireIn-memory is limited to the amount of data you can store in RAM.Hyperstage is a hybrid approach that efficiently uses disk I/O without sacrificing the performance achieved by in-memoryTableau for example has approximately a 100GB limit on its in-memory cache. In Memory SolutionsWebFOCUS HyperStageStorage: RAMStorage: RAM/DiskExpensiveCheapShort termLong TermRequires additional hardwareLeverage existing hardware
27NYSE Daily Stock Price History Downloaded from internet daily history from 1970 to 2006 for 7000 stocks14 million rows1.4GB of raw dataCompressed to 70MBTest query summarizes stock information for top tech companies in March 2000 and compares the information for the same period in March 2002 (dot com collapse)Note: Hyperstage running on a Dell laptop 1 duo core processor with 4GB of RAM
28NYSE Daily Stock Price History (exploded) Simulated additional stock prices up to 20432 billion rows200GB of raw dataCompressed to 17GBTest query summarizes stock information for top tech companies in March 2000 and compares the information for the same period in March 2002 (dot com collapse)
29WebFOCUS Hyperstage The Big Deal… No indexesNo partitionsNo viewsNo materialized aggregatesValue propositionLow IT overheadAllows for autonomy from ITEase of implementationFast time to marketLess HardwareLower TCONo DBA Required!