Presentation on theme: "Analyze/Report from large Volumes of Data"— Presentation transcript:
1Analyze/Report from large Volumes of Data WebFOCUS HyperstageAnalyze/Report from large Volumes of DataInformation BuildersMay 11, 2012Information Builders (Canada) Inc.
2WebFOCUS Higher Adoption & Reuse with Lower TCO MobileApplicationsData UpdatingVisualization& MappingPredictiveAnalyticsEnterpriseSearchHigh PerformanceData StoreExtended BIPerformanceManagementMS Office &e-PublishingQuery &AnalysisDashboardsReportingInformationDeliveryCore BIData Warehouse& ETLBusiness toBusinessData Profiling &Data QualityBusiness ActivityMonitoringMaster DataManagementExtensions to the WebFOCUS platform allow you to build more application types at a lower cost
3WebFOCUS High Performance Data Store MobileApplicationsData UpdatingVisualization& MappingPredictiveAnalyticsEnterpriseSearchHigh PerformanceData StoreExtended BIPerformanceManagementMS Office &e-PublishingQuery &AnalysisDashboardsReportingInformationDeliveryCore BIData Warehouse& ETLBusiness toBusinessData Profiling &Data QualityBusiness ActivityMonitoringMaster DataManagementExtensions to the WebFOCUS platform allow you to build more application types at a lower cost
5Today’s Top Data-Management Challenge Big Data and Machine Generated Data StorageMachine- GeneratedDataHuman-GeneratedDataTime
6IT Managers try to mitigate these response times ….. How Performance Issues are Typically Addressed – by Pace of Data GrowthWhen organizations have long running queries that limit the business, the response is often to spend much more time and money to resolve the problemSource: KEEPING UP WITH EVER-EXPANDING ENTERPRISE DATA ( Joseph McKendrick Unisphere Research October 2010)
7Classic Approaches and Challenges Data Warehousing More Data, More Data SourcesLimited Resourcesand BudgetMore Kinds of OutputNeeded by More Users,More QuicklyReal time data1010101Multiple databases10100110External Sources1011011101011010010110100111101111010101001101011100111010110101011011101110101010101010101111011010010110101101010110110101101Labour intensive, heavy indexing, aggregations and partitioningHardware intensive: massive storage; big serversExpensive and complexTraditional Data Warehousing
8Classic Approaches and Challenges Data Warehousing – Growing Demands New Demands:Larger transaction volumes driven by the internetImpact of Cloud ComputingMore -> Faster -> CheaperData Warehousing Matures:Near real time updatesIntegration with master data managementData mining using discrete business transactionsProvision of data for business critical applicationsEarly Data Warehouse Characteristics:Integration of internal systemsMonthly and weekly loadsHeavy use of aggregates
9Classic Approaches and Challenges Dealing with Large Data INDEXESCUBES/OLAP
10Classic Approaches and Challenges Limitations of Indexes Increased Space requirementsSum of Index Space requirements can exceed the source DBIndex ManagementIncreases Load timesBuilding the indexPredefines a fixed access path
11Classic Approaches and Challenges Limitations of OLAP Cube technology has limited scalabilityNumber of dimensions is limitedAmount of data is limitedCube technology is difficult to update (add Dimension)Usually requires a complete rebuildCube builds are typically slowNew design results in a new cube
12Limitations of Rows These Solutions Contribute to Operational Limitations Impediments to business agilitywait for DBAs to create indexes or other tuning structures, thereby delaying access to data.Indexes significantly slow data-loading operations and increase the size of the database, sometimes by a factor of 2x.Loss of data and time fidelity:ETL operations typically performed in batch during non-business hours.Delay access to data, often result in mismatches between operational and analytic databases.Limited ad hoc capability:Response times for ad hoc queries increase as the volume of data grows.Unanticipated queries (where DBAs have not tuned the database in advance) can result in unacceptable response times.Unnecessary expenditures:Attempts to improve performance using hardware acceleration and database tuning schemes raise the capital costs of equipment and the operational costs of database administration.Added complexity of managing a large database diverts operational budgets away from more urgent IT projects.Many IT organizations and technology solution providers rely on traditional relational databases for their data warehouse, data mart or analytic repository. The problem is that those databases were designed for transactional applications, not analytics against large data volumes. As a result, many companies find that as the volume of data grows, those systems cannot meet the performancerequirements from users. In addition, traditional database technology requires a high degree of effort (such as creating/ maintaining indexes, creating cubes or projections, or partitioning data) and are costly to license and maintain.Let’s Discuss Row Based approaches in more detail ….
13Pivoting Your Perspective: Columnar Technology ….
14The Limitation of Rows The Ubiquity of Rows 30 columnsRow-based databases are ubiquitous because so many of our most important business systems are transactional.Row-oriented databasesare well suited for transactional environments, such as a call center where a customer’s entire record is required when their profileis retrieved and/or when fields are frequently updated.50millionsRowsWhere row-based databases run into trouble is when they are used to handle analytic loads against large volumes of data, especially when user queries are dynamic and ad hoc.To see why, let’s look at a database of sales transactions with 50-days of data and 1 million rows per day. Each row has 30 columns of data. So, this database has 30 columns and 50 million rows. Say you want to see how many toasters were sold for the third week of this period. A row-based databasewould return 7-million rows (1 million for each day of the third week) with 30 columns for each row—or 210-million data elements. That’s a lot of data elements to crunch to find out how many toasters were sold that week.As the Data Set data set increases in size, disk I/O becomes a substantial limiting factor since a row-oriented design forces the database to retrieve all column data for any query.As we mentioned above, many companies try to solve this I/O problem by creating indices to optimize queries. This may work for routine reports (i.e. you always want to know how many toasters you sold for the third week of a reporting period) but there is a point of diminishing returns as load speed degrades since indices need to be recreated as data is added.But - Disk I/O becomes a substantial limiting factor since a row-oriented design forces the database to retrieve all column data for any query.
15Pivoting Your Perspective Columnar Technology Employee IdNameLocationSales1SmithNew York50,0002JonesNew York65,0003FraserBoston40,0004FraserBoston70,000Row Oriented(1, Smith, New York, 50000; 2, Jones, New York, 65000; 3, Fraser, Boston, 40000; 4, Fraser, Boston, 70000)Works well if all the columns are needed for every query.Efficient for transactional processing if all the data for the row is availableColumn Oriented(1, 2, 3, 4; Smith, Jones, Fraser, Fraser; New York, New York, Boston, Boston, 50000, 65000, 40000, 70000)Works well with aggregate results (sum, count, avg. )Only columns that are relevant need to be touchedConsistent performance with any database designAllows for very efficient compression
17Introducing WebFOCUS Hyperstage MissionImprove database performance for WebFOCUS applications with less hardware, no database tuning, and easy migrationWhat is WebFOCUS HyperstageHigh performance analytic data storeDesigned to handle business-driven queries on large volumes of datawithout IT intervention.Easy to implement and manage, Hyperstage provides answers to your business users need at a price you can affordAdvantagesDramatically increase performance of WebFOCUS applicationsDisk footprint reduced with powerful compression algorithm = faster response timeEmbedded ETL for seamless migration of existing analytical databasesNo change in query or application requiredIncludes optimized Hyperstage AdapterWebFOCUS metadata can be used to define hierarchies and drill paths to navigate the star schema
18Introducing WebFOCUS Hyperstage How it is architected Hyperstage EngineKnowledge GridCompressorBulkLoaderCombines a columnar database with intelligence we call the Knowledge Grid to deliver fast query responses.Unmatched Administrative SimplicityNo IndexesNo data partitioningNo Manual tuningImprove database performance for WebFOCUS applications with less hardware, no database tuning, and easy migration
19Introducing WebFOCUS Hyperstage What it means for Customers Self-managing: 90% less administrative effortLow-cost: More than 50% less than alternative solutionsScalable, high-performance: Up to 50 TB using a single industry standard serverFast queries: Ad hoc queries are as fast as anticipated queries, so users have total flexibilityCompression: Data compression of 10:1 to 40:1 means a lot less storage is needed, it might mean you can get the entire database in memory!
20Introducing WebFOCUS Hyperstage How it works Create Information(Metadata) about the data,and, upon Load,automatically …Stores it in the Knowledge Grid (KG)KG Is loaded into MemoryLess than 1% of compressed data SizeUses the metadata whenProcessing a query toEliminate / reduce need toaccess dataThe less data that needs to be accessed,the faster the responseSub-second responses when answered by KGArchitecture BenefitsNo Need to partition data, create/maintain indexesprojections, or tune for performanceAd hoc queries are as fast as static queries,so users have total flexibility
21WebFOCUS Hyperstage Engine How it works Column OrientationSmarter ArchitectureNo maintenanceNo query planningNo partition schemesNo DBAKnowledge Grid – statistics and metadata “describing” the super-compressed dataData Packs – data stored in manageably sized, highly compressed data packsData compressed using algorithms tailored to data type
24WebFOCUS Hyperstage The Big Deal No indexesNo partitionsNo viewsNo materialized aggregatesValue propositionLow IT overheadAllows for autonomy from ITEase of implementationFast time to marketLess HardwareLower TCONo DBA Required!