2 AGENDA Product Success What is Information Lifecycle Management? SAP SYBASE IQVLDB OptionVLDB in Use in a Large BankPowerDesigner ILM Model for SAP SYBASE IQSummaryThis slide deck describes the technical details of the SAP SYBASE IQVLDB option for managing big data. First, we will showcase SAP SYBASE IQ as a proven analytics platform for big data applications. Then we will describe the concept and practice of Information Lifecycle Management (ILM) for managing large volumes of data. The SAP SYBASE IQVLDB option supports ILM with partitioning, placement, and data administration features that are important for big data management. We will show how the VLDB option is being used at a large bank. And then we will describe how Sybase PowerDesigner has been enhanced with modeling features that support building an ILM scenario that can be deployed in SAP SYBASE IQ.
4 SAP SYBASE IQ LEADERSHIP ADOPTION MOMENTUM Mature, industrial strength analytic DBMSLEADERSHIPIndustry leading performance & scale benchmarksRecognized EDW market leader by Gartner, ForresterPioneering technology with 10+ patentsADOPTION4500+ installations in accounts~200 new customer wins per year (last 4 years)Consistently 96%+ customer satisfaction ratesMOMENTUM2 x DW market growth rate (last 4 years)Fast paced product releasesv15, v15.1 (2009), v15.2 (2010), v15.3, v15.4 (2011)SAP SYBASE IQ is a recognized leader in analytics – with a growing customer base and high level of customer satisfaction. Our revenue growth curve has been double that of the general data warehouse marketplace during the last 4 hears. Our performance in certified benchmarks, and recognition as a market leader by Gartner and Forrester have earned us our stripes. Sybase has never sat on its laurels, though, and has continuously improved SAP SYBASE IQ with a rapid series of innovations. The last 3 years have seen 5 major releases that have introduced significant new features and capabilities.Ericsson • Sungard • Nielsen • BNP Paribas • Telefonica • hmv.com • comScore • Agricultural Bank of China
5 SAP SYBASE IQ Stores and analyzes large amounts of data Stands out as the leading enterprise data warehouse amongst the largest banks, insurance agencies, and telecom operators worldwideManage and analyze statistical measures for the entire nation of CanadaAnalyze complex models in more than 200 financial institutions worldwideAnalyze ALL Federal tax returns in the USStore and Analyze massive amounts of industry segment data in 30 of the largest information providers in the world, including Transunion, Nielsen and AxiomSAP SYBASE IQ handles big data across many industries worldwide. It is the custodian of very large societal information:All statistical measures in CanadaAll federal tax returns in the USAAll citizen health information in KoreaSAP SYBASE IQ allows the largest commercial information providers to thrive, including 30 of the largest information providers in the world: Nielsen, Experian, TransUnion, Acxiom, Dun & Bradstreet, Thomson Reuters, TNS Media, …SAP SYBASE IQ crunches through the most complex models in the financial world, and is deployed in more than 200 financial institutions : JPMC, HSBC, Goldman Sachs, Alliance Bernstein, Citigroup, CSFB, Etrade, …
6 What is Information Lifecycle Management? First, let’s begin with a definition of Information Lifecycle Management.
7 “ ILM is a management approach aimed at tackling the storage ‘information overload' problem which has so far failed to live up to its potential. The key to its success is being able to automate identification of the most valuable information contained in company data at any given time so that relatively unimportant data can be automatically demoted to lower-cost, less accessible storage media and ultimately discarded.”Here is a definition of Information Lifecycle Management from respected research firm, Bloor Research.Bloor Research
8 ILM in the Real WorldNOAA: National Oceanic and Atmospheric AdministrationA global network of sensors provide a steady stream of data on the Earth’s oceans and weatherWith streams and a vast archive of historical data, NOAA manages some of the largest databases in federal governmentThe Princeton, NJ data center alone stores more than 20 petabytes of dataNOAA CIO: Joe Klimavicz:“I focus much of my time on DATA LIFECYCLE MANAGEMENT“The keys to ensuring that data is useable and easy to find include using accurate metadata, publishing data in standard formats, and having a WELL-CONCEIVED DATA STORAGE STRATEGY”ILM is a real challenge for companies that are dealing with large volumes of data. NOAA is one example of those.
9 Data Decreases in Value Over Time Data lifecycleBusiness EventOperational TransactionData Transform and Load into DWData is Queried, Analysed and ReportedData is ArchivedData is PurgedTimeHour/sDay/sMinute/sYear/sDecade/sT=0The value of data changes over time beginning when it first appears as a business event. Business data begins as an operational transaction, that is fulfilled and closed relatively quickly, and after that becomes data required for current reporting, then historical reporting, then archived for compliance/risk mitigation, and finally purged when it has no further value.Months
10 Information Lifecycle Management Data partitioning and placement according to data valueSepAugJul2. Mark partition read-onlyJun4. Drop partitionData Partitions1. Roll-on: Load monthly table partitionJanFebMarAprMayJunDec3. Back-up the partition5. Drop backup filesMany companies implement a “roll on – roll off” scenario, where new data is loaded into a particular area of fast storage, then as it ages, it is moved through tiered storage: each tier implemented with cheaper and slower storage. The purpose of this is to spend IT resource dollars more efficiently, and acquire just the right level of service for each type of data.
12 SAP SYBASE IQ Information lifecycle management SAP SAP SYBASE IQ15 EngineMultiplex Grid ArchitectureAdmin & Monitoring FrameworkStorage Area NetworkCommunications &SecurityColumn IndexingSub-systemLoading EngineColumn Storage ProcessorQueryEngineIn-Database AnalyticsText SearchWeb Enabled AnalyticsInformation Lifecycle ManagementManage data through its existence in the DWAmong its many other capabilities, SAP SYBASE IQ offers information lifecycle management features that help users manage large volumes of data more effectively.
13 SAP SYBASE IQVLDB OPTION Data partitioningMultiple user DBSpacesSeparate unstructured data from transactional dataPlace frequently accessed data on fast storageGranular database administration with read-only, read-write, on-line and off-line DBSpacesCatalog StoreIQ Main Store for User DataTemp StoreTableDBSpaceDBFileTable PartitionTable ColumnIndexBeing able to manage data according to its value requires partitioning functions to organize data, multiple user DBSpaces to map logical containers to different areas of physical storage, and placement commands to locate data into the preferred DBSpace. SAP SYBASE IQ offers all of this. In addition, DBSpaces can be marked read-only so that once data is not changing any more, it can validated and backed up only once.
14 VLDB OPTION Benefits Option Partitioned Tables Number of User DBSpaces Database Object PlacementDBSpace AttributesDBSpace ManagementVLDB OptionPartition by range; single column partition keyMultiple DBSpaces, each with multiple DBFilesUnlimited data volumePlace database objects (tables, table partitions, columns, indexes) in specific DBSpacesDBSpaces can be marked read-only, read-write, on-line or off-lineValidate read-write portions of database separately from read-onlyBackup read-write DBSpaces separately from read-onlySAP SAP SYBASE IQBase ProductSingle table partitionSingle user DBSpace with multiple DBFilesAll database objects are placed in one user DBSpaceSingle user DBSpace is read-write and on-lineValidate and backup single user DBSpace as a unitThis table shows what is included in the base product compared to what is provided by the VLDB option. In the base product, you cannot partition tables, and you have a single user DBSpace, albeit with multiple DBFiles and unlimited storage. The single user DBSpace is writeable and always on-line. With the VLDB option, you can partition tables based on a range of values of a column (the partition key), and you can have an unlimited number of user DBSpaces. These user DBSpaces can be read-only, read-write, on-line or off-line. You can back up read-write DBSpaces separately from read-only DBSpaces. Read-only DBSpaces need to be backed up only once.
15 ILM in SAP SYBASE IQPartitioning and placementIQ provides partitioning and placement features to manage the storage and movement of data:Partitioning divides data into non-overlapping subsets across a dimension, such as “date”. For example, you may partition customer order data by datePlacement maps a data partition to a particular area of storage: the partition “June Customer Orders 2009” resides in file “/opt/data/orders/june2009.dat”Separate big, unstructured data from transactional data:Different levels of protectionDifferent administration needsUse of tiered storage to control costPartitioning and placement are two key functions necessary for information lifecycle management. Partitioning allows you to organize your data into logical sets. Then you can place those data sets in appropriate areas of storage.Partitioning allows you to localize data that belongs together, and to separate data that is not usually accessed at the same time. You can apply the appropriate storage technology to a data set, depending on how quickly the data must be served up, and what your budget is. Data that needs to be accessed quickly and frequently deserves the highest grade storage. Also, you can protect and administer data sets in different ways, according to security and risk mitigation requirements.
16 Controls for Database Administration Database administrative operations can be performed with finer controlThe database can be divided into read-only and read-write sections that are managed differentlyBackup and restore time can be reduced by backing up read-only data onceData validation can be invoked on just the read-write portions of the databaseFrequently accessed data can be assigned to faster data storage, and less frequently accessed data can be segregated to cheaper, slower storageDatabase administration can be a very time consuming and costly activity. Think how much time you can save by dividing up the database into read-only and read-write sections. You can validate and back up read-only data once, saving precious CPU cycles and clock time.
17 Partition and position a table in IQ Partition by range: single column partition key1) Partition table OrdersCREATE TABLE Orders (OrderID INT,OrderDate DATE,Description CHAR(10) ,PARTITION BY RANGE (OrderDate( p2010 VALUES < =' ‘ IN FIBER,p2011 VALUES <= ' ‘ IN FIBER,pNextYear VALUES <= (MAX) IN FIBER);Over time, as data is being loaded, start migratingolder data to slower, cheaper storageThis slide shows examples of IQ DDL commands to create, move and drop partitions. The “PARTITION BY RANGE” clause on the CREATE TABLE statement at the top shows the creation of several table partitions in one statement. The ALTER TABLE…MOVE PARTITION statement shows the movement of a table partition onto a different DBSpace as it ages.2) Move p2010 to SATA storageALTER TABLE Orders MOVE PARTITION p2010 to SATA;3) Later, drop very old partitionsALTER TABLE Orders DROP PARTITION p2010;
18 Full Mesh High Speed Interconnect Virtual Data MartsUnique, user community focused platform for big data analyticsData ScientistsBusiness AnalystsOperationsEnd UsersFull Mesh High Speed InterconnectSAN FabricBuilding upon separation of data and storage into discrete sets, SAP SYBASE IQ Multiplex introduced the concept of “logical servers”. A logical server is a grouping of physical nodes in the Multiplex. When a query is executing on a machine in a logical server, only the nodes within the particular logical server will participate in the query. This allows workloads to be isolated from each other for security or resource balancing purposes. Logical servers are elastic – physical machines may be added to or removed from a logical server dynamically as workload demand changes. A logical server can be used to build a “virtual data mart” – a set of storage and compute resources used for a particular purpose within an enterprise. The data mart is “virtual”, because the set of storage and compute resources are part of a larger set, and the boundary around the mart is changeable – data can be moved to other areas of storage, and physical servers can migrate among logical servers.Virtual data mart of servers and partitioned storageWorkload managementPrivacy through isolation of resourcesSeparate big unstructured data from transactional dataBack up and restore independently
20 Shorten Data Backup Times A large bank is using the SAP SYBASE IQVLDB option to shorten backup times. They divided the database into read-write and read-only partitions. The read-only DBSpaces are backed up just once, and then only the read-write data needed to be backed up regularly.
21 Re-claim valuable Storage space The bank also implemented a data consolidation activity, that copied the data from partially used DBFiles (the physical files that make up a logical DBSpace), into other DBFiles in the same DBSpace. Then the emptied out DBFile was returned to the storage team for reuse. The result was more efficient use of storage resources, and money saved.
23 ILM in PowerDesigner Model the database Create DBSpaces Assign cost Create a new lifecycleAssign start date and phase retention periodsAssociate tables with lifecycleSelect date column partition keyEstimate cost savingsGenerate scripts to move partitions through DBSpaces as they ageImplementing ILM in SAP SYBASE IQ is made easier with PowerDesigner. In the Sybase PowerDesigner modeling tool, the user can define a data lifecycle - how data is partitioned, and how partitions are positioned on DBSpaces. PowerDesigner can generate cost savings reports as data is migrated over time onto cheaper storage, and can also generate the DDL scripts that move partitions at prescribed times.
24 Create LifecycleHere is picture of a PowerDesigner dialog box for defining a data lifecycle. The user defines the total length of a lifecycle, how many phases comprise the lifecycle, and how long a partition stays in a particular phase before moving to the next phase.
25 Lifecycle Properties Assign a cost to the storage: Indicate which tables are part of the lifecycle:The user assigns database tables to the lifecycle, and estimates the initial volume of data, and how data will potentially grow over time. Each phase of a data lifecycle is associated with a particular tablespace with a particular cost.
26 Generate Data Movement Scripts PowerDesigner will generate data partition movement scripts that implement the data lifecycle and work with SAP SYBASE IQ.
27 Generate Cost Savings Report Generate cost savings informationFinally, PowerDesigner can generate a report that shows cost savings as data is migrated through the lifecycle phases onto cheaper and cheaper storage.Report:
29 SAP SYBASE IQVLDB OPTION SUMMARYStorage strategies for managing big data — to service data requests responsively, while controlling costsLearn moreVisit:Call:For more information, visit the URL shown.