Nearly every research firm agrees that storage is growing at an amazing rate. > Industry estimates range from 30% to 70% per year growth > Regardless of which estimate you believe, it is still an amazing growth rate">
Presentation on theme: "Data Storage is Getting Out-of-Hand"— Presentation transcript:
0 Managing the Unimaginable: A Practical Approach to Petabyte Data Storage Randy Cochran, Infrastructure Architect, IBM Corporation,TOD Information on Demand Infrastructure
1 Data Storage is Getting Out-of-Hand > Companies are having difficulty keeping ahead of corporate storage demands.Are storage demands starting to overpowering you?
2 Most Research Firms Agree “It is projected that just four years from now, the world’s information base will be doubling in size every 11 hours.”(“The toxic terabyte; How data-dumping threatens business efficiency”, Paul Coles, Tony Cox, Chris Mackey, and Simon Richardson, IBM Global Technology Services white paper, July 2006)“Our two-year terabyte CAGR of 52% is 3ppt (percentage points) below rolling four quarter results of 55%.”("Enterprise Hardware: storage forecast & views from CIOs", Richard Farmer and Neal Austria, Merrill Lynch Industry Overview, 03 January 2007)“With a 2006–2011 CAGR nearing 60%, there is no lack in demand for storage…”("Worldwide Disk Storage Systems 2007–2011 Forecast: Mature, But Still Growing and Changing", Research Report # IDC206662, Natalya Yezhkova, Electronics.ca Publications, May 2007)“According to TheInfoPro…..the average installed capacity in Fortune organizations has jumped from 198 TB in early 2005 to 680 TB in October …..TIP found that capacity is doubling every 10 months.”(InfoStor Magazine, Kevin Komiega, October 19, 2006)> Nearly every research firm agrees that storage is growing at an amazing rate.> Industry estimates range from 30% to 70% per year growth> Regardless of which estimate you believe, it is still an amazing growth rate
3 What’s Driving Petabyte Level Storage? The “Perfect Storm”General Increase in demandDisaster Recovery plansDeclining storage media costsNew digital data technologiesA desire for greater storage efficiencyMore regulatory requirementsStorage technical skills scarcityBetter protection from litigation> Many factors are converging to fuel storage growth.> The factors fueling this phenomena will not disappear soon.A growing understanding of retained data’s business valueProliferation of Sophisticated applicationsAccording to IDC, between 2006 and 2010 information added annually to the digital universe will increase more than six fold from 161 to 988 exabytes.
4 Just How Big is a Petabyte? > Many petabytes have already been stored on off-line tape. What’s new is maintaining a petabyte of storage on-line!> That is one quadrillion, 125 trillion. 899 billion, 906 million, 842 thousand, 620 bytes!Petabyte storage had been around for years – online Petabyte storage has not.“Ninety-two percent of new information is stored on magnetic media, primarily hard disks.” “How Much Information 2003”, UC Berkeley's School of Information Management and Systems
5 How Big is That in Human Terms? > According to Britannica.com. the U.S. Library of Congress contains approximately 18 million books, 2.5 million recordings, 12 million photographs, 4.5 million maps, and more than 54 million manuscripts.> All of the printed documents contained within (100) Libraries of Congress would fit into a petabyte of storage.According to Britannica.com the U.S. Library of Congress contains approximately 18 million books, 2.5 million recordings, 12 million photographs, 4.5 million maps, and more than 54 million manuscripts.
6 Why is Petabyte Storage a Challenge? Areas Impacted by Petabyte Storage:Content and File ManagementApplication & Database CharacteristicsStorage ManagementArchitectural Design StrategyPerformance and CapacitySAN Fabric DesignBackup and Recovery MethodsSecurity System ComplexityCompliance with Regulatory RequirementsOperational Policies and ProcessesMaintenance Requirements> Almost every aspect of storage management is impacted by petabyte data.> There is nothing magic about the petabyte number; similar challenges start to appear with any really large volume of stored data.
7 Content and File Management > Successful storage management begins with knowledge about, and control of the content in the data files.
8 Management Starts With Data Classification Data Classification AssumptionsNot all data is created equalThe business value of data changes over timePerformance can be improved by re-allocating data to an optimized storage configurationThe value of most business data is not fixed; it is expected to change over timeUnderstanding the business value of data is a crucial in designing an effective data management strategy> It will be difficult or impossible to develop a viable strategy for managing storage without classifying your company’s existing data.> Identifying the business value data has will determine where it should be stored, and what policies will be applied to it.Which data has a greater value to the business - a client’s purchase record, or a memo about last year’s phone system upgrade?
9 Data Classification Example > There are no Industry Standard definitions available for data classification.> The key is to develop classification categories that are relevant to your company.There are no universally accepted standard definitions for Tier Levels.
10 Control Your File Content Implement file agingSet data retention periodsEliminate low value dataClean out old backup filesEliminate outdated informationDeploy de-duplication technologyReduce storage of low value dataLocate and purge corrupt filesCrack down on unauthorized storage usagePeriodically review log files and archive or delete obsolete information> Data storage “best practices” starts with diligently managing the content of files kept on the subsystem.> Data de-duplication holds great promise. It replaces identical copies of data with “pointers” to the original file location. Under the right circumstances, it can reduce the total volume of stored data by as much as x15 or more.> A certain amount of personal data cannot (and probably should not) be forbidden. However, it cannot be allowed to get out-of-control.
11 Application and Database Characteristics > The needs and idiosyncrasies of the business applications should dictate what direction the storage architecture will take.
12 Know Your Application and Database Needs Know your applications needsUser expectationsWorkload complexityRead or write intensitySequential files usageIOPS dependenceStripe size optimizationThroughput requirementsService prioritizationGrowth expectationsDon’t allow databases to “lock up” vast amounts of storage> The server, storage, and network infrastructure exists solely to support the requirements of the user applications and databases.> Understand your application and database performance characteristics.> Ensure your solution will meet or exceed user expectations. If not, little else will matter.
13 Applications Will Drive Storage Requirements Applications characteristics will drive storage decisionsValue to the businessNumber of usersUsage patternsSteadyBurstyCyclicalVariable7x24 or 9x5 accessDomestic or global accessDistributed or self-containedHigh or low security dataArchitectural constraintsSignificant performance gains (or losses) can be achieved by matching requirements to storage characteristics> A vast array of issues must be analyzed before making choices about storage requirements.
14 Storage Management> Like an orchestra conductor, the storage management software coordinates the operations of the storage subsystem.
15 Large Storage Systems Must Be Managed Information Lifecycle Management (ILM)Hierarchical Storage Management (HSM)Storage Resource Management (SRM)Storage Virtualization"Enterprises can achieve better and more targeted utilization of resources by first establishing the value of their information assets and then using storage management software to execute the policies that define how resources are utilized."Noemi Greyzdorf, research manager, Storage Software, IDC> There are several major storage management concepts that impact data storage operations.> Each of these topics could be an entire presentation unto itself. For our purposes, each will be briefly introduced here.
16 Information Lifecycle Management “(ILM is) the process of managing business data throughout its lifecycle from conception until disposition across different storage media, within the constraints of the business process.”(courtesy of Veritas Corporation, Nov. 2004)> If someone offers to sell you an ILM solution – be VERY skeptical. It is a “cradle-to-grave” program for managing data, not a product.> An ILM plan will include storage management applications, data handling policies and procedures, classification strategies, and many related activities.ILM is not a commercial product, but a complete set of products and processes for managing data from its initial inception to its final disposition.
17 Information Lifecycle Management Information has business valueIt’s value changes over timeIt ages at different ratesIt has a finite life-cycleAs data ages its performance needs changeSome Information is subject to different security requirements, due to government regulatory or legal enforcementsOutdated information has different disposal criteriaA combination of processes and technologies that determine how information flows through a corporate environmentEncompasses management of information from its creation until it becomes obsolete and is destroyed> ILM is about doing a good job of maintaining your data throughout its lifetime.> The details of ILM are remarkably similar to recommendations made by storage regulatory requirements.
18 “Best Practices” for ILM Implementations Know exactly where information is storedBe able to retrieve information quickly and efficientlyLimit access to only those who need to view dataCreate policies for managing and maintaining dataDo not destroy important documentsAvoid keeping multiple copies of the same dataRetain information only until it is no longer usefulDestroy outdated files on a regular basisDocument all processes and keep them up-to-date> “Best Practices” for ILM implementation varies widely by industry type.> Some companies feel if they store all data forever they’ll satisfy most storage management needs. Unfortunately, they are wrong.
19 Hierarchical Storage Management “HSM is a policy-based data storage management system that automatically moves data between high-cost and low-cost storage media, without requiring the knowledge or involvement of the user.”(courtesy of> HSM is the process (and product) that makes the economic repositioning of data on appropriate storage feasible.> In most cases HSM works “behind the scenes” to ensure software repositioning occurs, while remaining transparent to the user.> It keeps track of where the data is physically located within the storage structure.IBM has been involved in providing HSM solutions for over 30-years and offer a wide variety of products with automated data movement capabilities.
20 File Access Activity Over Time > Most file activity displays a similar pattern. After a few weeks the number of file accesses becomes negligible.> Why should inactive data be kept on high speed, maximum performance storage?> As file access declines, file storage is decremented to lower performance, less expensive storage.> When the business value of data approaches zero, it’s files are either systematically destroyed or transferred off the system to long-term storage.> Keep in mind that while file activity declines over time, the volume of data stored continues to grow!
21 Hierarchical Storage Management 10% % % ArchiveHSM ConceptsOnly 10%-15% of most data is actively accessedThe business value of data changes over timeBetween 80% and 90% of all stored data is inactiveHigh performance storage (FC disks) are expensiveLower performance media (tape, optical platters, and SATA disk) are comparatively inexpensive> On most systems only 10% -15% of the data requires high speed, high throughput disk drives.> The only way to correctly identify the proper distribution of data across tiers is to thoroughly analyze your own usage patterns.
22 Hierarchical Storage Management $$$$ $$$ $$ $HSM Concepts (cont.)Enterprise class storage is not required for all dataPolicies can be set to establish the proper frequency for transitioning aging data to less expensive mediaHSM allows optimal utilization of expensive disk storageLow cost, high density disks consume fewer resourcesOverall storage system performance may improve> Low performance, high capacity disks cost significantly less than premium disks.> Tape media is still has the lowest cost-per-GB of any storage medium, but very high capacity SATA disk has narrowed that gap significantly.
23 IBM Products with HSM Capabilities General Parallel File System (GPFS)IBM Content Manager for MultiplatformsTivoli Storage Manager HSM for WindowsTivoli Storage Manager for Space Management (AIX)SAN File System (SFS)DFSMShsm (Mainframe)High Performance Storage System (HPSS)> IBM has extensive experience in HSM technologies, and features both breadth and depth in their HSM product offerings.
24 Storage Resource Management “Storage Resource Management (SRM) is the process of optimizing the efficiency and speed with which the available drive space is utilized in a storage area network (SAN). Functions of an SRM program include data storage, data collection, data backup, data recovery, SAN performance analysis, storage virtualization, storage provisioning, forecasting of future needs, maintenance of activity logs, user authentication, protection from hackers and worms, and management of network expansion. An SRM solution may be offered as a stand-alone product, or as part of an integrated program suite.”(Definition Courtesy of> SRM normally encompasses a suite of storage management tools and utilities that facilitate the control of the storage environment.IBM’s primary tool for Storage Resource Management is their TotalStorage Productivity Center suite of tools for disk, data, fabric, and replication.
25 Storage Resource Management Functions > Areas that typically fall under SRM control are Deployment, Compliance, Operational, and Service Level Management.> Storage manufacturers provide different combinations of tools to address specific SRM requirements.
26 Storage Virtualization “The act of integrating one or more (back end) services or functions with additional (front end) functionality for the purpose of providing useful abstractions. Typically virtualization hides some of the back end complexity, or adds or integrates new functionality with existing back end services. Virtualization can be nested or applied to multiple layers of a system.”(Definition Courtesy of> Storage virtualization has been around for years in both the Mainframe and Open Systems worlds. Anyone who has worked with IBM’s LVM or the Veritas Volume Manager has worked with virtualization.> Virtualization provides a layer of abstraction to mask the complexity of the underlying technical structure from the user.Virtualization allows most of the complexity of a storage infrastructure to be hidden from the user.
27 Virtualization Makes Storage One Large Pool Virtualization CharacteristicsMakes storage configuration details invisible to the userImproves overall manageability of the systemAggregates isolated storage “islands” into a unified viewFacilitates greater flexibility and scalabilityOptimizes utilization of storage capacityProvides the ability to move data on-the-flyImproves storage subsystems flexibilityAllows rapid re-allocation of storage resourcesImproves performance by providing another layer of cachingMay provide additional functionality for the SAN> Since large storage subsystems can be very complex, virtualization is usually employed as a tool to simplify the management of the structure.> Virtualization normally spans heterogeneous platforms, presenting all manufacturer’s storage as one large virtual “pool”.> Virtualization can be used to extend the useful life of aging storage equipment, but care must taken not to make poor economic decisions based on a desire to use-what-you-have.
28 Architectural Design Strategy > A well designed architecture can mean the difference between a fast, efficient, and scalable storage infrastructure, and a “black hole” that perpetually sucks money and time from your IT organization.
29 Key Architectural Design Considerations Resource ConsumptionStorage EconomicsRAID AllocationPerformance ObjectivesOther Design Issues> These are some of the key areas of concern with large storage subsystem that can have a major impact on the design.The integrity of the architectural design will determine the overall performance, stability, economic efficiency, manageability and future scalability of the system.
30 Power Consumption vs. Storage Capacity > This chart uses a KW/hr. cost of $0874, which is said to be the national average by the Enterprise Strategy Group. This cost can vary substantially, and is dependent on what part of the U.S. you happen to be located in.> Even though capacities are between 36 GB to 1000 TB, power consumption remains nearly identical.> Older, lower capacity disks may be more expensive to operate.These disks all have very similar power consumption requirements, even though the largest one features 28 times the capacity of the smaller one.In addition, each disk will require approximately watts of electrical power to cool each BTU of heat produced.** National retail price of electricity per KwH from “Power, Cooling, Space Efficient Storage”, page 2, ESG white paper, Enterprise Strategy Group, July
31 Comparing Storage Subsystem Power Costs > Many companies still use Enterprise subsystems for all their data storage needs.> An arbitrary distribution of storage to a tiered 10% / 20% / 70% structure reduces direct power costs by over $120,000 per year.Significant power savings may be realized by redistributing data to the appropriate type and size of disk drive.
32 Comparing Storage Subsystem Cooling Costs > For every watt of power consumed, there is a corresponding power cost for cooling the equipment.> The cost of cooling the tiered structure is over $20,000 less than the cost of cooling a traditional storage structure.Additional power savings may be realized from the reduced cooling requirements provided by high capacity, lower wattage disk drives.
33 Comparing Storage Floor-Space Cost > The footprint presented by a tiered storage structure is considerably smaller than the footprint of a traditional storage structure.> At $65 per square foot, this is an average of $440,000 less than a traditional storage structure.The DS4800 and DS4200 storage subsystems include the required number of disk expansion trays mounted in standard equipment racks.
34 Tiered Storage Approach How Do the Costs Add Up?Traditional ApproachEverything on DS8300sTiered Storage ApproachDS8300DS4800DS4200s with SATA Disk> Altogether the tiered storage structure yields a savings of over $600,000 per year.Savings: $614,935 / yr.
35 A Look at Older Disk Subsystem Efficiency > Older technologies may have a higher operational cost.> Compared to the newer DS8300, a similarly configured ESS800 Shark uses more power, requires more cooling, and occupies more floor space.> This is a 100 TB example. If the storage volume was greater, the differences would be even more dramatic.Storing 100 TB of data on more modern storage subsystems results in 50% less power consumption, a 53% reduction in BTUs per hr., and a reduction in required floor space of 38%.In addition, a DS8300 system has over 7x the throughput of the ESS800.
36 Why is Tiered Storage Important? Maps data’s business value to disk characteristicsPlaces data on storage appropriate to its usageIncorporates lower cost disksReduces resource usage (power, cooling, etc.)Matches user access needs to storage characteristicsCapitalizes on higher capacity disk drive technologyIncreases overall performance of the system> Tiered storage offers a promising new direction for cost-effectively managing data storage.
37 A Typical Tiered Storage Architecture DS4200s with SATA DiskTS3500 Tape LibraryBusiness CriticalHigh Performance / Very High AvailabilityBusiness ImportantGood Performance / High AvailabilityBusiness StandardAverage Performance / Standard Availability> A common tiered storage model is to go from high speed, low capacity disk to medium speed, medium capacity disks, to low performance, high capacity disks. From there it is transitioned to tape media.Normally a tiered storage strategy is based on data’s business value.Reference / HistoricalNear-line or Off-line
38 Choosing the Right Controller Frame > Regardless of the manufacturer, Enterprise Storage controller systems cost more than Midrange or Low end controllers.> Enterprise controllers provide “redundant everything”, massive internal bandwidth, huge cache structures, and vast numbers of external fabric ports. They are designed for maximum performance, throughput, durability, and scalability.> Low end controllers are designed to compliment low cost, high capacity disks with adequate, but not exceptional capabilities.
39 Choosing the Right Disk Characteristics > Choosing the right disk characteristics for the proper tier is the key to a cost-effective storage structure.> Depending on the application requirement, one system’s tier-2 disk could be another’s tier-3 disk.
40 Comparing Disk Drive Attributes > Cost-per- GB may vary substantially between different types of disk.> Obviously high performance FC disk (2 or 4 GB interface) will be at the upper end and high capacity SATA will be at the lower end.> Look for the SATA price to drop even lower as larger platter technology is released.
41 The Cost Impact of Adding Disk Trays > This chart is done in 4-disk increments, and is based on the IBM EXP810, which is a 16-disk, rack-mountable storage tray.> The cost of the controller does have an impact the storage cost-per-GB, but not as much as you might think.> Beyond the second disk tray, the cost of the controller is pretty well distributed and has a minimal impact on the cost-per-GBNote: Calculations based on 146 GB, 10K RPM Drives
42 Tiered Storage Design Pros and Cons AdvantagesLower initial purchase priceHigher capacity per square footReduced power consumptionDecreased requirement for coolingIncreased equipment flexibilityPotentially a higher performance solutionDisadvantagesInherently a more complex architectureGreater up-front effort to design and implementRequires advanced storage design skills and knowledge> The major advantage to tiered storage is significantly reduced cost.> The primary disadvantage to tiered storage is added complexity in the system.
43 RAID Selection Decision Drivers Application or Database characteristicsRead/write mixDependency on IOPSRAID Performance characteristicsAppropriate RAID levelNumber of disks per arrayStripe sizeAvailable bandwidthConfiguration rules and recommendationsLoss from data parity and hot sparingDisk failure probabilityRAID parity rebuild times> The decision about what RAID to use is not only one about performance and protection. It is also one about available storage capacity as well.> In most situations the performance of RAID5 has been confirmed to be nearly as good as RAID1 (mirroring). If your application does not need the slight improvement performance RAID 1 can offer, why use it?
44 Loss from Mirroring, Striping, and Sparing RAID10 =Mirror plus StripeRAID1 =Mirror Only> Any solution that includes a RAID1 configuration will provide high availability (data has a 100% copy), but does so at a price.> Mirroring a disk ensures that at least 50% of the associated disk storage is consumed by the duplicate image.
45 Loss from RAID5 Parity and Sparing Note: The second tray has one 2+P array to allow for one spare drive per two trays.Note: The second tray has one 6+P array to allow for one spare drive per two trays.> RAID5 (striping with parity) needs far less disk capacity, but has slightly lower performance.> The “sweet spot” for most disk array configurations is between 12 and 16 disk drives per array.> Smaller arrays pay a penalty in disk storage while larger arrays start to suffer from cumulative internal latencies.> IBM recommends dedicating at least one disk as a spare for every two disk trays. (one spare per 32-disks) More spares are acceptable.Note: Each tray has one spare drive per tray.
46 Other Architectural Considerations CompatibilityHigh availabilityArchitectural robustnessFlexibility and scalabilityStability of the technologyVendor’s financial standingWell defined product line roadmapSupport for industry standards> Many other architectural factors come into play, each with its own degree of importance.> “Bleeding edge” technology is always fascinating, but it’s definitely not well suited for a production environment.> Everyone loves an underdog, but be absolutely certain about the financial stability of any young start-up company before making their product a cornerstone of your architecture.> Confirm with the vendor that the technology you’re evaluating has a long and promising roadmap into the future.
47 Performance and Throughput > Everyone wants better performance and higher throughput, but where do you start?
48 Storage Subsystem Performance Drivers Business objectives and user expectationsApplications and database characteristicsServer characteristicsSAN fabric characteristicsStorage controller characteristicsCaching characteristicsConfiguration characteristicsDisk latency characteristics> Performance always starts with user expectations. You may struggle to bring response times under a minute, and feel that’s fantastic. However, the user may need response times under 5-seconds and consider your performance efforts lacking dramatically.> Always shoot for consistency in your design. You may have a solution that responds in under 1-second 95% of the time, but the user will only remember the times they had to wait for over 10-second for a response."We can't solve problems by using the same kind of thinking we used when we created them." Albert Einstein
49 Storage Performance Enhancers Data Mover – Reassigning data transfer tasks to a specialized “engine” reduces the workload on the host processing system.Search Engines – Systems dedicated to executing searches in vast amounts of stored data to satisfy specific requests.Directory Services – Stores and organizes information about resources and data objects.High Speed Interconnections – Dedicated “behind the scenes” networks dedicated to the transfer of large amounts of data.Autonomic Computing – must have an ability to reconfigure itself under varying and possibly unpredictable conditions.> Moving data around in large storage structures takes processor cycles. If possible, hand data movement tasks off to a “data mover” to conserve CPU cycles for applications performance.
52 SAN Fabric OverviewSAN Fabric is the interconnecting structure between associated servers and storage devicesProper fabric design will directly impact on:PerformanceAvailabilityEquipment costManageabilityMaintainabilityCommunications protocol can be either Fibre Channel, Ethernet, or a combination of bothAbility of the fabric to scale is criticalMonitoring of SAN fabric traffic is a necessity
53 Designing a High Performance Fabric Select an optimal SAN fabric topologyMeshWaterfallCore / EdgeFat treeButterflyTorusHypercubeHybridEnsure the design is driven by business application requirementsKeep it as simple as possible for manageability
55 SAN Fabric Design Considerations SAN Fabric design issues:Throughput requirementsPotential bottlenecksPort speed / port countPort subscription rateMaximum hop countRedundancy for High AvailabilityModularity (flexibility/scalability)Future upgradeabilityComplexity vs. overall manageabilityIsolation vs. unificationWide Area Network interconnectionsPower consumption and footprintComponent cost
57 Backup and Recovery Challenges Size of the backup windowAbility to recover filesThe time to recover filesIntegrity of the data backupsRequired frequency of backupsStored data retention periodFunctionalLegalAvailable bandwidth for backup resourcesMedia deterioration over timeTechnological obsolescence
58 The Traditional Storage Backup Approach 1.0 PB of StorageTS3500 Tape Library(192) of the newest LTO 4 tape drives running at a maximum native transfer rate of 120 MB/sec. would need at least 13-hours to back up 1.0 PB of data.
59 Fast Tape Drives can Saturate the Network One LTO-4 tape drive run in native mode is 20% faster than the effective usable bandwidth of Gigabit Ethernet!Four LTO-4 tape drives run at a 2:1 compressed mode will take most of the usable bandwidth of 10Gbps Ethernet!
60 Large Systems Backup Approaches Point-in-Time Copies and ReplicationSnapshotsMost popular method of large storage backupSnapshots create an exact copy of the source dataOnce a bitmap is created, storage access can resumeWhile copying, new data is written to both source and targetRequires minimal downtime for production systemsReplicationReplication creates a mirror image of data over distanceMay be synchronous (consistent) or asynchronous (lagging)Synchronous is distance-limited, asynchronous is not
63 Other Large Systems Backup Approaches Object-based BackupsBacks up only new blocks that have changedCopies only files it has never seen beforeInserts pointers if file exists somewhere elseProvides instant recoveries by presenting a mountable volumeDelta-Block Incremental BackupsEvaluates changed data by breaking a file down into discrete blocksBlock-for-block comparison of a modified file with an existing fileWhen a difference is detected it extracts a copy of that block onlyUsually copies a number of blocks, but not the entire fileContinuous Data Protection (CDP)Copies blocks to the backup system as soon as they changeStores data in a log that allows recovery from any point in timePerforms fast recoveries by restoring only blocks that have changed
65 Impact of Petabyte Storage on Security Traditional distributed access control techniques are designed for smaller systems with general or random workloadsPetabyte storage may service tens of thousands of clients and hundreds of storage devicesStorage design must be capable of supporting I/O patterns that are highly parallel and very bursty by natureSecurity solutions must be kept highly scalable to keep up with storage growth patterns
66 Impact of Petabyte Storage on Security (Cont.) Authentication and authorization requirements can dramatically impact server performancePerformance could be further reduced if data is encryptedTraditional security protocols perform poorly because they do not scale wellThe number of security operations is closely tied to the number of devices and requests
68 The Challenge Of Regulatory Compliance What’s driving storage regulatory legislation?Corporate fraud and illegal practicesIncreased emphasis on the security of personal dataThe threat of terrorist activitiesThe global impact of the InternetIncreased reliance on stored data for defense against litigationIncreased business dependence on electronic communications ( , digital voice messaging, instant messaging, VoIP, etc.)
69 Regulatory Requirements Continue to Grow According to the Storage Networking Industry Association (SNIA) there are over 20,000 regulations worldwide addressing the storage of dataThe number of government regulatory requirements increases every yearThere is little chance this upward trend will reverse itself in the futureRegulatory guidelines do not dictate how you should maintain your data, only what the expected outcome should be.If you do business overseas, you must also be aware of the applicable foreign regulatory requirements
70 Common Regulatory Compliance Goals Most regulatory requirements are based upon:Security: Maintain data in a secure environmentEfficiency: Rapid location and retrieval of dataLegibility: Recovered documents must be in a readable format that is clear and conciseAuthenticity: The recovered document must be verifiable as the originalValidation: Documentation process must be available for review by a neutral third partyRegulatory compliance becomes more challenging as storage subsystems grow in size and complexity.
71 Regulatory Legislation Examples Sarbanes-OxleyHIPAAUSA Patriot ActGramm-Leach-Bliley ActFRCPCFR a(f)NASD 3010 and 311021 CFR Part 11 (FDA)DODCalifornia Senate Bill 1386Florida Sunshine LawPCIISO 17799CFR Title 18 (FERC)E-SIGNEU Directive 95/46/ECBasel IINARA GRS20CFR Title 47, Part 42NASD 2711/NYSE Rule 472JCAHOFPC 65 COOP compliance
73 Disk Drive Reliability Misconceptions Actual disk failure rate is usually higher than publishedVendors indicate a .58% - .88% failure rateActual field usage suggests a 1.5% - 3.5% (or greater) failure rateField studies show no appreciable difference in reliability between SCSI, FC, and SATA drivesHeat and high duty cycles do not appear to have as detrimental of an effect on disk life as once thought
74 Disk Drive Reliability Misconceptions (Cont.) Infant Mortality doesn’t appear to be a significant issue for newly installed disksDisks exhibit a fairly linear rate of failure over time, which is contrary to the standard “bathtub” modelSelf-Monitoring, Analysis and Reporting Technology (S.M.A.R.T.) diagnostics appear to predict only about 50% of all disk failures
75 Disk Failures Are an Expected Occurrence Using the disk count from our previous Traditional vs. Tiered models, it’s easy to see that disk failures will occur on a regular basis.
77 Other Issues to Consider Design for minimal human interventionMaintain extensive monitoring of the environmentExercise control over information propagationArchitect for maximum performance and throughputEnsure robustness and high availabilityConfigure for scalability and flexibilityDevelop well defined SLA objectivesImplement a structured support operation
79 Emerging Technologies to Watch Thin ProvisioningData de-duplicationSAS InterfaceInfiniBandNPIV (N_Port ID Virtualization)Large-platter storage technologies2.5” disk technologiesSolid state disk drivesVirtualized file systems (i.e.- ZFS, SOFS)Grid Storage
81 Some Parting Thoughts Online multi-petabyte storage is a reality Data will double every two to three yearsStorage media cost-per-GB will continue to declineStorage Operational management is a growing issueGovernmental regulations will increase over timeNew technologies will demand additional storageExperienced storage designers and administrators will grow increasingly harder to findScarce data center resources (bandwidth, floor space, power, cooling, etc.) will become more expensiveA carefully designed architecture is the key to efficient storage operations
82 Putting It All Together Captain Kirk: Scotty - We need more power!!!Mr. Scott: Capn, I'm gi'in ya all I got, she can na take much more!
83 Randy Cochran, Infrastructure Architect Questions?Randy Cochran, Infrastructure ArchitectIBM Global Technical Services - Cell: (630)