2A new type of data is driving this growth The Big Data RealityInformation universe in 2009:- 800 ExabytesIn 2020′s:- 35 ZettabytesInformation Explosion 2.0: New World. More than just wordsEverything is on-line. Just read a recent cartoon shows a man with wires hooked into him saying to a friend, “My washing machine just texted me, my whites are done”. Pretty funny, but we’re there today.The future of information and connected-ness has literally come to this. Everything will be connected to everything else, and everything will have an “opinion” (information) for everything else. The amount of information created in the next 2-3 years will dwarf all information generated since the beginning of mankind. This is the information explosion. The “information age” of the past 10 years will be like a firecracker as compared to the supernova of information coming in the next 10 yearsThis is the era of BIG DATA. All this data is being generated by machines – surveillance cameras, web 2.0 sites, system logs, and rich media. All this information is swamping company’s ability to store and process it.Storage systems that store and manage this data must be able to move beyond simple scalability, they must be able to scale exponentially globally across distributed environments and yet enable users around the world to collaborate and update data as if they were in the same room or in the same siteA new type of data is driving this growthStructured data – Relational tables or arraysUnstructured data — All other human generated dataMachine-Generated Data – growing as fast as Moore’s Law
3A Paradigm Shift is Needed Vs.File storageObject StorageMillions of FilesScalability100’s of Billions of ObjectsPoint to Point, LocalAccessPeer to Peer, GlobalFault-TolerantManagementSelf-Healing, AutonomousFiles, Extent ListsInformationObjects w/ Metadata75% on averageSpace UtilizationNear 100%The Era of Big DataWe are in the era of "Big Data“Machine generated data and rich unstructured data are a key driving forceCompanies are losing the data management battle. Cost of ingesting, storing, processing and managing big data is affecting the bottom line“Swimming in Sensors, Drowning in data”Why Not Traditional File Systems?In the world of Big Data, traditional file system and database storage paradigms no longer workFile system overhead, scalability limitations, mgmt cost, and configuration complexity have all contributed to an ever increasing TCODatabase Scaling is being addressed by the NoSQL movement, but needs a different approach to solve the file system problemA new paradigm is needed which provides a cost effective, scalable distributed solution for big data environmentsCME is a poster child for Big Data, and Cloud is the optimal solution for Big Data
4What Big Data NeedsHyper-scaleWorld-wide single & simple namespaceDense, efficient & greenHigh performance versatile on-ramp and off-rampGeographically distributedProcess the data close to where its generated vs. copying vast amount of data to processingCloud enablingResiliency with extremely low TCONo complexityNear zero administrationUbiquitous AccessLegacy protocolsWeb AccessBig data requires intelligent storage systems that scale in all ways, ease of use, low environmental footprint, and fast and easy on and off ramps. Data must be distributed so users can process it wherever they are located, yet they must be able to be processed close to where it is generated. And last but not least, it must be extremely simple, protected, and be self managing, and be able to be accessed and processed by applications across distributed wide area networks using a wide range of interfaces
5Storage should improve collaboration … Not make it harderMinutes to install, not hoursMilliseconds to retrieve data, not secondsReplication built in, not added onInstantaneous recovery from disk failure, not daysBuilt in data integrity, not silent data corruption
6Introducing: DDN Web Object Scaler Content Storage Building Block for Big Data InfrastructureIndustry’s leading scale-out object storage applianceUnprecedented performance & efficiencyBuilt-in single namespace & global content distributionOptimized for Collaborative EnvironmentsGeographic location intelligence optimizes access latencyJust-in-time provisioningLowest TCO in the IndustrySimple, near zero administrationAutomated “Continuity of Operations” architectureIntroducing the DDN Web Object Scaler (WOS) is the industry leading distributed storage platform to meet the needs of Big Data.WOS is a revolutionary object-based, cloud storage system that addresses the needs of content scale-out and global distribution. At its core is the WOS object clustering system - intelligent software that allows a massively scalable content delivery platform to be created out of small building blocks, enabling the system to start small and easily grow to a multi-Petabyte scale. WOS is a fully distributed system, meaning there are no single points of failure or bottlenecks. This allows the system to scale with each new cloud building block (called nodes) linearly adding to the system’s performance capabilities and storage capacity.WOS nodes are self-contained appliances configured with disk storage and CPU and memory resources. Each node is pre-configured with the WOS software and Gigabit Ethernet network interfaces. A cloud can be created out of any nodes that have Internet Protocol (IP) connectivity to each other, regardless of their physical location. Various capacities and performance configurations of WOS nodes are available. Clouds support heterogeneous node types, allowing tailoring of performance and scale to the needs of the environment.WOS changes the paradigm for storage of big data in several ways:Distributed Hyperscale object storage that not only scales up (inside a local site) but also outCollaborative, so many users across multiple sites can ingest, read, and update files concurrentlyLow (near zero) administrative overhead since big data administration, maintenance, and scale-out must be automatedData locality&Global Collaboration
7The WOS initiativeUnderstand the data usage model in a collaborative environment where immutable data is shared and studied.A simplified data access system with minimal layers.Eliminate the concept of FAT and extent lists.Reduce the instruction set to PUT, GET, & DELETE.Add the concept of locality based on latency to data.
8WOS Fundamentals No central metadata storage, distributed management Self-managed, online growth & balancing, replicationSelf-tuning, zero-intervention storageSelf-healing to resolve all problems & failures with rapid recoverySingle-Pane-of-Glass global, petabyte storage management
9WOS – Architected for Big Data Hyper-ScaleGlobal Reach & Data Locality256 billion objects per clusterScales to 23PB per clusterStart small, grow to tens of PetabytesNetwork & storage efficientUp to 4 way replicationGlobal collaborationAccess closest dataNo risk of data lossResiliency with Near Zero AdministrationUniversal AccessThis slide can be used to summarize WOS if you have limited timeHyperscale quadrantWOS nodes are self-contained appliances configured with disk storage and CPU and memory resources. Each node is pre-configured with the WOS software and Gigabit Ethernet network interfaces. A cloud can be created out of any nodes that have Internet Protocol (IP) connectivity to each other, regardless of their physical locationWOS Nodes are available in two versions,WOS 1600 – a 3U 16 drive node that provides up to 48TB (3T Drives) of SAS/SATA storage per trayWOS 6000 – a 4U 60 drive 2-node tray that provides up to 180TB of SAS/SATA storage per trayA rack containing 11 WOS 600o trays contains 2PB of storage per rack,WOS 2.0 limits – will be extended in each release Billion objects per cluster, scales to 23 PB per cluster.Easy to extend, grow brick by brick.Efficiency is key, from a density, power, disk utilization, and I/O perspectiveGlobal Reach & Data Locality quadrantWOS is architected from the ground up to be distributed across multiple sites. Data is replicated across up to 4 sites (extended in upcoming releases) based on policy. While replication is typically used for data protection & DR, with WOS, it enables something even more powerful.. Data locality and global collaboration. WOS maintains intelligence regarding the location of objects so Users at each replica site always access data closest to them (latency). Users can collaborate (ingest, access & update data) at local speeds across multiple locations. WOS is uniquely architected to enable local data access and global collaborationResiliency with Near Zero AdministrationWOS Self healing provides completely automated data protection and reduces or eliminates service calls. WOS resiliency and administration is superior to competitive approaches in severa1 ways:All drives fully utilized – Any free capacity on any drive is part of the spare pool50% shorter re-balance times – Only actual data is copiedFaster recovery times increase overall performance and reduce risk of data lossDrive failures decrease overall capacity only by the size of the failed drivesTotal capacity may be restored by replacing drives during scheduled maintenanceAll combined, WOS administrative simplicity, self healing capabilities, ease of provisioning & deployment provide the lowest TCO in the industry.Universal AccessWOS provides a NAS, cloud, and Native API interfaces to enable WOS benefits to be available to a wide range of applications.NAS Protocol GatewayUsed for internal POSIX apps that want to utilize internal cloud storageEfficiently utilizes WOS cloud storage with easy provisioning/de-provisioningFederates across multiple sites and provides collaboration & data locality servicesCloud Storage PlatformTargeted at cloud service providers or private cloudsEnables S3-enabled apps to use WOS storage at a fraction of the priceSupports full multi-tenancy, bill-back, geo-replication, encryption, and per-tenant reportingNative Object interfaceWOS quickly integrates in to existing and new environments using the simple but powerful WOS native storage interface.Available API Commands include: PUT object, GET object, DELETE object, RESERVE Object IDAccess WOS via standard tools including REST, Java, PHP, Python and C++.Self healingAll drives fully utilized50% faster recovery than traditional RAIDReduce or eliminate service calls
10WOS & the Big Data Life Cycle WOS is an intelligent, scalable object store ideal for both high-frequency transactions as well as content archiving and geo-distributionHigh PerformanceDistribution & Long Term PreservationContent DistributionReal Time ProcessingContent Growth RatesAccessDay 1 30 Days 90 Days 1 Year 2 Years 5 Years n YearsWOS delivers high performanceWOS delivers automatic replication & geo-distributionWOS delivers low TCO & massive scalability
12WOS: Distributed Data Mgmt. Application returns file to user.A file is uploaded to the application or web server.A user needs to retrieve a file.The WOS client automatically determines what nodes have the requested object, retrieves the object from the lowest latency source, and rapidly returns it to the application.Application makes a call to the WOS client to read (GET) the object. The unique Object ID is passed to the WOS client.The WOS client returns a unique Object ID which the application stores in lieu of a file path. The application registers this OID with the content database.Application makes a call to the WOS client to store (PUT) a new objectThe WOS client stores the object on a node. Subsequent objects are automatically load balanced across the cloud.App/Web ServersOID = 5718aOID = 5718aDatabaseThe system then replicates the data according to the WOS policy, in this case the file is replicated to Zone 2.LAN/WANZone 1Zone 2
14..… WOS Under the Hood WOS Tray Components WOS Software ClientsClients/AppsOther WOS NodesWOS Node / MTSClientsClients/AppsWOS-LibHTTP/REST4 GigEWOS Tray ComponentsProcessor /controller motherboardWOS Node softwareSAS or SATA Drives (2 or 3TB)WOS SoftwareServices I/O requests from clientsDirects local I/O requests to diskReplicates objects during PUTsReplicates objects to maintain policy complianceMonitors hardware healthWOS APIWOS APINode OID MapThis is the main WOS node-side process. It’s responsible for the following:Establishing & maintaining a connection to the “cluster” via network connection to the Cluster Manager (MTS)Servicing I/O requests from clientsDirecting local I/O requests to disk via the Local Data Store (detail on next slide)Replicating objects during PUTsReplicating Object Groups (ORGs) to maintain policy complianceAggregating node-side statistics for communication back to MTSMonitoring hardware health via IPMI or SESThough “Network Services” is depicted as a single component in this diagram, this is actually a set of components that is instantiated for each of the distinct communication channels. It provides TCP/IP-based socket setup, tear-down, heart-beats, reconnect-logic, message formatting, message handlers, etc...…Array Controllers2 or 3TB drives
16Efficient Data Placement WOS “Buckets”Contain objects of similar size to optimize placementWOS Object “Slots”Different sized objects are written in slots contiguouslySlots can be as small as 512B to efficiently support the smallest of files.The Result:WOS eliminates the wasted capacity seen with conventional NAS storage
17Object Storage vs. File System Space Utilization WOS utilizes an average of 25% more of the available disk space than does SAN/NAS file systems1PB deployment, stranded space totals 250TB of space, which add $50K-$100K of system cost, as well as ongoing power & space costsWOS eliminates stranded capacity Inherent in SAN/NAS File System storage
20WOS Deployment & Provisioning WOS building blocks are easy to deploy & provision – in 10 minutes or lessProvide power & network for the WOS NodeAssign IP address to WOS Node & specify cluster name (“Acme WOS 1”)Go to WOS Admin UI. WOS Node appears in “Pending Nodes” List for that clusterSan FranciscoNew YorkLondonTokyoSimply drag new nodes to any zone to extend storageDrag & Drop the node into the desired zoneAssign replication policy (if needed)NoFSCongratulations! You have just added 180TB to your WOS cluster!
21Intelligent Data Protection RAID Rebuild vs WOS Re-Balance FS + RAID 6Web Object ScalerRe-BalanceRAID SparesImmediate Service CallxCapacity Available: 118TBCapacity Available: 120TBCapacity Available: 116TBRebuildxx……x…xOptional Scheduled Service Call Restores Capacityx……xWOS (Replicated or Object Assure)Traditional RAID StorageRAID Rebuilds DrivesLost capacity - Spare drives strand capacityLong rebuild times - Whole drive must be rebuilt even though failed drive only partially fullHigher risk of data loss – if spare drive is not available, no rebuild can occurIncreased support costs - immediate service call is required to replace low spares conditionReduced write performance- RAID reduces disk write performance, especially for small filesWOS Re-Balances Data Across DrivesAll drives fully utilized – Any free capacity on any drive is part of the spare pool50% shorter re-balance times – Only actual data is copiedFaster recovery times increase overall performance and reduce risk of data lossDrive failures decrease overall capacity only by the size of the failed drivesTotal capacity may be restored by replacing drives during scheduled maintenanceDrive failure causes objects to be copied from replica or corrected data to other existing drives
23Cloud & Service Provider Tools Private or Internal CloudHosted Managed Service ProvidersPublic CloudsInternal Cloud CustomerMedium to large multi-site enterpriseProvides services to internal BU’sLowers costs by optimizing utilization of CPU & storageTransfer EC2& S3 workload in-house to improve security & lower costsBill-back internal departments for servicesManaged Service ProviderProvides hosting services for a few large customersHosts at local site or third party data centerMay share some resources across multiple customersExtremely security consciousPublic CloudShares resources across many customersHosts at third party data centersSubscription pricing for CPU, storage, & network usageOffers lowest CAPEX, Subscription pricingOpportunities & Case StudiesEnterprise IT wants to move AWS workloads in-house because of security and IP, record control concerns and to reduce costs. Examples of these types of opportunities are the Financial, Pharma, Life sciences, and state/federal government. Remote site support and data locality are important for these custiomersHosted clouds /MSPs opportunities are large enterprises who want to farm out IT operations to reduce costs while still maintaining security and location control. Hosting providers /MSPs are able to optimize resource utilization by taking over complete IT operations for a relatively small number of companies and using virtual processing and virtual storage/thin provisioning to reduce costs. Examples of hosted environments include regional, citywide / municipal, airports surveillance, as well as commercial central site and branch office surveillance, and home surveillance.Public cloud customers are fairly well known and include operations for SME and smaller commercial entities, as well as LE’s for low security processing.ServiceProvidersCommon Needs: DR, Multi-tenancy, Data Locality, Standard Interfaces, Low TCO
25Failure recovery - Data, Disk or Net Get Operation – Corrupted with RepairWOS-Lib selects replica with least latency & sends GET requestNode in Zone “San Fran” detects object corruptionWOS-Lib finds next nearest copy & retrieves it to the client appIn the background, good copy is used to replace corrupted object in San Fran zoneGet OperationWOSLib selects replica with least latency path & sends GET requestNode in Zone “San Fran” returns object A back to applicationOperation: GET “A”40 ms80 ms10 msLatency MapClient AppWOS-LibWOS Cluster Group Map1223Zone San FranWOS Nodes…WOS Nodes…Zone New YorkWOS Nodes…Zone LondonA4...XAAABest viewed in presentation mode
26Geographic Replica Distribution 1PUT with Asynchronous ReplicationWOSLib selects “shortest-path” nodeNode in Zone “San Fran” stores 2 copies of object to different disks (nodes)San Fran node returns OID to applicationLater (ASAP) Cluster asynchronously replicates to New York & London zonesOnce ACKs are received from New York & London zones, extra copy in San Fran zone is removedClient App40 ms80 ms10 msLatency MapWOS-LibWOS Cluster Group MapA234Zone San FranWOS Nodes…WOS Nodes…Zone New YorkWOS Nodes…Zone London...1AAABest viewed in presentation mode
27Multi-site Post-Production Operation Data Locality & Collaboration LA site user edits video “A”, which replicates to Mexico City & Madrid based on policyMP Gateway immediately synchronizes metadata DB with Madrid userMadrid user requests video “A” for processing, WOS-Lib selects Madrid site (lowest latency) & retrieves for the userThe Madrid user extracts frames from the video & writes to WOS (new object), which replicates to Mexico City & LALos AnglesMexico CityMadridNAS GatewayWOS-Lib40 ms80 ms10 msLos Angeles Latency MapReal Time Editing App30 ms80 ms10 msMadrid Latency Map2Los Angles User14433Zone Los AngelesThe example shows how WOS can be used to allow for collaboration and synchronization in real-time.Editing stations in Los Angeles and Madrid are making a last minute changes to a movie trailer set for a Grand Opening Premier tonight in Mexico City.The LA team makes their last changes and they are committed to the cloud. Immediately they are replicated to all zones and Madrid finished adding in the language subtitles and performs Q.C. checks. Final product is committed back to the cloud where Mexico City team access latest real-time update for tonight's premier.No waiting for facilities to down load via ftp or plane flights with new reels / disk of content. Data is delivered as needed. Allowing for last minute changes while keeping the opening premier on time, on schedule and ensuring the project stays on budget....BBBWOS-LibAAAANAS GatewayAReal Time Editing AppZone Mexico CityZone MadridCluster “Acme WOS 1”Madrid UserBest viewed in presentation mode
28iRODS IntegrationiRODS, a rules oriented distributed data management application meets WOS, an object oriented content scale-out and global distribution systemWOSPetabyte Scalability: Scale out by simply adding storage modulesUnrivaled Simplicity: Management simplicity translates directly to lower TCOSelf-Healing: Zero intervention required for failures, automatically recovers from lost drivesRapid Rebuilds: Fully recover from lost drives in momentsReplication Ready: Ingest & distribute data globallyDisaster Recoverable: For uninterrupted transactions no matter what type of disaster occursFile Layout: Capacity and performance optimizedObject Metadata: User-defined metadata makes files smarterRules oriented application meets object oriented storage
29WOS + IRODS is the simple solution for Cloud Collaboration WOS is a flat, addressable, low latency data structure.WOS creates a “trusted” environment with automated replication.WOS is not an extents based file system with layers of V- nodes and I-nodes.IRODS is the ideal complement to WOS allowing multiple client access and an incorporation of an efficient DB for metadata search activities.
30Some iRODS Examples NASA & iRODS U.S. Library of Congress Jet Propulsion LaboratorySelected for managing distribution of Planetary DataMODUS (NASA Center for Climate Simulation)Federated satellite image and reference data for climate simulationU.S. Library of CongressManages the entire digital collectionU.S. National ArchivesManages ingest and distributionFrench National LibraryiRODS rules control ingestion, access, and audit functionsAustralian Research Coordination ServiceManages data between academic institutions