Presentation on theme: "Roberto Tolini - NetApp Business Solutions Architect EMEA"— Presentation transcript:
1 Roberto Tolini - NetApp Business Solutions Architect EMEA NetApp Distributed Content Repositories: What Are We Doing in Real Life?Roberto Tolini - NetAppBusiness Solutions Architect EMEA
2 Agenda “Big Content” and Object Storage StorageGRID: Overview and ArchitectureWhere does it fit? Use cases and target marketsCompetition overviewHow to Prove it works? PoC, Test and Demo capabilitiesSummary, resources, and contacts for EMEA
4 What Does Your Corporate Data Look Like? Human-generated and machine- generated file data represent ~80% of all corporate dataThis data cannot be deleted, even though……97% of this data will never be touched againIt’s too expensive to keep this data on primary storage
6 All That Data Is Stressing the Infrastructure ChallengesRapid, untamed growth of unstructured dataPerpetually retain large and growing datasetsDistributed users and app environmentNeedsPB scale, billions of objects, reduced operational overhead, efficient managementPolicy-based placement, seamless technology refreshPredictable, location-independent access anywhere, anytime
7 So: what is exactly Object Storage? BlockFileObjectSpecific location on disks / memoryTracksSectorsSpecific folder in fixed logical orderFile pathFile nameDateFlexible container sizeData and MetadataUnique IDA content repository or cloud can be an “object storage” system, but what does that really mean?Every method of storing data involves ways to address, or refer to that data.Block-level storage refers to content stored at specific locations on disks or in memory. This can be a very efficient way to store databases with values that have a fixed length and change frequently. Values in a data table can be mapped directly to locations on disk with little translation involved.File level storage requires that every file be given a name and stored in a specific folder. File systems are limited in the number of files and folders they can reference, and in the length of path names. This tends not to be an issue on desktop computers that only store millions of files, but becomes a limitation for enterprise storage involving billions or trillions of files. Due to their hierarchical nature, one always has to specify that folders be arranged in a fixed logical order. For example, a set of folders containing invoices might be arranged by customer, then by the type of service provided. Later, changing this structure to a different arrangement can be very complex and time-consuming.Instead of providing a block-oriented interface that reads and writes fixed sized blocks of data or organizing data in a hierarchical series of file folders, Object Storage organizes data into flexible-sized data containers, called objects. Each object has both data (an un-interpreted sequence of bytes) and metadata (an extensible set of attributes describing the object). Object based storage uses unique IDs to identify files and packages these along with extensible metadata about the object. This allows data to be referenced and queried based on anything about the file. The types of identifier tags used allow for the indexing of files in quantities several orders of magnitude higher than a file system, making object storage ideal for enterprise storage distributed over wide areas.
8 Distributed Content Repositories Based on NetApp StorageGRID Software Large content repository for big, unstructured dataBillions of data sets, dozens of petabytesCreate, manage and consume content globallyPredictable access to data independent of locationPolicy-controlled data stores at each siteIntelligent data classification and accessMetadata-based managementThe NetApp solution for Distributed Content Repositories is based on NetApp StorageGRID, which was designed from the ground up to solve Big Content challenges in globally distributed environments. With the Distributed Content Repository solution customers can store petabytes of capacity and billions of files in a single, global container that can spread dozens or hundreds of sites across the globe. In addition, StorageGRID leverages object storage technology to offer long retention times (often measured in decades) and the ability to store, manage and retrieve data based on metadata – or “data about your data”.StorageGRID uses metadata-based management for data classification and access, meaning that StorageGRID manages where data is physically stored, how many copies exist (and where) for disaster recovery purposes, how long those copies are retained and when they are destroyed. Further, metadata-based access to your data means that instead of looking for a file name, you simple look for “Mortgage documents”, customer “John Doe”, account number “123456” – greatly simplifying how your applications interact with your storage.
10 A bit of history StorageGRID Acquisition of Bycast Inc. in 2010 with a decade of object storage innovationFootprint in long-term archive, healthcare marketSince 2010, expansion from healthcare to telecom/service providersIBM OEM customers transitioned to NetApp-branded productCurrently in version 9 (9.0.2)First product to support industry-standard object storage CDMI
11 NetApp StorageGRID Solution MULTIPLE: APPLICATIONS + SITES + PROTOCOLSSite 1Site 2Site 3… Site NAPPLICATIONSAPPLICATIONSAPPLICATIONSAPPLICATIONCIFSNFSHTTP/CDMICIFSNFSHTTP/CDMICIFSNFSHTTP/CDMICIFSNFSHTTP/ CDMIMULTIPLE: TENANTS and POLICIES and ADMINISTRATORSDisk StorageTapeStorageGRID makes it possible for complex storage networks involving multiple applications using multiple protocols spread across multiple sites to all be seamlessly managed as a single entity. StorageGRID can provide secure public or private cloud storage services to multiple tenants, each with their own policies and administrators. It also allows for storage to be organized into arbitrary storage pools that can overlap and be grouped by tier.Presenter 1StorageGRID makes possiblecomplex storage networkmultiple applicationsmultiple app protocolsmultiple sitesmultiple storage technologymanage as a single entity.MULTIPLE: TARGETS and TIERSNetApp Confidential – Limited Use
12 MULTIPLE: APPLICATIONS + SITES + PROTOCOLS ILM Policy ManagementMULTIPLE: APPLICATIONS + SITES + PROTOCOLSMetadata: FPTH starts with “/app1share/*”File1App1App2CIFSNFSHTTPCIFSNFSHTTPFile2Metadata: XTYP equals “bronze”StorageGRID®ILM Evaluation…Number of copiesStorage locationStorage tierRetention periodPolicyLet’s walk through a quick example of how data flows on storage.Applications write data to StorageGRID as to any network-attached file system with folders & subfoldersWhen the data is written, two parallel activities take placeOne, the metadata for the file is evaluated and a policy is applied; these policies determine the number of copies to be created, the location for each copy, what tier of storage the copy resides on at each location, and what happens to it over timeSecond, the data itself is compressed, encrypted, and a digital fingerprint is generatedTogether, the data and metadata are combined to create a managed objectIn this example, one copy stored at Site 1 and replicated – according to policy – to Site 2Site 1Site 2
13 NetApp E-Series Storage Systems Distributed Content Repository StorageGRID features and capabilities summaryMULTIPLE: APPLICATIONS + SITES + PROTOCOLSTechnical OverviewMulti-protocol: CIFS, NFS, RESTful HTTPScale-out architecture: capacity, count, sites, tiers, tenants, throughputPolicy-management: copies, locations & tiers on ingest / over timeObject storage: compression, encryption, fingerprint + metadata, WORMHA & DR: NDO, active+active for data and metadata, self-healingSite 1Site 2Site 3… Site NAPPLICATIONSAPPLICATIONSAPPLICATIONSAPPLICATIONStorageGRIDMULTIPLE: TENANTS + POLICIES + ADMINISTRATORSNetApp E-Series Storage SystemsTapeCan the Grid be shared among different applications/tenants?YES! Multiple sites/tenants/applications and several methods of separating them.I struggle to understand the difference between Filesystem and API?SG can present (= allowing applications or users to access it) either by presenting a Filesystem (CIFS/NFS) AND/OR presenting an URL for applications to write to (HTTP). API are used to enable I/O operations on SG.SG API vs CDMI. What are the differences?They do similar things. Both use HTTP protocol. SG API are the “Bycast API”. CDMI (Cloud Data Management Interface) is a newer set of API developed by SNIA and become ISO standard in CDMI is the future of development, however SG API will be still supported for long time.MULTIPLE: TARGETS + VENDORS + TIERS
15 HTTP API via Gateway Simple Client Implementation 4/12/2017HTTP API via GatewaySimple Client ImplementationGateway load-balances sessions across available Storage NodesStorage Nodes perform HTTP API transactions15
16 Sample Code – Storing Data Application CodeHTTP “PUT” RequestGrid Response“PUT” acceptedData TransferGrid ResponseObject receivedUUID returned to app(c) 2005 Bycast Inc.
17 What Is StorageGRID?StorageGRIDIs an Object Storage software solutionIs a software component (Bycast)Runs on a computing layer (default option: VMs)Holds the “intelligence”; manages data according to defined policiesData (objects) are stored in a storage layer; different types supported. The whole “solution” is referred to as “NetApp DCR.”
18 DCR Solution components StorageGRID SoftwareServers or HostsStorageNetwork⁞Notes (Q&A on this slide):Is StorageGrid available in blocks only?NO! We are flexible. Using “block” is aimed at making things easier (think about “FlexPods”)Can I configure a customized solution?YES! Depending on customer needs we can create any block. Just remember: it’s not ONLY about storage capacity, we do need to design for number of objects, I/O, HA, retention, data protection, ILM, etc… We are here to help you.Can I use a different storage than E-Series?YES. Under certain restrictions you can use FAS or even 3rd party storage (special cases). FPVR needed for the moment.You mentioned a “DCR Rack”. Is it available in EMEA?Not yet. It is available in US only.Example: 2 Sites - DC and DR solution
19 Building Block Software – 9.0 (main) NetApp StorageGRID 9.0 (9.0.2 today)SUSE Linux Enterprise Software 11 SP2VMware ESXi 5.0 (upd1)VMware ESXiSG⁞Do we have a preferred server vendor?NO! We are flexible. Depending on country and deal we can use any vendor or even customer-supplied servers. We just provide the requirements (HW resources) and the VM-to-physical mapping (see blocks as example)Can we use a different Hypervisor?Not yet, But it will come. For the time being we use Vmware ESXi. Note that we don’t need any Vmware advanced feature for SG .What do we quote as NetApp? Partner?NetApp: storage and StorageGrid license. PS for implementation (*)Partner: servers, SLES and VMware vSphere (**). Unless Customer has them already. (if needed) API integration services.(*) Unless partner is enable to delivery PS services for SG.(**) For small deployments the “free” ESXi is enough
20 Where does it fit? Use cases and target markets
21 DCR Solution: Target Markets Where is DCR Solution a good fit?In general: wherever there is a need for preservation, compliance, data integrity, distributed repositories (multi-site, multi-tenant), high availability, scalability, etc.And where it’s not…In general: highly transactional data, “dumb” storage, “none of the above”, etc…Target markets and use case examples:HealthCare: PACS (imaging), Electronic Health RecordsFile and archiving“Dropbox-like”, iCloud-like cloud services (sharing, synchronizing)Cloud archiving, backup: legal archiving, service providers, knowledge preservationRT notes:The slide is aimed at showing which are the best targets for DCR solution. The general rule is that solution is best fit where there is a need for preservation of data on long term ensuring its integrity and accessibility. Technological refresh is another key point.Target markets and use cases are based on current field experience and are the most common uses of this solution in real-life customer environmens. The list is not exhaustive but represents 90% of the current deployments.
22 What I do Need to Consider? Customer existing Application(s)StorageGRID is mainly accessed by applications rather than users directly. Example: PACS, Document management, etc...Is there an existing integration/reference with customer application? (Using API integration vs filesystem)Customer needs or problems to solveWe might need to propose an application that can solve customer problems and leverage StorageGRID capabilities. Examples: archiving, “dropbox”It might make sense to bring in a partner ISV or develop an ad hoc solution with a NetApp partner
23 File System vs. API: Why Should I Care? StorageGRID can be accessed in two ways:FileSystem: by exporting a CIFS/NFS share to users/applicationsHTTP API (SG API OR CDMI): by presenting a URLWhy does it matter? What are the differencies?File System: simple and immediate No integration needed.API: need a “connector” to application (integration)Why develop or integrate with API?Almost infinite scalabilty, truly unique namespace, can leverage metadata in a much more efficient way.
24 Why NetApp StorageGRID? General Considerations Main reasons for the choice:Applications can leverage Object Storage to enhance metadata use for data managementApplications can use a truly global namespace via APIStorageGRID provides real distributed content management (DR, ILM for data, HTTP access)Performances and scalability model (scale-out with “blocks”). Almost infinite scalability.Long-term data integrity guarantee built-in
25 Use Case 1: Healthcare (PACS and EHR) Business requirements:Preserve medical records for long term (integrity guarantee)Ensure compliance to regulations (HIPAA, EU, etc...)Guaranteed data accessibility, distribution and sharingSolution:A PACS and/or EHR application implemented on NetApp StorageGRID infrastructure
26 Solution Summary: One-Pager Delivery ModelsInterface with PACSOn Premises/LocalCloud service (example Iron Mountain)Managed Service4Local AND/OR Cloud Archive, SP facilitiesHospital3Billing (for Clud model or managed service)PatientsFlat subscription (level-of-service)Per effective usage (GB OR Objects stored/retrieved)2InfrastructureMedical record generation (Exam, X-Ray, etc...)NetApp StorageGRIDFront-end application (PACS)(optional) Middleware application (i.e.: DeJearnette, ForeCare,etc...)1Web-based configuration for infrastructure (local and centralized), capacity, access, etc...AdministratorOffer elementsDescriptionTarget Market SegmentsCoreOptionalsSolution allows store patient health records (PACS, others) in a «cloud» (Grid). Data can be either totally offsite (only local «cache» installed at customer), totalli onsite or both onsite and offsite («cache» + local storage at customer).Data integrity guarantee (self-healing), DR and compliance (HIPAA, etc...). Managed object lifecycle.Doctors and healthcare professionalsSmall-Medium HospitalsLarge hospitals (DR)Service ProvidersLevels of service (onsite infrastructure vs remote infrastructure. Retention, etc...)«Compliance» on data (audit, WORM, managed lifecycle, etc...)Managed Health Records repository
27 Use Case 2: File and Email Archiving Business requirements:Offload of less accessed files from primary storageArchiving of files (with or without legal value)Archiving of s (MS Outlook, Lotus Domino, etc...).Solution:An application for „file (or ) archiving “ implemented on NetApp StorageGRID infrastructure
28 Use Case (Solution) Overview It is a solution that includes a file (and, in some cases, ) archiving/tiering application that enables offload of user content from primary storage to other data storage tier(s).NetApp StorageGRID is used as secondary tier and provides the distributed content infrastructureApplication moves data from primary storage based on different parameters (age, metadata, etc...) to StorageGRID.Solution can leverage StorageGRID data management features (ILM, data protection, self-healing, multiple sites synchronization, etc...)
29 File Archiving: Theory Of Operations Inactive files are moved to StorageGRID (stubbed or not stubbed depending on the methodology used)Event-based, policy-based. User- initiated, MS SharePoint, etc...HTTP/CDMIStorageGRID
31 Solutions Examples: A Real-life “Taste” File archiving and ArchivingSymantec EV (API integration)File archiving/TieringNTP Software OSCC (API integration)PoINT Software Storage Manager (API integration)F5 ARX (CIFS “integration”, validated architecture)Other solutions (general approach)FSG: CIFS/NFS shares are used whenever there is not a specific API integration (any other application)
32 Use Case Example 3: Cloud File Sharing Business requirements:A “Cloud File Hosting” Service for retail customers (end users) and/or businesses (“private Dropbox”-style).Solution:A partner application for “Cloud File Hosting” implemented on NetApp StorageGRID infrastructure
33 What Is Cloud File Sharing? In general it is a file hosting solution for individuals (retail customers) or enterprises; it was developed for NetApp StorageGRIDIt syncronizes files accross desktop, laptop and smartphones (iPhone, Android, Blackberry and Windows Mobile), allowing users to share and access them from everywhereCan be “white labelled”, customized, run stand-alone or integrated with billing systems, CRM, LDAP (for users access)NetApp StorageGRID provides the distributed content infrastructure
34 Solution Summary: One-Pager Sales ModelOn Line (SP portal)Direct/Indirect sales (B2B)3File synchronization between PC, tablet and smartphone. Search and archive capabilitiesActivationSelf-provisioning through provider Web portal (user creation, level of service, etc...)Billing224Access to folders limited to selected groups (employees, suppliers, customers)UsersFlat subscriptionPer effective usage (GB OR Objects stored/retrieved)Online folder creationUsers authenticationUsers invited and files shared via o SMS (with or without password)Infrastructure providerNetApp StorageGRIDFront-end application (Partner or SP-customized)1Web-based configuration for infrastructure capacity, access, etc...AdministratorOffer elementsDescriptionTarget Market SegmentsCoreOptionalsLevels of service«Compliance» on data (audit, WORM, managed lifecycle, etc...)Solution allows to share documents and information within working groups inside company or with external entitiesMulti-channel access (desktop, web,mobile, etc...) with content synchronizationProfessionalsSmall-Medium EnterprisesEnd usersLarge CorporatesSecure file sharingPrivate «Dropbox»
35 Solutions Examples: A Real-life "Taste" Turk Telekom “BuluttDepo” (MRD “Nimbus” application)API-based integration with StorageGRIDDeveloped specifically for Service ProvidersMezeo CloudMulti-purpose “Cloud” applicationOther solutions (general approach)FSG: CIFS/NFS shares are used whenever there is not a specific API integration (any other application)
37 Who Is The “Enemy”?Well, first and foremost ourselves...but we’re improving “Blurred” border between object storage and “scale-out NAS” solutions. Not always easy to understand which is best fit. We often end up competing both with Object Storage and NAS solutions.Main competitors:EMC: Atmos and Centera (typically in banking sector), Isilon (typically in Service Providers)HDS: HCP (Hitachi Content Platform) and HUS (Unified Storage, now with http and object interface)IBM SONAS (scale-out NAS)DDS WoS (Web Object Scalar) and others (less in EMEA, more in U.S.)
38 Strategies to Win Understand the workload Do not undervalue E-Series! Object counts & sizesSweet-spot object >500KBCounts up to 8B, Capacity to 35PBPerformance requirements100MB/s ingest per file system namespace10Gbps aggregate ingest/retrieve via object APIsLook for the ISV that completes the puzzleEnterprise ArchiveMediaHealthcareDo not undervalue E-Series!Rock-solid enterprise arrays350,000 systems deployed WWDensity and performance1.8PB – 2.4PB per rack
40 Competitive Resources We have some good material on FieldPortalForrester Report: Total Economic impact of NetApp DCR Solutionhttps://fieldportal.netapp.com/Core/DownloadDoc.aspx?documentID=91451&contentID=122053ESG Lab Validation report NetApp DCRhttps://fieldportal.netapp.com/Core/DownloadDoc.aspx?documentID=80710&contentID=99326EMC Atmos: CAT Competitive presentationhttps://fieldportal.netapp.com/Core/DownloadDoc.aspx?documentID=94618&contentID=129854Other resources are available internally at the moment (just ask if you need), but they will soon be made available on FieldPortal.
41 How to prove it works?Test, PoC and performances testing guidelines
42 StorageGrid: demo /PoC capabilities Lab-on-demandTargeted for online demos (1-2 hours)Requires access to NetApp LoDStorageGrid-in-a-Laptop (SiL)Complete set of functions, can be done «on-the-fly»Grid Nodes consolidated and pre-configuredFits in a laptop low resources consumptionStorageGrid-in-a-Box (SiB)Complete set of functions, can be done onsite.Needs server (or servers)Allows for higher performance
43 StorageGrid: demo /PoC capabilities (cont) “Full system” (SG full stack of components)Full Grid deploymentAllows for full performancesNeeds server (or servers) and E-Series storageNeeds onsite work for implementationLead time impacted:Purchase of demo equipmentHW delivery timeTalk to your TPM!
44 NetApp StorageGRID: Lab-on-Demand https://labondemand.netapp.comNeeds registration (partner, eventually customer)Full guided Lab (1h) or «free session»Session is: «Insight 2012 BD Hands-On StorageGRID 9.0”Lab needs to be reserved (timeframe must be specified)After session done lab is reset to initial statusFull guide for lab available online or as pdf
45 StorageGRID-in-a-Laptop (SiL): overview Pre-packaged set of 2 VMs to be installed on Vmware Workstation 7/8 or ESXi 5Can be deployed at customer siteCan be installed on a laptop or serverIncludes a set of prepackaged test cases/scripts (additional Linux VM) and PoC guideLimited customization optionsCompressed images ~7GBNeeded space: ~120GB (max)Ask us for details
46 NetApp StorageGRID: SiL topology Additional Linux VM for running tests/scriptsIndividual VMs can be spread across multiple servers if needed (smaller servers, using existing resourcesSome nodes can be turned down if neededVmware ESX/ESXi/Workstations/ServerNetApp Confidential – Limited Use
47 NetApp StorageGRID: SiL overview NetApp Confidential – Limited Use
48 StorageGRID-in-a-Box (SiB): overview Pre-packaged set of VMs to install on Vmware ESXi 5Can be deployed at customer siteNeed at least a server to be installed onIncludes a set of prepackaged test cases/scripts (additional Linux VM) and PoC guideCan be customized according to test needsCan be spread across multiple servers/sitesCompressed images ~40GB, needed space 800GB-1.2TBAsk us for details
49 SiB Configuration (example) One Intel Sever, with two 6-core Xeon processors, 48GB memory, 8x600GB SAS disks, four GbE NICsOne 8 port 1GbE switch (or customer switch)Vmware ESXi 5StorageGRID Software LicenseThis is just a configuration example. The servers specifications can vary according to the desired capacity. It can also be deployed on more than a single server.
50 Typical Test Cases and proof points File System access:CIFS/NFS basic access, FSG cache behavior, replication, WORM, etc…HTTP API accessSG API and/or CDMI ingest/retrieve, metadata update, etc…ILM (object lifecycle management)Automatic content placement based on metadata, etc…Data integrity (content self-healing and inclusive protection)HA and “no single-point-of-failure design”Integration with application(s): involve application vendors or developers!
52 Key Takeaways: What Should I Remember? Learn what StorageGRID is and which are the use cases where we can more effectively position itUnderstand which are the critical points to address in each of them and leverage existing experiencesGet in touch with people who can help you in EMEA. (You’re welcome. )
53 Resources and Information NetApp DCR Solution:repositories.htmlNetApp Fieldportal DCR solution landing page:https://fieldportal.netapp.com/applications/storagegrid.aspx#14550Contacts for EMEARoberto Tolini:Philippe Wackers:Or contact your local NetApp office/TPM