Presentation is loading. Please wait.

Presentation is loading. Please wait.

OceanStore Global-Scale Persistent Storage John Kubiatowicz University of California at Berkeley.

Similar presentations


Presentation on theme: "OceanStore Global-Scale Persistent Storage John Kubiatowicz University of California at Berkeley."— Presentation transcript:

1 OceanStore Global-Scale Persistent Storage John Kubiatowicz University of California at Berkeley

2 OStore John KubiatowiczIRAM Summer Retreat 2000 Context: Project Endeavour Interdisciplinary, Technology-Centered Team Alex Aiken, PL Eric Brewer, OS John Canny, AI David Culler, OS/Arch Joseph Hellerstein, DB Michael Jordan, Learning Anthony Joseph, OS Randy Katz, Nets John Kubiatowicz, Arch James Landay, UI Jitendra Malik, Vision George Necula, PL Christos Papadimitriou, Theory David Patterson, Arch Kris Pister, Mems Larry Rowe, MM Alberto Sangiovanni- Vincentelli, CAD Doug Tygar, Security Robert Wilensky, DL/AI

3 OStore John KubiatowiczIRAM Summer Retreat 2000 Endeavour Goals: Enhancing human understanding –Help people to interact with information, devices, and people - exploit Moore’s law growth in everything –Enable new approaches for problem solving & learning –Figure of merit: how effectively we amplify and leverage human intellect Enabling and exploiting ubiquitous computing –Small devices, sensors, smart materials, cars, etc New methods for design, construction, and administration of ultra-scale systems –“Planetary-scale” Information Utilities Infrastructure is transparent and always active Extensive use of redundancy of hardware and data –Devices that negotiate their interfaces automatically –Elements that tune, repair, and maintain themselves

4 OStore John KubiatowiczIRAM Summer Retreat 2000 Endeavour Maxims Exploit Moore’s law growth for better behavior –Use of “excess” capacity for better human interface Personal Information Mgmt is the Killer App –Not corporate processing but management, analysis, aggregation, dissemination, filtering for the individual –Automated extraction and organization of daily activities to assist people Time to move beyond the Desktop –Community computing: infer relationships among information, delegate control, establish authority Information Technology as a Utility –Continuous service delivery, on a planetary-scale, on top of a highly dynamic information base

5 OStore John KubiatowiczIRAM Summer Retreat 2000 Endeavour Approach “Fluid”, Network-Centric System Software –Partitioning and management of state between soft and persistent state –Data processing placement and movement –Component discovery and negotiation –Flexible capture, self- organization, and re-use of information Information Devices –Beyond desktop computers to MEMS-sensors/actuators with capture/display to yield enhanced activity spaces Information Utility Information Applications –High Speed/Collaborative Decision Making and Learning –Augmented “Smart” Spaces: Rooms and Vehicles Design Methodology –User-centric Design with HW/SW Co-design; –Formal methods for safe and trustworthy decomposable and reusable components

6 OStore John KubiatowiczIRAM Summer Retreat 2000 OceanStore Context: Ubiquitous Computing Computing everywhere: –Desktop, Laptop, Palmtop –Cars, Cellphones –Shoes? Clothing? Walls? Connectivity everywhere: –Rapid growth of bandwidth in the interior of the net –Broadband to the home and office –Wireless technologies such as CMDA, Satelite, laser Rise of the thin-client metaphor: –Services provided by interior of network –Incredibly thin clients on the leaves MEMs devices -- sensors+CPU+wireless net in 1mm 3 Mobile society: people move and devices are disposable

7 OStore John KubiatowiczIRAM Summer Retreat 2000 Questions about information: Where is persistent information stored? –20th-century tie between location and content outdated (we all survived the Feb 29th bug -- let’s move on!) –In world-scale system, locality is key How is it protected? –Can disgruntled employee of ISP sell your secrets? –Can’t trust anyone (how paranoid are you?) Can we make it indestructible? –Want our data to survive “the big one”! –Highly resistant to hackers (denial of service) –Wide-scale disaster recovery Is it hard to manage? –Worst failures are human-related –Want automatic (introspective) diagnose and repair

8 OStore John KubiatowiczIRAM Summer Retreat 2000 First Observation: Want Utility Infrastructure Mark Weiser from Xerox: Transparent computing is the ultimate goal –Computers should disappear into the background In storage context: –Don’t want to worry about backup –Don’t want to worry about obsolescence –Need lots of resources to make data secure and highly available, BUT don’t want to own them –Outsourcing of storage already becoming popular Pay monthly fee and your “data is out there” –Simple payment interface  one bill from one company

9 OStore John KubiatowiczIRAM Summer Retreat 2000 Second Observation: Need wide-scale deployment Many components with geographic separation –System not disabled by natural disasters –Can adapt to changes in demand and regional outages –Gain in stability through statistics –Difference between thermodynamics and mechanics  surprising stability of temperature and pressure given 10 30 molecules with highly variable behavior! Wide-scale use and sharing also requires wide- scale deployment –Bandwidth increasing rapidly, but latency bounded by speed of light Handling many people with same system leads to economies of scale

10 OStore John KubiatowiczIRAM Summer Retreat 2000 OceanStore: Everyone’s data, One big Utility “The data is just out there” Separate information from location –Locality is an only an optimization (an important one!) –Wide-scale coding and replication for durability All information is globally identified –Unique identifiers are hashes over names & keys –Single uniform lookup interface replaces: DNS, server location, data location –No centralized namespace required (such as SDSI)

11 OStore John KubiatowiczIRAM Summer Retreat 2000 Basic Structure: Irregular Mesh of “Pools”

12 OStore John KubiatowiczIRAM Summer Retreat 2000 Amusing back of the envelope calculation (courtesy Bill Bolotsky, Microsoft) How many files in the OceanStore? –Assume 10 10 people in world –Say 10,000 files/person (very conservative?) –So 10 14 files in OceanStore! –If 1 gig files (not likely), get 1 mole of bytes! Truly impressive number of elements… … but small relative to physical constants

13 OStore John KubiatowiczIRAM Summer Retreat 2000 Service provided by confederation of companies –Monthly fee paid to one service provider –Companies buy and sell capacity from each other Utility-based Infrastructure Pac Bell Sprint IBM AT&T Canadian OceanStore IBM

14 OStore John KubiatowiczIRAM Summer Retreat 2000 Outline Motivation Properties of the OceanStore and Assumptions Specific Technologies and approaches: –Conflict resolution on encrypted data –Replication and Deep archival storage –Naming and Data Location –Introspective computing for optimization and repair –Economic models Conclusion

15 OStore John KubiatowiczIRAM Summer Retreat 2000 Ubiquitous Devices  Ubiquitous Storage Consumers of data move, change from one device to another, work in cafes, cars, airplanes, the office, etc. Properties REQUIRED for OceanStore storage substrate: –Strong Security: data encrypted in the infrastructure; resistance to monitoring and denial of service attacks –Coherence: too much data for naïve users to keep coherent “by hand” –Automatic replica management and optimization: huge quantities of data cannot be managed manually –Simple and automatic recovery from disasters: probability of failure increases with size of system –Utility model: world-scale system requires cooperation across administrative boundaries

16 OStore John KubiatowiczIRAM Summer Retreat 2000 OceanStore Assumptions Untrusted Infrastructure: –The OceanStore is comprised of untrusted components –Only cyphertext within the infrastructure –Information must not be “leaked” over time Principle Party: –There is one organization that is financially responsible for the integrity of your data Mostly Well-Connected: –Data producers and consumers are connected to a high- bandwidth network most of the time –Exploit multicast for quicker consistency when possible Promiscuous Caching: –Data may be cached anywhere, anytime Operations Interface with Conflict Resolution: –Applications employ an operations-oriented interface, rather than a file-systems interface –Coherence is centered around conflict resolution

17 OStore John KubiatowiczIRAM Summer Retreat 2000 OceanStore Technologies I: Naming and Data Location Requirements: –System-level names should help to authenticate data –Route to nearby data without global communication –Don’t inhibit rapid relocation of data OceanStore approach: Two-level search with embedded routing –Underlying namespace is flat and built from secure cryptographic hashes (160-bit SHA-1) –Search process combines quick, probabilistic search with slower guaranteed search –Long-distance data location and routing are integrated Every source/destination pair has multiple routing paths Continuous, on-line optimization adapts for hot spots, denial of service, and inefficiencies in routing

18 OStore John KubiatowiczIRAM Summer Retreat 2000 Universal Location Facility Universal Name Name OID Root Structure Update OID: Archive versions: Version OID 1 Version OID 2 Version OID 3 Global Object Resolution Floating Replica Active Data Commit Logs Checkpoint OID Global Object Resolution Version OID Archival copy or snapshot Archival copy or snapshot Archival copy or snapshot Global Object Resolution Global Object Resolution Erasure Coded: Takes 160-bit unique identifier (GUID) and Returns the nearest object that matches

19 OStore John KubiatowiczIRAM Summer Retreat 2000 Some current results: Have a working algorithm for local search –Uses attenuated bloom filters –Performs search by passing messages from node to node. All state kept in messages! –Updates filters through semi-chaotic passing of information between neighbors Resembles compiler dataflow algorithm Can be shown to converge Have candidate for “backing store” index –Randomized data structure with locality properties Every document has multiple roots in the OceanStore Searches “close” to copy tend to find copy quickly –Redundant, insensitive to faults, and repairable –Investigating algorithms to continually adapt routing structure to adjust for faults and denial of service

20 OStore John KubiatowiczIRAM Summer Retreat 2000 OceanStore Technologies II: Rapid Update in an Untrusted Infrastructure Requirements: –Scalable coherence mechanism which can operate directly on encrypted data without revealing information –Handle Byzantine failures –Rapid dissemination of committed information OceanStore Approach: –Operations-based interface using conflict resolution Modeled after Xerox Bayou  updates packets include: Predicate/update pairs which operate on encryped data Use of oblivious function techniques to perform this update Use of incremental cryptographic techniques –User signs Updates and principle party signs commits –Committed data multicast to clients

21 OStore John KubiatowiczIRAM Summer Retreat 2000 Tentative Updates: Epidemic Dissemination

22 OStore John KubiatowiczIRAM Summer Retreat 2000 Committed Updates: Multicast Dissemination

23 OStore John KubiatowiczIRAM Summer Retreat 2000 Our State of the Art Have techniques for protecting metadata –Uses encryption and signatures to provide protection against substitution attacks –Provides “secure pointer” technology Have a working scheme that can do some forms of conflict resolution directly on encryped data –Uses new technique for searching on encrypted data. –Can be generalized to perform optimistic concurrency, but at cost in performance and possibly privacy Byzantine assumptions for update commitment: –Signatures on update requests from clients Compromised servers are unable to produce valid updates Uncompromised second-tier servers can make consistent ordering decision with respect to tentative commits –Use of threshold cryptography in inner-tier of servers –Signatures on update stream from inner-tier Use of chained MACs to reduce overhead

24 OStore John KubiatowiczIRAM Summer Retreat 2000 OceanStore Technologies III: High-Availability and Disaster Recovery Requirements: –Handle diverse, unstable participants in OceanStore –Mitigate denial of service attacks –Automatic “disaster recovery” for everyone OceanStore Approach: –Components are self-configuring and self-tuning Just plug in new servers/remove old ones/bounce old ones –Use of erasure-codes to provide stable archival storage –Mobile replicas are self-contained centers for logging and conflict resolution Version-based consistency for painless recovery –Continuous introspection repairs data structures and degree of redundancy

25 OStore John KubiatowiczIRAM Summer Retreat 2000 Floating Replicas and “Deep Archival Coding” Floating Replicas are per-object virtual servers –Complete copy of data –logging for updates/conflict resolution –Interaction with other centers to keep data consistent –May appear and disappear like bubbles Erasure coded fragments provide very stable store –Multi-level codes spread over 1000s of nodes –Could lose 1/2 of nodes and still recover data –Archive: old versions of data and checkpoints –Inactive data may only be in erasure-coded form

26 OStore John KubiatowiczIRAM Summer Retreat 2000 Floating Replica and Deep Archival Coding Erasure-coded Fragments Ver1: 0x34243 Ver2: 0x49873 Ver3: … Full Cop y Conflict Resolution Logs Ver1: 0x34243 Ver2: 0x49873 Ver3: … Full Cop y Conflict Resolution Logs Ver1: 0x34243 Ver2: 0x49873 Ver3: … Full Cop y Conflict Resolution Logs Floating Replica

27 OStore John KubiatowiczIRAM Summer Retreat 2000 Structure of Archival Checkpoints All blocks and fragments signed “Copy on Write” behavior Older metablocks fragmented also Metadata Redo Logs Checkpoint Reference (GUID)..... Blocks Unit of Archival Storage Unit of Coding Fragments NOTE: Each Block needs a GUID Metadata Redo Logs Checkpoint Reference (Later Version)

28 OStore John KubiatowiczIRAM Summer Retreat 2000 Proactive Self-Maintenance Continuous testing and repair of information –Slow sweep through all information to make sure there are sufficient erasure-coded fragments –Continuously reevaluate of risk and redistribute data –Slow sweep and repair of metadata/search trees Continuous online self-testing of HW and SW –detects flaky, failing, or buggy components via: fault injection: triggering hardware and software error handling paths to verify their integrity/existence Correctness probes: to correct and optimize routing and data location structures stress testing: pushing HW/SW components past normal operating parameters scrubbing: periodic restoration of potentially “decaying” software state –automates preventive maintenance

29 OStore John KubiatowiczIRAM Summer Retreat 2000 OceanStore Technologies IV: Introspective Optimization Requirements: –Reasonable job on global-scale optimization problem Take advantage of locality whenever possible Sensitivity to limited storage and bandwidth at endpoints –Repair of data structures, increasing of redundancy –Stability in chaotic environment  Active Feedback OceanStore Approach: –Event based gathering -- short handlers on every event: Packet latencies, rates, acknowledgement times Relationships between user accesses –Periodic, Thread-based Analysis Algorithms: Clustering by relatedness Time series-analysis of user and data motion –Adaptation: Clustered prefetching: fetch related objects Proactive-prefetching: get data there before needed Rearrangement in response to overload and attack

30 OStore John KubiatowiczIRAM Summer Retreat 2000 OceanStore Technologies V: The oceanic data market Properties: –Utility providers have resources (storage and bandwidth) –Clients use resources both directly and indirectly Use of data storage and bandwidth on demand Data movement “on behalf” of users –Some customers are more important than others Techniques that we are exploring (very preliminary) –Data market driven by principle party Tradeoff between performance (replication) and cost –Secure signatures on data packets permit: Accounting of bandwidth and CPU utilization Access control policies (Bays in OceanStore nomenclature) –Use of challenge-response protocols (similar to zero- knowledge proofs) to demonstrate possession of data

31 OStore John KubiatowiczIRAM Summer Retreat 2000 First Implementation [Java]: Included Components –Initial floating replica design Conflict resolution on encrypted data Version 1 Byzantine agreement for commits using timestamps Simple multicast construction for second-tier –Initial data location facility Bloom Filter location algorithm finished Plaxton-based locate and route data structures –Introspective gathering of tacit info and adaptation Clustering, prefetching, adaptation of network routing –Initial archival facilities Methods for signing and validating fragments Possibly Lotus-notes-style multiversioning Applications –Unix file-system interface under Linux (“legacy apps”) –Email application –Proxy for Web caches

32 OStore John KubiatowiczIRAM Summer Retreat 2000 OceanStore Conclusion The Time is now for a Universal Data Utility –Ubiquitous computing and connectivity is (almost) here! –Confederation of utility providers is right model OceanStore holds all data, everywhere –Local storage is a cache on global storage –Provides security in an untrusted infrastructure –Large scale system has good statistical properties Use of introspection for performance and stability Quality of individual servers enhances reliability Exploits economies of scale to: –Provide high-availability and extreme survivability –Lower maintenance cost: self-diagnosis and repair Insensitivity to technology changes: Just unplug one set of servers, plug in others


Download ppt "OceanStore Global-Scale Persistent Storage John Kubiatowicz University of California at Berkeley."

Similar presentations


Ads by Google