Presentation on theme: "Storage management and caching in PAST Antony Rowstron and Peter Druschel Presented to cs294-4 by Owen Cooper."— Presentation transcript:
Storage management and caching in PAST Antony Rowstron and Peter Druschel Presented to cs294-4 by Owen Cooper
Outline PAST goals PAST api File storage overview File and replica diversion Replica management Caching Performance Discussion
PAST (non)goals P2P global storage network –Use properties of existing p2p systems (Pastry) –Support for strong persistence Via a core set of replicas –High availability Via local caching –Scalable Obtain high storage utilization via local cooperation –Secure Design goals do not include –Replacing the file system –Updatable files –Directory or lookup service
Security Model Pastry node ids are a hash of a public key Smartcard based security –Provides keys –Quota management Nodeid and fileid generation controlled –Try to stop nodes from getting consecutive ids –Or clients from overloading parts of the network But node id and real world identity may not be linked Data not encrypted
PAST API’s In PAST, files are immutable Fileid=Insert(filename,credentials, k, file) –Insert k copies of the file into the network, or fail. –Fileid a signed (filename, credentials, salt) –Successful if ack with receipts from k nodes File=lookup(fileid) –Return a copy of the file if it exists Reclaim(fileid, cradentials) –Reclaim accepted if requested by the owner –Allows, but does not require, storage reclamation
File insertion Insert(name, c, k, file) –Computes a storage certificate Contains fileid, hash of content, k, salt –Deducts k*filesize from quota –Routes file and storage certificate using pastry using fileid. –Node verifies the integrity of the file, stores it, and asks k-1 closest nodes to store the file. K-1 nodes in leaf set (k-1 <= l) –Node returns ack with k signed storage receipts, or a nak.
Lookup and Reclamation Pastry ensures replica is found –Since a lookup is routed to the closest nodeid Reclamation –Client generates a reclaim certificate –Sends it to the fileid via pastry –Recipients verify the certificate & issue receipt –Client reclaims quota
Diversion A file or replica can be relocated For a replica, to another close node –If one of the K closest is overloaded For a file, to another set of nodes in the idspace –If the nodes around a fileid are (possibly locally) congested Why is this necessary? –Differing storage capacity at nodes –Differing file size for inserted files
Replica Diversion Node responsible for fileid asks k-1 neighbors to store the file Neighbor (N) may divert a copy to a node in its leaf set –Pointer to copy inserted at N –N issues storage certificate –N also inserts a pointer on the k+1th closest node No orphan if N fails N remains responsible for pointer maintenance
File Diversion Replica diversion is local –Allows storage choice between nodes around fileid File Diversion –Triggered when an insert with a fileid fails –Insert is tried a total of three times –New fileid generated by changing the salt
Storage Policy How does a node choose to accept or reject a replica? –Computes sizeof(file)/sizeof(free_space) –Compares to T pri or T div depending node’s role –T pri > T div How is node chosen for replica diversion –Search leaf set for the node that Has maximal free space Doesn’t already hold a diverted or primary replica File diversion – K copies cannot be located (via primary or diversion)
Replica maintenance Node join/leave causes responsibility shift –Pastry node failure detection will cause leaf set updates Past detects responsibility shifts this way Newly responsible node must copy files –Make a copy immediately, OR –pointer to old owner & copy lazily Diverted replicas –Target of diversion may move out of leaf set Node to store repica can be any one in leaf set –Must exchange keepalive messages themselves –Should be relocated
Replica maintenance (2) Node failure may cause storage shortage –No node in leaf set can take over ownership Search space is widened –Ask most extreme nodes to locate storage Increases search space to 2l nodes –If no storage space found, fail.
Caching Pastry’s locality based routing will tend to direct requests to nearby copies PAST also stores cached copies –Along routing path between client and fileid –For insert and lookup operations –Cache maintained using GD-size algorithm Weight per file: 1/size(file) Eviction: –Pick file with minimum weight –Subtract weight of evicted file from all others
Experiments: without diversion Experiments use –Large trace from web server –Files from local web server The case for diversion with web trace –Without diversion: 51.1% of insertions failed 60.8% storage utilization
Experiments (2): with diversion With diversion –Bigger leaf set size a plus
Experiments (3):varying T pri Effects of varying T pri # files stored v.s. size of file
Experiments (4): Varying T div Varying T div T pri is constant
Your consent to our cookies if you continue to use this website.