Object Naming & Content based Object Search 2/3/2003.
Published byModified over 4 years ago
Presentation on theme: "Object Naming & Content based Object Search 2/3/2003."— Presentation transcript:
Object Naming & Content based Object Search 2/3/2003
System Architecture Due to requirements for scalability, reliability and performance, hybrid architecture is used for the system. Regional manager provides centralized management of OSD devices, clients and objects within the region. P2P relationship is maintained among regional managers. Object location among regions is based on Distributed Hashing Table (DHT). DHT has good scalability, and is used by Tapestry, Pastry, CAN, Chord… Current DHT-based systems do not consider locality, peer’s various processing power, and network topologies, including bandwidth, LAN and WAN, enterprise special requirement.
Object Placement Object is organized within region for efficient creation, update, and search. Object is stored within local region, and object or it’s metadata is also stored out of region. Dynamic replication/migration is a major component of object placement.
Object Naming In our proposal, object is identified with Globe Unique Identifier (GUID, such as f81d4fae-7dec- 11d0-a765-00a0c91e6bf6 ) Clients, including users and traditional file systems, may need symbolic object name. Same object may have different names in the light of different users. Same object name may refer to different objects in different environments.
Object Naming (Cont.) Mapping between GUID & object name is a must. This mapping should be distributed, scalable, efficient, reliable and secure. Mapping also supports client’s individual view of objects the client can access. GUID is in flat name space, but client’s view of objects may have hierarchy-tree structure. Mapping between client’s accessed object name and GUID will be cached for performance.
Object Retrieval based on Partial Information Client may want to access an object, but only knows about partial name or some key words of object contents. DHT-based P2P systems have good scalability with poor query facility. Currently, these systems only support exact match.
Object Retrieval based on Partial Information (Cont.) Options: –Centralized index management –Flooding search –GUID resolution based on DNS like mechanism –Content based routing on overlay networks Flooding search is not the solution. Gnutella does not scale well. DNS has many known problems, including single point of failure.
P2P Content Based Routing Goal: based on key words, object request is efficiently routed to regional manager that holds the object. –Number of regional mangers contacted should be as few as possible. –With object replicas distributed among regions/OSD devices, nearby copy should be found. Definition of nearby copy considers bandwidth, object size, node processing power as well as node state. –Role based Access Control should be honored. –Object should be found as long as it exists in the system.
P2P Content Based Routing (Cont.) Efficient indexing on key words of objects need to be constructed, distributed and its freshness will be maintained. One useful index scheme may rely on a unique hashing function. Object request can be routed based on the constructed indexing. Approximate routing based on statistics is a possible solution.
P2P & DB Systems Flexibility Decentralized Strong Semantics Powerful query facilities Fault Tolerance Lightweight Transactions & Concurrency Control P2P DB
Focusing Issues Efficient object placement mechanism within a region and among regions. It could be an innovative DHT, or indexing scheme. Scalable object search based on partial match, minimizing bandwidth, processing costs. Query processing optimization, including caching, client satisfaction.