Presentation on theme: "Intelligent File System Changgyu Oh 04/02/02. Problem Domain(1) Scalability of Current Decentralized P2P Networks similar to Gnutella –A total number."— Presentation transcript:
Intelligent File System Changgyu Oh 04/02/02
Problem Domain(1) Scalability of Current Decentralized P2P Networks similar to Gnutella –A total number of messages generated in the network uses a lot of network bandwidth. –In Frode’s R&D report, “The story tells that an employee at Nullsoft asid in an IRC chat that GnutellaNet probably would not scale to more than 250 or so clients. Gene Kan, a highly profiled spokesman in the Gnutella community, states that the technology was initially designed to support file sharing in a small network between friends.” –Marius at al claimed in  that Gnutella protocol cannot exceed more than a few thousand peer-nodes.
Problem Domain(2) Violation of User Anonymity From queryHit packet of the Gnutella protocol, everyone knows a publisher’s address.
Problem Domain(3) Lack of information for the resources
Goal of the Project Autonomous approach to control messages flowing over network by grouping and managing messages generated by peer hosts using caching mechanism. Fast response for desired file and for related information using association rules and data mining. It provides flexible query mechanism. Efficient data representation: represents pointless file association rules. More enhanced anonymous features to the decentralized systems: with the new approach for the IP address field of the “queryHit” packet. Algorithms are provided the above results.
Related Works Anonymous Publication Service –The scheme of Publius system  for the anonymity is based on a static, system-wide list of available servers. It doesn’t support the adding of the new servers or purging dead servers because of static feature. –The Eternity system  was based on the Anderson’s seminal paper on the Eternity Service. According to the Anderson, the basic idea of the eternity service is to use the redundancy and the scattering techniques to distribute replicas over a large number of hosts. It adds the anonymity mechanism to drive up the cost of selective service denial attacks. –Q/A(Query and Advertising System)  also tried fixing the weakness of the Gnutella protocol, but still first level of server knows the IP address of clients.
Meta-search engines EEM  is similar in the sense that it builds representatives for each database and is an optimizing relationship hierarchy. ETCR  is very similar in the sense that class hierarchies are used for inheritance, classification and transitive closure reasoning. BLDLC  uses classification hierarchies to increase capabilities of the data browsing in digital libraries.
Caching The Distributed File System (DFS) discusses detecting network failures. It ensures that caches are consistent when they occur . Some file systems choose a simple model, where the failure detection is not required such as the Network File System , whose clients poll the server to find out when the file was last modified, and determine if the cached version is valid. The Andrew file system  and Sprite  file system are other file systems using caching schema. Hint-Based Cooperative Caching file system was introduced to help clients to make decisions based on local state, enabling a loosely coordinated system and to reduce overhead and access latency .
Our Approach for the flexible query mechanism New File Association Rules are introduced. Every node(resource) is annotated with data elements, –consisting of a Set of Number pairs(Ν), –a Relation Type(Ŗ), –Constraint Rule(Ω), –and Hierachy Identifier(Њ).
Message Cost Function F=a total number of replicas Maximum hop A number of replicas of message generated by peer hosts A number of peer hosts for message forwarding in a routing table of each peer host.
Our New Grouping Algorithm
Properties of Distributed Grouping Structure search is robust against node failure; It is completely decentralized; Peer host in the same group share same group information; All peers serve as entry points for search; Division of a group is automatically occurs when it is necessary.
Searching in DGS
Replicas generated at each hop
Caching mechanism adapted in IFS 3: if(CheckMessage(MsgHeader) = SeenBefore ) 4: Begin 5:Drop the message; 6: End // Add the Query msg to the cache. 7: cachingQuery( getMsgID(),RemoteHost); // Forward the Query msg to other connected hosts except // the one sent it in the same group. 8: decreasingTTL(msg); forwardMsg(msg,toPeerHosts);
Enhancing anonymous feature of Gnutella Protocol 23:IPClue = encodingIPClue(address); //create a queryHit Message 24: qrm = CreateQueryHitMessage(MsgID of the query Packet, ClientID(), IPclue,port,Minimum Speed,records );
Enhanced query mechanism
Comparison with Other approach
Conclusion IFS is a fully decentralized, server-less, highly scalable, and a fully distributed file system that provides a high degrees of resource availability and flexible queries for the P2P end users. The system continuously maintains an entire network scale in the decentralized manner.
Discussion Further Research on the latency due to the grouping File registration strategy on heterogeneous environment