Presentation is loading. Please wait.

Presentation is loading. Please wait.

Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.

Similar presentations


Presentation on theme: "Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results."— Presentation transcript:

1 Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results

2 Introduction Peer-to-Peer (P2P) Information Retrieval framework  Peers that share information  Cumulative bandwidth  High processing power and storage  Absence of high cost hardware Three generations of P2P networks

3 1 st Generation  Centralized DB for coordinated look up  Napster 2 nd Generation  Flooding to search every node on the network  Gneutella 3 rd Generation’  Distributed Hash Tables  Tapestry, Chord, Pastry, CAN, Kademlia  Uses routing tables to maintain the addresses of its neighbours

4 In 3G P2P networks log N to N nodes have to be contacted to reach destination. Proposed method,  the target peer can be contacted directly from the source peer.  Search occurs within the target peer to retrieve file reference using keyword indices in a B+ tree

5 System Architecture P2P cluster and Hadoop cluster Hadoop cluster  Extract keywords for efficient searching  MapReduce programming paradigm P2P cluster  Upload files  Servicing search requests

6 Map reduce Master (Job Tracker) DFS Master (Name node) Map reduce Slave (Task Tracker) DFS Slave (Data node) Map reduce Slave (Task Tracker) DFS Slave (Data node) HADOOP CLUSTER P2P CLUSTER Keyword extraction SYSTEM ARCHITECTURE

7 Hadoop Software platform to handle vast amounts of data Moving computation to the place of data rather than moving large data blocks to the place of computation HDFS and MapReduce framework  HDFS – NameNode and DataNode  MapReduce computation Map – splits input data set into fragments and assigns each fragment to a map task. (K,V) Reduce – Merges all intermediate values associated with a key

8 D1,B1 D2,B1 D1,B2D1,B3 D3,B1D2,B2 D3,B2 MMMMMMM K 1,C 1 K 2,C 1 K 3,C 1 K 2,C 2 K 5,C 2 K 3,C 2 K 6,C 3 K 3,C 3 K 4,C 3 K 5,C 4 K 2,C 4 K 4,C 4 K 4,C 5 K 1,C 5 K 6,C 5 K 6,C 6 K 3,C 6 K 1,C 6 K 5,C 7 K 6,C 7 K 4,C 7 Sort and Group (D2) K 1,[C 6 ] K 2,[C 2 ] K 3,[C 2,C 6 ] K 5,[C 2 ] K 6,[C 6 ] Sort and Group (D1) RRR R RR K 1,[C 1 ] K 2,[C 1,C 4 ] K 3,[C 1,C 3 ] K 4,[C 4,C 3 ] K 5,[C 4 ] K 6,[C 3 ] R R R R R K 1,I K 2,I K 3, I K 4, I K 5, I K 6,I K 1, I K 2, I K 3, I K 5, I K 6, I Map Task 1Map Task 2 Map Task 3 Reduce Task 1 Reduce Task 2

9 B+ Tree – IP and its hash Represents sorted data indexed by a key for efficient insertion, retrieval and removal of records. Inserting / Searching a record requires O(log B N) operations in the worst case  B - order, N - nodes

10 DLS Components Start up component: Starting up the Hadoop cluster Identifying nodes to participate in the P2P cluster. Determining the IP hash values for the peers  Using SHA1 (160-bit  40-bit) Forming the B+ tree. Uploading B+ trees in other peers. Starting the Web Server.

11 DB Distribution Component Keyword extraction using Hadoop cluster Hashing keywords (SHA1 (160-bit  40-bit) Find peer with relatively close match Upload in target peer Update B+ tree (Keyword – file-ref) in target

12 HADOOP CLUSTER Doc 1 Doc 2Doc n File name, list of keywords Hash search keys Target Identification Upload the document in target node PEERS in P2P network

13 Search Component Process keywords Find 40-bit hash value Search the B+ tree in peer to identify target node Search B+ tree in target node to retrieve file reference

14 list of keywords Hash search keys Identify the search node using Relative difference between hash vales of keywords and IP address in B+ tree Search the document in target peer PEER2 in P2P network Search request PEER1 in P2P network

15 Add/Delete Peer Update IP address table Compute IP-hash of newly added peer Reconstruct the B+ tree and update in peers Relocate appropriate files to new peer Modify metadata in peers

16 Experimental Results – Keyword Extraction from multiple files(1MB each) Observation – depends on no of keywords

17 Cluster Set up Time It is a factor of No.of nodes

18 Add a new Peer It is a factor of No. of keywords (for 1 peer)

19 Performance of data distribution Component Load time is a factor of No.of keywords

20 Performance of Search Component Search time remains a constant (9 msec) - B+ tree and search distribution 2 4 6 8 10

21 Conclusion P2P Information Retrieval Framework uses 3G P2P DHT approach B+ trees are maintained in peers Hadoop is used for keyword extraction from multiple files in parallel Efficient search on peers

22 THANK YOU


Download ppt "Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results."

Similar presentations


Ads by Google