Presentation is loading. Please wait.

Presentation is loading. Please wait.

Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.

Similar presentations


Presentation on theme: "Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and."— Presentation transcript:

1 Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and Technology Trondheim, Norway Christos Doulkeridis and Michalis Vazirgiannis Athens University of Economics and Business Athens, Greece

2 June 28, 2006ICPS'20062 Outline  Motivation and example application  Taxonomies and taxonomy-based querying  Taxonomy-based query routing  Taxonomy caching: architecture and maintenance  Experimental results  Summary and further work

3 June 28, 2006ICPS'20063 Motivation  Mobile devices high storage capacity & wireless support  Contain multimedia documents that can be shared  Possibly other data/services: –Temperature or other environmental data  Important challenge: find the files & services!  Problem: –Dynamic contents, location, and visibility –Limited bandwidth  Centralized indexing/search engines not applicable  P2P network & search

4 June 28, 2006ICPS'20064 Example application: MobiShare  Devices share resources by hosting web services  Device connected to a CAS  CASs connected P2P  [More details in Valavanis et al., Web Intelligence’2003]

5 June 28, 2006ICPS'20065 Outline of basic idea 1) Describe contents according to taxonomy 2) Taxonomy info cached at remote peers 3) Use cached knowledge to route queries to appropriate peers Why? 1) Should reduce latency 2) Increase recall with same cost

6 June 28, 2006ICPS'20066 Resource description  Taxonomy-based resource description  Also applicable for audio/video  More than one taxonomy might exist in system  Resource description: Taxonomy ID and set of categories

7 June 28, 2006ICPS'20067 Taxonomy-based querying Query: 1) Request for all resources belonging to category C j or 2) Request for all resources belonging to category C j and satisfying some additional property Example properties: Text contents, metadata

8 June 28, 2006ICPS'20068 Searching in unstructured P2P networks  Basic search technique: Local execution of query then forwarding if TTL>0 –Naïve flooding (all neighbors) –Normalized flooding (only K neighbors) –Random walks: only one random neighbor, but W walks initiated  Problem: Only a limited # of peers can be searched (query horizon)  Possible improvements: –Routing indices –Summary indexing (bloom filters etc) –Result caching  However: Still limited scalability and coverage

9 June 28, 2006ICPS'20069 Taxonomy caching  Basic idea: –Maintain taxonomic of remote contents in a taxonomy cache (TCache)  Mapping from taxonomic concept to set of peers  Advantages: –Cheaper to maintain than full-text index –More applicable to multimedia data –More robust wrt. changes in contents  Used to improve query routing  Higher recall and reduced latency

10 June 28, 2006ICPS'200610 Query routing using taxonomy cache (TCache) 1) Basis: one of traditional routing strategies 2) Query forward peers: P F 3) Starting point: P F = neighbors=P N ={P N1,…,P Nn } 4) Lookup in TCache: Lookup(category)  P C ={P C1,…,P Cm } 5) P F = P N +P C 6) Query forwarded to (subset of) P F

11 June 28, 2006ICPS'200611 Query forwarding alternatives (1)  Query forward peers: P F  # of neighbors (excl. previous): N n  # matches from lookup: N c  Ranking of peers in P C : –Based on # of resources within a category –High # of resources: considered experts  TCB: –Highest ranked in P C + the N n neighbors in {P N1,…,P Nn } –Forwarding to peer in P C called jump –Jump can be to peer beyond query horizon!  TCA: –If N c ≥ N n : forward to N n highest ranked peers in P C –If N c < N n : forward to all N c peers in P C + (N n -N c ) randomly selected neighbors

12 June 28, 2006ICPS'200612 Query forwarding alternatives (2)  TCCN: –If N c ≥ N n : forward to all N c peers in P C –If N c < N n : forward to all N c peers in P C + (N n -N c ) neighbors  TCDN: –If N c ≥ N n : forward to N n /2 highest ranked peers in P C + random selection of N n /2 other peers in P C –If N c < N n : forward to all N c peers in P C + (N n -N c ) neighbors

13 June 28, 2006ICPS'200613 Distributing taxonomic information  Basic mechanism: piggyback matching category with query result –Rsult returned through original path, possibly involving jumps –Makes revalidation of contents intermediate TCaches possible –Coverage will be gradually extended (beyond query horizon)  Lazy distribution by gossiping also possible

14 June 28, 2006ICPS'200614 TCache architecture and maintenance  Aim: Provide efficient mapping C  {P C1,…,P Cm }  For each category: Peers, # of resources, and TTL  TTL: –Regularly decremented –Reset to start value at revalidation  Caching policy: Aggressive vs. selective  Compacting techniques: Peer upgrade & non-expert pruning

15 June 28, 2006ICPS'200615 Experimental setup  Simulations  Excerpts of DMOZ taxonomy  Synthetic network topologies  Resource allocation: 80/20 rule  Queries are taxonomic categories  A number of peers have role as querying peers  Measured: Contacted peers, messages, recall and latency  In this presentation: Results using flooding and TCDN query routing

16 June 28, 2006ICPS'200616 Improvements in recall N M (F) N M (TC) Recall (F) Recall (TC) TTL=17.87.00.00220.0019 TTL=3166.7166.00.01170.0149 TTL=5524.7523.90.02820.0717 TTL=71058.61057.70.05060.1835 TTL=91721.01719.60.07730.2930 TTL=112566.32566.00.11040.4012 TTL=133536.53535.80.14770.4891 TTL=154560.24558.70.18640.5755

17 June 28, 2006ICPS'200617 Primary reason for improvement: More intelligent query forwarding N C (F) N C (TC) Recall (F) Recall (TC) TTL=17.86.70.00220.0019 TTL=345.353.40.01170.0149 TTL=5110.6158.00.02820.0717 TTL=7199.9346.80.05060.1835 TTL=9305.6583.10.07730.2930 TTL=11437.7840.30.11040.4012 TTL=13586.71120.60.14770.4891 TTL=15741.61372.40.18640.5755

18 June 28, 2006ICPS'200618 Improvement and scalability

19 June 28, 2006ICPS'200619 Latency reduction  TCache results in very fast retrieval of first results  Finding all results approximately similar performance because flooding in both techniques

20 June 28, 2006ICPS'200620 Summary and further work  Presented motivation and context  Taxonomy-based querying and query routing  TCache architecture and maintenance  Experimental results proving our claims  Future/ongoing work: –Employing the techniques for XML/XPath querying in P2P context (to appear at IEEE P2P’2006) –Integration of different taxonomies


Download ppt "Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and."

Similar presentations


Ads by Google