Presentation is loading. Please wait.

Presentation is loading. Please wait.

15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 9 – Lookup Algorithms (DNS & DHTs)

Similar presentations


Presentation on theme: "15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 9 – Lookup Algorithms (DNS & DHTs)"— Presentation transcript:

1 15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 9 – Lookup Algorithms (DNS & DHTs)

2 Lecture #92 2-11-03 Overview Lookup Overview Hierarchical Lookup: DNS Routed Lookups  Chord  CAN

3 Lecture #93 2-11-03 The Lookup Problem Internet N1N1 N2N2 N3N3 N6N6 N5N5 N4N4 Publisher Key=“title” Value=MP3 data… Client Lookup(“title”) ? Problem in DNS, P2P, routing, etc. Sensor networks: how to find sensor readings?

4 Lecture #94 2-11-03 Centralized Lookup (Napster) sensor@ Client Lookup(“whale”) N6N6 N9N9 N7N7 DB N8N8 N3N3 N2N2 N1N1 SetLoc(“whale”, N4) Simple, but O( N ) state and a single point of failure Key=“whale” Value=image data… N4N4

5 Lecture #95 2-11-03 Hierarchical Queries (DNS) N4N4 Publisher@ Client N6N6 N9N9 N7N7 N8N8 N3N3 N2N2 N1N1 Scalable (log n), but could be brittle Key=“whale” Value=image data… Lookup(“whale”) Root

6 Lecture #96 2-11-03 Flooded Queries (Gnutella, /etc/hosts) N4N4 Publisher@ Client N6N6 N9N9 N7N7 N8N8 N3N3 N2N2 N1N1 Robust, but worst case O( N ) messages per lookup Key=“whale” Value=image data… Lookup(“whale”)

7 Lecture #97 2-11-03 Routed Queries (DHTs) N4N4 Publisher Client N6N6 N9N9 N7N7 N8N8 N3N3 N2N2 N1N1 Lookup(“whale”) Key=“whale” Value=image data… Scalable (log n), but limited search styles

8 Lecture #98 2-11-03 Overview Lookup Overview Hierarchical Lookup: DNS Routed Lookups  Chord  CAN

9 Lecture #99 2-11-03 Obvious Solutions (1) Objective: provide name to address translations Why not centralize DNS? Single point of failure Traffic volume Distant centralized database Single point of update Doesn’t scale!

10 Lecture #910 2-11-03 Obvious Solutions (2) Why not use /etc/hosts? Original Name to Address Mapping  Flat namespace  /etc/hosts  SRI kept main copy  Downloaded regularly Count of hosts was increasing: machine per domain  machine per user  Many more downloads  Many more updates

11 Lecture #911 2-11-03 Domain Name System Goals Basically building a wide area distributed database Scalability Decentralized maintenance Robustness Global scope  Names mean the same thing everywhere Don’t need  Atomicity  Strong consistency

12 Lecture #912 2-11-03 DNS Records RR format: (class, name, value, type, ttl) DB contains tuples called resource records (RRs)  Classes = Internet (IN), Chaosnet (CH), etc.  Each class defines value associated with type FOR IN class: Type=A  name is hostname  value is IP address Type=NS  name is domain (e.g. foo.com)  value is name of authoritative name server for this domain Type=CNAME  name is an alias name for some “canonical” (the real) name  value is canonical name Type=MX  value is hostname of mailserver associated with name

13 Lecture #913 2-11-03 DNS Design: Hierarchy Definitions root edunet org ukcom gwuucbcmubu mit cs ece cmcl Each node in hierarchy stores a list of names that end with same suffix Suffix = path up tree E.g., given this tree, where would following be stored: Fred.com Fred.edu Fred.cmu.edu Fred.cmcl.cs.cmu.edu Fred.cs.mit.edu

14 Lecture #914 2-11-03 DNS Design: Zone Definitions root edunet org ukcom ca gwuucbcmubu mit cs ece cmcl Single node Subtree Complete Tree Zone = contiguous section of name space E.g., Complete tree, single node or subtree A zone has an associated set of name servers

15 Lecture #915 2-11-03 DNS Design: Cont. Zones are created by convincing owner node to create/delegate a subzone  Records within zone stored multiple redundant name servers  Primary/master name server updated manually  Secondary/redundant servers updated by zone transfer of name space Zone transfer is a bulk transfer of the “configuration” of a DNS server – uses TCP to ensure reliability Example:  CS.CMU.EDU created by CMU.EDU administrators

16 Lecture #916 2-11-03 Servers/Resolvers Each host has a resolver  Typically a library that applications can link to  Local name servers hand-configured (e.g. /etc/resolv.conf) Name servers  Either responsible for some zone or…  Local servers Do lookup of distant host names for local hosts Typically answer queries about local zone

17 Lecture #917 2-11-03 DNS: Root Name Servers Responsible for “root” zone Approx. dozen root name servers worldwide  Currently {a-m}.root- servers.net Local name servers contact root servers when they cannot resolve a name  Configured with well- known root servers

18 Lecture #918 2-11-03 Lookup Methods Recursive query: Server goes out and searches for more info (recursive) Only returns final answer or “not found” Iterative query: Server responds with as much as it knows (iterative) “I don’t know this name, but ask this server” Workload impact on choice? Local server typically does recursive Root/distant server does iterative requesting host surf.eurecom.fr gaia.cs.umass.edu root name server local name server dns.eurecom.fr 1 2 3 4 5 6 authoritative name server dns.cs.umass.edu intermediate name server dns.umass.edu 7 8 iterated query

19 Lecture #919 2-11-03 Typical Resolution Client Local DNS server root & edu DNS server ns1.cmu.edu DNS server www.cs.cmu.edu NS ns1.cmu.edu www.cs.cmu.edu NS ns1.cs.cmu.edu A www=IPaddr ns1.cs.cmu.edu DNS server

20 Lecture #920 2-11-03 Typical Resolution Steps for resolving www.cmu.edu  Application calls gethostbyname() (RESOLVER)  Resolver contacts local name server (S 1 )  S 1 queries root server (S 2 ) for (www.cmu.edu)www.cmu.edu  S 2 returns NS record for cmu.edu (S 3 )  What about A record for S 3 ? This is what the additional information section is for (PREFETCHING)  S 1 queries S 3 for www.cmu.eduwww.cmu.edu  S 3 returns A record for www.cmu.eduwww.cmu.edu Can return multiple A records  what does this mean?

21 Lecture #921 2-11-03 Workload and Caching What workload do you expect for different servers/names?  Why might this be a problem? How can we solve this problem? DNS responses are cached  Quick response for repeated translations  Other queries may reuse some parts of lookup NS records for domains DNS negative queries are cached  Don’t have to repeat past mistakes  E.g. misspellings, search strings in resolv.conf Cached data periodically times out  Lifetime (TTL) of data controlled by owner of data  TTL passed with every record

22 Lecture #922 2-11-03 Prefetching Name servers can add additional data to any response Typically used for prefetching  CNAME/MX/NS typically point to another host name  Responses include address of host referred to in “additional section”

23 Lecture #923 2-11-03 Typical Resolution Client Local DNS server root & edu DNS server ns1.cmu.edu DNS server www.cs.cmu.edu NS ns1.cmu.edu www.cs.cmu.edu NS ns1.cs.cmu.edu A www=IPaddr ns1.cs.cmu.edu DNS server

24 Lecture #924 2-11-03 Subsequent Lookup Example Client Local DNS server root & edu DNS server cmu.edu DNS server cs.cmu.edu DNS server ftp.cs.cmu.edu ftp=IPaddr ftp.cs.cmu.edu

25 Lecture #925 2-11-03 Reliability DNS servers are replicated  Name service available if ≥ one replica is up  Queries can be load balanced between replicas UDP used for queries  Need reliability  must implement this on top of UDP!  Why not just use TCP? Try alternate servers on timeout  Exponential backoff when retrying same server Same identifier for all queries  Don’t care which server responds Is it still a brittle design?

26 Lecture #926 2-11-03 Overview Lookup Overview Hierarchical Lookup: DNS Routed Lookups  Chord  CAN

27 Lecture #927 2-11-03 Routing: Structured Approaches Goal: make sure that an item (file) identified is always found in a reasonable # of steps Abstraction: a distributed hash-table (DHT) data structure  insert(id, item);  item = query(id);  Note: item can be anything: a data object, document, file, pointer to a file… Proposals  CAN (ICIR/Berkeley)  Chord (MIT/Berkeley)  Pastry (Rice)  Tapestry (Berkeley)

28 Lecture #928 2-11-03 Routing: Chord Associate to each node and item a unique id in an uni-dimensional space Properties  Routing table size O(log(N)), where N is the total number of nodes  Guarantees that a file is found in O(log(N)) steps

29 Lecture #929 2-11-03 Aside: Consistent Hashing [Karger 97] N32 N90 N105 K80 K20 K5 Circular 7-bit ID space Key 5 Node 105 A key is stored at its successor: node with next higher ID

30 Lecture #930 2-11-03 Routing: Chord Basic Lookup N32 N90 N105 N60 N10 N120 K80 “Where is key 80?” “N90 has K80”

31 Lecture #931 2-11-03 Routing: Finger table - Faster Lookups N80 ½ ¼ 1/8 1/16 1/32 1/64 1/128

32 Lecture #932 2-11-03 Routing: Chord Summary Assume identifier space is 0…2 m Each node maintains  Finger table Entry i in the finger table of n is the first node that succeeds or equals n + 2 i  Predecessor node An item identified by id is stored on the successor node of id

33 Lecture #933 2-11-03 Routing: Chord Example Assume an identifier space 0..8 Node n1:(1) joins  all entries in its finger table are initialized to itself 0 1 2 3 4 5 6 7 i id+2 i succ 0 2 1 1 3 1 2 5 1 Succ. Table

34 Lecture #934 2-11-03 Routing: Chord Example Node n2:(3) joins 0 1 2 3 4 5 6 7 i id+2 i succ 0 2 2 1 3 1 2 5 1 Succ. Table i id+2 i succ 0 3 1 1 4 1 2 6 1 Succ. Table

35 Lecture #935 2-11-03 Routing: Chord Example Nodes n3:(0), n4:(6) join 0 1 2 3 4 5 6 7 i id+2 i succ 0 2 2 1 3 6 2 5 6 Succ. Table i id+2 i succ 0 3 6 1 4 6 2 6 6 Succ. Table i id+2 i succ 0 1 1 1 2 2 2 4 0 Succ. Table i id+2 i succ 0 7 0 1 0 0 2 2 2 Succ. Table

36 Lecture #936 2-11-03 Routing: Chord Examples Nodes: n1:(1), n2(3), n3(0), n4(6) Items: f1:(7), f2:(2) 0 1 2 3 4 5 6 7 i id+2 i succ 0 2 2 1 3 6 2 5 6 Succ. Table i id+2 i succ 0 3 6 1 4 6 2 6 6 Succ. Table i id+2 i succ 0 1 1 1 2 2 2 4 0 Succ. Table 7 Items 1 i id+2 i succ 0 7 0 1 0 0 2 2 2 Succ. Table

37 Lecture #937 2-11-03 Routing: Query Upon receiving a query for item id, a node Check whether stores the item locally If not, forwards the query to the largest node in its successor table that does not exceed id 0 1 2 3 4 5 6 7 i id+2 i succ 0 2 2 1 3 6 2 5 6 Succ. Table i id+2 i succ 0 3 6 1 4 6 2 6 6 Succ. Table i id+2 i succ 0 1 1 1 2 2 2 4 0 Succ. Table 7 Items 1 i id+2 i succ 0 7 0 1 0 0 2 2 2 Succ. Table query(7)

38 Lecture #938 2-11-03 Overview Lookup Overview Hierarchical Lookup: DNS Routed Lookups  Chord  CAN

39 Lecture #939 2-11-03 CAN Associate to each node and item a unique id in an d- dimensional space  Virtual Cartesian coordinate space Entire space is partitioned amongst all the nodes  Every node “owns” a zone in the overall space Abstraction  Can store data at “points” in the space  Can route from one “point” to another Point = node that owns the enclosing zone Properties  Routing table size O(d)  Guarantees that a file is found in at most d*n 1/d steps, where n is the total number of nodes

40 Lecture #940 2-11-03 CAN E.g.: Two Dimensional Space Space divided between nodes All nodes cover the entire space Each node covers either a square or a rectangular area of ratios 1:2 or 2:1 Example:  Assume space size (8 x 8)  Node n1:(1, 2) first node that joins  cover the entire space 1 234 5 670 1 2 3 4 5 6 7 0 n1

41 Lecture #941 2-11-03 CAN E.g.: Two Dimensional Space Node n2:(4, 2) joins  space is divided between n1 and n2 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2

42 Lecture #942 2-11-03 CAN E.g.: Two Dimensional Space Node n2:(4, 2) joins  space is divided between n1 and n2 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3

43 Lecture #943 2-11-03 CAN E.g.: Two Dimensional Space Nodes n4:(5, 5) and n5:(6,6) join 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5

44 Lecture #944 2-11-03 CAN E.g.: Two Dimensional Space Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6) Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5); 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 f1 f2 f3 f4

45 Lecture #945 2-11-03 CAN E.g.: Two Dimensional Space Each item is stored by the node who owns its mapping in the space 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 f1 f2 f3 f4

46 Lecture #946 2-11-03 CAN: Query Example Each node knows its neighbors in the d-space Forward query to the neighbor that is closest to the query id Example: assume n1 queries f4 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 f1 f2 f3 f4

47 Lecture #947 2-11-03 Routing: Concerns Each hop in a routing-based lookup can be expensive  No correlation between neighbors and their location  A query can repeatedly jump from Europe to North America, though both the initiator and the node that store the item are in Europe!  Solutions: Tapestry takes care of this implicitly; CAN and Chord maintain multiple copies for each entry in their routing tables and choose the closest in terms of network distance What type of lookups?  Only exact match!  no beatles.*

48 Lecture #948 2-11-03 Summary The key challenge of building wide area P2P systems is a scalable and robust location service Solutions covered in this lecture  Naptser: centralized location service  Gnutella: broadcast-based decentralized location service  Freenet: intelligent-routing decentralized solution  DHTs: structured intelligent-routing decentralized solution

49 Lecture #949 2-11-03 Overview Centralized/Flooded Lookups

50 Lecture #950 2-11-03 Centralized: Napster Simple centralized scheme  motivated by ability to sell/control How to find a file:  On startup, client contacts central server and reports list of files  Query the index system  return a machine that stores the required file Ideally this is the closest/least-loaded machine  Fetch the file directly from peer

51 Lecture #951 2-11-03 Centralized: Napster Advantages:  Simple  Easy to implement sophisticated search engines on top of the index system Disadvantages:  Robustness, scalability  Easy to sue!

52 Lecture #952 2-11-03 Flooding: Gnutella On startup, client contacts any servent (server + client) in network  Servent interconnection used to forward control (queries, hits, etc) Idea: broadcast the request How to find a file:  Send request to all neighbors  Neighbors recursively forward the request  Eventually a machine that has the file receives the request, and it sends back the answer  Transfers are done with HTTP between peers

53 Lecture #953 2-11-03 Flooding: Gnutella Advantages:  Totally decentralized, highly robust Disadvantages:  Not scalable; the entire network can be swamped with request (to alleviate this problem, each request has a TTL)  Especially hard on slow clients At some point broadcast traffic on Gnutella exceeded 56kbps – what happened? Modem users were effectively cut off!

54 Lecture #954 2-11-03 Flooding: Gnutella Details Basic message header  Unique ID, TTL, Hops Message types  Ping – probes network for other servents  Pong – response to ping, contains IP addr, # of files, # of Kbytes shared  Query – search criteria + speed requirement of servent  QueryHit – successful response to Query, contains addr + port to transfer from, speed of servent, number of hits, hit results, servent ID  Push – request to servent ID to initiate connection, used to traverse firewalls Ping, Queries are flooded QueryHit, Pong, Push reverse path of previous message

55 Lecture #955 2-11-03 Flooding: Gnutella Example Assume: m1’s neighbors are m2 and m3; m3’s neighbors are m4 and m5;… A B C D E F m1 m2 m3 m4 m5 m6 E? E E E

56 Lecture #956 2-11-03 Flooding: FastTrack (aka Kazaa) Modifies the Gnutella protocol into two-level hierarchy  Hybrid of Gnutella and Napster Supernodes  Nodes that have better connection to Internet  Act as temporary indexing servers for other nodes  Help improve the stability of the network Standard nodes  Connect to supernodes and report list of files  Allows slower nodes to participate Search  Broadcast (Gnutella-style) search across supernodes Disadvantages  Kept a centralized registration  allowed for law suits 


Download ppt "15-829A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 9 – Lookup Algorithms (DNS & DHTs)"

Similar presentations


Ads by Google