Presentation on theme: "P2pWeb Slide1Peer-To-Peer : Concept, Tools and Applications The p2pweb Project Low cost Peer to Peer solutions for high availability web hosting 19 Mai."— Presentation transcript:
p2pWeb Slide1Peer-To-Peer : Concept, Tools and Applications The p2pweb Project Low cost Peer to Peer solutions for high availability web hosting 19 Mai 2005 Séminaire « Peer-To-Peer : Concept, Tools and Applications » Ecole dingénieurs de Genève
p2pWeb Slide2Peer-To-Peer : Concept, Tools and Applications Agenda 1.The Project goals 2.Web hosting solutions and architecture 3.The p2pweb solution 4.Project constraints and key technologies 5.Related projects 6.The project components –Global server load balancing system –Distributed set of web server –Monitoring system –Node architecture and hardware 7.Conclusion
p2pWeb Slide3Peer-To-Peer : Concept, Tools and Applications To explore and implement low cost solutions for high availability web hosting Do More with Less Our targets are : small or medium structures (associations, NGO, etc …) with limited resources (money, IT people) with important web hosting needs (bandwidth available) –rich and complex web site –medium to high web traffic –high availability and visibility needs It may fit very well the needs of many project in Least Developed Countries : TeleCentres Networks, Rural Organisations, Universities, Cultural Centres, Public Libraries, Community Multimedia Centres, Health Networks, etc... The Project goals
p2pWeb Slide4Peer-To-Peer : Concept, Tools and Applications Afromix.org (personal web site) A portal of African and Caribbean Cultures since 1993 A complex web site using multiple technologies in house Perl Content Management System (CMS) an extended discographic database (1600 artist, more than 50 styles from all Africa and French West Indies) multilingual (French, English, Spanish) site running on a JAVA application server (Tomcat) about 25 000 files, 400 000 pages/month, 2 million hits/month, 60 000 unique visitors/month Mediaport.net (community web site) One of the first French web pioneer, first developed in INA mostly static content (near 10 000 files) multilingual (French, English) site running on a PHP CMS (ezpublish) its the main p2pweb test platform and it will evolve to an open web hosting solution for artistic and cultural web projects (an editorial committee is forming) Example of hosted web site
p2pWeb Slide5Peer-To-Peer : Concept, Tools and Applications The web hosting market Free web hosting –Very limited static html or small PHP site (limited computing resources) cant use your own domain name Professional web hosting –A broad range of services private virtual server dedicated server Co/location –But price is quite high 100-200/month for one dedicated server and maintenance can be complex
p2pWeb Slide6Peer-To-Peer : Concept, Tools and Applications Centralized architecture Server in one location : Server and Internet link are single point of failure (SPOF)
p2pWeb Slide7Peer-To-Peer : Concept, Tools and Applications Centralized architecture (cont.) Database cluster SAN Storage Application Servers Load Balancers Web servers Reverse Proxy / Cache / SSL accelerators Load Balancers Multi-homing with BGP routing High availability architecture Datacenter hosting - BGP routing - hardware load balancing - SAN storage In theory, no SPOF but very complex architecture very high cost
p2pWeb Slide8Peer-To-Peer : Concept, Tools and Applications CDN Architecture Content Delivery Network Service delivered by companies like Akamai, Speedera, and others. Edge servers provide caching and data replication for fast delivery to clients worldwide. A solution for very high traffic web site. Very expensive solution.
p2pWeb Slide9Peer-To-Peer : Concept, Tools and Applications alternative web hosting Community based web hosting –Initiatives from various associations ouvaton.coop, globenet.net, autre.net, altern.net,... –Most of the time, people share their money and knowledge to buy and administer one or two dedicated server. Home server –We now have sufficient bandwidth (ADSL) computing power (PCs), good software (apache, linux …) – We lack reliability !
p2pWeb Slide10Peer-To-Peer : Concept, Tools and Applications First idea : big home server
p2pWeb Slide11Peer-To-Peer : Concept, Tools and Applications Second idea (better one) Lots of people (family, friends, co-workers, …) already have : An ADSL Internet access or Permanent High Speed Connection One or more PCs (with a lot of unused disk space) So, what about sharing those resources to build a more powerful and resilient network of web servers
p2pWeb Slide12Peer-To-Peer : Concept, Tools and Applications Web Hosting : the p2pweb way ADSL ISP 1 ADSL ISP 2 ADSL ISP 3 Each member of the p2pweb network share a portion of his Internet bandwidth (most of the time an ADSL line) and host a small server. The result is a powerful network that is the sum of the bandwidth and computing resources of all the members.
p2pWeb Slide13Peer-To-Peer : Concept, Tools and Applications A peer to peer solution Somehow, its a return to the very fundamentals principles of Internet: –a cooperative solution (network of servers) –a distributed solution (no central control) –a fault tolerant solution (resilience) But with all the power of existing internet and open source technologies –consumer computers and internet access –overlay network and services over the Internet –It is a peer to peer solution !
p2pWeb Slide14Peer-To-Peer : Concept, Tools and Applications The project constraints Unreliable component –Node failure is not an exception, its the rule. –Internet link failure, power outage, server crash … Automatic function –Murphys law : servers will always crash when there is nobody to fix the problem (at night, when you are on vacation …) Pragmatic approach –Build from existing component –Simple and efficient solutions are priority choices
p2pWeb Slide15Peer-To-Peer : Concept, Tools and Applications Key technologies Mass market products are available at low cost now ! ADSL lines –1 Mb/s Up - 15Mb/s Down for 30 / month (free.fr) ADSL router / firewall / ethernet or wifi –D-LINK, NetGear, LINKSYS from 75 to 150 Small Servers –PC barebones (Asus, Biostar, Shuttle …) from 300 to 500 –mini iMac (Apple) 499 Open Source Software –BSD, Linux, apache, tomcat, etc …
p2pWeb Slide16Peer-To-Peer : Concept, Tools and Applications Related projects YouServ (IBM) http://www.almaden.ibm.com/cs/people/bayardo/userv/ YouServ is software that forms a webserving "grid" by allowing its users to pool their desktop computing resources to create one large, virtual web- space. An intranet project, more oriented on desktop file sharing. Unfortunately not open source Vergenet (Simon Horman) http://www.vergenet.net/ Vergenet has servers located in Sydney, Amsterdam, London, Tokyo and Indiana. These servers are all running Linux and a variant of Super Sparrow to load balance traffic between them. Super Sparrow enables users to load balance traffic between geographically separated points of presence by finding the site network- wise closest to clients. This is done by accessing BGP routing information (but it require direct access to a BGP router)
p2pWeb Slide17Peer-To-Peer : Concept, Tools and Applications Related projects (cont.) Coral (New York University) http://www.coralcdn.org/ Coral is peer-to-peer content distribution network, comprised of a world-wide network of web proxies and name servers Publishing through Coral is as simple as appending a short string to the hostname of objects' URLs; a peer-to-peer DNS layer transparently redirects browsers to participating caching proxies an URL like www.myserver.com/some/path.html becomes www.myserver.com.nyud.net:8090/some/path.html Coral is in fact running on top of the planet-lab network (a grid computing research network : http://www.planet-lab.org/) Globule (Vrije University Amsterdam) http://www.globule.org/ Globule is a module for the Apache Web server that allows a given server to replicate its documents to other Globule servers. Clients are automatically redirected to one of the available replicas. The project provide both content replication and HTTP or DNS based redirection mechanisms
p2pWeb Slide18Peer-To-Peer : Concept, Tools and Applications P2PWeb - Project Components A global server load balancing system –Two main functions Load balance the traffic on the web servers Provide failover = only send traffic on alive web servers A distributed set of web server –And a set of tools to : Publish content on the servers Keep all servers in sync (replication mechanism) Monitoring services
p2pWeb Slide19Peer-To-Peer : Concept, Tools and Applications Global server load balancing Load balancing –achieved using Round Robin DNS simple system, with well known limits (http://www.tenereillo.com/GSLBPageOfShame.htm) Failover –achieved by coupling a monitoring system (NAGIOS) with the DNS DNS entries have short TTL (time to live) NAGIOS monitors each web servers When a server change state (for example DOWN) a special handler is called that update the DNS entry and reload the DNS The failed server is no longer announced by the DNS To have a fully redundant system, we use 3 independents DNS (all primary), each running its own NAGIOS instance
p2pWeb Slide20Peer-To-Peer : Concept, Tools and Applications GSLB : Failover illustrated Initial DNS entries : all server are up www 300 IN A 18.104.22.168 www 300 IN A 22.214.171.124 www 300 IN A 126.96.36.199 www 300 IN A 188.8.131.52 Server 184.108.40.206 fails In the syslog trace, we can see : 22:22:46 nagios: SERVICE ALERT: ns1;HTTP-P2PWEB;CRITICAL;SOFT;1;Connection refused by host 22:23:47 nagios: SERVICE ALERT: ns1;HTTP-P2PWEB;CRITICAL;SOFT;2;Connection refused by host 22:24:46 nagios: SERVICE ALERT: ns1;HTTP-P2PWEB;CRITICAL;HARD;3;Connection refused by host After 3 unsuccessfull try, a notification is send by email to the admin 22:24:46 nagios: SERVICE NOTIFICATION: nagios;ns1;HTTP-P2PWEB;CRITICAL;notify-by-email;Connection refused by host The specific handler is called 22:24:47 nagios: SERVICE EVENT HANDLER: ns1;HTTP-P2PWEB;CRITICAL;HARD;3;http_p2pweb_handler And the DNS is reloaded 22:24:47 named: master/p2pweb.net.zone:1: no TTL specified; using SOA MINTTL instead And now we can verify that the DNS entries are www 300 IN A 220.127.116.11 ;www 300 IN A 18.104.22.168 www 300 IN A 22.214.171.124 www 300 IN A 126.96.36.199 Failover time is : 2 or 3 minutes (NAGIOS) + DNS max TTL (here 5 minutes) = less than 10 minutes
p2pWeb Slide21Peer-To-Peer : Concept, Tools and Applications GSLB : next steps Improvements : –Better service provisioning (manual process for now) –Better support for long downtime When a server crash for a long period of time and then recovers its content may be outdated We must not announce it back until it has re-synchronize itself –Proximity load balancing The goal is to load balance traffic between geographically distributed servers by finding the site network-wise closest to clients. A technology used in the CDN (Content Delivery Network) world We can use part of the globule project, as Globule support DNS redirection based on 'AS-path length' policy (used in BGP routing) which tries to redirect clients to a server close to them. These BGP information's can be collected through routeviews.org (no direct access to a BGP router needed)
p2pWeb Slide22Peer-To-Peer : Concept, Tools and Applications Web server content management We have a set of web servers and we need tools to : –Publish content on all servers –Keep them in sync (content replication) Two main replication strategies primary backup : one master server to form replicas active replication : if any changes, one replica propagates them back to all the other ones ADSL ISP 1 ADSL ISP 2 ADSL ISP 3
p2pWeb Slide23Peer-To-Peer : Concept, Tools and Applications static content replication One server play the masters role –Content is published first on the master (for example via FTP) –Then the content is either pushed or pulled on the replica The easiest way is to use rsync (rsync.samba.org) Content can be pulled via anonymous rsync from master Content can be pushed via rsync over ssh (using private/public key pair for security) ADSL ISP 1 ADSL ISP 2 ADSL ISP 3 Master Replica
p2pWeb Slide24Peer-To-Peer : Concept, Tools and Applications Content replication : rsync rsync is a file transfer program for Unix systems. rsync provides a very fast method for bringing remote files into sync. It does this by sending just the differences in the files across the link, without requiring that both sets of files are present at one of the ends of the link beforehand. Anonymous rsync server (pull mode) Run as a standalone daemon or can be launched by inetd Advanced security options (read-only, chroot, IP access list) Use : run from crontab on each mirror rsync -a master.mydomain.com::www/ /data/www/ Rsync over SSH (push mode) Need ssh access on each mirror And ssh cryptographic keys exchange for unattended operation Use : run on demand or from crontab on master rsync -a /data/www/ firstname.lastname@example.org::/data/www/ Useful options --compress compress file data during the transfer --bwlimit=KBPS limit I/O bandwidth; KBytes per second
p2pWeb Slide25Peer-To-Peer : Concept, Tools and Applications Content distribution : Satellite For a lot of geographically distributed mirrors, an interesting solution can be Datacasting over satellite Technology used by some CDN vendors –Skycache, cidera, Skystream.com, panamsat.com Now available at lower cost from worldspace.fr (SatPost Solution)
p2pWeb Slide26Peer-To-Peer : Concept, Tools and Applications Use of CMS Nowadays most webmasters use CMS (Content Management System) tools for publishing –A lot of open source and commercial tools Spip, mambo, typo3, phpnuke, … (php) Bricolage, metadot, slashcode, … (perl) Cofax, opencms, magnolia, jahia, … (java) Plone, cps, zwook, … (python) But none of them has direct support for a distributed architecture Most use a database as a backstore Database distributed transaction and replication is a hard problem
p2pWeb Slide27Peer-To-Peer : Concept, Tools and Applications CMS : a pragmatic solution The webmaster publish using the CMS as usual –The content is exported as static html files –Then distributed on the replicas using rsync Constraint : the CMS must support export with static like URLs Either directly or thru URL rewriting /article/sport/2005/4/13/football.html (good) /article.php?id_category=3&id_article=25 (bad for mirroring) ADSL ISP 1 ADSL ISP 2 ADSL ISP 3 webmaster Master : static html files Replica CMS Back office html export Replica
p2pWeb Slide28Peer-To-Peer : Concept, Tools and Applications CMS : distributed architecture (1) Example : a non-governmental organization has activity over 4 countries and want to provide a global web presence. The same global web design and tools are used on all servers. Local publishing Each local webmaster publish news about his country using the CMS on the local server Content exchange using web services Each local web server collect (pull) new articles from the other servers using some RSS (Really Simple Syndication) web services Global web presence Global content is (re)constructed on each server (from all data from the others) and served on Internet Such solution may be constructed by hacking/customizing existing CMS ADSL ISP 1 ADSL ISP 2 ADSL ISP 3 Ivory coast Senegal Burkina faso Mali XML content exchange
p2pWeb Slide29Peer-To-Peer : Concept, Tools and Applications CMS : distributed architecture (2) CMS + Message-oriented middleware (MOM) A MOM is a client/server infrastructure that increases the interoperability, portability and flexibility of an application by allowing the application to be distributed over multiple heterogeneous platforms. Thru the use of queue system, a MOM can provide asynchronous reliable data exchange. MOM is typically asynchronous and peer-to-peer and supports –Point to point communication –Publish and subscribe communication There is a standardized interface in Java : JMS (java Message Service) API Various open source implementation in the java world ActiveMQ ( activemq.codehaus.org) OpenJMS (openjms.sourceforge.net) Joram (joram.objectweb.org) MantaRay (mantamq.org) No CMS use it now (as far as i know), but it may be a very good solution
p2pWeb Slide30Peer-To-Peer : Concept, Tools and Applications Performance monitoring We collaborate with the webperf.org project –WebPerf is a system for measuring response time of specified URLs from multiple locations on the internet. –The project is founded on the premise that there are lot of other companies who also require such a monitoring service. If the other companies are willing to monitor our URLs, we will montior theirs (a free co-peering arrangement). Some perl script installed on local node collect data from other web site, then data are pushed to a central repository for further analysis. A web interface allow members to display various statistics. A view of ones web site as seen from all other the world.
p2pWeb Slide34Peer-To-Peer : Concept, Tools and Applications Node architecture and security ADSL or Cable modem Ethernet router/firewall Optional Wifi access point Private Ethernet LAN Ethernet link Internet Security Mandatory Hardware router/firewall with NAT capabilities Internal private network using RFC 1918 IP address (192.168.x.y) No incoming traffic from the outside other than required Controlled via redirect on the firewall http (port 80) ssh (port 22, optional) Web server P2pweb traffic
p2pWeb Slide35Peer-To-Peer : Concept, Tools and Applications Node hardware (example) Run on the corner of a desk An ethernet and wifi switch Connect other computers (not shown here) A web and application server Mac mini (apple) running apache2 and tomcat A firewall Embedded PC (www.pcengines.ch) running pf (packet filter) on OpenBSD from a compact flash No noise, and low electric power consumption (near 50W)
p2pWeb Slide36Peer-To-Peer : Concept, Tools and Applications Conclusion It can be done (at low cost) It runs, with good results (service uptime measured by siteuptime.com) www.p2pweb.net hosted by the p2pweb network monitored Since: 9/23/2004 Outages: 40 Total Uptime: 99.560% Downtime/year: 38,5 hours www.afromix.org hosted on a single node monitored Since: 9/23/2004 Outages: 37 Total Uptime: 97.634% Downtime/year: 207,3 hours Still a lot of improvements Not already an easy to use solution : node admin still require good Unix knowledge Most important : a new way to design web applications
p2pWeb Slide37Peer-To-Peer : Concept, Tools and Applications The Future What we can provide right now P2pweb.net : a global load balancing solution for any distributed web project Just provide the servers IP addresses and a health check URL Mediaport.net : a Community web hosting solution We can host various web projects We are looking for Partnerships in the following domains : Packaging an easy and ready to use solution for deploying web mirrors (industrializing the solution) dedicated LINUX or BSD Distro with preinstalled packages all in one solution : Java CMS + MOM in one webapp application Helping in deploying such solution in Least Developed Countries The P2PWeb Solution fits perfectly for Least Developed Countries with weak bandwidth and low connectivity,
p2pWeb Slide38Peer-To-Peer : Concept, Tools and Applications Contacts P2pweb is a SourceForge project (bsd license) www.p2pweb.net or mediaport.sourceforge.net Contacts : about the project : email@example.com you want to be hosted on mediaport.net : firstname.lastname@example.org email@example.com
p2pWeb Slide39Peer-To-Peer : Concept, Tools and Applications Questions Thank you Questions ?