We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byStone Casey
Modified about 1 year ago
© 2013 A. Haeberlen, Z. Ives Welcome to CIS 455 / 555 – Internet and Web Systems Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems January 14, 2015
© A. Haeberlen, Z. Ives 2 What this Course Is About How do we build services like Google, Akamai, iTunes, Facebook, EBAY, …? What are the principles behind them? (This is NOT a course on building Web sites! See CIS 450/550…) How do “cloud computing,” P2P, and Web services relate? The main themes of the course: Distributed systems concepts, with emphasis on data, scalability and interoperability (including “the cloud”) Data representation fundamentals, with emphasis on XML Information retrieval concepts, including ranking and indexing It’s a course that involves building software using the principles learned, evaluating it, and programming in teams
© A. Haeberlen, Z. Ives 3 How Does this Relate to Other CIS Courses? NETS 212 Cloud service layers Key/value stores, in particular MapReduce, Spark, and data-parallel programming basics CIS 450/550 Data representation and management Relational querying with SQL; XML querying with XQuery DBMS-backed web sites 455/555 focuses on data with respect to interoperability CIS 350/573: software engineering and mashups CIS 505: focuses on distributed systems and algorithms CIS 505 is less project-oriented than CIS 555 CIS 555 covers Web services, cloud architectures in more detail
© A. Haeberlen, Z. Ives 4 Some Things We’ll Look at What are the principles behind building systems that work on the Internet? How do these relate to many of today’s hot technologies? Web servers, DHTML, Servlets, JSP, … XML Web services Peer-to-peer Application servers Cloud computing environments Content distribution networks Web search Mash-ups The cloud …
© A. Haeberlen, Z. Ives 5 Staff Instructor: Zack Ives, Office: 576 Levine North Office hours W 1:30-2:30 (and by arrangement) TAs: Avani DeshpandeAkshay Hegde Mounica MaddelaShruthi Gorantala Shenga Ding Piazza: piazza.com/upenn/spring2015/cis455555piazza.com/upenn/spring2015/cis Will have custom homework submission platform (coming soon)
© A. Haeberlen, Z. Ives 6 Textbooks Distributed Systems: Principles and Paradigms, 2 nd ed, Tanenbaum and van Steen We’ll read from the book ~50% of the time Frequent supplementary handouts Excerpts from several books Many recent research papers Your first one, which you should read by Wed: Hints/Acrobat.pdf(linked off the CIS 555 page) Hints/Acrobat.pdf
© A. Haeberlen, Z. Ives 7 Prerequisites, Workload, etc. Necessary skills: Ability to code in Java: there is a substantial implementation project Good debugging skills – this will be the biggest time sink! The ability to work as a team with classmates (towards the end) A willingness to learn how to read API documentation Some exposure to threads and concurrent programming A willingness to “push the envelope” Workload: Several programming/debugging-based homework assignments A substantial term project with experimental evaluation and a report Two midterms Payoff: Lots of practical development and debugging experience A good working knowledge of the fundamentals behind scalable systems A working “academic clone of Google,” hosted on Amazon EC2! WARNING: this course should be considered 1.5 CU!
© A. Haeberlen, Z. Ives 8 A Disclaimer… This remains a “bleeding edge” course! Goal 0: an understanding of scalable distributed data-centric systems Goal 1: a look under the covers of today’s hottest topics – in lectures and in projects Goal 2: a level of comfort in managing large, complex software development with others’ code Part of this means doing a substantial implementation project As in the real world: learning APIs, dealing with inadequate tools Most of you will find this a struggle! You’ll spend many hours debugging! We will be using some immature technology Not everything will have been validated ahead of time We’ll do the best we can to smooth over the bugs! We hope it will be a fun course, though… … And an interesting one!
© A. Haeberlen, Z. Ives 9 A Bit of Context for the Course
© A. Haeberlen, Z. Ives 10 What Exactly Is the Web? The Web consists of HTTP servers that publish HTML, XML, and a few other content types These are hyperlinked via URLs (a subset of URIs) Plus there are a huge number of web clients The Web is built on a number of Internet protocols: DNS, TCP, IP Other Internet services use other protocols SMTP, IMAP, POP, AIM, FTP, … Streaming media, music swapping protocols, … Web services, custom applications may actually also use HTTP in ways it wasn’t designed for
© A. Haeberlen, Z. Ives 11 The Internet is Built in Layers IPv4, IPv6 Unicast, (multicast) TCP (session- based) UDP (sessionless) WiFi, ZigBee, Ethernet, WiMax Lightweight streaming, etc. SSH, FTP, HTTP, IM, P2P, … Web Services, distrib transactions, … Link IP Transport Session Middleware Your Application ……
© A. Haeberlen, Z. Ives 12 What Is an Internet System? Not just a web server or web application… An application built over the Internet, whose functionality is distributed across more than one machine Typically, at least in a client-server or server-to-server fashion, but may have many more participants Typically, data and/or code must be exchanged in distributed fashion for the functioning of the application Often, the data must be partitioned, replicated, translated, etc. (“shards” in Google-speak) Often, the code is written in multiple different environments, languages, etc. Often, there are concerns about handling failures, firewalls, attacks, …
© A. Haeberlen, Z. Ives 13 Why Are Internet System Topics Interesting? Understanding what’s underneath today’s Web How does it work? What are its shortcomings? What are its strengths? Understanding distributed algorithms Using the right approach when designing new protocols and web systems Being able to anticipate what’s actually possible in the future
© A. Haeberlen, Z. Ives 14 Example: Web Search, a Cloud Service Index Servers Crawlers Search Interface Servers queries HTML forms; results query results Web Pages pages keywords + locations client Uses a model of document/word similarity to rank matches
© A. Haeberlen, Z. Ives 15 Example: Social Networking (Facebook / Twitter), a Cloud Service Recommender Users & entities User Page Servers clicks pages & notifications suggestions common properties, usage logs, … client updates, posts
© A. Haeberlen, Z. Ives 16 Example: Enterprise (or Web) Information Integration XML sources Mediator System queries results in “mediated schema” client Relational sources HTML sources XQuery + XPath over XML SQL ODBC results HTTP POST HTML Maps all data into a single format and virtual schema
© A. Haeberlen, Z. Ives 17 Example: Problem Partitioning client Breaks computation into many parts and distributes them to the clients Data Aggregation New sub- problems Computed subresults
© A. Haeberlen, Z. Ives 18 Example: P2P File Sharing client request data Processes name-based requests for data; each node can make requests, forward requests, return data
© A. Haeberlen, Z. Ives 19 What are the Hard Problems? Disclaimer: most of the hard problems AREN’T solved (or solvable) – and there often isn’t any single BEST solution Much of systems design is about finding the right compromise for each specific problem We can divide them into: Scalability Availability / reliability Consistency Interoperability Location and resource discovery
© A. Haeberlen, Z. Ives 20 Scalability How do we support a large number of clients or requests? Distribute work! Challenges: Coordination – takes significant overhead in the general case Load balancing – avoid having bottlenecks Parts of the solution: Client-server, multi-tier, P2P architectures Restricted programming models, e.g., MapReduce Data partitioning, replication, remote procedure calls, …
© A. Haeberlen, Z. Ives 21 Availability/Reliability How do we ensure the system is “up” when we want it to be, and doing the “right” thing? Replication and redundancy Security measures against attacks Ability to undo/redo Challenges: Keeping things consistent Performance vs. security Acknowledgments Parts of the solution: Data partitioning, replication, … Logging, transactions, … Redundant hardware, multiple sites, … Quorum and consensus algorithms
© A. Haeberlen, Z. Ives 22 Consistency / Consensus Replication, distribution, and failures make it difficult to keep a unified, consistent view of the world – how do we combat this? Locking, concurrency control, and invalidation schemes Clock synchronization Challenges: Locking has huge performance overhead Network partitions, disconnected operation Parts of the solution: Optimistic concurrency control, 2-phase locking Distributed clock sync Conflict resolvers
© A. Haeberlen, Z. Ives 23 Interoperability How do we coordinate the efforts of components that have different data formats and/or source languages, and are on different machines? Standardization! Challenges: Everything has a different semantics! Parts of the solution: Standard data formats: XML, XML schemas “Schema mediation” and data translation Remote procedure calls: CORBA, XML-RPC, …
© A. Haeberlen, Z. Ives 24 Location & Resource Discovery How do you find what you’re looking for? Naming Declarative queries over standard schemas Advertisements Challenges: Naming has implicit semantics What do you do when you don’t know what to call something? Parts of the solution: Directory systems – DNS, LDAP, etc. Resource discovery and advertising protocols Overlay networks, sharding schemes Standardized schemas
© A. Haeberlen, Z. Ives 25 Our First Focus: Single Machines, aka Servers How do you handle large numbers of concurrent users? Processes Threads Events Hybrids (e.g., thread pools) Staged architectures
© A. Haeberlen, Z. Ives 26 Next Time… We’ll look under the covers of an HTTP server Key ideas in building scalable systems Principles of HTTP and web servers Management of concurrent sessions To read by next Wednesday: Lampson and Saltzer paper Hints/Acrobat.pdf Hints/Acrobat.pdf Tanenbaum Ch. 3.1 If necessary: Review Tanenbaum “Modern OS,” Ch. 2.3 or a similar OS book on interprocess communication
© 2013 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Case studies September 24, 2013.
Ken Birman Cornell University. CS5410 Fall
© 2013 A. Haeberlen NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Storage at Facebook December 3, 2013.
1 Computer Networks: A Systems Approach, 5e Larry L. Peterson and Bruce S. Davie Chapter 9 Applications Copyright © 2010, Elsevier Inc. All rights Reserved.
Chapter 6 Architectural Design Slide 1 Chapter 6 Architectural Design.
© 2010 VMware Inc. All rights reserved One does not simply start a career in IT: Launch yours with an Alexandar Bonev, Manager QE.
The Client/Server Database Environment CS263 Lecture 12.
The Architecture of Transaction Processing Systems Chapter 26.
1 GREY BOX TESTING Web Apps & Networking Session 1 Boris Grinberg
1 GREY BOX TESTING Web Apps & Networking Session 7 Boris Grinberg
Chapter 6 – Architectural Design 1Chapter 6 Architectural design Software Engineering Ian Sommerville, Software Engineering, 9 th Edition Pearson.
REST AND JSON. Web 2.0 What is Web 2.0? Commonly associated with web applications that facilitate interactive information sharing, interoperability, user-centered.
Course Overview and Internet Architecture CS 7260 Nick Feamster January 8, 2007.
Distributed Processing, Client/Server and Clusters Chapter 16.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 2: Operating-System Structures.
1 Advanced Database Application Development Performance Tuning Performance Benchmarks Standardization E-Commerce Legacy Systems.
1 E-Commerce Servers Internet, Web and Database server architectures for e-commerce.
© 2012 IBM Corporation January 19, 2014 The Big Deal About Big Data Dean Compher Data Management Technical Professional for UT, NV
Services and Identity Management Prof. Sasu Tarkoma.
Introduction to Network Security INFSCI 1075: Network Security Amir Masoumzadeh.
RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.
Distributed Systems Technologies CM0356/CM0456 Andrew Harrison 1.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 20: Database System.
Course Overview and Introduction Nick Feamster CS 6250: Computer Networking Fall 2011.
Copyright 2005 John Cowan under GPL 1 RESTful Web Services An introduction to building Web Services without tears (i.e., without SOAP or WSDL) John Cowan.
Grid Monitoring Futures with Globus Jennifer M. Schopf Argonne National Lab April 2003.
Copyright 2004 Bernd Brügge TUM Software Engineering WS TUM System Design II Bernd Brügge Technische Universität München Applied Software Engineering.
Chapters 2 & Technologies for e-commerce and web hosting.
1 CSE 380 Computer Operating Systems Instructor: Insup Lee University of Pennsylvania Fall 2003 Lecture Notes: Multiprocessors (updated version)
© 2016 SlidePlayer.com Inc. All rights reserved.