Distributed Computing Primer UMBC CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides
Agenda Distributed Computing – Evolution of Computing Infrastructure – Networking Infrastructure – Properties of Distributed Systems – Example System Architectures
EVOLUTION OF COMPUTING INFRASTRUCTURE
Mainframe – 50s to 70s Custom hardware Custom low-level specialized code Very expensive solutions
Client/Server – 80s to 00s IT-led architectures More portable solutions Scalable solutions based on demand Reign of the Enterprise Data Warehouse
Cloud – 00s to Today Consumer-grade infrastructure Growing IaaS and PaaS markets Data revolution Focus on applications and not infrastructure
Where does Hadoop fit? A piece of your data infrastructure – Can crunch data for analytics – Can expose data for web applications Exploration of raw data Augments today’s infrastructure IMO, a big toolbox that can do a bit of everything
NETWORKING INFRASTRUCTURE
Single Server HDD CPU RAM NIC Server Scale Up Scale Out Faster CPUs Bigger Storage More Servers
Local-Area Network (LAN) Rack HDD CPU RAM NIC Server HDD CPU RAM NIC Server HDD CPU RAM NIC Server HDD CPU RAM NIC Server Rack HDD CPU RAM NIC Server HDD CPU RAM NIC Server HDD CPU RAM NIC Server HDD CPU RAM NIC Server WAN Gateway
Wide Area Network (WAN) London, England Beijing, China New York, NY
PROPERTIES OF DISTRIBUTED SYSTEMS
Distributed Systems The development of low-cost powerful microprocessors, together with the invention of high speed networks, enable us to construct computer systems by connecting a large number of computers A distributed system is a collection of independent computers that appears to its users as a single coherent system.
Properties of Distributed Systems Reliability Scalability Availability Efficiency CAP Theorem
Reliability Can the system deliver services in face of several component failures?
Scalability Can the system scale to support a growing number of tasks?
Availability How much latency is imposed on the system when a failure occurs?
Efficiency How efficient is the system, in terms of latency and throughput?
CAP Theorem Consistent Available Partition Tolerant Trade-off between Consistency and Availability
Stateful vs. Stateless Whether or not a distributed system saves their state on an attached device for recovery
EXAMPLE SYSTEM ARCHITECTURES
Simple Client/Server
Multi-Tiered Client/Server
Round-Robin Client/Server
Linux Reference A free and open source operating system In this course, we live in Eclipse and the command line Mastery of 'vi' gets you +4 charisma lpic1-v / uickref/linux.pdf
References Google Images