Download presentation
Presentation is loading. Please wait.
1
Distributed Operating Systems
Luke Wood
2
What is a distributed operating system?
3
Distributed Operating Systems
Runs across multiple physical or virtual machines Utilizes the processing power of multiple machines Huge issues with synchronization in development Play a huge role in the world of "big data" (big daters)
4
What is driving this development
We just have so much data!
5
of the world's data was generated over past two years
90% of the world's data was generated over past two years
6
For real - why didn't we just increase our clock speed Oh that's why.
Why Distributed? For real - why didn't we just increase our clock speed Oh that's why.
7
Today's Solution: Hadoop
Hadoop is the most widely used distributed OS in industry. It is made up of: Hadoop common Hadoop FS MapReduce and so much more...
8
Hadoop History Google File System published in October 2003
MapReduce: Simplified Data Processing on Large Clusters published in December 2004 Named after Doug Cutting's Son's toy elephant hadoop!
9
Used to Process Data Such As
Surveillance Data Social Media Data Stock Exchange Data Power Grid Data Transport Data Search Engine Data
10
Hadoop Case Study - Incredibly impressive results
- Insane performance gains using the cluster Results from Cloud Hadoop Map Reduce For Remote Sensing Image Analysis by Mohamed Almeer
11
The end goal of a distributed OS is to harness the power of multiple machines
12
What? How!? We utilize the Map Reduce Paradigm
15
The End.
16
Just Kidding.
17
Issues and Solutions From an OS and Application level perspective
18
#1: Shared Data When we use a map function - how do we access a shared state? What if our operations are not communicative?
19
Programmer Dependent Solution: Operating System Solution:
- Just use pure functions - This can be a challenge - Not super "general population friendly" Operating System Solution: Operating system provides broadcast functionality Can we update the broadcasted data? How expensive is this broadcasting? Is this a programmer invoked function?
20
#2: Data distribution How do we distribute data between devices?
21
Data Distribution Architectures
Master to workers only useful in MapReduce much simpler than other architectures Peer to Peer file distribution much harder to implement
22
Programmer Dependent Solution: Operating System Solution:
- Explicitly broadcast data - Prevents unnecessary data distribution Operating System Solution: - Try to intelligently distribute data - Delegate specific tasks to specific systems
23
Conclusion - Distributed operating systems have allowed companies to crunch insane amounts of data in reasonable time frames - Parallel and distributed computing are made significantly easier through the use of the mapreduce paradigm - Many of the synchronization problems we have studied in this class are taken care of by the mapreduce implementation
24
Thank you - check out distributed OS programming - it's a ton of fun
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.