Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CS 294-42: Project Suggestions Ion Stoica (http://www.cs.berkeley.edu/~istoica/classes/cs294/11/) September 14, 2011.

Similar presentations


Presentation on theme: "1 CS 294-42: Project Suggestions Ion Stoica (http://www.cs.berkeley.edu/~istoica/classes/cs294/11/) September 14, 2011."— Presentation transcript:

1 1 CS 294-42: Project Suggestions Ion Stoica (http://www.cs.berkeley.edu/~istoica/classes/cs294/11/) September 14, 2011

2 Projects This is a project oriented class Reading papers should be means to a great project not a goal in itself! Strongly prefer groups of two Perfectly fine to have the same project at cs262 Today, I’ll present some suggestions But, you are free to come up with your own proposal Main goal: just do a great project 2

3 Where I’m Coming From? Key challenge: maximize economic value of data, i.e., Extract value from data while reducing costs (e.g., storage, computation) 3

4 Where I’m Coming From? Tools to extract value from big-data Scalability Response time Accuracy Provide high cluster utilization for heterogeneous workloads Support diverse SLAs Predictable performance Isolation Consistency 4

5 Caveats Cloud computing is HOT, but lot of NOISE! Not easy to differentiate between narrow engineering solutions and fundamental tradeoffs predict the importance of the problem you solve Cloud computing it’s akin Gold Rush! 5

6 Background: Mesos Rapid innovation in cloud computing No single framework optimal for all applications Running each framework on its dedicated cluster Expensive Hard to share data 6 Dryad Pregel Cassandra Hypertable Need to run multiple frameworks on same cluster

7 Background: Mesos – Where We Want to Go Hadoop Pregel MPI Shared cluster Today: static partitioningMesos: dynamic sharing uniprogrammingmultiprogramming

8 Background: Mesos – Solution Mesos is a common resource sharing layer over which diverse frameworks can run 8 Node Hadoop Node MPI … Mesos

9 Background: Workload in Datacenters Frontend (Web- servers, dabses) Decision-driven processes Exploratory queries (e.g., Dremel) Production jobs (e.g., compute summaries) Analytics jobs 9 High Low Interactive (low-latency) Batch Priority Response

10 Datacenter OS: Resource Management, Scheduling 10

11 Hierarchical Scheduler (for Mesos) Allow administrators to organize into groups Provide resource guarantees per group Share available resources (fairly) across groups Research questions Abstraction (when using multiple resources)? How to implement using resource offers? What policies are compatible at different levels in the hierarchy? 11

12 Cross Application Resource Management An app uses many services (e.g., file systems, key- value storage, databases, etc) If an app has high priority and the service it uses doesn’t, the app SLA (Service Level Agreement) might be violated Research questions Abstraction, e.g., resource delegation, priority propagation? Clean-slate mechanisms vs. incremental deployability This is also highly challenging in single node OSes! 12

13 Resource Management using VMs Most cluster resource managers use Linux containers (e.g., Mesos) Thus, schedulers assume no task migration Research questions: Develop scheduler for VM environments (e.g., extend DRF) Tradeoffs between migration, delay, and preemption 13

14 Task Granularity Selection (Yanpei Chen) Problem: number of tasks per stage in today’s MapRed apps (highly) sub-optimal Research question: Derive algorithms to pick the number of tasks to optimize various performance metrics, e.g., utilization, response time, network traffic subject to various constraints, e.g., capacity, network 14

15 Resource Revocation Which task we should revoke/preempt? Two questions Which slot has least impact on the giving framework? Is the slot acceptable to receiving framework? Research questions Identify feasible slot for receiving framework with least impact on giving framework Light-weight protocol design 15

16 Control Plane Consistency Model What type of consistency is “good-enough” for various control plane functions File system metadata (Hadoop) Routing (Nicira) Scheduling Coordinated caching … Research question What are trade-off between performance and consistency? Develop generic framework for control plane 16

17 Decentralized vs. Centralized Scheduling Decentralized schedulers E.g., Mesos, Hadoop 2.0 Delegate decision to apps (i.e., frameworks, jobs) Advantages: scale and separation of concerns (i.e., apps know the best where and which tasks to run) Centralized schedulers Knows all app requirements Advantages: optimal Research challenge: Evaluate centralized vs. decentralized schedulers Characterize class of workloads for which decentralized scheduler is good enough 17

18 Opportunistic Scheduling Goal: schedule interactive jobs (e.g., <100ms latency) Existing schedulers: high overhead (e.g., Mesos needs to decide on every offer) Research challenge: Tradeoff between utilization and response time Evaluate hybrid approach 18

19 Background: Dominant Resource Fairness Implement fair (proportional) allocation for multiple types of resources Key properties Strategy proof: users cannot get an advantage by lying about their demands Sharing incentives: users are incentivized to share a cluster rather than partitioning it 19

20 DRF for Non-linear Resources/Demands DRF assume resources & demands are additive E.g., task 1 needs (1CPU, 1GB) and task 2 needs (1CPU, 3GB)  both tasks need (2CPU, 4GB) Sometime demands are non-linear E.g., shared memory Sometime resources are non-linear E.g., disk throughput, caches Research challenge: DRF-like scheduler for non-linear resources & demands (could be two projects here!) 20

21 DRF for OSes DRF designed for clusters using resource offer mechanism Redesign DRF to support multi-core OSes Research questions: Is resource offer best abstraction? How to best leverage preemption? (in Mesos tasks are not preempted by default) How to support gang scheduling? 21

22 Storage & Data Processing 22

23 Resource Isolation for Storage Services Share storage (e.g., key-value store) between Frontend, e.g., web services Backend, e.g., analytics on freshest data Research challenge Isolation mechanism: protect front-end performance from back-end workload 23

24 “Quicksilver” DB Goal: interactive queries with bounded error on “unbounded” data Trade between efficiency and accuracy Query response time target: < 100ms Approach: random pre-sampling across different dimensions (columns) Research question: given a query and an error bound, find Smallest sample to compute result Sample minimizing disk (or memory) access times (Talk with Sameer, if interested) 24

25 Split-Privacy DB (1/2) 25 Partition data & computation Private Public (stored on cloud) Goal: use cloud without revealing the computation result Example: Operation f(x, y) = x + y, where x: private y: public Pick random number a, and compute x’ = x + a compute f(x’, y) = r’ = x’ + y recover result: r = r’ – a = (x’ – a) + y = x + y Private DB Public DB f private f public result

26 Split-Privacy DB (2/2) 26 Partition data & computation Private Public (stored on cloud) Example: patient data (private), public clinical and genomics data sets Goal: use cloud without revealing the computation result Research questions: What types of computation can be implemented? Any more powerful than privacy-preserving computation / Data Mining? Private DB Public DB f private f public result

27 RDDs as an OS Abstraction Resilient Data Sets (RDDs) Fault-tolerant (in-memory) parallel data structures Allows Spark apps to efficiently reuse data Design cross-application RDDs Research questions RDD reconstruction (track software and platform changes) Enable users to share intermediate results of queries (identify when two apps compute same RDD) RDD cluster-wide caching 27

28 Provenance-based Efficient Storage (Peter B and Patrick W) Reduce storage by deleting data that can be recreated Generalization of previous project Research challenges: Identify data that can deterministically recreated and the code to do so Use hints? Tradeoff between re-creation and storage May take into account access patter, frequency, performance 28

29 Very-low Latency Streaming Challenge: straglers, failures Approaches to reduce latency: Redundant computations Speculative execution Research questions Theoretical trade-off between response time and accuracy? Achieve target latency and accuracy, while minimizing the overhead 29


Download ppt "1 CS 294-42: Project Suggestions Ion Stoica (http://www.cs.berkeley.edu/~istoica/classes/cs294/11/) September 14, 2011."

Similar presentations


Ads by Google