CS240B Midterm: Winter 2017 Your Name: and your ID:

CS240B Midterm: Winter 2017 Your Name: and your ID:
Problem Max score Score Problem 1 40% Problem 2 36% Problem 3 24% Total 100%

Problem 1. 40% B1 B2 B3 Sink1 Sink2 C D Source A B A and B process 1000 tuples per second whereas C processes 400 per second and D processes 200 tuples per second. Say that initially buffer B1 contains 10,000 tuples whereas the other buffers are empty. Compute the time T needed to complete the processing of these tuples (by a single processor). Describe a scheduling inspired by the chain algorithm that will optimize the average usage of memory during time T, and illustrate your answer with a simple graph. Define a scheduling that optimizes response time, i.e. minimizes the average time until tuples are delivered to the output. Illustrate your answer with a simple graph. Assume that all the parameters of our problem remain the same except that we now assume that C is a selection operation that, on the average, filters out half of its input tuples: e.g., for 1000 input tuples only 500 are outputted to sink1. Will any of your answers to questions 1—3change under the new assumption? Please justify your answers, and give the new answer for those that have changed. Answer: If there is no idle-waiting the time is all spent in processing tuples. Thus if N1=N1=0.5 N , these N tuples will be processed in time 0.5*N/ *N/ *N/ *N/1000= 2.5*N/1000. Then say that C also takes N/400 seconds to process N tuples. Now say that N2 =0 , so the union group process the N tuples in source1 in N/1000+N/1000= N/500 much faster. We will break to optimize memory and the response time will suffer. Now say that N1 =0 , so the union group process the N tuples in source2 in N/500+N/1000= 3*N/ /333 . Then C

Problem 1. 40% B1 B2 B3 Sink1 Sink2 C D Source A B A and B process 1000 tuples per second whereas C processes 400 per second and D processes 200 tuples per second. Say that initially buffer B1 contains 10,000 tuples whereas the other buffers are empty. Assume that A and B produce one tuple in output for each tuple in input. Compute the time T needed to complete the processing of these tuples ((by a single processor). T= =95s Describe a scheduling inspired by the chain algorithm that will optimize the average usage of memory during time T, and illustrate your answer with a simple graph. The slope to A and B is flat since they release no memory. Memory for tuples in B3 are released only after they are processed by both C and D. Thus we have a slope /95. The overall memory is 5000*95 and the overall memory used is tuples*seconds = 13.1 tuples for one hour. Answer: If there is no idle-waiting the time is all spent in processing tuples. Thus if N1=N1=0.5 N , these N tuples will be processed in time 0.5*N/ *N/ *N/ *N/1000= 2.5*N/1000. Then say that C also takes N/400 seconds to process N tuples. Now say that N2 =0 , so the union group process the N tuples in source1 in N/1000+N/1000= N/500 much faster. We will break to optimize memory and the response time will suffer. Now say that N1 =0 , so the union group process the N tuples in source2 in N/500+N/1000= 3*N/ /333 . Then C

A Naïve serial Schedule
B C 10000 D ` T=95s 10s 10s 25s 50s Time

Alternative serial Schedules: Memory Used
B C 10000 D T=95s 10s 10s 25s 50s Time

Problem 1. 40% B1 B2 B3 Sink1 Sink2 C D Source A B A and B process 1000 tuples per second whereas C processes 400 per second and D processes 200 tuples per second. Say that initially buffer B1 contains 10,000 tuples whereas the other buffers are empty. 3. Define a scheduling that optimize the response time, i.e. minimizes the average time in tuples are delivered to the output. Illustrate your answer with a simple graph. he path A>B>(C||D) delivers 20,000 tuples in T=95s. The path A> B>C delivers 10,000 tuple to Sink1 in 45s. The path A>B>D delivers tuples in 70s. Now: /45 =20000/90 > 20000/95. Thus the processing is done in two phases: A>B>C first and B>C next. 4. Assume that all the parameters of our problem remain the same except that we now assume that C is a selection operation that, on the average, filters out half of its input tuples: e.g., for 1000 input tuples only 500 are outputted to sink1. Will any of the answers to questions 1—3 change under the new assumption? Please justify your answer, and give the new answer for those that have changed. Neither processing times not speeds of memory release for input buffers have changed so the answer to 1.2 has not changed. However output productivity of C has changed and now we have that A>B>C delivers tuples in 45s: 5000/45=111 Moreover A>B>D= 10000/( )= 10000/70= Finally A>B>(C||D) delivers /95=157. This last win and the processing is A>B>(C||D). This is the winner. Answer: If there is no idle-waiting the time is all spent in processing tuples. Thus if N1=N1=0.5 N , these N tuples will be processed in time 0.5*N/ *N/ *N/ *N/1000= 2.5*N/1000. Then say that C also takes N/400 seconds to process N tuples. Now say that N2 =0 , so the union group process the N tuples in source1 in N/1000+N/1000= N/500 much faster. We will break to optimize memory and the response time will suffer. Now say that N1 =0 , so the union group process the N tuples in source2 in N/500+N/1000= 3*N/ /333 . Then C

A B C D Problem 3: Minimizing Latency 20000 10000 T=95s 10s 10s 25s
Time

Question 4: C is a selection operation that, on the average, filters out half of its input tuples: e.g., for input tuples only 5000 are outputted to sink1 C. A B C 15000 …………………………….. D T=95s 10s 10s 25s 50s Time

Problem 2: 36% Sensors in a building generate the stream: sensd(ID, Etype, Timestamp) which describes when people (identified by their IDs) triggered the following types of sensor-detected events: enter: the person enters the building exit: the person exit the building wlab: the person visit a wet lab dlab: the person visit decontamination lab. Of course, there other types of labs and events in our stream, but we are focusing ion these four, because the strict_rule is that everybody who entered the building and then visited one or more wetlabs must go trough a decontamination lab as the very last event before he or she exit the building: thus for people who visited some wlab, dlab must be the very last event they before exit. Problem 2a: Write an SQL-MR query to detect all the people who satisfied the strict_rule by not visiting any wlab while in the building (i.e. after their enter event and before their exit event). Your query should return the ID of those people and the time when they exited the building. Problem 2b: Write an SQL-MR query to detect all the people who actually violated the strict_rule: i.e., people who visited a wlab and then exited the building but their last action before that was not a visit to a dlab. Your query should return the ID of those people and the time they exited the building.Z Problem 2c. Are either of these two queries blocking, or are they non-blocking or something in between? (Hint: Phe solution of problem 3 might also be useful in answering this question.)

2a: No wlab. 2b wlab & No dlab before exit
%Problem 2a: Select A_ID, exitTime from sensd match_recognize ( partition by ID order by Timestamp mesures A.ID as A_ID, C.Timestamp as exitTime, one row per match after match skip past last row maximal match pattern (A B* C) define A as (A.Etype= enter), B as (B.Etype <> wlab), C as (C.Type = exit) ) %Problem 2b: Select A_ID, exitTime from sensd match_recognize ( partition by ID order by Timestamp mesures A.ID as A_ID, C.Timestamp as exitTime, one row per match after match skip past last row maximal match pattern (A X* V Y* C) define A as (A.Etype= enter), V as (B.Etype = wlab), C as (C.Etype = exit and previous(C.Etype) <> dlab) )

Problem 2b another solution
Select A_ID, exitTime from sensd match_recognize ( partition by ID order by Timestamp mesures A.ID as A_ID, C.Timestamp as exitTime, one row per match after match skip past last row maximal match pattern (A X* V + C) define A as (A.Etype= enter), V as (first(V.Etype) = wlab and last(V.Etype<>dlab>), C as (C.Etype = exit) )

Problem 3: 24% Here too we have the stream: sensd(ID, Etype, Timestamp) as in Problem 2. But here your solutions must use UDAs (written using SQL as we have done in many examples). The SQL query calling those UDAs is: select ID, Timestamp, uda3x over (Etype partition by ID ordered by Timestamp unlimited preceding) from sensd. Please write the following udas to detect all occurrences of the pattern of interest, not just the first: Problem 3a: Write uda3a to solve problem 2b. When detecting a violation of the strict rule, your uda3a should return the message “strict_rule_violated”) Problem 3b: Write uda3b to solve the extended version of problem 2a, whereby uda3b will produce the message “strict_rule_observed” for people who never visited a wlab and for those who visited a dlab just before exiting the building.

3a: wlab & No dlab just before exit
Problem 3a: Write uda3a to solve problem 2b. When detecting a violation of the strict rule, your uda3a should return the message “strict_rule_violated”) aggregate uda3a(Next): char, { table memo(Wlab int, Previous char); Initialize: Iterate: {insert into memo values (0, enter) where Next=“enter” /* ignore events between enter and Wlab*/ update memo set Previous=Next where 1= (select Wlab from memo)) update memo set Wlab=1 where Next= “wlab” and “enter”= (select Previous from memo) insert into return “strict_rule_violated”; where Next=“exit” and 1= (select Wlab from memo) and dlab <> (select Previous from memo) update memo set values (0, exit)) return: {} }

3a: wlab & No dlab just before exit
Problem 3b: Write uda3b to solve the extended version of problem 2a, whereby uda3b will produce the message “strict_rule_observed” for people who never visited a wlab and for those who visited a dlab just before exiting the building. aggregate uda3a(Next): char, { table memo(Wlab int, Previous char); Initialize: Iterate: {insert into memo values (0, enter) where Next=“enter” /* ignore events between enter and Wlab*/ update memo set Previous=Next where 1= (select Wlab from memo)) update memo set Wlab=1 where Next= “wlab” and “enter”= (select Previous from memo) insert into return “strict_rule_respected”; where Next=“exit” and 0= (select Wlab from memo) or dlab = (select Previous from memo) update memo set values (0, exit)) return: {} }

Problem 2C Blocking operator: one that returns no answer until it has seen the end of input. e.g. max. Even if actually have seen whole input we can only return the result when we are told that there is no-more. Nonblocking operator: one that returns the whole answer before it has seen the end of input. I.e. if the addition of input-end will add no additional answer. I.e. if the knowledge that there is no-more will not add any additional answer. In our example: enter, ..., dlab, exit, …, enter, ... dlab, …|eof

CS240B Midterm: Winter 2017 Your Name: and your ID:

Similar presentations

Presentation on theme: "CS240B Midterm: Winter 2017 Your Name: and your ID:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS240B Midterm: Winter 2017 Your Name: and your ID:

Similar presentations

Presentation on theme: "CS240B Midterm: Winter 2017 Your Name: and your ID:"— Presentation transcript:

Similar presentations

About project

Feedback