Presentation is loading. Please wait.

Presentation is loading. Please wait.

UCLA, Winter Sample from CS240B Past Midterms

Similar presentations


Presentation on theme: "UCLA, Winter Sample from CS240B Past Midterms"— Presentation transcript:

1 UCLA, Winter 2017. Sample from CS240B Past Midterms

2 Problem A. 36% A Source B C D Source E F The time required by these operator to process a tuple are: A, B, D, E : 10ms; C: 20 ms ; F: 30ms A, C and D: are selection operators that send 50% of their input tuples to output and discard the rest. The other operators let 100% of their input through. A1: Show how the execution is scheduled to minimize average tuple latency and show the resulting latency diagram assuming that initially you have 1000 tuples in each input. Problem A1, latency minimization: A+B+C takes =25 s to complete and delivers 250 tuples—i.e. 10 tuples per second. Now, D+E+F takes = 30 s to complete, delivering 500 tuples i.e., 16.6 tuples per sec. Thus to minimize latency the second path is executed first.

3 Problem A3: Maximize memory release first and minimize latency second.
Source B C D Source E F The time required by these operator to process one tuple are: A, B, D, E : 10ms; C: 20 ms ; F: 30ms. A, C and D: are selection operators that send 50% of their input tuples to output and discard the rest. The other operators let 100% of their input through. A2: Show how the execution will be scheduled to minimize memory. While memory release is the primary objective you should still use here latency reduction as your secondary one. Show the memory release diagram. Problem A2: So A has a memory release rate of (mrr) of 500 in 10 s --i.e., 50 tuples per sec. B has mmr=0, and B+C has mmr=1000/30=500/15=33.33 tuples per sec. So, the path is broken after A. The bottom path is also broken after D (D has mmr=50/s). E has mmr=0, and E+F has mmr of 100/40=500/20= 25 tuples/s. Thus, the overall schedule is to maximize mmr is: A and D first (in either order), then B+C and finally E+F. Problem A3: Maximize memory release first and minimize latency second. The mmr determines the schedule, leaving no room latency optimization. But if F were taking 20 seconds, and then EF would go before B+C which only returns 50% of its tuples.

4 B2. it is partially blocking—see picture above.
Problem B, 32%: TSQL2 introduced a new temporal aggregate that given empsal(Eno, Sal, Tstart, Tend) returns for each employee the periods in which his/her salary was growing. Here instead of those periods we want to know the total raises during thoseperiods. Salaries can increase or decrease but employee cannot be rehired. B1. Write the aggregate totraise(Sal): DeltaSal to perform such computation on the input stream empsal(Eno, Sal, Tstart, Tend) ordered by Tstart partitioned by Eno. So, X-axis is time, Y-axis is Sal. Then we want the Difference between salaries at Tc and TA will return that value when we see the Third segment. But if that is missing we have to wait Possibly till Terminate. Punctuation timestamps could be useful here. TA TB TC TD Solution: B2. it is partially blocking—see picture above.

5 Problem B, 32%: TSQL2 introduced a new temporal aggregate that given empsal(Eno, Sal, Tstart, Tend)
returns for each employee periods in which his/her salary was growing. Here instead of the periods we want to know the total raises during those periods. Salaries can increase or decrease but employee cannot be rehired. B1. Write the aggregate totraise(Sal): DeltaSal to perform such computation on the input stream empsal(Eno, Sal, Tstart, Tend) ordered by Tstart partitioned by Eno. aggregate raise (Sal real): (Deltasal Real) Table history(Previous real, Initial Real) Initialize:{ insert into history value (Sal, Sal)} Iterate:{ update history set Previous=Sal, Initial=Sal where Sal >Previous and Initial=0; % beginning of growing phase update history set Previous=Sal, where Sal >Previous; and Initial>0; % a growing phase continues insert into return select Previous − Initial from history where Sal<Previous and Initial>0. update history set Previous=Sal, Initial=0 where Sal <Previous % shirking phase initiate or continue } Terminate:{ insert into return where Previous − Initial from history Initial>0}

6 match-recognize( partition by Eno, ordered by Time
Problem C. 32% Another solution to the previous problem is to have a stream of events such as evnt(Eno, Time, Sal), where Sal=0 denotes that the employee just quit the company, whereas Sal>0 denotes the salary of a just-hired employee, or the salary just updated for a current employee: C1. Write a SQL-MR to express the raise computation described in B. Remember that salaries can increase or decrease. C2. Determine the blocking/non-blocking properties of your query, assuming that the history of each employee in evnt terminates with a quit (Sal=0) tuple. C2. In the previous problem at time Tc We do not know if we are going to see another tuple for the same employee. But if Sal=0 the employee Just quit and we know that no more tuples are coming: blocking is no longer needed ! C select eno, FnlSal − InitSal as Raise from evnt match-recognize( partition by Eno, ordered by Time measures X.Eno as eno, X.Sal as InitSal previous(Z.Sal) as FnlSal One row per match After match skip past last row Maximal match pattern (X Y* Z) define Y as Y.Sal > previous(Y.Sal).

7 Extra Problem B Layout I: A
Source2 Source3 C Sink A Source1 The processing speeds of these operators are: A: 100 tuples/sec B: 100 tuples/sec C: 25 tuple/sec B Consider Layout I and assume that the buffers feeding A, B, and C contain respectively 1000 tuples, 400 tuples and 600 tuples. Also B and C deliver one output tuple for each input tuple, while A is a selection that eliminates about 50% of its input tuples. C1: in which order should the operators be scheduled to minimize the average memory usage? Illustrate your answer with a memory diagram and estimate the resulting average memory usage assuming that each tuple takes 100 bytes. C2. in which order should the operators be scheduled to minimize the average latency measured in the number of missing query answers? Illustrate your answer with a latency diagram that shows the missing tuples over time and estimate the resulting average latency. C1: A and B have the same memory release rate. Thus a first segment that processes 1400 tuples in 14 seconds. Area(A+B)= 14*1400/2=9800 Area of rectangle: 600*14= 8400 C: 600 tuples in 600*4/100= 24 sec. area(C)= 24*600/2= 7200 Total Area= = tuples*sec. Avg: 25400/38=668. Answer: If there is no idle-waiting the time is all spent in processing tuples. Thus if N1=N1=0.5 N , these N tuples will be processed in time 0.5*N/ *N/ *N/ *N/1000= 2.5*N/1000. Then say that C also takes N/400 seconds to process N tuples. Now say that N2 =0 , so the union group process the N tuples in source1 in N/1000+N/1000= N/500 much faster. We will break to optimize memory and the response time will suffer. Now say that N1 =0 , so the union group process the N tuples in source2 in N/500+N/1000= 3*N/ /333 . Then C C2: A delivers 500 answer tuples per sec. B delivers 1000 per sec and will go first and completes in 4 sec A goes next and completes in 10 sec. C goes last in 24 sec. Average tuples missing. 4*( ) +10*( )+24*300=23,000 On the average 605 tuples missing over 38 sec.


Download ppt "UCLA, Winter Sample from CS240B Past Midterms"

Similar presentations


Ads by Google