Presentation is loading. Please wait.

Presentation is loading. Please wait.

3/25/2002Gustavo Cancelo1 Data flow analysis in the Processor Farmlet Transient Behavior of the M/M/1 process.

Similar presentations


Presentation on theme: "3/25/2002Gustavo Cancelo1 Data flow analysis in the Processor Farmlet Transient Behavior of the M/M/1 process."— Presentation transcript:

1 3/25/2002Gustavo Cancelo1 Data flow analysis in the Processor Farmlet Transient Behavior of the M/M/1 process

2 3/25/2002Gustavo Cancelo2 Review of a stationary M/M/1 queue The input and output to the queue are Poisson distributed. : input data rate (w/s, pix/BCO)  : output data rate  : traffic intensity   S0 S1 S2 State diagram Steady State equations

3 3/25/2002Gustavo Cancelo3 Problem statement A Farmlet has N processing nodes. When one nodes fails, the remaining N-1 nodes will see an increase in the average data flow they must process. Several questions come up: How is the data flow dynamics in the remaining N-1 working processors? Is the new system stable or do we need to throttle data? If the node’s failure can be fixed by reinitializing the processor, how much time do we have before running into problems with the other nodes? How much processing idle-time we need to provide to each processor to prevent data loss by instability (i.e overflow) during a fault? Note that processing idle-time almost linearly translates into $$$.

4 3/25/2002Gustavo Cancelo4 Problem statement (cont.) Before failure (BF) at t->  After failure (AF) at t->  i Buffer Manager o = i/N...... i Buffer Manager o = 0 o = i/(N-1)...... If  AF >1 the system becomes unstable and must throttle data in the long term. Here the term “unstable” implies that N-1 nodes cannot process the average input rate and in the “long term” (i.e after 4 or 5 time constants of the system dynamics ) the buffers will grow unbounded. Equations 1b, 1c, and 1d are valid only if the system remains stable. (1b) (1c) (1d) (1a)

5 3/25/2002Gustavo Cancelo5 Transient M/M/1 The M/M/1 process dynamics can be modeled by a differential- difference equation. This equation shows that M/M/1 Space State is continuous in the time domain an discrete in the state that describes the buffer size. To solve (2) we must use both the Laplace and the Z-transforms. (2) Where i represents the buffer size at t 0 and P 0 *(s) is the Laplace transform of the initial state. (3)

6 3/25/2002Gustavo Cancelo6 Transient M/M/1 (continued) The solution to (3) is: where (4) I k (x) is the modified Bessel function of the first kind. Equation (4) not only includes Bessel functions but also an infinite sum of them! The increasing and decreasing exponential terms in P k (t) generate numerical problems when we calculate P k (t) for t -> 

7 3/25/2002Gustavo Cancelo7 Transient M/M/1 (continued) P k (t) represents the entire distribution at each time instant. If we let t-> , (4) provides the M/M/1 steady state equations Now we’ll focus on P k (t)’s 1 st moment P mean (t). There isn’t a closed expression for P mean (t). Standard methods use numeric integration in the time domain or the in the transformed domain. An alternative is approximation of P mean (t) with simpler functions. (4)

8 3/25/2002Gustavo Cancelo8 Optimal Least-Squares Approximation to P mean (t) Optimal Least-Squares Approximation to P mean (t). Assuming that P mean (t) is stable (i.e. ρ= λ/μ < 1), it can be demonstrated that P mean (t) is a non-decreasing function in t with exponential behavior. Hence it make sense to approximate it with a function like The approximation is done using the L 2 norm The approximation involves inserting equation (5) into (6), deriving, and equating the result to zero to solve for the optimal coefficients. Since q n (t) is an infinite series it must be truncated. The truncation defines our model. That is the order of the approximation and the number of coefficients we need to solve for. (5) (6)

9 3/25/2002Gustavo Cancelo9 --- P k (t)/q mean --- q 1 /q mean --- q 2 /q mean Due to the exponential behavior of P mean (t), 1 st and 2 nd order approximations work pretty well. b 1, in q 1 and a 1, b 1 and b 2 in q 2 are a function of ρ. An important result of this approximation is that a time constant for the dynamic process can be obtained. Let τ 1 =1/ b 1 be the time constant of the 1 st order approximation. Optimal Least-Squares Approximation to P mean (t) (cont.)

10 3/25/2002Gustavo Cancelo10 Optimal Least-Squares Approximation to P mean (t) (cont.) Example: Let τ 1 =1/ b 1 ρ=0.9 b 1 = 0.0123μ μ=3.3events/ms Then, τ 1 =24.4ms 24.4ms => 24.4 million clocks of a 100MHz clock system (not bad!) --- P k (t)/q mean --- q 1 /q mean --- q 2 /q mean If we want to have a moderate increase in queue sizes and processor workload we should attempt to recover from a fault condition as fast as possible. A fault recovery time ~ 0.1τ 1 will increase the queue by ~ 20%

11 3/25/2002Gustavo Cancelo11 Summary Upon fault of a processing node the remaining N-1 nodes see an increase of their input queue size and processing workload (Eq. 1a-1d). The fault condition may bring the system to unstability or saturation in the long term (Eq. 1a). However, if we can recover fast enough from the fault condition, we may be able to keep the queue size and workload within reasonable bounds, even when the process is unstable in the long term. This will allow us to design keeping the processors with a low idle-time. A throttling system must be available for when the system cannot recover from a fault, such us faults caused by hardware problems.

12 3/25/2002Gustavo Cancelo12 Some Costing Considerations Motherboard + 4 Daughterboards (M+4D) = $2380 Motherboard + 6 Daughterboards (M+6D) = $2820 Cost of Idle Time –M+4D case: $17K per every 1% (10% IdleTime=$170K) –M+6D case: $13K per every 1% (10% IdleTime=$130K) How much does it cost to remain stable after a processor fault: –M+4D case:  2  1 <0.75: ~$250K –M+6D case:  2  1 <0.83: ~$113K

13 3/25/2002Gustavo Cancelo13 The Triplet’s File If, for instance, dim(x)=16 bits, dim(y)=16 bits, dim(z)=7 bits, each Pixel Point occupies 40 bits. Three Pixel Point occupy 120. We can use 11 for tags => 128 bits = 4-32 bit words per Triplet line. Example of a Triplet entry in the file: -888 -1 8 -0.35682 -0.68696 -34.17005 -0.39532 -0.83705 -38.42006 -0.38984 -1.00026 -42.67006 0 8 -0.35761 -0.68310 -33.77005 -0.38282 -0.81598 -38.02005 -0.41201 -0.99012 -42.27005 … -888 Non-bend view plane Bend view plane Station No N-1 plane x, y, z N plane x, y, zN+1 plane x, y, z Pixel Points Inner/Outer, Left/Right going triplet

14 3/25/2002Gustavo Cancelo14 Triplet’s Data Statistics Average event size: –In number of triplets: 88.90 triplets –In number of words (4-32bit words/triplet): 355.61 words (1.4KB) –  event size In number of triplets: 80.06 In number of words (4-32bit words/triplet): 320.25 words (1.3KB) Largest event size of a sample of 2500 events –In number of triplets: 633 triplets (7.12 times the average) –In number of words (4-32bit words/triplet): 2532 words (~10KB) Average execution time: –In  s: 90.96  s –In number of BCO clocks: 688.83 clocks –  execution time: In  s: 141.7  s In number of BCO clocks: 1073.8clocks

15 3/25/2002Gustavo Cancelo15 Triplet’s Data Statistics (2) The throughput is based in the average execution time 90.96  s. –If we can execute at this speed we’d only need 690 processors! Average data throughput to a M+4D: –In number of triplets: 3.52 million triplets/s –In number of words (4-32bit words/triplet): 14.08 Mw/s –In bits/s 450.56 Mb/s Average data throughput to a M+6D: –In number of triplets: 5.28 million triplets/s –In number of words (4-32bit words/triplet): 21.12 Mw/s –In bits/s: 675.84 Mb/s

16 3/25/2002Gustavo Cancelo16 Triplet’s Data Statistics (3)

17 3/25/2002Gustavo Cancelo17 Triplet’s Data Statistics (4)

18 3/25/2002Gustavo Cancelo18 Farmlet simulation run Simulates a processor failure –The simulation run for 800 BCOs –Processor No4 “failed” at BCO=100 and was operative again starting at BCO=600. Simulation parameters –Processor’s internal queue maximum size is 2 events deep. –Buffer Manager’s individual queues are just one event deep. –The FIFO input buffer size is not restricted. –The data is moved around in 32-bit words. One word per clock cycle. The system clock is set at 106 MHz. Simulation Results –Input FIFO queue size –Processor idle time

19 3/25/2002Gustavo Cancelo19 Simulation output (1)

20 3/25/2002Gustavo Cancelo20 Simulation output (2)

21 3/25/2002Gustavo Cancelo21 Simulation output (3)


Download ppt "3/25/2002Gustavo Cancelo1 Data flow analysis in the Processor Farmlet Transient Behavior of the M/M/1 process."

Similar presentations


Ads by Google