Presentation is loading. Please wait.

Presentation is loading. Please wait.

A guided wave approach to plane-to-plane optical interconnects for multistage networks and multiprocessor computers Alvaro Cassinelli*, Makoto Naruse*,

Similar presentations


Presentation on theme: "A guided wave approach to plane-to-plane optical interconnects for multistage networks and multiprocessor computers Alvaro Cassinelli*, Makoto Naruse*,"— Presentation transcript:

1 A guided wave approach to plane-to-plane optical interconnects for multistage networks and multiprocessor computers Alvaro Cassinelli*, Makoto Naruse*, ** and Masatoshi Ishikawa* Univ. of Tokyo*, CRL** input output Fiber module Multistage hypercube computer Processor arrays 2D folded perfect shuffle permutation  (2)

2 I. Multistage architecture for optical parallel computers Reconfigurable multi-stage architecture Hypercube and omega network examples. II. Optical fiber-based interconnection module. Why guided optics? Module decomposition III. Prototype fabrication and test 4x4 exchange prototype.. Transmittance, alignment tolerances IV. Conclusion. Present and future research directions Plan of the presentation

3 I. Multistage architectures for optical parallel computers Interconnection module … Elementary Processor Array VCSEL array Photo- detector array Data flow Interconnection/ switch module Data flow … All optical Hybrid optoelectronic

4 I.1 Reconfigurable multi-stage architecture: principle A C : We will concentrate on network-based parallel computers (or “direct-connection machines”) rather than on shared memory model (PRAM) as an efficient way to implement parallel computer architectures. This choice is dictated by the fact that dealing with read/write conflicts in PRAM machines is more related with control and routing, and we are primarily interested in topology and communication primitives from a hardware point of view in our research –enhanced communication primitives is what optical technology offers. A C : We will concentrate on network-based parallel computers (or “direct-connection machines”) rather than on shared memory model (PRAM) as an efficient way to implement parallel computer architectures. This choice is dictated by the fact that dealing with read/write conflicts in PRAM machines is more related with control and routing, and we are primarily interested in topology and communication primitives from a hardware point of view in our research –enhanced communication primitives is what optical technology offers. network-based parallel computers Optical technology offers enhanced parallel communication primitives Static Dynamic Reconfigurable interconnection (X, Y or Z). …switches inside processors (local control) …switches outside processors (local or global/external control possible) P1P1 P2P2 PnPn Y Z X Fixed interconnection (X, Y, and Z) mux ULA Mem control P1P1 P2P2 PnPn … … … … … X Y Z … … … … controller … …of great benefit for = distributed memory  shared memory

5 Static networks can be redesigned as single-stage dynamic networks… I.2Dynamic architecture vs. static In an n-degree static topology, each processor should have n distinct optoelectronic I/O ports… Technologically challenging Non reusable architecture Bad scalability P1P1 P2P2 PnPn …processors, switches and interconnections located in distinct modules Optimal use of electronic, optoelectronic and optics Scalability, hardware reusability in other topologies possible introduction of multiple stages… switches interconnections processors P1P1 P2P2 PnPn … … … … …… … … … Feed-back loop … A C : Actually, in our former research, we studied single-stage dynamic interconnections with global control of the switch, using spatial light modulators (OCULAR II). A C : Actually, in our former research, we studied single-stage dynamic interconnections with global control of the switch, using spatial light modulators (OCULAR II).

6 Multi-Stages Single-Stage S & I - m … S & I - 2 … S & I - 1 P1P1 P2P2 PnPn … … I.3 The multi-stage paradigm architecture can be “spanned” into The cost of multiplying the processors is paid back as Simplicity (switches can be elemental 2x2 cross-bars) Scalability / Reconfigurability for different topologies Possibility of pipelining Theoretical background: Multi-stage architectures have been studied for decades in networking applications Hypercube Mesh Cube Cycle Shuffle/exchange Delta Benes De Bruijn [computing] [computing & networking] Tree Pyramid Omega Clos Banyan S witch & P1P1 P2P2 PnPn … I nterconnection Stage m Stage 1 P1P1 P2P2 PnPn P1P1 P2P2 PnPn Stage 2

7 connection module … I.2 The theoretically best optical architecture (connectivity) MOA n (X) = A (n).I (n) … A (k).I (k) … A (1).I (1) (X) Computation made on PE array Optical shuffle of data between PE arrays Optical (2D) Data flow Processor Elements (PE) Array VCSEL array Photo-detector (PD) array PD and VCSEL flip-chip bonded to processor array Matrix representation of computations on the Multistage Optical Architecture X A C : The linear architecture may be “sub-optimal” (Ozatkas) when addressing thermal dissipation issues, but offers much easier “scalability”. Also, the “flat” optimal architecture, will work well with reflective holograms, but would be much difficult to build using fiber arrays. A C : The linear architecture may be “sub-optimal” (Ozatkas) when addressing thermal dissipation issues, but offers much easier “scalability”. Also, the “flat” optimal architecture, will work well with reflective holograms, but would be much difficult to build using fiber arrays.

8 reconfigurable interconnection module … Elementary Processor Array VCSEL array Photo-detector array SLM-based reconfigurable interconnection Optoelectronic processing module a) Free-Space reconfigurable interconnections OCULAR-II Space-invariant interconnections – good/bad? Free-space – alignment issues? Multi-level CGH – good diffraction efficiency Reconfiguration (“switch”) freq. – 100 Hz…

9 Fixed interconnection modules... Processor array in charge of the switching function… OCULAR – III Interconnection module … Elementary Processor Array VCSEL array Photo- detector array Data flow Some examples: shuffle/exchange networks, Clos and Benes crossbars, etc… No lost of interconnection capacity if things are designed properly b) Fixed interconnections (hybrid opto-electronic)

10 Indirect Binary Cube (“multistage hypercube”) Binary Hypercube… X Y W Z I.3 Two well known examples: Self routing: “switches” are set locally by packet address (destination – input)  (Omega) network It is full access, but not full connection. Also useful on computing (FFT)… A C : - Ring, Mesh, and Hypercube are all classes of k-dimensional nearest-neighbor networks. -In the indirect binary hypercube network (as well as in the Generalized Cube), we CAN NOT find the exchange permutation E(k) at the end of stage k, we have to “wait” till the end (the unshuffle is necessary…). - The FFT is an algorithm that is easily embedded in a hypercube topology. Moreover, the shuffle-exchange “direct” binary hypercube architecture can be used to demonstrate a “pipelined” FFT algorithm very easily, because: FFT=(IBnC) -1. , where E(1) has been replaced by W(1). Of course, (IBnC) -1 = “Direct” Binary n-Cube or “generalized Cube network”. -The Omega network is also very useful for primitives of parallel computing, like FFT algorithms!! Omega network is NOT full connection, it is full access BUT blocking. A non blocking network (rearrangeable, no “strict-non blocking” nor “wide sense non-blocking”) is the BENES network. CLOS is another which is strict-non blocking (but the network is not constructed using 2x2 cross-bar switches). -Other interesting architecture would be the Batcher-Banyan, which is “wide-sense” non-blocking… In fact, the OMEGA and IBnC demonstrated here are EQUIVALENT networks (so here the examples are not the best ones) REM: for OC2002, build the hypercube using the exchange, so that pipelining the quad-three compression becomes possible. Then, build other multi-stages. A C : - Ring, Mesh, and Hypercube are all classes of k-dimensional nearest-neighbor networks. -In the indirect binary hypercube network (as well as in the Generalized Cube), we CAN NOT find the exchange permutation E(k) at the end of stage k, we have to “wait” till the end (the unshuffle is necessary…). - The FFT is an algorithm that is easily embedded in a hypercube topology. Moreover, the shuffle-exchange “direct” binary hypercube architecture can be used to demonstrate a “pipelined” FFT algorithm very easily, because: FFT=(IBnC) -1. , where E(1) has been replaced by W(1). Of course, (IBnC) -1 = “Direct” Binary n-Cube or “generalized Cube network”. -The Omega network is also very useful for primitives of parallel computing, like FFT algorithms!! Omega network is NOT full connection, it is full access BUT blocking. A non blocking network (rearrangeable, no “strict-non blocking” nor “wide sense non-blocking”) is the BENES network. CLOS is another which is strict-non blocking (but the network is not constructed using 2x2 cross-bar switches). -Other interesting architecture would be the Batcher-Banyan, which is “wide-sense” non-blocking… In fact, the OMEGA and IBnC demonstrated here are EQUIVALENT networks (so here the examples are not the best ones) REM: for OC2002, build the hypercube using the exchange, so that pipelining the quad-three compression becomes possible. Then, build other multi-stages. [ computing ] [ networking ]

11 …an optical “3D optical wiring” module between 2D VLSI arrays. II. Optical fiber-based plane-to-plane interconnection modules input output Fiber module  (2)

12 II.1 Fiber-based interconnection blocks for multistage architectures. Inter-stage connection fixed and point-to-point: channels can be fibers. Fibers have better efficiency and just like free-space optics, no cross-talk in 3D. No space-invariance required. Precise and robust alignment possible. Theoretically more volume efficient than free-space equivalent! Maybe “hard” to build? Boring, but not a fundamentally difficult - can be automated… Alignment of both output and input needed… Power dissipation may be a fundamental limitation, but we are far from these limits… input output Prototype Fiber module (fibers and holders) “integrated” 2D folded perfect shuffle permutation module  (2) …wave-guide arrays for fixed, point-to-point and space variant interconnections are an interesting alternative to free-space optics “Volume-consumption comparisons of free-space and guided-wave optical interconnections”, Y.Li and J. Popelek, p.1815-1825, Appl.Opt. Vol 39, n.11, april 2000.

13 II.2 “Decomposition” of the interconnect into modules Many multistage networks use simple, regular interconnections… [ Problem ] A C : In group-theoretic-based construction of MINs (giving symmetric networks), the most useful permutations are the exchange, the shuffle, the butterfly, the bit reversal and the shift permutation. The nature of the decomposition of the interconnect into EITHER the column, row or the diagonal dimensions may also reintroduce the use of light- efficient one dimensional, non pixilated, rapid reconfigurable diffractive elements (such as acousto optics). A C : In group-theoretic-based construction of MINs (giving symmetric networks), the most useful permutations are the exchange, the shuffle, the butterfly, the bit reversal and the shift permutation. The nature of the decomposition of the interconnect into EITHER the column, row or the diagonal dimensions may also reintroduce the use of light- efficient one dimensional, non pixilated, rapid reconfigurable diffractive elements (such as acousto optics). However, when folded in a plane, these may materialize as non-regular, non- scalable and non-reusable interconnection modules! 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 4 5 2 3 6 7 8 9 12 13 10 11 14 15 columns rows Scan map Fractal map

14 Permutations are decomposed “ad-hoc” into their “row” and “column” exclusive permutations parts, plus some simple-to- fold “link” permutation… Because it may be possible to cascade fiber-based modules without too much loss of light power, let’s “break” these into simple to fold modules. [Solution ] “simple to fold” means: 1) Simple to implement by stacking planer wave-guide structures 2) Or simple to implement using previously built modules (scalability) Permutations are decomposed “recursively” verticalhorizontal diagonal

15 The idea is to define permutation “constructors” that correspond to basic building steps using PLC circuits (stacking, grouping modules). This decomposition methodology also applies to the switching stages (no other thing that a set of possible permutations) Z P n “zoom” constructor Q P n “quadrant” constructor L n/2 P n/2 R n/2 P n/2 Permutation layer P n/2 Permutation module P n Vertical replicator Horizontal replicator

16 Let’s try that on the previous examples: Indirect Binary n-Cube Network  (Omega) network …uses the butterfly  (k) and perfect shuffle  (k) permutations …uses only the perfect shuffle  (k) permutation

17 Example: shuffle and butterfly decomposition …It is easy to see that the ad-hoc folding of a “regular” permutation needs a maximum of three concatenated “stacked” modules {b n, … b k+1, b k, b k-1, … b 2, b 1 } shuffle  n (k) {b n, … b k+1, b k-1, b k-2, … b 1, b k } butterfly  n (k)  n (k)  n (k) {b n, … b k+1, b k, b k-1, … b 2, b 1 } {b n, … b k+1, b 1, b k-1, … b 2,b k }  n (k) = L n/2  n/2 ; R n/2  n/2 ; L Decomposition using constructors:  n (k) = R n/2  n L n/2  n/2 ;  n/2 ; L “ad-hoc” “recursive”  2p (2p) = Q p-1 T 2 (1,2)

18  (k) =  row (k) Folding the shuffle permutation  (k) =  row (n/2).  col (k-n/2).L - and L is the “link” permutation.  col(2) If k  n/2, the shuffle “acts” over rows : - where  col is a column shuffle,  row (k) If k > n/2, the shuffle can be written as: Link 0 1=001 2 3 4=100 8 12 Can be built by stacking “slices”

19  (2) =  row(2) Folding a butterfly into a 4x4 array  (k) =  col (k-n/2).L.  col(k-n/2) - and L exchanges row and column LSB Link 0 1=001 2 3 4=100 8 12  col(2) If k  n/2, the butterfly  (k) “acts” over rows : - where  col is a column butterfly,  row(2) If k > n/2, the butterfly  (k) can be written as: A C : REM: the modules can be built by stacking layers, to that planar-optics technology used. In particular, we can think again about fan-in and fan-out channels… (cf. NHK company. A C : REM: the modules can be built by stacking layers, to that planar-optics technology used. In particular, we can think again about fan-in and fan-out channels… (cf. NHK company.

20 …back to examples:  network shuffle Processor arrays (exchange switches and more)  row (2 L  90 º  col (2 pair of PE implement elemental exchange switch

21 I.3 Indirect Binary 4-Cube  (2)  (3) PE array 1 (exchange) PE array 2 (exchange) PE array 3 (exchange)  (4) PE array 4 (exchange)  -1 (4) Processor arrays (exchange switches and more)

22 III. 4x4 prototype fiber module. Preliminary tests Two holder prototypes: Zirconium, SiO 2 Pitch: 250±5  m Multimode graded index fibers: NA=0,21 (core 50  m, cladding 126  m) Transmission loss: 3dB/km Length: 30 cm

23 Input (VCSEL 854±4nm) Output (CCD) III.1 Preliminary tests on a 4x4 prototype module [ Interconnection pattern ] [ Transmittance (one channel) ] Max. transmittance 38,45% for I=9,5 mA 0 5 10 15 20 25 30 35 40 45 678910111213 VCSEL driving current (mA) Transmittance (%) 38,45 9,5 LED regime LASER regime  (2) A C : REM: the light coming from the non-addressed channels is mainly due to some default functioning of the neighboring VCSELs which emits LED light though they are OFF! A C : REM: the light coming from the non-addressed channels is mainly due to some default functioning of the neighboring VCSELs which emits LED light though they are OFF!

24  (2 ) input output No relay optics between VCSEL array and fiber module input III.2 Alignment tolerances (test performed on a single channel)  x  50  m VCSEL array X,Y, and Z translation stage Power meter VCSEL ON Horizontal excursion 0 0.05 0.1 0.15 0.2 0.25 -105-90-75-60-45-30-1501530456075 X (microns) exit power (mW) xx Alignment tolerances (half peak power)  y  70  m A C : The differences on alignment tolerances are probably due to the non-circular shape of the VCSEL mode. A C : The differences on alignment tolerances are probably due to the non-circular shape of the VCSEL mode.

25 IV. Conclusion Input/output alignment of modules Microlenses, Fibers with round ends. Modules built from fiber bundles. Active alignment. Demonstrator architectures using smart pixel arrays (2x2 or 4x4 electronic switches) [ Present research ] A C : Multi-function modules: the use of optical fiber modules fits well with the all optical approach; for instance, one can imagine a module with several different interconnection patterns, but also other “optical-functions” like optical delay lines: However, in all-optical networks the “switches” may be very fast (electro optical devices, not MEMS), because the delay time for avoiding the drop of ATM cells is ?? for a typical Gigabit network!!! A C : Multi-function modules: the use of optical fiber modules fits well with the all optical approach; for instance, one can imagine a module with several different interconnection patterns, but also other “optical-functions” like optical delay lines: However, in all-optical networks the “switches” may be very fast (electro optical devices, not MEMS), because the delay time for avoiding the drop of ATM cells is ?? for a typical Gigabit network!!! 0 1 2 3 Optical interconnection

26 The shuffle-exchange networks are degree 2 bidelta-networks: the exchange stage is composed of elemental 2x2 crossbars 0 1 2 3 Optical interconnection 2x2 electronic or optical crossbars (Mach-Zenhder, directional couplers) are: - simple - adapted to plane topologies But for 3d structures, the crossbar element can be “naturally” larger! (3x3, 4x4). The degree 4 bidelta-network uses the 4-shuffle permutation… [Delta networks adapted to 3d interconnections]

27 [ Future research directions ] Multi-interconnection modules “Mixed” interconnections, and other optical functions Circuit switching for all optical networks Packet switching in a buffered architecture with globally controlled stages Integrated plane-to-plane multistage paradigm using permutation “slices” for intra-chip massive, regular interconnections. Guided-wave interconnects can be “modulated” and integrated !

28 Interleaved permutations into the same module: multi-permutation/switch module A C : Rem: Dynamic alignment is tightly coupled with dynamic reconfiguration of the interconnect. Cf. Naruse’s presentation. A C : Rem: Dynamic alignment is tightly coupled with dynamic reconfiguration of the interconnect. Cf. Naruse’s presentation.  “Normal” directional coupling between waveguides? Multi-permutation module A small controlled mechanic or optical perturbation can produce a drastic change of the interconnection pattern from input to output. ( …optical switches does not need to be “local” –i.e,2x2)  Use of MEMS technology?

29 Transparent circuit switching by TDM interconnections We are now building a demonstrator using mechanical displacement of modules containing a by-pass interconnection and cube interconnections “spanned” hypercube with weak-communication “all-optical” multistage architecture …optical switches does not need to be “local” (2x2)

30 …globally controlled exchange stages + Intermediate buffers A new paradigm for packet switching in multistage networks

31 waveguide “permutation slices” IC layer … - a nice application for photonic bandgap coupling structures - 3d IC integration of regular interconnected circuits Integrated multistage architecture? WG Normal coupling photonic structure


Download ppt "A guided wave approach to plane-to-plane optical interconnects for multistage networks and multiprocessor computers Alvaro Cassinelli*, Makoto Naruse*,"

Similar presentations


Ads by Google