Presentation on theme: "DAS 3 and StarPlane have Landed Architecture, Status... Freek Dijkstra."— Presentation transcript:
DAS 3 and StarPlane have Landed Architecture, Status... Freek Dijkstra
DAS history Project to prove distributed clusters are as effective as supercomputers Simple Computer Science grid that works DAS-1 (1997-2002) DAS-2 (2002-2006) DAS-3 (2006-future) 4 sites5 sites4 sites, 5 clusters 200 MHz Pentium Pro Myrinet BSD Linux 1 GHz Pentium3 1 GB memory Myrinet Red Hat Linux 2.2+ GHz Opteron 4 GB memory Myrinet + WAN Not uniform! 6 Mb/s ATM (full mesh)1 Gb/s SURFnet routed8×10 Gb/s dedicated
Parallel to Distributed Computing Cluster Computing Parallel languages (Orca, Spar) Parallel applications Distributed Computing Parallel processing on multiple clusters Study non-trivially parallel applications Exploit hierarchical structure for locality optimizations Grid Computing
DAS-2 Usage 200 users; 25 Ph.D. Theses Simple, clean, laboratory-like system Example Applications: Solving Awari (3500-year old game) HIRLAM: Weather forecasting GRAPE: simulation hardware for astrophysics Manta: distributed supercomputing in Java Ensflow: Stochastic ocean flow model http://www.cs.vu.nl/das2/
Grid Computing Ibis: Java-centric grid computing Satin: divide-and-conquer on grids Zorilla: P2P distributed supercomputing KOALA: co-allocation of grid resources CrossGrid: interactive simulation and visualization of a biomedical system VL-e: scientific collaboration using the grid (e-Science) LamdaRAM: share memory among cluster nodes Grid Middleware Computing Clusters + Network Applications
Colourful Future: DAS-3 Timeline Autumn DAS-3 proposal initiated Summer Proposal accepted September European tender preparation December Tender call February Five proposals received April ClusterVision chosen June Pilot cluster at VU August Intended installation End Official ending DAS-2 Funding: NWO, NCF, VL-e (UvA, Delft, part VU), MultimediaN (UvA), Universiteit Leiden 2006 2005 2004
DAS-2 Cluster MyrinetMyrinet 32-72 compute nodes Fast interconnectLocal interconnect 100 Mb/s Ethernet head node To local University and wide area interconnect 1 Gbit/s Ethernet 2 Gbit/s
DAS-3 Cluster MyrinetMyrinet 32-85 compute nodes Fast interconnectLocal interconnect 10 Gbit/s Ethernet 1 Gbit/s Ethernet To SURFnet head node To local University NortelNortel 10 Gbit/s Ethernet 10 Gbit/s
Problem space CPUData Network DAS-2 DAS-3 & StarPlane
SURFnet6 In The Netherlands SURFnet connects between 180: universities; academic hospitals; most polytechnics; research centers. with a user base of ~750k users ~6000km fiber comparable to railway system
Common Photonic Layer (CPL) 5 rings Initially 36 lambdas (4x9) Later 72 lambdas (8x9) Troughput of each lambda is up to 10 Gb/s now Later up to 40 Gb/s per lambda
Quality of Service (QoS) by providing wavelengths Old Quality of Service: One fiber, with a single lambda Set part of it aside on request Rest gets less service New Quality of Service: One fiber, multiple lambda (separate colours) Move requests to other lambdas as needed Rest also gets happier!
StarPlane Topology 4 DAS-3 sites, with 5 clusters Interconnected with 4 to 8 dedicated lambdas of 10 Gb/s each Same fiber as for regular Internet External Connectivity Grid 5000 GridLab Media archives in Hilversum
StarPlane Project StarPlane will use the SURFnet6 infrastructure to interconnect the DAS-3 sites The novelty: to give flexibility directly to the applications by allowing them to choose the logical topology in real time Ultimately configure within subseconds People and Timeline: 1 postdoc, 1 AIO, 1 scientific programmer (Jason Maassen - VU; Li Xu - UvA; JP Velders - UvA) February 2006 - February 2010 Funding: NWO, with major contributions from SURFnet and Nortel.
Application - Network Interaction Application Control Plane Network Use Configuration Request start, ring, full mesh
Application - Network Interaction Network App1App2App3 time App1App2App3 Application Initiated Network Configuration Workflow Initiated Network Configuration Work Flow Manager
StarPlane Applications Large stand-alone file transfers User-driven file transfers Nightly backups Transfer of medical data files (MRI) Large file (speedier) Stage-in/Stage-out MEG modeling (Magneto encephalography) Analysis of video data Application with static bandwidth requirements Distributed game-tree search Remote data access for analysis of video data Remote visualization Applications with dynamic bandwidth requirements Remote data access for MEG modeling SCARI
Conclusions This fall, DAS-3 will be available at a university near you StarPlane allows applications to configure the network We aim for fast (subsecond) lambda switching. Workflow systems and/or applications need to become network aware For details: see the StarPlane poster this evening!
DAS 3 and StarPlane have Landed Architecture, Status...... and Application Research
Network Memory LambdaRAM software uses memory in the local cluster as a local cache. Faster then caching at disk (access time ~1ms for network; ~10ms for disk) (Very) high-rez remote image Blue box: active (visualized) zoom region Green area: cached on other cluster nodes http://www.evl.uic.edu/cavern/optiputer/lambdaram.html