Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discovery Net Yike Guo, John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and Bioengineering) Bob Spence (Dept. of Electrical Engineering)

Similar presentations


Presentation on theme: "Discovery Net Yike Guo, John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and Bioengineering) Bob Spence (Dept. of Electrical Engineering)"— Presentation transcript:

1 Discovery Net Yike Guo, John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and Bioengineering) Bob Spence (Dept. of Electrical Engineering) Tony Cass (Department of Biochemistry), Sevket Durucan (T. H. Huxley School of Environment) Imperial College London Discovery Net

2 AIM To design, develop and implement an infrastructure to support real time processing, interaction, integration, visualisation and mining of massive amounts of time critical data generated by high throughput devices.

3 The Consortium Industry Connection : 4 Spin-off companies + related companies (AstraZeneca, Pfizer, GSK, Cisco, IBM, HP, Fujitsu, Gene Logic, Applera, Evotec, International Power, Hydro Quebec, BP, British Energy, ….)

4 Industrial Contribution Hardware : sensors (photodiode arrays, hybrid photodiodes, PMTs), systems (optics, mechanical systems, DSPs, FPGAs) Software (analysis packages, algorithms, data warehousing and mining systems) Intellectual Property: access to IP portfolio suite at no cost Data: raw and processed data from biotechnology, pharmacogenomic, remote sensing (GUSTO installations, satellite data from geo-hazard programmes) and renewable energy data (from our own remote tidal power systems)

5 High Throughput Sensing Characteristics Different Devices but same computational characteristics Data intensive & Data dispersive large scale, heterogeneous distributed data Real-time data manipulation Need to calibrate integrate analyse GRID issues: wide area, high volume, scalability (data, users), collaboration Data issues: different measurements for same object: Data registration, normalisation, calibration & quality control Information issues: annotations semantics, reference, integrated view of data Discovery issues: Distributed Knowledge Discovery, Management Incremental, Interactive Discovery & Collaborative Discovery Distributed Devices Distributed warehousing Distributed Reference DBs Distributed Users Collaborative applications

6 High Throughput Computing Services Distributed Data Engineering Data Registration, Data Normalisation, Data Quality Information Structuring Information Integration & Composition, Semantics & Domain-based Ontologies, Sharing Grid-based Knowledge Discovery Grid-based Data Mining, Collaborative Visualisation DNet Architecture High Throughput Sensing (HTS) Applications Large-scale Dynamic Real- time Decision support Large-scale Dynamic System Knowledge Discovery Grid Basic Infrastructure Globus/Cordon/SRB Utilising Grid Infrastructure for HT Computing Based on Kensington Discovery Platform Based onGlobus & ORBInfrastructure

7 Testbed Applications HTS Applications Large-scale Dynamic Real- time Decision support Large-scale Dynamic System Knowledge Discovery Bio Chip Applications  Protein-folding chips: SNP chips, Diff. Gene chips using LFII  Protein-based fluorescent micro arrays Renewable energy Applications  Tidal Energy  Connections to other renewable initiatives  (solar, biomass, fuel cells), & to CHP and baseload stations Remote Sensing Applications  Air Sensing, GUSTO  Geological, geohazard analysis 1-100 10-100 >50000 Image Registration Visualisation Predictive Modelling RT decisions 1-1000 10-1000 >10000 Data Quality Visualisation Structuring Clustering Distributed Dynamic Knowledge Management Throughput (GB/s) Size (petabytes) Node Number operations 1-10 1-10 >20000 Structuring Mining Optimisation RT decisions

8 Large-scale urban air sensing applications Each GUSTO air pollution system produces 1kbit per second, or 10 10 bits per year. We expect to increase the number (from the present 2 systems) to over 20,000 over next 3 years, to reach a total of 0.6 petabytes of data within the 3-year ramp-up. GUSTO NO simulant 6.7.2001 The useful information comes from time-resolved correlations among remote stations, and with other environmental data sets. You are here

9 Electrical grid There is large potential in embedded generation renewable sources – they will dominate in new build (nuclear., hydro and carbon) power stations. Decentralised power is the new paradigm.. Renewables characterised by large number of small units, often in remote areas wireless connectivity fluctuating,unpredictable loading As total exceeds 12% grid control becomes very difficult without RT e-grid. active management, RT monitoring, RT control, minute to minute security, pan network optimisation. This requires very high bandwidth RT remote station data acquisition, warehousing and analysis.

10 The IC Advantage The IC infrastructure: microgird for the testbed ICPC Resource +20 TB of disk storage +25 TB of tape storage 3 Clusters (> 1 Tera Flops) Network upgrade Over than 12000 end devices 10 Mb/s – 1Gb/s to end devices 1 Gb/s between floors 10 Gb/s to backbone 10 Gb/s between backbone router matrix and wireless capability 2x1Gb/s to LMAN II (10Gb/s scheduled 2004) Access to disparate off- campus sites: IC hospitals, Wye College etc. workstation cluster storage SMP Central Computing Facilities wireless End devices Floor switches Building Router Switches Core Router Switches Proposed Firewall London MAN/ JANET £3m SRIF funding 150 Gflops Processing >100 GB Memory 5 TB of disk storage

11 Particle Physics and Astronomy Research Council (PPARC) ASTROGRID (http://www.astrogrid.ac.uk/) a ~£5M project aimed at building a data- grid for UK astronomy, which will form the UK contribution to a global Virtual Observatory

12 Particle Physics and Astronomy Research Council (PPARC) GridPP (http://www.gridpp.ac.uk/) to develop the Grid technologies required to meet the LHC computing challenge collaboration with international grid developments in Europe and the US

13 EPSRC Testbeds (1) MyGrid Personalised extensible environments for data-intensive in silico experiments in biology Distributed Aircraft Maintenance Environment RealityGrid closely couple high performance computing, high throughput experiment and visualization

14 EPSRC Testbeds (2) GEODISE : Grid Enabled Optimisation and DesIgn Search for Engineering CombiChem : Combinatorial Chemistry Structure-Property Mapping Discovery Net : High Throughput Sensing


Download ppt "Discovery Net Yike Guo, John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and Bioengineering) Bob Spence (Dept. of Electrical Engineering)"

Similar presentations


Ads by Google