Milanesi Luciano EGEE User Forum, Clermont-Ferrand, France February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute of Biomedical Technologies, Milan, Italy
Milanesi Luciano BioinfoGRID Symposium, Milan December Networks of resources The potential of new biological and biomedical technological platforms in connection with HPC and GRID technology will be particularly useful to deal with the increasing amount, complexity, and heterogeneity of biological and biomedical data. Bioinformatics applications for eHealth have become an ideal research area where computer scientists can apply and further develop new intelligent computation methods, in both experimental and theoretical cases.
Milanesi Luciano BioinfoGRID Symposium, Milan December BioinfoGRID Project BioinfoGRID Project web site:
Milanesi Luciano BioinfoGRID Symposium, Milan December Consortium
Milanesi Luciano BioinfoGRID Symposium, Milan December BioinfoGRID Objectives Objective of the BioinfoGRID project
Milanesi Luciano BioinfoGRID Symposium, Milan December Interaction with related projects At present the BioinfoGRID project has established co-operations with the following projects initiative: EGEE BELIEF EMBRACE EUCHINAGRID EUMEDGRID EELA DILIGENT ICEAGE LITBIO LIBI HEALTHGRID WISDOM
Milanesi Luciano BioinfoGRID Symposium, Milan December BioinfoGRID Work Packages Project Management OfficeWP8 Dissemination and Outreach.WP7 Coordination of technical aspects and relation with Grid infrastructure Projects, user training, application support and resources integration. WP6 Molecular Dynamics ApplicationsWP5 Database and Functional Genomics ApplicationsWP4 Transcriptomics Applications in GRIDWP3 Proteomics Applications in GRIDWP2 Genomics Applications in GRIDWP1 Work Package titleWork-package No
Milanesi Luciano BioinfoGRID Symposium, Milan December HUSAR Program Package GCG EMBOSS DATABASES SRS (Sequence Retrieval System) In-house developments Third-party programs (~130 programs) - >300 - Prompt updates (daily, weekly) (~150 programs) - own programms - automated tasks WP1 – Genomics Applications
Milanesi Luciano BioinfoGRID Symposium, Milan December SoapL ab ScLinux (OS) Grid Client toolkit any more software ?? Interface % formatdb … % blastall … Grid CE WebService Grid API W3H analysis tasks Solaris (OS) % formatdb … % blastall … Grid CE W2H HTML ScLinux (OS) Grid Client toolkit % submit_formatd b … % submit_blastal l or anywhere else ssh target setuppreliminary setup any more software ?? WP1 – Genomics Applications Integrating W3H, SoapLab and the GRID
Milanesi Luciano BioinfoGRID Symposium, Milan December WP2 – Proteomics Applications Perform functional protein analysis in GRID by using the functional protein domain annotations on large protein families using GRID and related databases. All 518 human protein kinases and 5129 proteins from non-redundant chainset of Protein DataBank were analyzed with InterProScan applications
Milanesi Luciano BioinfoGRID Symposium, Milan December WP2 – Proteomics Applications Protein surface calculation in GRID. : the grid was used to compute the volumetric description of the proteins obtaining a precise representation of the corresponding surface. Then protein interactions could be quickly screened by the mean of surface analysis. –The ProSite domains were analyzed all-against-all –ATP-E against its inhibitor –Collagen against integrin
Milanesi Luciano BioinfoGRID Symposium, Milan December WP3 – Transcriptomics applications Phylogenetics : Reconstructing the evolutionary history of a group of taxa is major research thrust in computational biology and a standard part of exploratory sequence analysis. An evolutionary history not only gives relationships among taxa, but also an important tool for inferring structural, physiological, and biochemical properties of sequences from other similar sequences, and reconstruction of tissue evolution.
Milanesi Luciano BioinfoGRID Symposium, Milan December 2007 WP4 – Databases & Genomics Applications Work Package 4: Databases and Functional Genomics Applications –Testing the main biological databases in the Grid environment optimization on storage space, bandwidth, download time –Testing performances and scalability of database-based applications performances/scalability testing according to various use cases and submission algorithms –1 challenge: Gene Analogous Finder 55+ years of computation on a single CPU, not feasible in a local environment.
Milanesi Luciano BioinfoGRID Symposium, Milan December 2007 GridDBManager –Automatic Updater Timer based monitoring and update of Grid ported databases –Adaptive replica manager Constantly adapts the number of replicas in relation to the usage of each database in the last 10 days –Version Regression Keeps patches on the Grid for allowing regression of each database to an earlier version WP4 – Databases handling
Milanesi Luciano BioinfoGRID Symposium, Milan December WP4 – Methods - GridDBManager
Milanesi Luciano BioinfoGRID Symposium, Milan December Testing performances and scalability of Database-Oriented Bioinformatics Applications (DBApp) in the EGEE GRID –Testing Performance and Scalability Grid: too many variables (queue time, database download time, queue failures, execution failures) Submission mode: too many variables (number of jobs, rate-limiting settings, resubmission algorithm) Application too many variables: (performance of specific application, location of database) Probing of Grid performances Numeric simulation for all algorithms WP4 – Methods - DBApp Perf. Testing
Milanesi Luciano BioinfoGRID Symposium, Milan December Probing Grid performances (Example) –Grid queue times and reliability Sent 150 jobs in 3 groups of 50 at different times WP4 – Methods - DBApp Perf. Testing
Milanesi Luciano BioinfoGRID Symposium, Milan December WP5 – Molecular docking The neuraminidase viruses is considered a valid target for antiviral drugs
Milanesi Luciano BioinfoGRID Symposium, Milan December Docking: predict how small molecules bind to a receptor of known 3D structure WP5 – Molecular docking There are successful examples –rapid, –cost effective… But there are limitations –CPU and storage needed More specific talk by Ana Lucia Da Costa Wednesday 13 th 11:15 – Room: Bordeaux
Milanesi Luciano BioinfoGRID Symposium, Milan December WP7 – Dissemination The following series of events were specifically associated to or organized by the BioinfoGRID project: –BioinfoGRID Symposium 2007: December 10 th -13 th 2007, Milan –BioinfoGRID Session at EGEE '07: October 4 th 2007, Budapest –Biomed Grid School, Varenna, Italy, May 14 th -19 th 2007 –BioinfoGRID Workshop at Healthgrid 2007 Conference - Geneva, Switzerland, 24 th April 2007 –NETTAB 2006 Workshop: Distributed Applications, Web Services, Tools and GRID Infrastructures for Bioinformatics - Santa Margherita di Pula, Sardinia, Italy - July th, 2006 –BioinfoGRID Initial Training Course, Bari, Italy, March 8 th -10 th 2006 In addition, the BioinfoGRID project has been represented at 58 national and international conferences and workshops.
Milanesi Luciano BioinfoGRID Symposium, Milan December WP7 – Dissemination 24 Journal Articles written within the frame of the BioinfoGRID project: –9 - BMC Bioinformatics –4 - IEEE Transactions on Nanobioscience –3 - Studies in Health Technology and Informatics –1 - Journal of Parallel and Distributed Computing –1 - Journal of Chemical Information and Modeling –1 - Parallel Computing –1 - Int. J. of Bioinformatics Research and Applications –1 - IEEE Transactions on Systems Science and Applications –1 - Nucleic Acids Research –1 - BMC Genetics –1 - Bioinformatics
Milanesi Luciano BioinfoGRID Symposium, Milan December WP7 – Dissemination 19 Conferences proceedings achieved within BioinfoGRID –6 – NETTAB '06 –2 – EGEE User Forum 06/07 –2 – BITS '06 –2 – HPDC '07 –1 – EGEE 06/07 –1 – CAPI 2006 –1 – Bioinformatics of African Pathogens and Disease Vectors. Nairobi 2007 –1 – MAS-BIOMED '06 Workshop –1 – CCGrid '07 Symposium –1 – EvoBIO '08 –1 – CHEP '07
Milanesi Luciano BioinfoGRID Symposium, Milan December People Acknowledgments Cristina Aiftimiei Roberta Alfieri Claudio Arlandini Roberto Barbera Endre Barta Francesco Beltrame Attila Bende Chiara Bishop Chirstophe Blanchet Ignacio Blanquer Vincent Bloch Gianpaolo Bottoni Vincent Breton Andrea Calabria Andrea Caprera Tiziana Castrignanò Federidica Chiappori Dario Corrada Paolo Cozzi Stefano Cozzini Enza D’Alba Pasqualina D’Ursi Ana Da Costa Paride Dagna Guilia De Sario Davide Di Pasquale Giacinto Donvito Vihang Dudhalkar Peter Ernst David Fergusson Geraldine Fettahi Sandro Fiore Riccardo Gervasoni Karl-Heinz Glatting John Hatton Ally Hume Nicolas Jacq Atul Jain Miklos Kozlovszky Giuseppe La Rocca Yannick Legré Pietro Liò Carles Loomis Mario Marchisio Hajnal Marton Rafael Mayo Garcia Mirco Mazzucato Giovanni Meloni Ivan Merelli Emanuale Merelli Luciano Milanesi Elisa Molinari Ettore Mosca Georgina Moulton Loukas Moutsianas Tibor Nagy Alessandro Negro Laszlo Oroszi Alessandro Orro Giovanni Paolella Silvano Paoli Antonio Pierro Giorgio Pietro Maggi Marco Pirola Raffaele Ponzini Ivan Porro Paolo Ramieri Paolo Romano Ermanna Rovida Erika Salvi Jean Salzemann Diego Sardaci Salvatore Scifo Martin Senger Giuliano Taffoni Livia Torterolo Gabriele Trombetti Angelica Tulipano Vania Ugè Elizabeth van der Wath Richard van der Wath Kasam Vinod Federica Viti Guy Warner Ted Wen Pierfrancesco Zuccato
Milanesi Luciano BioinfoGRID Symposium, Milan December Projects Acknowledgements EUGRID ISS e G