Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.

Similar presentations


Presentation on theme: "Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG."— Presentation transcript:

1 Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG Workshop, Oxford Dr Andrew Harrison, University of Essex Dr Hugh Shanahan, Royal Holloway, University of London

2 Introduction The Affymetrix GeneChip Micro-array data Venus-C pilot project R scripts on Azure Cloud Results to date Our Experience

3 We are developing informatics tools to aid the analysis of Affymetrix chips (GeneChips, Exon arrays). Micro-arrays are the data read from GeneChips Affymetrix GeneChip ArrayExpress is an example of a public database containing microarrays and other data from biological experiments

4 DNA and RNA

5 Probe cells of an Affymetrix Gene chip contain millions of identical 25-mers 25-mer

6 Affymetrix GeneChip Hybridization – fragments of RNA stick to the probes

7 Affymetrix GeneChip Fluorescence

8 Micro-array datasets Fluorescence data put into.cel files Many 1000s of experiments Many 100s of micro-arrays for each GeneChip >1Tb data to analyse 1000s of published papers using Affymetrix GeneChips This data is a free resource to researchers

9 Going Forward... Currently we analyse flaws in Genechip data next generation sequencing Future is new genomic technology known as next generation sequencing Petabytes of data being generated faster than it can be analysed Cloud solutions needed for storage of and access to this data

10 Venus-C Pilot Project VENUS-C is a project funded under the European Commissions 7th Framework Programme with computing resources from Microsoft Joint co-operation between computing service providers and scientific user communities Aim: to develop, test and deploy a large, Cloud computing infrastructure for science and SMEs (small and medium-sized enterprises) in Europe.

11

12 Venus-C Infrastructure 3 main areas dealing with standards: – VM management (OCCI and OVF) – Job submission (BES) – Cloud data storage (CDMI) Other specifications, such as – WS-Security Programming model: – Task based submission: Generic Worker role

13 cTQm Project Overview B L O B Storage Public database Scripts, R libs and key data uploaded via Azure webpage

14 Cloud / Grid Interfaces Amazon EC2: Amazon EC2: Command line interface into Linux terminal NGS: NGS: Portal or Command Line to Linux machine Azure: Azure: Webpage interface to a Windows machine, Visual Studio 2010, C#

15

16 Bioinformatics Results to date Uploading of datasets into Cloud storage is underway Success with R scripts on Azure to confirm results in published paper* Minor problems with ArrayExpress to solve Work is extending to more GeneChip types Still need user authentication / accounting * Nucleic Acids Research, 2011, 1-9, Normalised Affymetrix expression data are biased by G-quadruplex formation, by Hugh P. Shanahan, Farhat N. Memon, Graham J. G. Upton and Andrew P. Harrison

17 Our Experience Azure Cloud is a steep learning curve for a Linux-based scientist Vast datasets can be made available Applications can be user-friendly Scalability makes Cloud approach attractive Costs need to be assessed Enables scientists in developing countries to perform genome analysis

18 Acknowledgements and thanks to:- Dr Andrew Harrison, University of Essex Dr Hugh Shanahan, Royal Holloway, University of London Department of Mathematical Sciences, University of Essex European Commissions 7th Framework Programme Venus-C Microsoft and Venus-C project Organisers Analysis of Affymetrix expression data using R on Azure Cloud


Download ppt "Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG."

Similar presentations


Ads by Google