Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presentation of the CRG Bioinformatics Core facility Jean-François Taly.

Similar presentations


Presentation on theme: "Presentation of the CRG Bioinformatics Core facility Jean-François Taly."— Presentation transcript:

1 Presentation of the CRG Bioinformatics Core facility Jean-François Taly

2 People in the BioCore Jean-Francois Luca Toni @CRG 2009 @BioCore 2012 Acting head Structur. bioinfo. MSA NGS analyst Galaxy server Training @BioCore 2010 NGS analyst Small ncRNA prediction Motif analysis Training @Biocore 2009 Wikis Web/DB dev. DB Mirrors Struct. bioinfo. Training @Biocore 2014 Micro-arrays NGS analyst Galaxy Training Sarah

3 Our mission Expertise in bioinformatics Service Consultation Trainings Internal and external Support in infrastructures In collaboration with the SIT and TIC Part of the CRG bioinformaticians network 83 @ bioinformatics retreat Many more in PRBB/CNAG

4 Our services  Analysis  Microarray  Chip-seq  RNA-seq DE and assembly  Genome assembly  Variant calling  Informatics support  Wiki  WEB Server  API  Trainings  Galaxy, Perl, Linux, advanced bioinformatics

5 Fee per service

6 Our contribution to projects Project conception Bioinfo exp. design Bioinfo exp. realization Bioinfo output interpretation Project conclusions

7 Our contribution to projects Project conception Bioinfo exp. design Bioinfo exp. realization Bioinfo output interpretation Project conclusions Apply a defined procedures

8 Our contribution to projects Project conception Bioinfo exp. design Bioinfo exp. realization Bioinfo output interpretation Project conclusions Customized Analysis

9 CRG bioinformatics community Big Data WG EGA initiative Data Engineering NoSQL HPC NGS Tech. Sem. RNA-seq G. assembly Variant Annot. Metagenomics Other topics Integrated -omics Good practice in code dev. Galaxy dev. …

10 source: Creative Commons, Wikipedia Gene expression array data analysis: Background correction and normalization Differential expression analysis Gene Ontology and pathway analysis Various graphics / plots Additional array-based technologies the Bioinformatics unit supports include: qPCR arrays Comparative Genomics Hybridization arrays Main tools are based on the R / Bioconductor environment Micro-arrays

11 RNA-seq

12

13 DNA-seq

14 Pevzner P A et al. PNAS 2001;98:9748-9753

15 Chip-seq

16

17 Growing to the next level  From gene DE to transcripts DE  Users have now access to longer reads and deeper coverage  Metagenomics  16S Ribosomal amplicon sequencing with MiSeq  Data integration framework  Combining different data types into one single analysis  RNAseq DE  Histone marks  Metabolomics data  Proteomics  Data analysis workflow on Galaxy  Leave the basic processing to users and focus on advanced analysis

18 Databases mirroring  Biological file sources  ENSEMBL  UCSC  NCBI Blast DBs  UniProt  PDB  Igenomes (Illumina, only Human but the rest is upcoming)  All Indexed and formated for  NCBI BLAST+ (makeblastdb for proteins and nucleic acids)  Bowtie & Bowtie2  BWA  Fastaindex (Exonerate)  GEM  faTo2bit

19 Where are they stored?  In CRG common storage:  /db  More information:  http://biocore.crg.cat/wiki/Category:Mirrors http://biocore.crg.cat/wiki/Category:Mirrors  IMPORTANT: DEPRECATED  /db/seq (former /seq) IS DEPRECATED

20 WEB and Database services  Applications  Data and project management  Platforms for big data analysis and complex information querying  Promotion and publication of scientific results

21 WEB and Database services  Example  Superfly for Yogi Jaëger Superfly  Visual catalogue of gene embryo development of different fly species.

22 WEB and Database services  Example  PRGDB with Walter Sanseverino PRGDB  Wiki-based Database of plant resistance genes.

23 Activity per category in 2014

24 Presentation of the Galaxy platform Jean-François Taly Bioinformatics Core Facility CRG (Barcelona, Catalonia, Spain) September 18th 2014 EMBO Global Exchange Course Pasteur Institute of Tunis, Tunisia

25  Biologists :  Linux-free data analysis with a graphical interface  Bioinformaticians :  Insure reproducibility when sharing analysis and workflows  Teach their knowledge to a broad audience  Get access to workflows for topics they are not familiar of  Software Developers :  Diffuse their tools on a standardized platform Why Should I Use Galaxy?

26 The Galaxy Team Galaxy is developed by : The Nekrutenko lab in the center for Comparative Genomics and Bioinformatics at Penn State UniversityNekrutenko lab The Taylor lab at Johns Hopkins UniversityTaylor lab The community https://wiki.galaxyproject.org/GalaxyTeam

27 Rationale behind Galaxy From Goeks et al. Genome Biol. 2010.Goeks et al. Genome Biol. 2010 “Computation has become an essential tool in life science research. This is exemplified in genomics, where first microarrays and now massively parallel DNA sequencing have enabled a variety of genome-wide functional assays, such as ChIP-seq and RNA-seq (and many others), that require increasingly complex analysis tools. However, sudden reliance on computation has created an 'informatics crisis' for life science researchers: computational resources can be difficult to use, and ensuring that computational experiments are communicated well and hence reproducible is challenging. Galaxy helps to address this crisis by providing an open, web-based platform for performing accessible, reproducible, and transparent genomic science. “

28  Biologists :  Linux-free data analysis with a graphical interface  Bioinformaticians :  Insure reproducibility when sharing analysis and workflows  Teach their knowledge to a broad audience  Get access to workflows for topics they are not familiar of  Software Developers :  Diffuse their tools on a standardized platform Why Should I Use Galaxy?

29 Makes bioinformatics accessible

30 From a command line …

31 … to a graphical interface

32 One step

33 Multi-step protocol 1 2 3 4 5

34 Workflow

35 Galaxy Tutorials  https://usegalaxy.org/u/jeremy/p/galaxy-rna-seq-analysis-exercise https://usegalaxy.org/u/jeremy/p/galaxy-rna-seq-analysis-exercise  https://wiki.galaxyproject.org/Learn https://wiki.galaxyproject.org/Learn

36 NGS in a laptop MinION brings NGS to your laptop MinION http://youtu.be/UtXlr19xTh8

37  Biologists :  Linux-free data analysis with a graphical interface  Bioinformaticians :  Insure reproducibility when sharing analysis and workflows  Teach their knowledge to a broad audience  Get access to workflows for topics they are not familiar of  Software Developers :  Diffuse their tools on a standardized platform Why Should I Use Galaxy?

38 Reproducibility Bioinformaticians suffer that too! Results can change in function of Libraries and software versions Genome annotations Results published without the code Want to share your findings with everybody? Froze an environment in a Virtual Machine Use an application controller (Docker) Prepare a Galaxy workflow

39 Improve the visibility of a paper “A Galaxy workflow and the corresponding wrappers are available to download at https://mylab.com. A virtual machine containing a pre-set up server can be download at the same address “https://mylab.com Why not having as well?

40 Galaxy Workflows

41  Biologists :  Linux-free data analysis with a graphical interface  Bioinformaticians :  Insure reproducibility when sharing analysis and workflows  Teach their knowledge to a broad audience  Get access to workflows for topics they are not familiar of  Software Developers :  Diffuse their tools on a standardized platform Why Should I Use Galaxy?

42 Wrapping software Software The wrapper prepare the command line XML file

43 Simple wrapper example

44 venn_diagram.sh  Wrapper can launch scripts

45 TopHat wrapper (1)  XML file describing tophat parameters

46 TopHat wrapper (2)  XML file describing tophat parameters

47 Community Tools/Wrappers

48 Galaxy Public servers  Good points  Free  No IT tasks  Comes with reference genomes and workflows  Bad points  Offer Limited Resources (Disk/CPUs)  Data transfer may be long  Give access to the tools they want  Data security may not be respected Should I install Galaxy?

49 Galaxy Public Servers  https://wiki.galaxyproject.org/PublicGalaxyServers https://wiki.galaxyproject.org/PublicGalaxyServers

50 Galaxy Local Server  Good points  Total control on data and tools  Your own disk and CPU limitation  Some companies sell a ready-to-use infrastructure  Tool shed helps to install wrappers and software  Bad points  Cost of installation and maintenance  Need IT supports if you need a multi-users advanced set up Should I install Galaxy?

51 Get Galaxy  https://wiki.galaxyproject.org/Admin/GetGalaxy https://wiki.galaxyproject.org/Admin/GetGalaxy  Can be installed only in Linux or Mac

52 NFS:/software HPC User /scratch Sequences Indexes Files, Back-up, tmp FTP NFS NFS:/db Galaxy server Tools DATA Software 30 days max. Files > 2Gb

53  Database engine  Galaxy team recommend postgreSQL but can it be MySQL  Store users details and data information  Tools = wrappers  File describing all possible parameters of a software  Script preparing the correct command line  Apache server

54

55  Shared file system  NFS (2Pb)  10 €/Tb/Group/Month  Access to the shared biological resources  Ensembl, UCSC Genomes and indexes  Uniprot, pfam, smart, PDB  Access to the shared software repository  High Performance Computing  7 cores  8 CPUS each (56 tot)  47 Gb memory

56

57  FTP server  Proftpd for the server side  I recommend Filezila for the client (multiplatform)  Upload from Galaxy  Files are moved to the shared file system

58  Galaxy is an open, web-based platform for computational biomedical research.  Accessible: Users without programming experience can run tools and workflows  Reproducible: Galaxy captures analysis details  Transparent: Users can share and publish analyses  WIKI:  https://wiki.galaxyproject.org/FrontPage https://wiki.galaxyproject.org/FrontPage Summary

59  http://galaxy.crg.es/ http://galaxy.crg.es/ Demo on Galaxy@CRG

60


Download ppt "Presentation of the CRG Bioinformatics Core facility Jean-François Taly."

Similar presentations


Ads by Google