DBBM CESMG G. Paolella CEINGE. CSI INTERNET CEINGE University Campus.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

Introduction to genomes & genome browsers
Homology Based Analysis of the Human/Mouse lncRNome
On line (DNA and amino acid) Sequence Information Lecture 7.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Genes. Outline  Genes: definitions  Molecular genetics - methodology  Genome Content  Molecular structure of mRNA-coding genes  Genetics  Gene regulation.
Genomic Database - Ensembl Ka-Lok Ng Department of Bioinformatics Asia University.
Displaying associations, improving alignments and gene sets at UCSC Jim Kent and the UCSC Genome Bioinformatics Group.
How to access genomic information using Ensembl August 2005.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Eukaryotic Gene Finding
RNA.
Doug Brutlag 2011 Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University School of Medicine Genomics, Bioinformatics.
Genome organization Eukaryotic genomes are complex and DNA amounts and organization vary widely between species.
Gene Structure and Identification
Fine Structure and Analysis of Eukaryotic Genes
International Livestock Research Institute, Nairobi, Kenya. Introduction to Bioinformatics: NOV David Lynn (M.Sc., Ph.D.) Trinity College Dublin.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
The BioBox Initiative: Bio-ClusterGrid Gilbert Thomas Associate Engineer Sun APSTC – Asia Pacific Science & Technology Center.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Eukaryotic Gene Expression The “More Complex” Genome.
Genome Annotation BBSI July 14, 2005 Rita Shiang.
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Cluster Computing Applications for Bioinformatics Thurs., Aug. 9, 2007 Introduction to cluster computing Working with Linux operating systems Overview.
is accessible at: The following pages are a schematic representation of how to navigate through ALE-HSA21.
Galaxy: Integrative, Reproducible Analysis of Genomics Data Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
11 Overview Paracel GeneMatcher2. 22 GeneMatcher2 The GeneMatcher system comprises of hardware and software components that significantly accelerate a.
Module 4: Understanding KO designs Mark Thomas Wellcome Trust Sanger Institute.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Sackler Medical School
Mark D. Adams Dept. of Genetics 9/10/04
EMBOSS over a Grid 1. 1st EELA Grid School December 4th of 2006 Eduardo MURRIETA LEON Romualdo ZAYAS-LAGUNAS Pierre-Alain BRANGER Jérôme VERLEYEN Roberto.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
.1Sources of DNA and Sequencing Methods.1Sources of DNA and Sequencing Methods 2 Genome Assembly Strategy and Characterization 2 Genome Assembly.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Finding genes in the genome
Accessing and visualizing genomics data
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
CIP HPC CIP - HPC HPC = High Performance Computer It’s not a regular computer, it’s bigger, faster, more powerful, and more.
Welcome to the combined BLAST and Genome Browser Tutorial.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
GeneConnect Use Cases and Design August 3, GeneConnect Database IDs are linked by Direct Annotation, Inferred Annotation, or Sequence Alignment.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
BLAST: Basic Local Alignment Search Tool Robert (R.J.) Sperazza BLAST is a software used to analyze genetic information It can identify existing genes.
bacteria and eukaryotes
Annotating The data.
Mirela Andronescu February 22, 2005 Lab 8.3 (c) 2005 CGDN.
Genes, Genomes, and Genomics
Visualization of genomic data
Visualization of genomic data
Genome organization and Bioinformatics
Ensembl Genome Repository.
A web-based platform for structural and functional annotation of model and non-model organisms Jodi Humann, Taein Lee, Stephen Ficklin,
.1Sources of DNA and Sequencing Methods 2 Genome Assembly Strategy and Characterization 3 Gene Prediction and Annotation 4 Genome Structure 5 Genome.
Introduction to Alternative Splicing and my research report
Manfred Schmid, Agnieszka Tudek, Torben Heick Jensen  Cell Reports 
Presentation transcript:

DBBM CESMG G. Paolella CEINGE

CSI INTERNET CEINGE University Campus

CAPRI Image restoration and analysis Comparative Genomics Francesco Salvatore 0503 Research and Services in Bioinformatics

- Comparative genomics - DG-CST - KinWeb - Non Coding RNAs - Bacterial - Eukaryotic - Cell motility Research subjects

Conserved Sequence Tags (CST)

DG-CST

DG-CST DB

Genome browser

KinWeb

(a) (b) (c) (d) (e) KinWeb DB

Three genes a) b) Ig-IIg-IIIg-IIITMTyr Kinase // CSTs Ser-Thr Kinase CST // Ser-Thr Kinase c) // acb CST IIIIII

Selection of homologous chromosome regions from human and mouse genomes. Comparison of selected regions using BLASTZ, a program based on a local similarity algorhitm. Further analysis on the dataset looking for subpopulations sharing specific characteristics, using different programs, such as: - Blast of CSTs vs EST, human and other species genomes - Program for calculation of CPS score (Coding Potential Score) - RNA structure prediction programs Selection of the definitive set of CSTs based on specified thresholds (identity >= 70%; length >= 100 bp) using StrongHits. Insertion of selected CSTs into DB and extensively annotation for: - type (i.e. intergenic, exonic etc.) according to Ensembl - Coding capability according to Ensembl - Distances from other genes and coding regions - Calculation of Log Score according to UCSC comparison of human and mouse genomes Masking sequences of repetitive elements to reduce the noise fatally introduced by repeated sequences through RepeatMasker. Pipeline

Pipeline units

Non coding RNAs ncRNA DNA transcription reverse transcription Proteins translation mRNA tRNA rRNA Antisense miRNA transcription/maturation snoRNA maturation Self-splicing intron snRNA Imprinting H19, AIR X inactivation XIST Chromatin structure dynamics small RNAs DNA demethylation KHPS1a

Bacterial SLSs

SLS Families

Position in the genome Position

Alignment

RNAz P = 0.99 PFOLD Secondary structures

Processing time

4x14x2=112 procs 2.8 GHz 4x14x2=112 GB RAM 2 GB/s per scheda - 4 GB/s aggregata Cluster

Bioinfo portal

Servizi bioinformatici per la ricerca gia attivi Francesco Salvatore 0503 Circa 100 banche dati di interesse biologico accessibili mediante SRS (sequenze nucleotidiche, genomi, mutazioni, malattie ereditarie, enzimi, etc.) Sistema integrato per analisi di dati biologici con oltre 150 programmi per analisi di sequenze, modelli evolutivi, studio di mutazioni, proteine etc. Banche dati realizzate nellambito di progetti di ricerca (DG-CST, KinWEB, etc.) Sistemi per la gestione di dati sperimentali (campioni biologici, sequenze, immagini da microscopia etc.)

Research and services Research and Services in Bioinformatics CAPRI Image restoration and analysis Comparative Genomics

CEINGE DBBM IIGB BIOGEM Facolta di Medicina Facolta di Biotecnologie Altre Facolta Pubblico (accesso limitato) Francesco Salvatore 0503 Servizi: chi ha accesso ?

WEB SERVER CAPRI SRS PISE Other Emboss Fasta Blast User Data DB Primary remote databases ENSEMBL Services organization

Graphic interface to programs

CAPRI

Various operations in a row: Complement ->Translation -> Isoelectric point of the resulting protein. DNA Complement Translation Isoelectric point CAPRI workflow

CGI Plugin Object Pise Plugin Object CLI Simple Programs Plugin Object CURL Base Obj. Plugin Object SOAP Plugin Object JEMBOSS Program Object Tasks Obj. Menu Table Disk Buffering BLAST FASTA EMBOSS HMMer Genscan ClustalW Programmi Dischi del Server Phylip CLIENTSERVER CAPRI Program Object Program Object Legenda Relazione tra oggetti: Uso Eredità Esecuzione programmi Trasferimento dati Relazione temporale CAPRI architecture

ClusterCluster Cluster Nodes Access Server Access Server Access Server For each user request, a process is launched on a different node Distributed execution

Cluster Broker Web applicatio n server DB server Cluster Manager Cluster Manager 3 – Request the status of the cluster 5 - launch the command on the node 1 – Run a command 2 – Request a node IP 4 – Search for the best resource and return the corresponding node IP Relational DB 6 – Return the result Cluster activity http

Broker virtual node virtual node DB Grid node

PROGETTO DI RICERCA *Cell line *Colture conditions *Fixation and inclusion methods, stainings, ecc *Objective *Focus Position *Stage position x/y *Project title *Experiment name, *Author, group, group leader, ecc. WEB INTERFACE *Exposure time *Resolution, ecc. DB Image archival and management

Image-DB interface

timelapse at 6 positions timelapse actin wound healing timelapse 2 adhesion actin staining IPROC

HPC on Cluster nodes GatewayGateway iPage image area data + images page iPane proc- steps IPROC architecture

ClusterCluster Cluster Nodes Access Server Access Server Access Server A tool can require the execution of multiple, simultaneous processes Distributed execution of parallel requests

-PHP internal routines (basic drawing, processing) -ImageMagick (more advanced processing) -Image converters -Special tools (PDL, deconvolution) -Tools developed in-house (cell tracking) What software may be linked

-Convenient graphic interface -Access to a vast library of image processing steps -No specific interface requirements -Remote processing on parallel hardware -Support for a large number of concurrent users -System independent (works on Mac, PC, Linux etc.) -No need to install. A browser is enough. Advantages