DBBM CESMG G. Paolella CEINGE
CSI INTERNET CEINGE University Campus
CAPRI Image restoration and analysis Comparative Genomics Francesco Salvatore 0503 Research and Services in Bioinformatics
- Comparative genomics - DG-CST - KinWeb - Non Coding RNAs - Bacterial - Eukaryotic - Cell motility Research subjects
Conserved Sequence Tags (CST)
DG-CST
DG-CST DB
Genome browser
KinWeb
(a) (b) (c) (d) (e) KinWeb DB
Three genes a) b) Ig-IIg-IIIg-IIITMTyr Kinase // CSTs Ser-Thr Kinase CST // Ser-Thr Kinase c) // acb CST IIIIII
Selection of homologous chromosome regions from human and mouse genomes. Comparison of selected regions using BLASTZ, a program based on a local similarity algorhitm. Further analysis on the dataset looking for subpopulations sharing specific characteristics, using different programs, such as: - Blast of CSTs vs EST, human and other species genomes - Program for calculation of CPS score (Coding Potential Score) - RNA structure prediction programs Selection of the definitive set of CSTs based on specified thresholds (identity >= 70%; length >= 100 bp) using StrongHits. Insertion of selected CSTs into DB and extensively annotation for: - type (i.e. intergenic, exonic etc.) according to Ensembl - Coding capability according to Ensembl - Distances from other genes and coding regions - Calculation of Log Score according to UCSC comparison of human and mouse genomes Masking sequences of repetitive elements to reduce the noise fatally introduced by repeated sequences through RepeatMasker. Pipeline
Pipeline units
Non coding RNAs ncRNA DNA transcription reverse transcription Proteins translation mRNA tRNA rRNA Antisense miRNA transcription/maturation snoRNA maturation Self-splicing intron snRNA Imprinting H19, AIR X inactivation XIST Chromatin structure dynamics small RNAs DNA demethylation KHPS1a
Bacterial SLSs
SLS Families
Position in the genome Position
Alignment
RNAz P = 0.99 PFOLD Secondary structures
Processing time
4x14x2=112 procs 2.8 GHz 4x14x2=112 GB RAM 2 GB/s per scheda - 4 GB/s aggregata Cluster
Bioinfo portal
Servizi bioinformatici per la ricerca gia attivi Francesco Salvatore 0503 Circa 100 banche dati di interesse biologico accessibili mediante SRS (sequenze nucleotidiche, genomi, mutazioni, malattie ereditarie, enzimi, etc.) Sistema integrato per analisi di dati biologici con oltre 150 programmi per analisi di sequenze, modelli evolutivi, studio di mutazioni, proteine etc. Banche dati realizzate nellambito di progetti di ricerca (DG-CST, KinWEB, etc.) Sistemi per la gestione di dati sperimentali (campioni biologici, sequenze, immagini da microscopia etc.)
Research and services Research and Services in Bioinformatics CAPRI Image restoration and analysis Comparative Genomics
CEINGE DBBM IIGB BIOGEM Facolta di Medicina Facolta di Biotecnologie Altre Facolta Pubblico (accesso limitato) Francesco Salvatore 0503 Servizi: chi ha accesso ?
WEB SERVER CAPRI SRS PISE Other Emboss Fasta Blast User Data DB Primary remote databases ENSEMBL Services organization
Graphic interface to programs
CAPRI
Various operations in a row: Complement ->Translation -> Isoelectric point of the resulting protein. DNA Complement Translation Isoelectric point CAPRI workflow
CGI Plugin Object Pise Plugin Object CLI Simple Programs Plugin Object CURL Base Obj. Plugin Object SOAP Plugin Object JEMBOSS Program Object Tasks Obj. Menu Table Disk Buffering BLAST FASTA EMBOSS HMMer Genscan ClustalW Programmi Dischi del Server Phylip CLIENTSERVER CAPRI Program Object Program Object Legenda Relazione tra oggetti: Uso Eredità Esecuzione programmi Trasferimento dati Relazione temporale CAPRI architecture
ClusterCluster Cluster Nodes Access Server Access Server Access Server For each user request, a process is launched on a different node Distributed execution
Cluster Broker Web applicatio n server DB server Cluster Manager Cluster Manager 3 – Request the status of the cluster 5 - launch the command on the node 1 – Run a command 2 – Request a node IP 4 – Search for the best resource and return the corresponding node IP Relational DB 6 – Return the result Cluster activity http
Broker virtual node virtual node DB Grid node
PROGETTO DI RICERCA *Cell line *Colture conditions *Fixation and inclusion methods, stainings, ecc *Objective *Focus Position *Stage position x/y *Project title *Experiment name, *Author, group, group leader, ecc. WEB INTERFACE *Exposure time *Resolution, ecc. DB Image archival and management
Image-DB interface
timelapse at 6 positions timelapse actin wound healing timelapse 2 adhesion actin staining IPROC
HPC on Cluster nodes GatewayGateway iPage image area data + images page iPane proc- steps IPROC architecture
ClusterCluster Cluster Nodes Access Server Access Server Access Server A tool can require the execution of multiple, simultaneous processes Distributed execution of parallel requests
-PHP internal routines (basic drawing, processing) -ImageMagick (more advanced processing) -Image converters -Special tools (PDL, deconvolution) -Tools developed in-house (cell tracking) What software may be linked
-Convenient graphic interface -Access to a vast library of image processing steps -No specific interface requirements -Remote processing on parallel hardware -Support for a large number of concurrent users -System independent (works on Mac, PC, Linux etc.) -No need to install. A browser is enough. Advantages