STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology.

Slides:



Advertisements
Similar presentations
EAnnot: A genome annotation tool using experimental evidence Aniko Sabo & Li Ding Genome Sequencing Center Washington University, St. Louis.
Advertisements

Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
1 Using the TFOE transcriptional regulation network spreadsheet tool Tige Rustad, Senior Scientist at Seattle Biomed
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
March, 2005 Chapter 13 Regulation of Gene Transcription DNA  RNA.
InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.
Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
Copyright OpenHelix. No use or reproduction without express written consent1.
UCSC Archaeal genome browser Advanced browsing September 19, 2006 David Bernick, Aaron Cozen and Todd Lowe September 19, 2006 David Bernick, Aaron Cozen.
Lab 3.41 Demo: Exploiting the UCSC Genome Browser Stefanie Butland UBC Bioinformatics Centre
UCSC Genome Browser Tutorial
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Genome Browsing with the UCSC Genome Browser
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Molecular genetics of gene expression Mat Halter and Neal Stewart 2014.
Paola CASTAGNOLI Maria FOTI Microarrays. Applicazioni nella genomica funzionale e nel genotyping DIPARTIMENTO DI BIOTECNOLOGIE E BIOSCIENZE.
Special Topics in Genomics Lecture 1: Introduction Instructor: Hongkai Ji Department of Biostatistics
Genome of the week - Deinococcus radiodurans Highly resistant to DNA damage –Most radiation resistant organism known Multiple genetic elements –2 chromosomes,
1 Using the TFOE transcriptional regulation network spreadsheet tool Tige Rustad, Senior Scientist at Seattle Biomed
Regulatory factors 1) Gene copy number 2) Transcriptional control 2-1) Promoters 2-2) Terminators, attenuators and anti-terminators 2-3) Induction and.
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
MRNA protein DNA Activation Repression Translation Localization Stability Pol II 3’UTR Transcriptional and post-transcriptional regulation of gene expression.
Generic substitution matrix -based sequence similarity evaluation Q: M A T W L I. A: M A - W T V. Scr: 45 -?11 3 Scr: Q: M A T W L I. A: M A W.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
is accessible at: The following pages are a schematic representation of how to navigate through ALE-HSA21.
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
NGS data analysis CCM Seminar series Michael Liang:
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
Copyright OpenHelix. No use or reproduction without express written consent1.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Chip-Seq Peak Calling in Galaxy | Lisa Stubbs | PowerPoint by Casey Hanson.
The generalized transcription of the genome Víctor Gámez Visairas Genomics Course 2014/15.
Sackler Medical School
The UCSC Table Browser & Custom Tracks Advanced searching and discovery using the UCSC Table Browser and Custom Tracks Osvaldo Graña CNIO Bioinformatics.
Class material and homework for February 9 today’s in-class topic: selected examples of contemporary biotechnology –polymerase chain reaction (PCR) –DNA.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Copyright OpenHelix. No use or reproduction without express written consent1.
How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Thoughts on ENCODE Annotations Mark Gerstein. Simplified Comprehensive (published annotation, mostly in '12 & '14 rollouts)
Overview of ENCODE Elements
Lecture-5 ChIP-chip and ChIP-seq
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Accessing and visualizing genomics data
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
The Genome Genome Browser Training Materials developed by: Warren C. Lathe, Ph.D. and Mary Mangan, Ph.D. Part 2.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Lisa Stubbs | Chip-Seq Peak Calling in Galaxy1.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
The regulation of Caspase 8 chIP-seq motifs mRNA expression DNA methylation.
? ? Individual 1Individual 2 1. Questions This is a pedigree for a disease involving a mutation within an imprinted gene. The disease manifests only when.
Regulatory Genomics Lab
GE3M25: Data Analysis, Class 4
Visualization of genomic data
Genomes and Their Evolution
Regulatory Genomics Lab
Regulatory Genomics Lab
Presentation transcript:

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

March 28, 2012 Daniel Fernandez Alejandro Quiroz

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology 1 st ACT Information theory correction Motif Finding The Genome Browser Homework help Q1, Q2 INTERLUDE Electronic music with DJ Cistrome (10 min) 2 nd ACT Dah Cistrome MA2C Homework help Q3

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Information Theory

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Information Theory The amount of information transmitted through the channel is the same as the entropy (or uncertainty) associated with the source. I.e., it is maximized when the source can produce n possible outcomes, all with equal probability (1/n). Then, the entropy is log2(n). Thus, biologists took this concept and used it to characterize the amount of uncertainty associated with a motif, represented as a PWM. But, your TF got confused… see why!

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Information Theory INFORMATIONENTROPY Source channeldestination ATCGATCG

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Information Theory But what happens when we want to compare the uncertainty between two sources? Or the comparison between two probability distributions, i.e, the background sequence PWM and the motif PWM? RELATIVE ENTROPY, or, KULLBACK-LEIBLER DIVERGENCE, or INFORMATION CONTENT

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Motif Example I Prokaryotic Co-expression Objective. Find the binding sites that control the gene regulation of co-expressed genes in Mycobacterium Tuberculosis. File. mt.fasta Note. We assume that genes are co- expressed because they are under the control of the same transcription factor(s), and we use Gibbs sampling to try to identify the putative binding motif for this factor(s).

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Motif Example I Prokaryotic Co-expression Motif parameters are designed to capture the features of binding sites for a classic bacterial helix-turn-helix (HTH) type transcription factor. HTH-type TFs are typically symmetric homodimers, thus they bind to symmetric (palindromic) DNA binding sites. Furthermore, the two HTH regions of the dimeric TF typically contact bases in two adjacent major grooves of the DNA, and thus the two halves of the palindromic binding site span well over 10 bases (the approximate number of bases per helical turn of B-form DNA). The bases contacted by a TF are not necessarily contiguous, thus we use fragmentation to allow the Gibbs sampler to ignore positions which do not participate in the protein-DNA interaction, and are therefore not conserved as part of the binding site. To understand what I am saying: search 1lmbhttp://melolab.org/pdidb/web/content/home

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Motif Example I Prokaryotic Co-expression

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology DNA as Herederitary Material

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Central Dogma of Molecular Biology Gene Expression Splicing

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

The Human Genome Project The goal is to understand the human genome and its role in health and disease. –“The true payoff from the HGP will be the ability to better diagnose, treat and prevent disease” Francis Collins. Director of the HGP and NHGRI

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Sequencing Thousands of researchers from 20 centers worked on the HGP Assembly The sequence existed as millions of clones of small fragments Finding overlaps and putting together “contigs” was a huge challenge Annotation What does it all mean? Where are the genes? What do they do?

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology UCSC Genome browser

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Basic Features Species, assemblies Genome browser Gene sorter Sequence search (BLAT) Advanced Features Coordinate conversion Custom tracks Table Browser

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology UCSC Genome Browser Consists of a suite of tools for the viewing and mining of genomic data.

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Organization of Genomic Data

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Genome Gateway start page, basic search

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Overview of the browser

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology The browser

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology The browser

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology The browser

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology The browser

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Genome Gateway start page, basic search Genome version Chromosome/region Gene Cytogenetic coordinates Phenotype of interest Key words: Zinc fingers, kinase Try the following example: Autism How many UCSC genes are located on chromosome X? How many RefSeq are associated with Autism? Pick the gene: AUTS2 (uc011keg.1) at chr7: AUTS2 (uc011keg.1) at chr7:

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology base position gene annotation Gene annotation Tracks! Where we obtain information Tracks! Where we obtain information

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

UCSC Table Browser Retrieve the data associated with a track in text format –To calculate intersections between tracks –To retrieve DNA sequence covered by a track.

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

Hhelp Q2 How many RefSeq genes have more than 15 exons in human chromosome 1? How many genes on chromosome 22, on the positive strand, are associated with a disease on the OMIM db?

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology The Cistrome Understanding Genetic Regulation CisTrOme, stands for Cis-acting regulatory elements searched across, Trans, the whole genOme. –Visit and register at The objective is to map/identify the binding regions of a transcription factor across (trans) the genome in order to understand the regulatory mechanisms of gene expression in the chromosome where the gene is located (cis).

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Types of Data and Peak – Calling Methods Chip-Chip data (Chip on Chip) –Affymetrix one color arrays –Nimble two color arrays Chip-Seq data (Chip and NGS) –Sequencing data (Illumina, Roche, 454) MACS Model based Analysis for Chip-Seq MA2C Model based Analysis for 2-Color arrays MAT Model based Analysis for Tiling arrays

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology MA2C – Hhelp Q3 Model based Analysis for 2-Color arrays C/MA2C.htmhttp://liulab.dfci.harvard.edu/MA2 C/MA2C.htm Installation. You need Java Runtime Environment (JRE) 5.0 or higher. You can download it from Download the MA2C.zip and uncompress it. –Windows: open MA2C\dist\MA2C.bat –Go to the terminal and then MA2C/dist/ and execute the command java –Xmx600m –jar MA2C.jar (or just double click on MA2C.jar)

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology MA2C Data Normalization Download the data from the homework – SDC3 zip file Uncompress it and open MA2C Upload the SampleKeyIVtoX.txt to the sample key Select your control group (IP channel) Go to normalization tab and normalize your data – default parameters are ok.

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology MA2C Peak Finding Go to the peak-detection tab. Change the parameters accordingly Select find peaks Voila! the results have been ouputed to the MA2C_output folder!