Transcription factor binding motifs (part II) 10/22/07.

Transcription factor binding motifs (part II) 10/22/07

Information from negative control Motivation: combine information from TF binding and non-binding sequences to identify discriminative information. Methods: –REDUCE (Bussmaker et al. 2001) –Motif Regressor (Conlon et al. 2003)

Motif Regressor Algorithm Rank all genes by expression and obtain their upstream sequences Use MDscan to find motifs from most induced and most repressed genes Score each upstream sequence for matches to each MDscan reported motif Perform simple linear regression between motif- matching score and gene expression to remove insignificant motifs Perform stepwise regression on the significant motifs to find groups acting together to affect expression

Motif matching score Extract upstream sequence X mg (e.g. 800 bp) from each gene. Define which measures the overall enrichment of a motif. sum over sliding windows

Motif Regressor Approach Look at one expression experiment Expression log ratio Genes Look for candidate motifs Refine motifs Regress b/t upstream mtf match score and downstream expression MDscan

Motif Regressor Linear Regression

Multiple regression model: expression explained as the sum of motifs’ effects Expression of gene g Baseline expression Error term Regression coefficient Upstream motif- match score

Further motif selection by stepwise regression Stepwise regression to further select significant motifs. –Step 1: Include only intercept –Step 2: Sequentially add new motifs that give the largest reduction in error. –Step 3: Sequentially remove motifs that give the smallest increase in error. –Repeat Steps 2 and 3 until converge.

Application Yeast cells are grown under amino acid starvation. Gene expression (~6000 genes) was measured at 30 minutes after amino acid starvation. Motif Regressor was applied to identify sequence motifs.

Comparative genomics Evolutionary tree Darwin’s principle from evolution Cross-species sequence alignment Conservation of genes Conservation of regulatory sequence Quantifying sequence conservation Methods –MCS score (Kellis) –Phylocon Results –Yeast (Kellis) Advantage: no requirement for prior functional information Drawback: specie-specific motifs may not be learned (Fraenkel)

Non-uniform conservation rates Genes are typically conserved Intergenic regions are typically not conserved Why?

Motif finding by using multiple genomes Basic assumption: functional sequences evolve more slowly than non-functional sequences, as they are subject to selection pressure. Basic approach: –Identify conserved regions by sequence alignment algorithms –Restrict motif finding in conserved regions.

Motif: Gal4 – CGGNNNNNNNNNNNCCG Gal4 motif is highly conserved

Methods Wasserman et al. 2000 MCS (Kellis et al. 2003; Xie et al. 2005) PhyloCon (Wang and Stormo 2003) EMnEM (Moses et al. 2004) OrthoMEME (Prakash et al. 2004) PhyME (Sinha et al. 2004) CompareProspector (Liu et al. 2004) PhyloGibbs (Siddharthan et al. 2005) Ortholog Sampler (Li and Wong 2005) MultiModule (Zhou and Wong 2005)

MCS frequency Conservation rate p obs p0p0 Basic Idea Select those highly conserved motifs: p obs >> p 0 (Xie et al. 2005)

MCS frequency Conservation rate p obs p0p0 Definition of MCS: total #occurrence expected frequency observed frequency p 0 is estimated by random sampling. Choose cutoff at MCS = 6 (Xie et al. 2005)

Application to human regulatory motifs

Results

Tissue specificity of detected motifs

PhyloCon Basic Idea: (Wang and Stormo 2003) Both sequence conservation and gene co-regulation information are used for motif finding. Orthologous regions are viewed as sequence profiles. Align of sequence profiles instead of sequences. species 1 species 2 species 3 species 4 profile

PhyloCon

Compare two columns first. f b = {f A, f C, f G, f T } a column of profile p b = {p A, p C, p G, p T } background base frequency n b = {n A, n C, n G, n T } observed counts at the specified position likelihood ratio: Log-likelihood ratio: Profile Comparison

Compare two columns first ALLR measures the similarities between two columns. Sum over ALLR at all positions to get a score comparing two profiles. Profile Comparison background total counts frequencies

Profile merging Iteratively merge un-orthologous groups that have high ALLR scores.

Sampling motifs on Phylogenetic trees Motivation: The alignment-based method does not work well if the species are distant. Basic idea –Avoid aligning multiple species to gather othorlogous gene information. –Directly model the evolution of the genomic sequences. –Assuming that motifs evolve slower than background sequences.

An evolution model

Evolution model Probability of a nucleotide change

Main Algorithm Step 1: Building an evolution model. –Motif evolution is modeled by decreasing branch length by a fixed rate, say 50%. Step 2: Infer model parameters by using a Gibbs sampler.

Limitation of comparative genomics approach Species-specific motifs cannot be learned from this approach.

Divergence of TF binding Borneman et al. 2007

Divergence of TF binding Divergence binding can be caused by: divergence of TF motifs (e.g., Ste12) or some unknown mechanism (e.g. Tec1) Borneman et al. 2007

Other directions Combining multiple motif finding algorithms. (e.g. Harbison et al. 2004, Jensen and Liu 2005). Directly identify TF binding sites through experiments (CHIP-chip). Then apply motif finding algorithms to binding data. experimental data. (e.g. MDscan).

Challenge of Specificity A 7-mer is expected to occur every 16,384 base pairs by chance In human, this means 3 X 10 9 / 16,384 ~ 180,000 sites in total Total number of genes ~ 25,000 Most of predicted binding sites are false positives! Need other restrictive information to reduce false positives.

Some Biological Notes TF binding does not mean it is functional. –Some TFs always bind to DNA, but they are functional only if they are phosphorylated. Motif sites contain a large number of false positives. –Motifs are short DNA elements (~10 bp). Higher eukaryotes have large genome size, and these short elements may occur frequently by chance. Epigenetic factors also play an important role in regulation of TF binding. –Chromatin structure, histone modifications, DNA methylation, etc.

Reading list Conlon et al. 2003 –Proposed Motif Regressor. Filter out motifs that are unassociated with gene expression changes. Xie et al. 2005 – MCS. Use comparative approach to identify human regulatory motifs. Highly biological. Wang and Stormo 2003 –Phylocon. An elegant “multi-gene, multi species” approach for motif finding.

Acknowledgements X.S.Liu

Transcription factor binding motifs (part II) 10/22/07.

Similar presentations

Presentation on theme: "Transcription factor binding motifs (part II) 10/22/07."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Transcription factor binding motifs (part II) 10/22/07.

Similar presentations

Presentation on theme: "Transcription factor binding motifs (part II) 10/22/07."— Presentation transcript:

Similar presentations

About project

Feedback