Comp. Genomics Recitation 12 Bayesian networks Taken from Artificial Intelligence course, MIT, 6.034

Slides:



Advertisements
Similar presentations
CS 336 March 19, 2012 Tandy Warnow.
Advertisements

Greedy Algorithms Amihood Amir Bar-Ilan University.
Copyright © Cengage Learning. All rights reserved. CHAPTER 5 SEQUENCES, MATHEMATICAL INDUCTION, AND RECURSION SEQUENCES, MATHEMATICAL INDUCTION, AND RECURSION.
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
Chapter 7 Dynamic Programming.
Comp 122, Fall 2004 Dynamic Programming. dynprog - 2 Lin / Devi Comp 122, Spring 2004 Longest Common Subsequence  Problem: Given 2 sequences, X =  x.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
COFFEE: an objective function for multiple sequence alignments
Searching. 2 Searching an array of integers If an array is not sorted, there is no better algorithm than linear search for finding an element in it static.
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Hidden Markov Model Special case of Dynamic Bayesian network Single (hidden) state variable Single (observed) observation variable Transition probability.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
CSC401 – Analysis of Algorithms Lecture Notes 12 Dynamic Programming
Artificial Intelligence and Lisp Lecture 7 LiU Course TDDC65 Autumn Semester, 2010
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
7 -1 Chapter 7 Dynamic Programming Fibonacci Sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
Searching. 2 Searching an array of integers If an array is not sorted, there is no better algorithm than linear search for finding an element in it static.
Sequence Alignment II CIS 667 Spring Optimal Alignments So we know how to compute the similarity between two sequences  How do we construct an.
© 2004 Goodrich, Tamassia Dynamic Programming1. © 2004 Goodrich, Tamassia Dynamic Programming2 Matrix Chain-Products (not in book) Dynamic Programming.
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
ASC Program Example Part 3 of Associative Computing Examining the MST code in ASC Primer.
Copyright © Cengage Learning. All rights reserved.
Building Suffix Trees in O(m) time Weiner had first linear time algorithm in 1973 McCreight developed a more space efficient algorithm in 1976 Ukkonen.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 20: Binary Trees.
Solving Systems of Linear Equations by Graphing
Linear Systems The definition of a linear equation given in Chapter 1 can be extended to more variables; any equation of the form for real numbers.
Sequence comparison: Local alignment
Dynamic Programming Introduction to Algorithms Dynamic Programming CSE 680 Prof. Roger Crawfis.
Review for Final Exam Systems of Equations.
Solving Systems of Linear Equations and Inequalities
Sequence Alignment.
Comp. Genomics Recitation 2 12/3/09 Slides by Igor Ulitsky.
Dynamic Programming UNC Chapel Hill Z. Guo.
Chapter 19: Binary Trees. Objectives In this chapter, you will: – Learn about binary trees – Explore various binary tree traversal algorithms – Organize.
Comp. Genomics Recitation 12 Bayesian networks Taken from Artificial Intelligence course, MIT, 6.034
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Read Alignment Algorithms. The Problem 2 Given a very long reference sequence of length n and given several short strings.
CSC401: Analysis of Algorithms CSC401 – Analysis of Algorithms Chapter Dynamic Programming Objectives: Present the Dynamic Programming paradigm.
Comp. Genomics Recitation 3 The statistics of database searching.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
Lectures on Greedy Algorithms and Dynamic Programming
Dynamic Programming David Kauchak cs302 Spring 2013.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
Dynamic Programming David Kauchak cs161 Summer 2009.
Part 2 # 68 Longest Common Subsequence T.H. Cormen et al., Introduction to Algorithms, MIT press, 3/e, 2009, pp Example: X=abadcda, Y=acbacadb.
Probabilistic methods for phylogenetic tree reconstruction BMI/CS 576 Colin Dewey Fall 2015.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Dr Nazir A. Zafar Advanced Algorithms Analysis and Design Advanced Algorithms Analysis and Design By Dr. Nazir Ahmad Zafar.
Objectives: 1. Look for a pattern 2. Write an equation given a solution 4-8 Writing Equations from Patterns.
All-pairs Shortest paths Transitive Closure
Sequence comparison: Local alignment
Solving Systems of Linear Equations and Inequalities
Statistical Optimal Hash-based Longest Prefix Match
Sequence Alignment 11/24/2018.
Intro to Alignment Algorithms: Global and Local
Artificial Intelligence Chapter 20 Learning and Acting with Bayes Nets
CSE 589 Applied Algorithms Spring 1999
Chapter 20. Learning and Acting with Bayes Nets
Searching.
Computational Genomics Lecture #3a
4 Chapter Chapter 2 Solving Systems of Linear Equations.
Presentation transcript:

Comp. Genomics Recitation 12 Bayesian networks Taken from Artificial Intelligence course, MIT, Exam questions

Question 1.1 Draw a Bayesian network among the following binary variables that model the outcome of an election: I: candidate is Incumbent M: has lots of Money for advertising A: uses advertisements that focus on Attacking the candidate’s opponent Q: uses advertisements that focus on the candidate’s Qualifications L: candidate is Liked D: opponent is Distrusted E: candidate is Elected

Question 1.1 – cont’d Your network should encode the following beliefs: Incumbents tend to raise lots of money. Money can be used to buy advertising that either focuses on the candidate’s qualifications or that attacks the candidate’s opponent. But if one does the first, there is less money to do the latter. Attack advertisements tend to make voters distrust the opponent but they also make the voters tend not to like the candidate. Advertisement focusing on qualifications tends to make the voters like the candidate. Candidates that people like tend to get elected. Candidates whose opponent people distrust tent to get elected.

Question solution

Question 1.2 For each of the following, say whether it is or is not asserted by the network structure you drew (without assuming anything about the numerical entries in the CPTs). 1.P(L | A,Q,D) = P(L | A,Q) 2.P(A | M,Q) = P(A | M) 3.P(L,D | A,Q) = P(L | A,Q) P(D | A,Q)

Question solution 1.P(L | A,Q,D) = P(L | A,Q) Asserted 2.P(A | M,Q) = P(A | M) Not asserted 3.P(L,D | A,Q) = P(L | A,Q) P(D | A,Q) Asserted

Question 2 Show a Bayesian network structure that encodes the following relationships: A is independent of B A is dependent on B given C A is dependent on D A is independent of D given C

Question 2 - solution Nodes A and B have no parents Node C has two parents: A and B Node D has one parent: C

Question 3 Which of the following conditional independence assumptions are true? 1.A and E are independent 2.A and E are independent given D 3.B and C are independent 4.B and C are independent given A 5.B and C are independent given D 6.A and E are independent given B 7.A and E are independent given F 8.B and C are independent given E

Question 3 - solution A and E are independent False A and E are independent given D True B and C are independent False B and C are independent given A True B and C are independent given D False A and E are independent given B False A and E are independent given F False B and C are independent given E False

Question 4 For each statement, name all of the graph structures, G1-G4, or “none” that imply it.

Question 4 – cont’d 1.A is conditionally independent of B given C 2.A is conditionally independent of B given D 3.B is conditionally independent of D given A 4.B is conditionally independent of D given C 5.B is independent of C 6.B is conditionally independent of C given A

Question 4 - solution A is conditionally independent of B given C G2 A is conditionally independent of B given D none B is conditionally independent of D given A G3,G4 B is conditionally independent of D given C none B is independent of C G2,G3 B is conditionally independent of C given A G1,G2,G4

שאלה ממועד א ' תשע " ג ביולוג מבצע ניסוי. לניסוי יש הסתברות הצלחה מסויימת. אם הניסוי הצליח, עולה רמת הביטוי של שני חלבונים : חלבון A וחלבון B. שני החלבונים האלה מבצעים רגולציה ( כל אחד לחוד וגם ביחד ) לחלבון C. כל חלבון יכול להיות באחד משני מצבים – UP או DOWN מבחינת רמת הביטוי שלו.

שאלה ממועד א ' תשע " ג – סעיף א ' שרטטו רשת בייסיאנית שכוללת 4 מצבים : E, A, B ו -C. מצב E מסמן האם הניסוי הצליח ו -A,B,C – עבור כל חלבון האם הוא UP או DOWN. הקשתות (E, A), (E, B), (A, C), (A, B)

שאלה ממועד א ' תשע " ג – המשך לכל אחת מהטענות הבאות של אי - תלות – הוכח או הפרך. הוכחה צריכה להיות ללא תלות בערכים המספריים. הפרכה צריכה להיות על - ידי דוגמא נגדית ולהראות שאכן יש תלות בין המשתנים. ניתן להיעזר בכל משפט שנלמד בכיתה.

שאלה ממועד א ' תשע " ג – סעיף ב ' A ו -B בלתי - תלויים בהינתן E. התשובה נכונה. למדנו משפט שמשתנים בלתי - תלויים ב -non-descendants, אם ההורים שלהם נתונים. E נתון, ו -A ו -B לא צאצאים אחד של השני, ולכן בלתי - תלויים.

שאלה ממועד א ' תשע " ג – סעיף ג ' A ו -B בלתי - תלויים. הם תלויים. P(E=0)=P(E=1)=0.5. P(A=0|E=0)=P(B=0|E=0)=1. P(A=1|E=1)=P(B=1|E=1)=1. P(A=1,B=1)=0.5. P(A=1)=0.5, P(B=1)=0.5.

שאלה ממועד א ' תשע " ג – סעיף שלא היה A ו -B בלתי - תלויים בהינתן C. הם תלויים. P(E=0)=P(E=1)=0.5. P(A=0|E=0)=P(B=0|E=0)=1. P(A=1|E=1)=P(B=1|E=1)=0.5. P(C=0|AB=00)=1, P(C=1|AB=01/10)=0.5, P(C=1|AB=11)=1 P(AB=11|C=1)=P(C=1/AB=11)P(AB=11)/P(C=1)=0.5. P(A=1|C=1)=[P(AB=11|C=1)+P(AB=10|C=1)]/P(C=1)=0.75. P(B=1|C=1)=P(A=1|C=1)=0.75

שאלה ממועד א ' תשע " ג – עוד סעיף שלא היה A ו -B בלתי - תלויים בהינתן E ו -C. הם תלויים. P(E=0)=P(E=1)=0.5. P(A=0|E=0)=P(B=0|E=0)=1. P(A=1|E=1)=P(B=1|E=1)=0.5. P(C=0|AB=00)=1, P(C=1|AB=01/10)=0.5, P(C=1|AB=11)=1 P(A=0|EC=11) = P(B=0|EC=11) > 0 P(AB=00|EC=11) = 0

Moed B You are given a set of strings S 1,S 2,..S k of length C each, and each string is associated to a positive score B i. S i appears in an alignment if there is a sequence of gapless matches in the alignment that contains S i. We reduce B i from the score of an alignment for every appearance of S i, including overlaps. Describe a global alignment algorithm.

Question True or false: The following algorithm is a global alignment algorithm for the problem: For every cell [i,j] in the DP matrix we will save the number of consecutive matches that the optimal alignment between x 1,…,x i and y 1,…y j has made since the last gap. If this value is ≥ C we will check for every S i and reduce B i as needed.

Solution The suggested algorithm does not work. Counter-example: S[G,G]=10, S[A,A]=1 indel=-1 S 1 =AAAG B 1 =-100 AAAG A A-2 A-3 G-4

Solution AAAG A10 -2 A 0 A-3 G-4-2 AAAG A10 -2 A 0210 A-31 G-4-20

Solution AAAG A10 -2 A 0210 A-3132 G AAAG A10 -2 A 0210 A-3132 G Alignment found: AAA _G Score:1A AAG_ Optimal alignment: _AAAG Score:10A_AAG

Question True or false: The algorithm that worked for positive bonuses will work here too : Add terms of the following form to the recursive update rule: -I si *B i +∑ k=0..3 S[i-k,j-k] where I si is 1 if the nucleotides i-3,…,i and j- 3,…,j are the seed S i and otherwise ∞. The last component is the normal score for matching 4 nucleotides.

Solution It will not work here. Since the -I si *B i ≤0, and since the option of four consecutive matches is also considered, the algorithm will never use the new update rule The score that the algorithm computes will not be consistent with the scoring scheme

Question What is the correct algorithm? Divide every cell of the DP matrix into C+1 cells The cell M[i,j,k] represents the optimal alignment between X and Y that ends with k matches

Solution

Correctness: Assume that we have the correct values for all cells M[i’,j’,k’] that precede M[i,j,k] and we want to compute the score at the cell M[i,j,k]. If k<C, then we are not creating a sequence of C matches, and therefore by the inductive assumption and the defined operations M[i,j,k] will contain the optimal score.

Solution If k≥C, and the last C characters are not in {S i }, we are done for the same reasons. If k≥C and the last C characters are in {S i }, then there are several options: The optimal alignment contains the seed. Since we are checking the cells M[i-1,j-1,k-1], M[i-1,j-1,k], we will obtain the score of the optimal alignment.

Solution If k≥C and the last C characters are in {S i }, then the other option is: The seed is not in the optimal alignment. Since the alignment between [i-C+1,…,i][j-C+1,…j] does not contain C consecutive matches, but ends with a match, its prefix which aligns [i-C+1,…,i-1][j-C+1,…j- 1] must end with 0,1,…,C-2 matches. Hence the optimal alignment between [1,…,i-1] and [1,…,j-1] ends with 0,1,…,C-2 matches. By the inductive assumption, we have the optimal score for all these alignments, and the update rules tests them all.

Moed B An inverted-repeat is an appearance of some sequence and its inverse in a string, without overlapping. For example, in the string abcdelmnedcblmnknm there is an inverted repeat of size 4, because bcde and its inverse appear in it, and do not overlap. The sequence lmn appears twice but in the same order and therefore it does not constitute an inverted repeat. The sequence mnk appears twice but the two appearances overlap, and therefore it does not constitute an inverted repeat either. Describe a linear time algorithm for finding the longest inverted repeat in a string.

Solution Build a suffix tree for S and S R abcdelmnedcblmnknm i=red start index in S=2 mnknmlbcdenmledcba j=green start index in S R =7=|S|-|REP|-green start index in S+2 So there is no overlap if i+|REP|-1< |S|-|REP|-j+2

Solution Each node is marked if it has children from both S and S R. MAX  0 The postorder search will proceed as follows: If v does not have marked descendants and v is marked, compare the indices of the leaves with minimal indices in S, S R.

Solution The postorder search will proceed as follows: If v does not have marked descendants and v is marked, compare the indices of the leaves with minimal indices in S, S R. The indices give the maximal repeat with no overlap. If >MAX, update MAX.

Solution The section between the end of the first appearance and the start of the inverted appearance is (i+|REP|-1+|S|-|REP|- j+2)/2=(|S|+1+i-j)/2 The length of the non-overlapping string is (|S|+1+i-j)/2-i