Comp. Genomics Recitation 12 Bayesian networks Taken from Artificial Intelligence course, MIT, 6.034

Comp. Genomics Recitation 12 Bayesian networks Taken from Artificial Intelligence course, MIT, 6.034 http://courses.csail.mit.edu/6.034s/handouts/6034-review-sol.pdf Exam questions

Question 1.1 Draw a Bayesian network among the following binary variables that model the outcome of an election: I: candidate is Incumbent M: has lots of Money for advertising A: uses advertisements that focus on Attacking the candidate’s opponent Q: uses advertisements that focus on the candidate’s Qualifications L: candidate is Liked D: opponent is Distrusted E: candidate is Elected

Question 1.1 – cont’d Your network should encode the following beliefs: Incumbents tend to raise lots of money. Money can be used to buy advertising that either focuses on the candidate’s qualifications or that attacks the candidate’s opponent. But if one does the first, there is less money to do the latter. Attack advertisements tend to make voters distrust the opponent but they also make the voters tend not to like the candidate. Advertisement focusing on qualifications tends to make the voters like the candidate. Candidates that people like tend to get elected. Candidates whose opponent people distrust tent to get elected.

Question 1.1 - solution

Question 2 Show a Bayesian network structure that encodes the following relationships: A is independent of B A is dependent on B given C A is dependent on D A is independent of D given C

Question 2 - solution Nodes A and B have no parents Node C has two parents: A and B Node D has one parent: C

Question 3 Which of the following conditional independence assumptions are true? 1.A and E are independent 2.A and E are independent given D 3.B and C are independent 4.B and C are independent given A 5.B and C are independent given D 6.A and E are independent given B 7.A and E are independent given F 8.B and C are independent given E

Question 3 - solution A and E are independent False A and E are independent given D True B and C are independent False B and C are independent given A True B and C are independent given D False A and E are independent given B False A and E are independent given F False B and C are independent given E False

Question 4 For each statement, name all of the graph structures, G1-G4, or “none” that imply it.

Question 4 – cont’d 1.A is conditionally independent of B given C 2.A is conditionally independent of B given D 3.B is conditionally independent of D given A 4.B is conditionally independent of D given C 5.B is independent of C 6.B is conditionally independent of C given A

Question 4 - solution A is conditionally independent of B given C G2 A is conditionally independent of B given D none B is conditionally independent of D given A G3,G4 B is conditionally independent of D given C none B is independent of C G2,G3 B is conditionally independent of C given A G1,G2,G4

שאלה ממועד א ' תשע " ג ביולוג מבצע ניסוי. לניסוי יש הסתברות הצלחה מסויימת. אם הניסוי הצליח, עולה רמת הביטוי של שני חלבונים : חלבון A וחלבון B. שני החלבונים האלה מבצעים רגולציה ( כל אחד לחוד וגם ביחד ) לחלבון C. כל חלבון יכול להיות באחד משני מצבים – UP או DOWN מבחינת רמת הביטוי שלו.

שאלה ממועד א ' תשע " ג – סעיף א ' שרטטו רשת בייסיאנית שכוללת 4 מצבים : E, A, B ו -C. מצב E מסמן האם הניסוי הצליח ו -A,B,C – עבור כל חלבון האם הוא UP או DOWN. הקשתות (E, A), (E, B), (A, C), (A, B)

שאלה ממועד א ' תשע " ג – המשך לכל אחת מהטענות הבאות של אי - תלות – הוכח או הפרך. הוכחה צריכה להיות ללא תלות בערכים המספריים. הפרכה צריכה להיות על - ידי דוגמא נגדית ולהראות שאכן יש תלות בין המשתנים. ניתן להיעזר בכל משפט שנלמד בכיתה.

שאלה ממועד א ' תשע " ג – סעיף ב ' A ו -B בלתי - תלויים בהינתן E. התשובה נכונה. למדנו משפט שמשתנים בלתי - תלויים ב -non-descendants, אם ההורים שלהם נתונים. E נתון, ו -A ו -B לא צאצאים אחד של השני, ולכן בלתי - תלויים.

שאלה ממועד א ' תשע " ג – סעיף ג ' A ו -B בלתי - תלויים. הם תלויים. P(E=0)=P(E=1)=0.5. P(A=0|E=0)=P(B=0|E=0)=1. P(A=1|E=1)=P(B=1|E=1)=1. P(A=1,B=1)=0.5. P(A=1)=0.5, P(B=1)=0.5.

שאלה ממועד א ' תשע " ג – סעיף שלא היה A ו -B בלתי - תלויים בהינתן C. הם תלויים. P(E=0)=P(E=1)=0.5. P(A=0|E=0)=P(B=0|E=0)=1. P(A=1|E=1)=P(B=1|E=1)=0.5. P(C=0|AB=00)=1, P(C=1|AB=01/10)=0.5, P(C=1|AB=11)=1 P(AB=11|C=1)=P(C=1/AB=11)P(AB=11)/P(C=1)=0.5. P(A=1|C=1)=[P(AB=11|C=1)+P(AB=10|C=1)]/P(C=1)=0.75. P(B=1|C=1)=P(A=1|C=1)=0.75

שאלה ממועד א ' תשע " ג – עוד סעיף שלא היה A ו -B בלתי - תלויים בהינתן E ו -C. הם תלויים. P(E=0)=P(E=1)=0.5. P(A=0|E=0)=P(B=0|E=0)=1. P(A=1|E=1)=P(B=1|E=1)=0.5. P(C=0|AB=00)=1, P(C=1|AB=01/10)=0.5, P(C=1|AB=11)=1 P(A=0|EC=11) = P(B=0|EC=11) > 0 P(AB=00|EC=11) = 0

Moed B 26.2.2010 You are given a set of strings S 1,S 2,..S k of length C each, and each string is associated to a positive score B i. S i appears in an alignment if there is a sequence of gapless matches in the alignment that contains S i. We reduce B i from the score of an alignment for every appearance of S i, including overlaps. Describe a global alignment algorithm.

Question True or false: The following algorithm is a global alignment algorithm for the problem: For every cell [i,j] in the DP matrix we will save the number of consecutive matches that the optimal alignment between x 1,…,x i and y 1,…y j has made since the last gap. If this value is ≥ C we will check for every S i and reduce B i as needed.

Solution The suggested algorithm does not work. Counter-example: S[G,G]=10, S[A,A]=1 indel=-1 S 1 =AAAG B 1 =-100 AAAG 0-2-3-4 A A-2 A-3 G-4

Solution AAAG 0-2-3-4 A10 -2 A 0 A-3 G-4-2 AAAG 0-2-3-4 A10 -2 A 0210 A-31 G-4-20

Solution AAAG 0-2-3-4 A10 -2 A 0210 A-3132 G-4-202 AAAG 0-2-3-4 A10 -2 A 0210 A-3132 G-4-2021 Alignment found: AAA _G Score:1A AAG_ Optimal alignment: _AAAG Score:10A_AAG

Question True or false: The algorithm that worked for positive bonuses will work here too : Add terms of the following form to the recursive update rule: -I si *B i +∑ k=0..3 S[i-k,j-k] where I si is 1 if the nucleotides i-3,…,i and j- 3,…,j are the seed S i and otherwise ∞. The last component is the normal score for matching 4 nucleotides.

Solution It will not work here. Since the -I si *B i ≤0, and since the option of four consecutive matches is also considered, the algorithm will never use the new update rule The score that the algorithm computes will not be consistent with the scoring scheme

Question What is the correct algorithm? Divide every cell of the DP matrix into C+1 cells The cell M[i,j,k] represents the optimal alignment between X and Y that ends with k matches

Solution

Correctness: Assume that we have the correct values for all cells M[i’,j’,k’] that precede M[i,j,k] and we want to compute the score at the cell M[i,j,k]. If k<C, then we are not creating a sequence of C matches, and therefore by the inductive assumption and the defined operations M[i,j,k] will contain the optimal score.

Solution If k≥C, and the last C characters are not in {S i }, we are done for the same reasons. If k≥C and the last C characters are in {S i }, then there are several options: The optimal alignment contains the seed. Since we are checking the cells M[i-1,j-1,k-1], M[i-1,j-1,k], we will obtain the score of the optimal alignment.

Solution If k≥C and the last C characters are in {S i }, then the other option is: The seed is not in the optimal alignment. Since the alignment between [i-C+1,…,i][j-C+1,…j] does not contain C consecutive matches, but ends with a match, its prefix which aligns [i-C+1,…,i-1][j-C+1,…j- 1] must end with 0,1,…,C-2 matches. Hence the optimal alignment between [1,…,i-1] and [1,…,j-1] ends with 0,1,…,C-2 matches. By the inductive assumption, we have the optimal score for all these alignments, and the update rules tests them all.

Moed B 26.2.2010 An inverted-repeat is an appearance of some sequence and its inverse in a string, without overlapping. For example, in the string abcdelmnedcblmnknm there is an inverted repeat of size 4, because bcde and its inverse appear in it, and do not overlap. The sequence lmn appears twice but in the same order and therefore it does not constitute an inverted repeat. The sequence mnk appears twice but the two appearances overlap, and therefore it does not constitute an inverted repeat either. Describe a linear time algorithm for finding the longest inverted repeat in a string.

Solution Build a suffix tree for S and S R abcdelmnedcblmnknm i=red start index in S=2 mnknmlbcdenmledcba j=green start index in S R =7=|S|-|REP|-green start index in S+2 So there is no overlap if i+|REP|-1< |S|-|REP|-j+2

Solution Each node is marked if it has children from both S and S R. MAX  0 The postorder search will proceed as follows: If v does not have marked descendants and v is marked, compare the indices of the leaves with minimal indices in S, S R.

Solution The postorder search will proceed as follows: If v does not have marked descendants and v is marked, compare the indices of the leaves with minimal indices in S, S R. The indices give the maximal repeat with no overlap. If >MAX, update MAX.

Solution The section between the end of the first appearance and the start of the inverted appearance is (i+|REP|-1+|S|-|REP|- j+2)/2=(|S|+1+i-j)/2 The length of the non-overlapping string is (|S|+1+i-j)/2-i

Comp. Genomics Recitation 12 Bayesian networks Taken from Artificial Intelligence course, MIT, 6.034

Similar presentations

Presentation on theme: "Comp. Genomics Recitation 12 Bayesian networks Taken from Artificial Intelligence course, MIT, 6.034"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Comp. Genomics Recitation 12 Bayesian networks Taken from Artificial Intelligence course, MIT, 6.034

Similar presentations

Presentation on theme: "Comp. Genomics Recitation 12 Bayesian networks Taken from Artificial Intelligence course, MIT, 6.034"— Presentation transcript:

Similar presentations

About project

Feedback