Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protein – Protein Interactions Simon Kanaan Advisor: Dr. Izaguirre Others: Dr. Chen, Dr. Wuchty, ChengBang Huang.

Similar presentations


Presentation on theme: "Protein – Protein Interactions Simon Kanaan Advisor: Dr. Izaguirre Others: Dr. Chen, Dr. Wuchty, ChengBang Huang."— Presentation transcript:

1 Protein – Protein Interactions Simon Kanaan Advisor: Dr. Izaguirre Others: Dr. Chen, Dr. Wuchty, ChengBang Huang

2 What is a “ protein – protein ”, P-P, interaction and why are they important? Derived from the nuclear material within a cell, proteins fold and interact in intricate arrangements that provide functionality to the components of a cell, which in turn work cooperatively to form whole body systems. Protein-protein interactions serve as the chemical basis of all living organisms.

3 What causes P-P interactions? Protein domains drive proteins to fold and interact as they do.

4 What are protein domains? significant portions of proteins composed of distinct peptides the key to intricate arrangements

5 Domains and Proteins A single protein molecule can possess multiple domains, causing difficulty in discovering a simple formula that dictates the manner by which protein-protein interactions occur. Yet, certain affinities exist between certain protein domains and are frequently seen in living organisms. This drives our research that seeks to extrapolate the mechanism of protein-protein interactions to focus on domain-domain interactions as a factor. The model system used for these proceedings is the yeast cell, with several of its proteins serving as the test cases. This is done using a protein family data bank available online.

6 Our “ Formula ” dictating which P-P interactions occur We approximate the minimum number of domain pairs, which are needed to explain all the protein interactions read from the family data bank. A protein interaction, (P1, P2), is explained if a domain pair, (D1, D2), is chosen such that P1 includes either D1 or D2 as one of its domains, while P2 includes the other as one of its domains. This is known as the "Minimum Set Cover Problem“.

7 Why the Minimum Set of Domains? Lets look at the following case: – P1 contains domains D2 – P2 contains domains D2 and D3 – P3 contains domains D2 and D4 – P4 contains domains D2 and D5 And lets assume the protein interactions are: P1 - P1 P1 - P2 P1 - P3 P1 - P4 P-P interactions explained by: – (D2 - D2) – (D2 - D3) – (D2 - D4) – (D1 - D5) Or by: – (D2 - D2)

8 Minimum Set Cover Problem The problem of finding the minimum size set of sets whose union is equal to the union of all the sets. NP complete problem.

9 Implementation/Algorithm This base algorithm consists of functions that can record the protein structure and interaction information and store them into different data structures. It also builds a domain-domain matrix. This matrix holds information about interacting domains. Each entry in the matrix represents the number of times domains Di and Dj were observed as the possible cause in different protein-protein interactions. Example: – P1:{D1, D2, D3} and P2 {D1, D5} interact. (D1, D1), (D1, D5), (D2, D1), (D2, D5), (D3, D1) and (D3, D5).

10 Implementation/Algorithm Algorithm approximates the minimum set of domains pairs. Algorithm needs to be able to choose d-d pairs in an educated, not a randomized fashion. This educated way can be done using weight functions. Where each domain pair is given a weight, and the largest of the weights is chosen.

11 Different Weight Functions Four different weight functions have been developed and implemented for the purposes of this project.

12 First Weight Function Assumption: – most common observed interacting domain pair among the protein interactions is probably the cause of the protein interactions. While there are P-P interactions to be explained { – Chooses the most common observed interacting domain Di-Dj. – Removes Di-Dj Removes all P-P interactions from the data being observed Undoes P-P interactions effect on matrix }

13 Second Weight Function Assumption – The most interacting domain present among the p-p interactions is probably the cause of the protein interactions. Performs same tasks as the first weight function with the following modification: – It creates a vector, sum_vector, with its size being the number of rows in the matrix. – Then it goes about summing up each row in the matrix and stores the value in the corresponding value in the sum_vector. – Then the weight function finds the maximum element in the sum_vector and then returns the maximum element in the corresponding row of the domain-domain matrix.

14 Third Weight Function Assumption: – Incorporate the absence of p-p interactions. – Initialize matrix just like the first weight function. go through every element in the matrix and divide that entry by the total number of proteins that contain the first domain times the number of proteins which contain the second domain. Now each element now represents the probability that domains i and j interact. – Then the weight function goes about choosing the highest probability in the matrix, seeing which proteins this domain pair explains, remove these proteins influence from the data and then performing the same tasks again.

15 Fourth Weight Function Assumption: – the absence of p-p edges is because the data is not complete not because these proteins do not interact. So if a domain pair is present among a lot of the proteins observed, only the interacting proteins will be taken into account. In the d-d matrix each entry represents the probability of domain i interacting with domain j calculated by subtracting the probability that domain i,j is not the cause of the interaction from one.

16 Testing From the available data bank, create training data of different sizes (.01,.25,.5,.75, 1). Run program which takes domain pairs chosen using training data and our algorithm Creates all possible P-P interactions by looking at the protein structure. Compares calculated P-P interactions with observed interactions. (number of matches, false positive, and false negative p-p interactions) Calculate fold, specificity, and sensitivity in order to compare to previous research.

17 Work in Progress Fixing a bug found Getting the testing done Maybe making a few more weight functions – Adding or subtracting weight depending on different assumptions. Compare with different algorithms/papers out there


Download ppt "Protein – Protein Interactions Simon Kanaan Advisor: Dr. Izaguirre Others: Dr. Chen, Dr. Wuchty, ChengBang Huang."

Similar presentations


Ads by Google