geneInfo[geneInfo$ensembl_gene_id == "ENSG ", ] ensembl_gene_id hgnc_symbol description 4829 ENSG STAM2 signal transducing adaptor molecule (SH3 domain and ITAM motif) 2 How many exons does the ensemble id ENSG has? 51 exons attr(gene, "ens") ensembl_gene_id ensembl_transcript_id ensembl_exon_id exon_chrom_start exon_chrom_end rank strand biotype 1 ENSG ENST ENSE protein_coding 2ENSG ENST ENSE protein_coding …… 48 ENSG ENST ENSE protein_coding 49 ENSG ENST ENSE protein_coding 50 ENSG ENST ENSE protein_coding 51 ENSG ENST ENSE protein_coding"> geneInfo[geneInfo$ensembl_gene_id == "ENSG ", ] ensembl_gene_id hgnc_symbol description 4829 ENSG STAM2 signal transducing adaptor molecule (SH3 domain and ITAM motif) 2 How many exons does the ensemble id ENSG has? 51 exons attr(gene, "ens") ensembl_gene_id ensembl_transcript_id ensembl_exon_id exon_chrom_start exon_chrom_end rank strand biotype 1 ENSG ENST ENSE protein_coding 2ENSG ENST ENSE protein_coding …… 48 ENSG ENST ENSE protein_coding 49 ENSG ENST ENSE protein_coding 50 ENSG ENST ENSE protein_coding 51 ENSG ENST ENSE protein_coding">

Presentation is loading. Please wait.

Presentation is loading. Please wait.

GBIO009-1 - Bioinformatics ____________________________________________________________________________________________________________________ Kirill.

Similar presentations


Presentation on theme: "GBIO009-1 - Bioinformatics ____________________________________________________________________________________________________________________ Kirill."— Presentation transcript:

1 GBIO009-1 - Bioinformatics ____________________________________________________________________________________________________________________ Kirill Bessonov 1 Homework 1 and 2 review session Presented by Kirill Bessonov November 2012

2 GBIO009-1 - Bioinformatics ____________________________________________________________________________________________________________________ Kirill Bessonov 2 HW1: classical Q & A (GenomeGraphs) (1) First two questions were on Bioconductor libraries. There are BioC 608 packages To get citations on particular library use citation("library_name") You were asked to get genomic data on specific gene library(GenomeGraphs) #download the whole database of Ensemble IDs ensembl_Human_Genes = useMart("ensembl",dataset="hsapiens_gene_ensembl"); #get info on gene form the database on the Ensemble ID gene <- makeGene(id = "ENSG00000115145", type="ensembl_gene_id", biomart = ensembl_Human_Genes ) #get info on transcript transcript <- makeTranscript(id = "ENSG00000115145", type="ensembl_gene_id", biomart= ensembl_Human_Genes) gdPlot ( list("gene"=gene, "transcripts"=transcript)) #retrieve info from the database displaying first 25 entries getBM(c("ensembl_gene_id", "hgnc_symbol", "description"), filter=c("with_exon_transcript", "with_protein_id", "with_transcript_variation"),values=list(TRUE, TRUE, TRUE), ensembl_Human_Genes )[1:25,]

3 GBIO009-1 - Bioinformatics ____________________________________________________________________________________________________________________ Kirill Bessonov 3 HW1: classical Q & A (GenomeGraphs) (2) What is the gene name (i.e. hgnc_symbol) and function represented by the Ensembl ID - ENSG00000115145? geneInfo=getBM(c("ensembl_gene_id", "hgnc_symbol", "description"), filter=c("with_exon_transcript", "with_protein_id", "with_transcript_variation"),values=list(TRUE, TRUE, TRUE), ensembl_Human_Genes ) > geneInfo[geneInfo$ensembl_gene_id == "ENSG00000115145", ] ensembl_gene_id hgnc_symbol description 4829 ENSG00000115145 STAM2 signal transducing adaptor molecule (SH3 domain and ITAM motif) 2 How many exons does the ensemble id ENSG00000115145 has? 51 exons attr(gene, "ens") ensembl_gene_id ensembl_transcript_id ensembl_exon_id exon_chrom_start exon_chrom_end rank strand biotype 1 ENSG00000115145 ENST00000263904 ENSE00001351655 153032117 153032506 1 -1 protein_coding 2ENSG00000115145 ENST00000263904 ENSE00002888710 153006659 153006743 2 -1 protein_coding …… 48 ENSG00000115145 ENST00000494589 ENSE00002785037 153004538 153004636 3 -1 protein_coding 49 ENSG00000115145 ENST00000494589 ENSE00002808134 153003676 153003822 4 -1 protein_coding 50 ENSG00000115145 ENST00000494589 ENSE00002929781 153001402 153001471 5 -1 protein_coding 51 ENSG00000115145 ENST00000494589 ENSE00001828491 153000503 153000527 6 -1 protein_coding

4 GBIO009-1 - Bioinformatics ____________________________________________________________________________________________________________________ Kirill Bessonov 4 HW1: classical Q & A (GenomeGraphs) (3) Execute the following command. How many chromosomes do you see? 25 chromosomes. 22 autosomal pairs, 1 sex pair and one mitochondrial chromosome Why the number of chromosomes in this Ensembl dataset is greater than 23 chromosome pairs? What does “MT”, “X” and “Y” refer to? Because of the MT chromosome, since X and Y can be grouped to a single pair > getBM("chromosome_name","","", ensembl_Human_Genes)[c(1:22,433:435),1] [1] "1" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "2" "20" "21" "22" "3" "4" "5" "6" "7" "8" "9" "MT" "X" "Y"

5 GBIO009-1 - Bioinformatics ____________________________________________________________________________________________________________________ Kirill Bessonov 5 HW2: Pairwise alignments (classical Q&A)

6 GBIO009-1 - Bioinformatics ____________________________________________________________________________________________________________________ Kirill Bessonov 6 HW2: Pairwise alignments (classical Q&A) Q1 Please align globally using Needleman–Wunsch algorithm the following DNA sequences. Use The following scoring rules: a) gap -5; b) match between two bases +5; c) mismatch between two bases +3;

7 GBIO009-1 - Bioinformatics ____________________________________________________________________________________________________________________ Kirill Bessonov 7 HW2: Pairwise alignments (classical Q&A) Q3 Do local protein alignment using BLOSUM 62 matrix on the HEAGAWGHEE and PAWHAE sequence. The scoring rules are a) gap -8; matches and mismatches are given in BLOSUM 62 matrix.

8 GBIO009-1 - Bioinformatics ____________________________________________________________________________________________________________________ Kirill Bessonov 8 HW2: Pairwise alignments (classical Q&A) Q5 Produce a dot plot of Human and Mouse p53 proteins from previous question and paste the plot below. Complete the lines of R code to get the dot plot. Are both proteins similar? Yes, very similar since we see clear diagonal corresponding to >90% of sequences length Where is/are the region(s) of greatest variation occur? Between 50-100

9 GBIO009-1 - Bioinformatics ____________________________________________________________________________________________________________________ Kirill Bessonov 9 HW2: Pairwise alignments (classical Q&A) Q7 What global alignment score do you get for the two p53 proteins, when you use the BLOSUM62 alignment matrix, a gap opening penalty of -10 and a gap extension penalty of - 0.5? Answer: score of 1556 query("p53_HUMAN", "AC=P04637"); p53_HUMAN_seq = getSequence(p53_HUMAN); query("p53_MOUSE", "AC=P02340"); p53_MOUSE_seq = getSequence(p53_MOUSE); globalAlign <- pairwiseAlignment( p53_HUMAN_seq, p53_MOUSE_seq, substitutionMatrix = "BLOSUM62", gapOpening = -10, gapExtension = -0.5) Errors: the R-code was not stated and the ID of proteins were not given such as Uniprot ID P04637

10 GBIO009-1 - Bioinformatics ____________________________________________________________________________________________________________________ Kirill Bessonov 10 HW2: Computer Style Implementation of NW algorithm in R

11 GBIO009-1 - Bioinformatics ____________________________________________________________________________________________________________________ Kirill Bessonov 11 HW2: Computer style (NW algorithm) [1] Given the pseudo-code implement NW algorithm in R – Algorithm has two parts Calculation of the alignment F-matrix Finding the optimal path(s) through the matrix for to length(A) F(i,0) ← d*i for j=0 to length(B) F(0,j) ← d*j for i=1 to length(A){ for j=1 to length(B) { Match ← F(i-1,j-1) + S(Ai, Bj) Delete ← F(i-1, j) + d Insert ← F(i, j-1) + d F(i,j) ← max(Match, Insert, Delete) } } d = gap penalty score i and j = positions in A & B sequences

12 GBIO009-1 - Bioinformatics ____________________________________________________________________________________________________________________ Kirill Bessonov 12 HW2: Computer style (NW algorithm) [2] Fmatrix = function(A,B){ fmatrix = matrix(0, nrow = (nchar(A)+1), ncol = nchar(B)+1) d = -8 #this is gap penalty for(i in 0 : nchar(A)){ fmatrix[i+1,1] = d * i #populates initial row with gap penalty } for(j in 0 : nchar(B)){ fmatrix[1,j+1] = d * i } for(i in 1 : nchar(A)){ for(j in 1 : nchar(B)) { score = rules(A,B) #get me sccore for the pair of aa or nt match = fmatrix[i,j] + score delete = fmatrix[i,j+1] + d insert = fmatrix[i+1,j] + d fmatrix[i+1,j+1] = max(match,delete,insert) } colnames(fmatrix) = strsplit( paste(" ", B, sep=""), "")[[1]]; rownames(fmatrix) = strsplit( paste(" ", A, sep=""), "")[[1]]; return(fmatrix) }

13 GBIO009-1 - Bioinformatics ____________________________________________________________________________________________________________________ Kirill Bessonov 13 HW2: Computer style (NW algorithm) [3] rules = function(A,B){ s.matrix <- matrix(rep(0,16), nrow = 4, ncol=4, byrow=TRUE, dimnames = list(c("A","C","G","T"),c("A","C","T","G"))) s.matrix["A",] = c(2,-1,-1,-1) s.matrix["C",] = c(-1,2,-1,-1) s.matrix["T",] = c(-1,-1,2,-1) s.matrix["G",] = c(-1,-1,-1,2) } > s.matrix A C T G A 2 -1 -1 -1 C -1 2 -1 -1 G -1 -1 2 -1 T -1 -1 -1 2

14 GBIO009-1 - Bioinformatics ____________________________________________________________________________________________________________________ Kirill Bessonov 14 HW2: Computer style (NW algorithm) [4] Check the F-matrix fmatrix=Fmatrix("ATCG", "TG") T G -32 -32 -32 A -8 -16 -24 T -16 -6 -14 C -24 -14 -4 G -32 -22 -12 Start finding the optimal path(s) through the matrix AlignmentA = "" AlignmentB = "" i = nchar(A) + 1 j = nchar(B) + 1 while(i > 1 && j > 1){ CurrentScore = fmatrix[i,j] #get score at current position of F-matrix ScoreDiag = fmatrix[i - 1, j - 1] ScoreUp = fmatrix[i, j - 1] what is around that F-matrix cell? ScoreLeft = fmatrix[i - 1, j]

15 GBIO009-1 - Bioinformatics ____________________________________________________________________________________________________________________ Kirill Bessonov 15 HW1: Computer style (NW algorithm) [5] Selecting the bottom right cell and starting to trace-back the path of optimal alignment AlignmentA = "" AlignmentB = "" while(i > 1 && j > 1){ CurrentScore = fmatrix[i,j] ScoreDiag = fmatrix[i - 1, j - 1] ScoreUp = fmatrix[i, j - 1] ScoreLeft = fmatrix[i - 1, j] #considering the score came from diagonal if (CurrentScore == ScoreDiag + s.matrix[substr(A,i,i), substr(B,j,j)) ){ AlignmentA = paste(substr(A,i-1,i-1),AlignmentA, sep = "") AlignmentB = paste(substr(B,j-1,j-1),AlignmentB, sep = "") i = i - 1 j = j - 1 } On diagonal path: previous + next cell Which cell of the F-matrix I am now?

16 GBIO009-1 - Bioinformatics ____________________________________________________________________________________________________________________ Kirill Bessonov 16 HW2: Computer style (NW algorithm) [6] #considering if the score comes from left (introducing a gap) else if(CurrentScore == ScoreLeft + d){ AlignmentA = paste(substr(A,i-1,i-1),AlignmentA, sep = "") AlignmentB = paste( "-", AlignmentB, sep = "") i = i - 1 } #considering if the score comes from upper cell (introducing a gap) else if(CurrentScore == ScoreUp + d){ AlignmentA = paste( "-", AlignmentA, sep = "") AlignmentB = paste(substr(B,j-1,j-1), AlignmentB, sep = "") j = j – 1} print(AlignmentA) print(AlignmentB) finalScore = cat("Final score :",fmatrix[(nchar(A)+1),(nchar(B)+1)])

17 GBIO009-1 - Bioinformatics ____________________________________________________________________________________________________________________ Kirill Bessonov 17 HW2: Computer style (NW algorithm) [7] The scoring matrices could have been accessed though character indices not requiring conversion and making code faster How one would output more than one BEST possible alignments? Please use more comments in your R-code Would be nice to see trace-backs visually Also the scoring rules were not stated clearly


Download ppt "GBIO009-1 - Bioinformatics ____________________________________________________________________________________________________________________ Kirill."

Similar presentations


Ads by Google