Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 4 BNFO 235 Usman Roshan. IUPAC Nucleic Acid symbols.

Similar presentations


Presentation on theme: "Lecture 4 BNFO 235 Usman Roshan. IUPAC Nucleic Acid symbols."— Presentation transcript:

1 Lecture 4 BNFO 235 Usman Roshan

2 IUPAC Nucleic Acid symbols

3 IUPAC Amino Acid symbols

4 Genetic code

5 Splitting and joining strings split: splits a string by regular expression and returns array –@s = split(/,/); –@s = split(/\s+/); join: joins elements of array and returns a string (opposite of split) –$seq=join(“”, @pieces); –$seq=join(“X”, @pieces);

6 Searching and substitution $x =~ /$y/ ---- true if expression $y found in $x $x =~ /ATG/ --- true if open reading frame ATG found in $x $x !~ /GC/ --- true if GC not found in $x $x =~ s/T/U/g --- replace all T’s with U’s $x =~ s/g/G/g --- convert all lower case g to upper case G

7 DNA regular expressions Taken from Jagota’s Perl for Bioinformatics

8 DNA Sequence Evolution AAGACTT -3 mil yrs -2 mil yrs -1 mil yrs today AAGACTT T_GACTTAAGGCTT _GGGCTTTAGACCTTA_CACTT ACCTT (Cat) ACACTTC (Lion) TAGCCCTTA (Monkey) TAGGCCTT (Human) GGCTT (Mouse) T_GACTTAAGGCTT AAGACTT _GGGCTTTAGACCTTA_CACTT AAGGCTTT_GACTT AAGACTT TAGGCCTT (Human) TAGCCCTTA (Monkey) A_C_CTT (Cat) A_CACTTC (Lion) _G_GCTT (Mouse) _GGGCTTTAGACCTTA_CACTT AAGGCTTT_GACTT AAGACTT

9 Comparative Bioinformatics Fundamental notion of biology: all life is related by an unknown evolutionary Tree of Life. Therefore, if we know something about one species we can make inferences about other ones. Also, by comparing multiple species we can make inferences about sets of species. How do we compare DNA or protein sequences of two different species?

10 Comparative Bioinformatics We need to know how often do mutations from A to T occur or A to C occur. To determine this we manually create a set of “true” alignments and estimate the likelihood of A changing to C, for example, by counting the number of time A changes to C and computing related statistics. Now we have a realistic “scoring matrix” which can be used to evaluate how related are two species based on their DNA.

11 Problems Write a Perl subroutine called readmatrix that reads a DNA substitution scoring matrix from a file called “dna.txt” and stores it in a two dimensional array. The format of the scoring matrix in the file is ACGT A10314 C31235 G13152 T45211 Write a Perl subroutine called translate that takes an mRNA sequence and converts it into a protein sequence and also returns the sequence.

12 Problems Write a Perl program that reads in a substitution scoring matrix from a file called “matrix.txt”, reads in a pair of DNA sequences of equal length from a file called “dna.txt”, and returns the total substitution score between the two sequences. Write a Perl program that reads pairs of DNA sequences from a file called “DNApairs.txt” and estimates the frequency of nucleotide substitutions.


Download ppt "Lecture 4 BNFO 235 Usman Roshan. IUPAC Nucleic Acid symbols."

Similar presentations


Ads by Google