Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 7: Gen(om)e duplications 9/23/09. Homework 1. Clustal and trees 2. Ensembl links 3. OMIM.

Similar presentations


Presentation on theme: "Lecture 7: Gen(om)e duplications 9/23/09. Homework 1. Clustal and trees 2. Ensembl links 3. OMIM."— Presentation transcript:

1 Lecture 7: Gen(om)e duplications 9/23/09

2 Homework 1. Clustal and trees 2. Ensembl links 3. OMIM

3 HW #1  GNAT1

4 Fasta file >Human_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLEECLEFIAIIY GNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMSDIIQRLWKDSGIQACFERAS EYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGIIETQFSFKDLNFRMFDVGGQRSERKKWIHC FEGVTCIIFIAALSAYDMVLVEDDEVNRMHESLHLFNSICNHRYFATTSIVLFLNKKDVFFEKIKKAHLS ICFPDYDGPNTYEDAGNYIKVQFLELNMRRDVKEIYSHMTCATDTQNVKFVFDAVTDIIIKENLKDCGLF >Chimp_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLEECLEFIAIIY GNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMSDIIQRLWKDSGIQACFERAS EYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGIIETQFSFKDLNFRMFDVGGQRSERKKWIHC FEGVTCIIFIAALSAYDMVLVEDDEVNRMHESLHLFNSICNHRYFATTSIVLFLNKKDVFFEKIKKAHLS ICFPDYDGPNTYEDAGNYIKVQFLELNMRRDVKEIYSHMTCATDTQNVKFVFDAVTDIIIKENLKDCGLF >Dog_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLEECLEFIAIIY GNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMSDIIQRLWKDSGIQACFERAS EYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGIIETQFSFKDLNFRMFDVGGQRSERKKWIHC FEGVTCIIFIAALSAYDMVLVEDDEVNRMHESLHLFNSICNHRYFATTSIVLFLNKKDVFSEKIKKAHLS ICFPDYDGPNTYEDAGNYIKVQFLELNMRRDVKEIYSHMTCATDTQNVKFVFDAVTDIIIKENLKDCGLF Note: Programs will use whatever is in the identifier up to the 1st space as labels. If you don’t like genbank #s, you can change this to species names.

5 CLUSTAL 2.0.8 multiple sequence alignment Human_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLE 60 Chimp_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLE 60 Dog_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLE 60 Cow_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLE 60 Rat_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLE 60 Mouse_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLE 60 Zfish_GNAT1 MGAGASAEEKHSRELEKKLKEDADKDARTVKLLLLGAGESGKSTIVKQMKIIHKDGYSLE 60 ***********************:*****************************:****** Human_GNAT1 ECLEFIAIIYGNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMS 120 Chimp_GNAT1 ECLEFIAIIYGNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMS 120 Dog_GNAT1 ECLEFIAIIYGNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMS 120 Cow_GNAT1 ECLEFIAIIYGNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMS 120 Rat_GNAT1 ECLEFIAIIYGNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMS 120 Mouse_GNAT1 ECLEFIAIIYGNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMS 120 Zfish_GNAT1 ECLEFIVIIYSNTMQSILAVVRAMTTLNIGYGDAAAQDDARKLMHLADTIEEGTMPKELS 120 ******.***.**:*****:********* ***:* *********:************:* Human_GNAT1 DIIQRLWKDSGIQACFERASEYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGI 180 Chimp_GNAT1 DIIQRLWKDSGIQACFERASEYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGI 180 Dog_GNAT1 DIIQRLWKDSGIQACFERASEYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGI 180 Cow_GNAT1 DIIQRLWKDSGIQACFDRASEYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGI 180 Rat_GNAT1 DIIQRLWKDSGIQACFDRASEYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGI 180 Mouse_GNAT1 DIIQRLWKDSGIQACFDRASEYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGI 180 Zfish_GNAT1 DIILRLWKDSGIQACFDRASEYQLNDSAGYYLNDLERLIQPGYVPTEQDVLRSRVKTTGI 180 *** ************:***************.*****: ******************** Human_GNAT1 IETQFSFKDLNFRMFDVGGQRSERKKWIHCFEGVTCIIFIAALSAYDMVLVEDDEVNRMH 240 Chimp_GNAT1 IETQFSFKDLNFRMFDVGGQRSERKKWIHCFEGVTCIIFIAALSAYDMVLVEDDEVNRMH 240 Dog_GNAT1 IETQFSFKDLNFRMFDVGGQRSERKKWIHCFEGVTCIIFIAALSAYDMVLVEDDEVNRMH 240 Cow_GNAT1 IETQFSFKDLNFRMFDVGGQRSERKKWIHCFEGVTCIIFIAALSAYDMVLVEDDEVNRMH 240 Rat_GNAT1 IETQFSFKDLNFRMFDVGGQRSERKKWIHCFEGVTCIIFIAALSAYDMVLVEDDEVNRMH 240 Mouse_GNAT1 IETQFSFKDLNFRMFDVGGQRSERKKWIHCFEGVTCIIFIAALSAYDMVLVEDDEVNRMH 240 Zfish_GNAT1 IETQFSFKDLNFRMFDVGGQRSERKKWIHCFEGVTCIIFIAALSAYDMVLVEDDEVNRMH 240 ************************************************************ Human_GNAT1 ESLHLFNSICNHRYFATTSIVLFLNKKDVFFEKIKKAHLSICFPDYDGPNTYEDAGNYIK 300 Chimp_GNAT1 ESLHLFNSICNHRYFATTSIVLFLNKKDVFFEKIKKAHLSICFPDYDGPNTYEDAGNYIK 300 Dog_GNAT1 ESLHLFNSICNHRYFATTSIVLFLNKKDVFSEKIKKAHLSICFPDYDGPNTYEDAGNYIK 300 Cow_GNAT1 ESLHLFNSICNHRYFATTSIVLFLNKKDVFSEKIKKAHLSICFPDYNGPNTYEDAGNYIK 300 Rat_GNAT1 ESLHLFNSICNHRYFATTSIVLFLNKKDVFSEKIKKAHLSICFPDYDGPNTYDDAGNYIK 300 Mouse_GNAT1 ESLHLFNSICNHRYFATTSIVLFLNKKDVFSEKIKKAHLSICFPDYDGPNTYEDAGNYIK 300 Zfish_GNAT1 ESLHLFNSICNHRYFATTSIVLFLNKKDVFVEKIKKAHLSMCFPEYDGPNTFEDAGNYIK 300 ****************************** *********:***:*:****::******* Human_GNAT1 VQFLELNMRRDVKEIYSHMTCATDTQNVKFVFDAVTDIIIKENLKDCGLF 350 Chimp_GNAT1 VQFLELNMRRDVKEIYSHMTCATDTQNVKFVFDAVTDIIIKENLKDCGLF 350 Dog_GNAT1 VQFLELNMRRDVKEIYSHMTCATDTQNVKFVFDAVTDIIIKENLKDCGLF 350 Cow_GNAT1 VQFLELNMRRDVKEIYSHMTCATDTQNVKFVFDAVTDIIIKENLKDCGLF 350 Rat_GNAT1 VQFLELNMRRDVKEIYSHMTCATDTQNVKFVFDAVTDIIIKENLKDCGLF 350 Mouse_GNAT1 VQFLELNMRRDVKEIYSHMTCATDTQNVKFVFDAVTDIIIKENLKDCGLF 350 Zfish_GNAT1 VQFLDLNLRRDIKEIYSHMTCATDTENVKFVFDAVTDIIIKENLKDCGLF 350 ****:**:***:*************:************************ 350 sites * Fixed : 324 324/350 = 92.6%

6 HW #1  GNGT1 Fixed =49/74 = 66% Human_GNGT1 MPVINIEDLTEKDKLKMEVDQLKKEVTLERMLVSKCCEEVRDYVEERSGEDPLVKGIPED 60 Chimp_GNGT1 MPVINIEDLTEKDKLKMEVDQLKKEVTLERMLVSKCCEEVRDYVEERSGEDPLVKGIPED 60 Dog_GNGT1 MPVINIEDLTEKDKLKMEVDQLKKEVTLERMLVSKCCEEVRDYVEERSGEDPLVKGIPED 60 Cow_GNGT1 MPVINIEDLTEKDKLKMEVDQLKKEVTLERMLVSKCCEEFRDYVEERSGEDPLVKGIPED 60 Mouse_GNGT1 MPVINIEDLTEKDKLKMEVDQLKKEVTLERMMVSKCCEEVRDYIEERSGEDPLVKGIPED 60 Rat_GNGT1 MPVINIEDLTEKDKLKMEVDQLKKEVTLERVMVSKCCEEVRDYIEERSREDPLVKGIPED 60 Zfish_GNGT1 MPIIDVENMTDLDKAKMEVTQLKTEVKLERAKVSKCCEEITEYIQGGADEDPLVKGIPEE 60 **:*::*::*: ** **** ***.**.*** *******. :*:: : **********: Human_GNGT1 KNPFKELKGGCVIS 74 Chimp_GNGT1 KNPFKELKGGCVIS 74 Dog_GNGT1 KNPFKELKGGCVIS 74 Cow_GNGT1 KNPFKELKGGCVIS 74 Mouse_GNGT1 KNPFKELKGGCVIS 74 Rat_GNGT1 KNPFKELKGGCVIS 74 Zfish_GNGT1 KNPFKE-KGGCVIC 73 ****** ******.

7 Protein interactions Rhodopsin GNAT1 GNB1 GNGT1

8 Relative constraint, % of fixed sites  GNAT1324 / 350 = 92.6%  GNB1306 / 340 = 88%  GNGT149 / 74 = 66%

9 Trees

10 Ensembl search finds lots of groups  Interpro domain - identifies and groups proteins by protein signatures  Ensembl families - proteins grouped by phylogenetic relationship  Vega / Havana - the human hand curated part of the ensembl database. They confirm each predicted gene in different genomes Find proteins, pseudogenes, processed pseudogenes

11 We want Ensembl protein_coding Gene Check that it is rhodopsin and not some rhodopsin related gene

12 Transcript and protein info are useful

13 Protein - use links at left to look at the sequence

14 Protein sequence

15 Exon shows sequences of exons as well as those of UTRs, and introns Start 5’UTR Intron

16 cDNA sequence includes known SNPs Variation in human population

17 Can export sequence

18 Ensembl  There is a dizzying array of data and info on this web site.  We will try to use it as a “helpful” tool to gather more sequences  Often we just want to get all the homologs from all the species where Ensembl has made that link -

19 At bottom of sequence list is link to sequence display

20 Go back to the gene page and scroll down to find orthologs This shows pairwise comparisons in clustalw format.

21 OMIM

22 Q4. Making trees  Clustalw is a bit limited Sequences are compared using distances Trees are drawn by neighbor joining  Nice to have more options Max likelihood, distance, parsimony  Phylip - set of modules that you can mix and match to make trees Phylemon Pasteur Institute

23 Methods  Parsimony - Alignment  Input characters to parsimony tree program  Distance Alignment  Calculate distances  Input distances to tree program  Maximum likelihood Alignment  Input characters to ML program

24 Steps to make a distance tree StepsProgram Align sequencesClustalw-multialign Calculate distancesDNAdist Protdist Use distances to make a tree Neighbor Display treeExternal program

25 Steps to make a distance tree  Align sequences Can do in clustalw at EBI web site or at Pasteur web site

26 Pasteur Institute - Phylogenetics

27 Clustalw2 at Pasteur - under alignment and under multiple Either paste in sequences or select fasta file and upload

28 Leave defaults and hit Run

29 Save files to keep results Clustal does make dendogram which you can save

30 Save files to keep results You can pass the results of this to the next program here

31 Calculate distances  If DNA use DNAdist  If protein (AA) use Protdist

32 Pass alignment to protdist

33 Use Protdist under distance Upload or paste data and say Run

34 Save distance matrix then send to neighbor joining program to make tree

35 Tell it which # taxa is the outgroup - this will root your tree! 7

36  CLUSTAL 2.0.11 multiple sequence alignment Human_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLE Chimp_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLE Dog_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLE Cow_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLE Rat_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLE Mouse_GNAT1 MGAGASAEEKHSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKQMKIIHQDGYSLE Zfish_GNAT1 MGAGASAEEKHSRELEKKLKEDADKDARTVKLLLLGAGESGKSTIVKQMKIIHKDGYSLE ***********************:*****************************:****** Human_GNAT1 ECLEFIAIIYGNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMS Chimp_GNAT1 ECLEFIAIIYGNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMS Dog_GNAT1 ECLEFIAIIYGNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMS Cow_GNAT1 ECLEFIAIIYGNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMS Rat_GNAT1 ECLEFIAIIYGNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMS Mouse_GNAT1 ECLEFIAIIYGNTLQSILAIVRAMTTLNIQYGDSARQDDARKLMHMADTIEEGTMPKEMS Zfish_GNAT1 ECLEFIVIIYSNTMQSILAVVRAMTTLNIGYGDAAAQDDARKLMHLADTIEEGTMPKELS ******.***.**:*****:********* ***:* *********:************:* Human_GNAT1 DIIQRLWKDSGIQACFERASEYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGI Chimp_GNAT1 DIIQRLWKDSGIQACFERASEYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGI Dog_GNAT1 DIIQRLWKDSGIQACFERASEYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGI Cow_GNAT1 DIIQRLWKDSGIQACFDRASEYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGI Rat_GNAT1 DIIQRLWKDSGIQACFDRASEYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGI Mouse_GNAT1 DIIQRLWKDSGIQACFDRASEYQLNDSAGYYLSDLERLVTPGYVPTEQDVLRSRVKTTGI Zfish_GNAT1 DIILRLWKDSGIQACFDRASEYQLNDSAGYYLNDLERLIQPGYVPTEQDVLRSRVKTTGI *** ************:***************.*****: ******************** Note: Zebrafish is taxa #7

37 Save tree

38 What does this tree mean???  Tree shows relationships and branch lengths (((Cow_GNAT1:0.00281,Rat_GNAT1:0.00281):0.00004,Mouse_G NAT:-0.00004):0.00070, ((Human_GNAT:0.00000,Chimp_GNAT:0.00001):0.00244,Dog_ GNAT1:0.00037):0.00210,Zfish_GNAT:0.06645);  Just relationships: (((Cow,Rat),Mouse),((Human,Chimp),Dog),Zfish)

39 You can download FigTree for drawing trees Mac PC

40 Tree - does this make sense?

41 What is the difference between homologs, orthologs and paralogs?????

42 Orthologs Have common ancestor, derived by descent Paralogs Gene duplicates within the same organism Homologs = orthologs + paralogs

43 LWS RH2 SWS2 SWS1 RH1 Lamprey LWS Lamprey RHB Lamprey RHA Lamprey S2 Lamprey S1

44 How do gen(ome)s evolve?  What can change? DNA mutation DNA deletions / insertions (indels) Recombination Selection - change in gene frequency Gene transfer Duplications

45 Human Chicken Frog Zebrafish Dog Human Chicken Frog Zebrafish Dog Lamprey Gene duplication

46 Ohno Evolution by Gene Duplication, 1970  Gene duplication is the primary way that you get new genes to work with  Genome duplications Double # of chromosomes Keep balance in biochemical machinery Duplicate regulatory structure  New genes can evolve to do new jobs!

47 Gene vs genome duplications  How do you know what has duplicated?

48 Mechanisms for duplication 1.Tandem duplication 2.Insertion of retrotransposed gene 3.Genome / chromosome duplication

49 1. Mismatched recombination  Leads to extra genes inserted right next to original gene  Unequal crossover

50 Normal DNA recombination Switches genes from one chromosome to the other Leads to new gene combinations

51 Mismatched recombination If chromosomes misalign, recombination leads to gain of gene on one chromosome and loss of gene on the other. Tandem arrays of genes

52 Opsin gene tandem arrays on X chromosome Only first 2 genes are expressed so it doesn’t matter if there are more green genes. They are just along for ride.

53 Misaligned recombination If recombination happens within gene, get chimera Intermediate phenotype - changes pigment light sensitivity Opsin genes on X chromosome

54 Human red and green opsins 530 nm 560 nm A S A A164S=+2 nm Y F T F261Y=+10 nm A269T=+14 nm 554 nm

55 Normal human visual pigments Normal max = 420, 535, 565 nm

56 Deuteranomoly - green pigment shifted towards red max = 420, 550, 565 nm 5% male 0.04% female

57

58 2. Insertion of retrotransposed gene  Gene can be transcribed to mRNA  mRNA then gets reverse transcribed and inserted into DNA Clue a gene is retrotransposed?  No introns - all coding sequence

59 Comparison of rhodopsin genes Vertebrate rhodopsin gene Fish rhodopsin gene

60 Possibilities  Lost introns and stayed in place  mRNA sequence reinserted somewhere else in the genome

61 Fugu - human comparison Rh1 Human chr 3 Fugu scaffold 830 Human chr Z Fugu Rh gene has been inserted into chromosome


Download ppt "Lecture 7: Gen(om)e duplications 9/23/09. Homework 1. Clustal and trees 2. Ensembl links 3. OMIM."

Similar presentations


Ads by Google