Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parametric Inference and Drosophila Alignments Female Male Karyotype A project to compare and contrast Drosophila.

Similar presentations


Presentation on theme: "Parametric Inference and Drosophila Alignments Female Male Karyotype A project to compare and contrast Drosophila."— Presentation transcript:

1

2 Parametric Inference and Drosophila Alignments

3 http://species.flybase.net/ Female Male Karyotype A project to compare and contrast Drosophila

4 http://rana.lbl.gov/drosophila/

5 DroAna_20041206_ GTCGCTCAACCAGCATTTGCAAAAGTCGCAGAACTTGCGCTCATTGGATTTCCAGTACTC DroMel_4_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroMoj_20041206_ GTCGCTTAACCAGCATTTACAGAAATCGCAATACTTGCGTTCATTGGATTTCCAGTACTC DroPse_1_ GTCGCTCAGCCAGCACTTGCAGAAGTCGCAGTACTTGCGCTCGTTTGATTTCCAGAATTC DroSim_20040829_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroVir_20041029_ GTCGCTCAACCAGCATTTGCAGAAGTCGCAATACTTGCGTTCATTCGACTTCCAGTACTC DroYak_1_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTC ****** * ****** ** ** ** ***** **** ** ** ** ** ****** * ** Alignment of an exon DroAna_20041206_ CTGAAGGAAT-------TCTATATT---------AAAGAAGATTTCTCATCATTGGTTG DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTT DroMoj_20041206_ CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA------- DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- DroSim_20040829_ CTGCGGGATTAGGAGTCATTAGAGT---------GCGGAAAAGCGG---------GTT-DroVir_20041029_ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA------- DroYak_1_ CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAATAGATC---------CTTT *** * * * DroAna_20041206_ AATC-----ACTTAC DroMel_4_ ATTCTATGGACTCAC DroMoj_20041206_ ----TATTTACTCAC DroPse_1_ ------TGTACTTAC DroSim_20040829_ ATTCTATGGACTCAC DroVir_20041029_ ----TATTTACTCAC DroYak_1_ ATTTCATAAACTCAC *** ** Alignment of an intron

6 DroAna_20041206_ GTCGCTCAACCAGCATTTGCAAAAGTCGCAGAACTTGCGCTCATTGGATTTCCAGTACTC DroMel_4_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroMoj_20041206_ GTCGCTTAACCAGCATTTACAGAAATCGCAATACTTGCGTTCATTGGATTTCCAGTACTC DroPse_1_ GTCGCTCAGCCAGCACTTGCAGAAGTCGCAGTACTTGCGCTCGTTTGATTTCCAGAATTC DroSim_20040829_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroVir_20041029_ GTCGCTCAACCAGCATTTGCAGAAGTCGCAATACTTGCGTTCATTCGACTTCCAGTACTC DroYak_1_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTC ****** * ****** ** ** ** ***** **** ** ** ** ** ****** * ** Alignment of an exon Alignment of an intron droAna1.2448876 CTGAAGGAATTCTA--TATTAAAG----------------------------------- dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAA----AGCGAGT-TTATTC droMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA---------CACATAAA------CGTTTTAAATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGG----AGAGGCCATCATCG droSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG---------TGCGGAAA----AGCGGG--TTATTC droVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA---------TAAACAA--------TTCTCTAATTT droYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG---------TGAAGAAT----AGATCCT-TTATTT *** * * * * droAna1.2448876 AAGATTTCTCATCATTGGTTGAATC---------------------ACTTAC dm2.chr2L -----------------------------------------TATGGACTCAC droMoj1.contig_2959 -------------------------AAATATTT--------TATTGACTCAC dp3.chr4_group3 -----------------------------------------TGT--ACTTAC droSim1.chr2L -----------------------------------------TATGGACTCAC droVir1.scaffold_6 ---------------------------------AAATATTTGGTCCACTCAC droYak1.chr2L -----------------------------------------CATAAACTCAC *** **

7 DroAna_20041206_ GTCGCTCAACCAGCATTTGCAAAAGTCGCAGAACTTGCGCTCATTGGATTTCCAGTACTC DroMel_4_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroMoj_20041206_ GTCGCTTAACCAGCATTTACAGAAATCGCAATACTTGCGTTCATTGGATTTCCAGTACTC DroPse_1_ GTCGCTCAGCCAGCACTTGCAGAAGTCGCAGTACTTGCGCTCGTTTGATTTCCAGAATTC DroSim_20040829_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroVir_20041029_ GTCGCTCAACCAGCATTTGCAGAAGTCGCAATACTTGCGTTCATTCGACTTCCAGTACTC DroYak_1_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTC ****** * ****** ** ** ** ***** **** ** ** ** ** ****** * ** Alignment of an exon Alignment of an intron droAna CTGAAGGAATT--CTATATTAAAGAAGATTTCTCATCATT-GGTTGAATCACTTAC---- droMel CTGCGGGATTAGGGGTCATTAGAGTGCCGAAAAGCGAGTTTATTCTATGGACTCAC---- droMoj CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAA-ATTCAAATATTTTATTGAC droPse CTGGAAGAGTT--TTGATTAGTAGGGGATCCATGGGGGCG-AGGAGAGGCCATCATCGTG droSim CTGCGGGATTAGGAGTCATTAGAGTGCGGAAAAGCGGGTT-ATTCTATGGACTCAC---- droVir CTGCAGCAGTTAAATA-ATTGTAATAAACAA--TTCTCTA-ATTTAAATATTTGGTCCAC droYak CTGCGGGATTAGCGGTCATTGGTGTGAAGAATAGATCCTTTATTTCATAAACTCAC---- *** * * * * * droAna ------- droMel ------- droPse TACTTAC droMoj TCAC--- droSim ------- droVir TCAC--- droYak -------

8 DroAna_20041206_ CTGAAGGAAT-------TCTATATT---------AAAGAAGATTTCTCATCATTGGTTG DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTT DroMoj_20041206_ CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA------- DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- DroSim_20040829_ CTGCGGGATTAGGAGTCATTAGAGT---------GCGGAAAAGCGG---------GTT- DroVir_20041029_ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA------- DroYak_1_ CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAATAGATC---------CTTT DroAna_20041206_ AATC-----ACTTAC DroMel_4_ ATTCTATGGACTCAC DroMoj_20041206_ ----TATTTACTCAC DroPse_1_ ------TGTACTTAC DroSim_20040829_ ATTCTATGGACTCAC DroVir_20041029_ ----TATTTACTCAC DroYak_1_ ATTTCATAAACTCAC droAna CTGAAGGAATT--CTATATTAAAGAAGATTTCTCATCATT-GGTTGAATCACTTAC---- droMel CTGCGGGATTAGGGGTCATTAGAGTGCCGAAAAGCGAGTTTATTCTATGGACTCAC---- droMoj CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAA-ATTCAAATATTTTATTGAC droPse CTGGAAGAGTT--TTGATTAGTAGGGGATCCATGGGGGCG-AGGAGAGGCCATCATCGTG droSim CTGCGGGATTAGGAGTCATTAGAGTGCGGAAAAGCGGGTT-ATTCTATGGACTCAC---- droVir CTGCAGCAGTTAAATA-ATTGTAATAAACAA--TTCTCTA-ATTTAAATATTTGGTCCAC droYak CTGCGGGATTAGCGGTCATTGGTGTGAAGAATAGATCCTTTATTTCATAAACTCAC---- droAna ------- droMel ------- droPse TACTTAC droMoj TCAC--- droSim ------- droVir TCAC--- droYak ------- droAna1.2448876 CTGAAGGAATTCTA--TATTAAAG----------------------------------- dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAA----AGCGAGT-TTATTC droMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA---------CACATAAA------CGTTTTAAATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGG----AGAGGCCATCATCG droSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG---------TGCGGAAA----AGCGGG--TTATTC droVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA---------TAAACAA--------TTCTCTAATTT droYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG---------TGAAGAAT----AGATCCT-TTATTT droAna1.2448876 AAGATTTCTCATCATTGGTTGAATC---------------------ACTTAC dm2.chr2L -----------------------------------------TATGGACTCAC droMoj1.contig_2959 -------------------------AAATATTT--------TATTGACTCAC dp3.chr4_group3 -----------------------------------------TGT--ACTTAC droSim1.chr2L -----------------------------------------TATGGACTCAC droVir1.scaffold_6 ---------------------------------AAATATTTGGTCCACTCAC droYak1.chr2L -----------------------------------------CATAAACTCAC 64% 50% 43%

9 DroAna_20041206_ GTCGCTCAACCAGCATTTGCAAAAGTCGCAGAACTTGCGCTCATTGGATTTCCAGTACTC DroMel_4_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroMoj_20041206_ GTCGCTTAACCAGCATTTACAGAAATCGCAATACTTGCGTTCATTGGATTTCCAGTACTC DroPse_1_ GTCGCTCAGCCAGCACTTGCAGAAGTCGCAGTACTTGCGCTCGTTTGATTTCCAGAATTC DroSim_20040829_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroVir_20041029_ GTCGCTCAACCAGCATTTGCAGAAGTCGCAATACTTGCGTTCATTCGACTTCCAGTACTC DroYak_1_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTC ****** * ****** ** ** ** ***** **** ** ** ** ** ****** * ** Alignment of an exon DroAna_20041206_ GTCGCTCAACCAGCATTTGCAAAAGTCGCAGAACTTGCGCTCATTGGATTTCCAGTACTC DroEre_20041028_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTC DroMel_4_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroMoj_20041206_ GTCGCTTAACCAGCATTTACAGAAATCGCAATACTTGCGTTCATTGGATTTCCAGTACTC DroPse_1_ GTCGCTCAGCCAGCACTTGCAGAAGTCGCAGTACTTGCGCTCGTTTGATTTCCAGAATTC DroSim_20040829_ GTCGCTCAGCCAGCA-TTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroVir_20041029_ GTCGCTCAACCAGCATTTGCAGAAGTCGCAATACTTGCGTTCATTCGACTTCCAGTACTC DroYak_1_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTC ****** * ****** ** ** ** ***** **** ** ** ** ** ****** * ** X

10 Core Promoter Sequences Contribute to ovo-B Regulation in the Drosophila melanogaster Germline Beata Bielinska, Jining Lü, David Sturgill and Brian Oliver Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Department of Health and Human Services, Bethesda, Maryland 20892 Vol. 169, 161-172, January 2005

11 DroAna_20041206_ TTTCGGTGATTTTGAGTCT---------------CATATTGTATATTGTCTTCTT----- DroEre_20041028_ TCCGGGTGATTTTCCGTTG---------------CTTTTT-TTTTTTGCCTGCTT----- DroMel_4_ TC--GGTGATTTTCCGTTG---------------CTTTTT-TATTGTGTGTGCAC----- DroMoj_20041206_ TTTCGTTGTTATTACATTCTATTTTAATTTCGGAGTAATCTTCGTT--------CTCTTG DroPse_1_ TCTCGGCAGTTTTTCGTTGTAATATA-TTGGGGACTATTTGT------------------ DroSim_20040829_ ------------------------------------------------------------ DroVir_20041029_ TTTCGTTGTTATTTAATT--------ATTTAAGGCTCGTTTTCTTTTGCCCACCCCCCTA DroYak_1_ TC--GGTGATTTTCCGTTG---------------TTTCTT-T-TTTCGCCCGCAC----- DroAna_20041206_ ------CTCGAAAGTTCCTTGACTCCTAGCATCCA------TTACATTACATTAGA---- DroEre_20041028_ ------TCGAAAAGTTCTAT------TGGGTTCCACACGGTTTTCATATAGTTTGAA--- DroMel_4_ ------TCG-AAAGTTCTAT------TAGGTTCCACAGGGTTTTTATA-------CA--- DroMoj_20041206_ CGCTTTTCGC----TTTCGGGCAAGTGCCGTT----AACTTTTGCTTTACA--AGAATGT DroPse_1_ --------GAAATTTTCT----------------------TTTAGATACAAAAATAC--- DroSim_20040829_ ------------------------------------------------------------ DroVir_20041029_ CCCTATTCGCTCGGTTTCGGGCAACTGCCGTTGCACATTTATAACGTAAC----GAATGT DroYak_1_ ------TC-----GTTCTAT------TAGGTTCCACAAGGTTTTCATA-------TA--- DroAna_20041206_ ----------------------------------------TCTATTATT-------TCTA DroEre_20041028_ ----------------------------------CATAAT-------------------- DroMel_4_ ----------------------------------TATGATT----AATT-------CGTA DroMoj_20041206_ AAAACTTATG--------------------CGCGCATCAGTGCATACATACAAACATA-- DroPse_1_ ----------------------------------AAAGGATCGGT--TT-------TATC DroSim_20040829_ ------------------------------------------------------------ DroVir_20041029_ AAAACTCATGATGCGCATGCAGCACTAACACATGCATACATGCATACATACATACATATA DroYak_1_ ----------------------------------CATAGTTTGATAGTT-------TGTA Core Promoter Sequences Contribute to ovo-B Regulation in the Drosophila melanogaster Germline Beata Bielinska, Jining Lü, David Sturgill and Brian Oliver Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Department of Health and Human Services, Bethesda, Maryland 20892 Vol. 169, 161-172, January 2005

12 http://rana.lbl.gov/drosophila/

13 http://species.flybase.net/ Female Male Karyotype Differences among the Drosophila

14 Available Drosophila whole genome multiple alignments MAVID http://hanuman.math.berkeley.edu/kbrowser MULTIZ http://genome.ucsc.edu/ (currently no D. erecta )

15 DroAna_20041206_ CTGAAGGAAT-------TCTATATT---------AAAGAAGATTTCTCATCATTGGTTG DroEre_20041028_ CTGCGGGATTAGGGGTCATTGGTGT---------GCCAAAAGTCGC---------GTTT DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTT DroMoj_20041206_ CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA------- DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- DroSim_20040829_ CTGCGGGATTAGGAGTCATTAGAGT---------GCGGAAAAGCGG---------GTT-DroVir_20041029_ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA------- DroYak_1_ CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAATAGATC---------CTTT *** * * * DroAna_20041206_ AATC-----ACTTAC DroEre_20041028_ ACTTTATAGACTCAC DroMel_4_ ATTCTATGGACTCAC DroMoj_20041206_ ----TATTTACTCAC DroPse_1_ ------TGTACTTAC DroSim_20040829_ ATTCTATGGACTCAC DroVir_20041029_ ----TATTTACTCAC DroYak_1_ ATTTCATAAACTCAC *** ** N. Bray and L. Pachter, MAVID: Constrained ancestral alignment of multiple sequences, Genome Research 14 (2004) p 693--699 MAVID

16 DroAna_20041206_ CTGAAGGAAT-------TCTATATT---------AAAGAAGATTTCTCATCATTGGTTG DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTT DroMoj_20041206_ CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA------- DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- DroSim_20040829_ CTGCGGGATTAGGAGTCATTAGAGT---------GCGGAAAAGCGG---------GTT-DroVir_20041029_ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA------- DroYak_1_ CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAATAGATC---------CTTT *** * * * DroAna_20041206_ AATC-----ACTTAC DroMel_4_ ATTCTATGGACTCAC DroMoj_20041206_ ----TATTTACTCAC DroPse_1_ ------TGTACTTAC DroSim_20040829_ ATTCTATGGACTCAC DroVir_20041029_ ----TATTTACTCAC DroYak_1_ ATTTCATAAACTCAC *** ** N. Bray and L. Pachter, MAVID: Constrained ancestral alignment of multiple sequences, Genome Research 14 (2004) p 693--699 MAVID

17

18 droAna1.2448876 CTGAAGGAATTCTA--TATTAAAG----------------------------------- dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAA----AGCGAGT-TTATTC droMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA---------CACATAAA------CGTTTTAAATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGG----AGAGGCCATCATCG droSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG---------TGCGGAAA----AGCGGG--TTATTC droVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA---------TAAACAA--------TTCTCTAATTT droYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG---------TGAAGAAT----AGATCCT-TTATTT *** * * * * droAna1.2448876 AAGATTTCTCATCATTGGTTGAATC---------------------ACTTAC dm2.chr2L -----------------------------------------TATGGACTCAC droMoj1.contig_2959 -------------------------AAATATTT--------TATTGACTCAC dp3.chr4_group3 -----------------------------------------TGT--ACTTAC droSim1.chr2L -----------------------------------------TATGGACTCAC droVir1.scaffold_6 ---------------------------------AAATATTTGGTCCACTCAC droYak1.chr2L -----------------------------------------CATAAACTCAC *** ** Blanchette et al., Aligning multiple sequences with the threaded blockset aligner, Genome Research 14 (2004) p 708--715 MULTIZ

19

20 droAna1.2448876 CTGAAGGAATTCTA--TATTAAAG----------------------------------- dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAA----AGCGAGT-TTATTC droMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA---------CACATAAA------CGTTTTAAATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGG----AGAGGCCATCATCG droSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG---------TGCGGAAA----AGCGGG--TTATTC droVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA---------TAAACAA--------TTCTCTAATTT droYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG---------TGAAGAAT----AGATCCT-TTATTT *** * * * * droAna1.2448876 -----ACTTAC dm2.chr2L TATGGACTCAC droMoj1.contig_2959 TATTGACTCAC dp3.chr4_group3 TGT--ACTTAC droSim1.chr2L TATGGACTCAC droVir1.scaffold_6 GGTCCACTCAC droYak1.chr2L CATAAACTCAC *** **

21 DroAna_20041206_ CTGAAGGAAT-------TCTATATT---------AAAGAAGATTTCTCATCATTGGTTG DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTT DroMoj_20041206_ CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA------- DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- DroSim_20040829_ CTGCGGGATTAGGAGTCATTAGAGT---------GCGGAAAAGCGG---------GTT-DroVir_20041029_ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA------- DroYak_1_ CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAATAGATC---------CTTT *** * * * DroAna_20041206_ AATC-----ACTTAC DroMel_4_ ATTCTATGGACTCAC DroMoj_20041206_ ----TATTTACTCAC DroPse_1_ ------TGTACTTAC DroSim_20040829_ ATTCTATGGACTCAC DroVir_20041029_ ----TATTTACTCAC DroYak_1_ ATTTCATAAACTCAC *** **

22 droAna1.2448876 CTGAAGGAATTCTA--TATTAAAG----------------------------------- dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAA----AGCGAGT-TTATTC droMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA---------CACATAAA------CGTTTTAAATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGG----AGAGGCCATCATCG droSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG---------TGCGGAAA----AGCGGG--TTATTC droVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA---------TAAACAA--------TTCTCTAATTT droYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG---------TGAAGAAT----AGATCCT-TTATTT *** * * * * droAna1.2448876 AAGATTTCTCATCATTGGTTGAATC---------------------ACTTAC dm2.chr2L -----------------------------------------TATGGACTCAC droMoj1.contig_2959 -------------------------AAATATTT--------TATTGACTCAC dp3.chr4_group3 -----------------------------------------TGT--ACTTAC droSim1.chr2L -----------------------------------------TATGGACTCAC droVir1.scaffold_6 ---------------------------------AAATATTTGGTCCACTCAC droYak1.chr2L -----------------------------------------CATAAACTCAC *** **

23 droAna CTGAAGGAATT--CTATATTAAAGAAGATTTCTCATCATT-GGTTGAATCACTTAC---- droMel CTGCGGGATTAGGGGTCATTAGAGTGCCGAAAAGCGAGTTTATTCTATGGACTCAC---- droMoj CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAA-ATTCAAATATTTTATTGAC droPse CTGGAAGAGTT--TTGATTAGTAGGGGATCCATGGGGGCG-AGGAGAGGCCATCATCGTG droSim CTGCGGGATTAGGAGTCATTAGAGTGCGGAAAAGCGGGTT-ATTCTATGGACTCAC---- droVir CTGCAGCAGTTAAATA-ATTGTAATAAACAA--TTCTCTA-ATTTAAATATTTGGTCCAC droYak CTGCGGGATTAGCGGTCATTGGTGTGAAGAATAGATCCTTTATTTCATAAACTCAC---- *** * * * * * droAna ------- droMel ------- droMoj TCAC--- droPse TACTTAC droSim ------- droVir TCAC--- droYak ------- Higgins et al.,CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice, Nucleic Acids Research 22 (1994) p 4673--4680 CLUSTAL W

24 droAna1.2448876 CTGAAGGAATTCTA--TATTAAAG----------------------------------- dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAA----AGCGAGT-TTATTC droMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA---------CACATAAA------CGTTTTAAATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGG----AGAGGCCATCATCG droSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG---------TGCGGAAA----AGCGGG--TTATTC droVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA---------TAAACAA--------TTCTCTAATTT droYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG---------TGAAGAAT----AGATCCT-TTATTT *** * * * * droAna1.2448876 -----ACTTAC dm2.chr2L TATGGACTCAC droMoj1.contig_2959 TATTGACTCAC dp3.chr4_group3 TGT--ACTTAC droSim1.chr2L TATGGACTCAC droVir1.scaffold_6 GGTCCACTCAC droYak1.chr2L CATAAACTCAC *** **

25 dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGT-TTATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG dm2.chr2L TATGGACTCAC dp3.chr4_group3 TGT--ACTTAC >dm2.chr2L CTGCGGGATTAGGGGTCATTAGAGTGCCGAAAAGCGAGTTTATTCTATGGACTCAC >dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCGTGTACTTAC How is an alignment made from the sequences?

26 dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGT-TTATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG dm2.chr2L TATGGACTCAC dp3.chr4_group3 TGT--ACTTAC DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTT DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- DroMel_4_ ATTCTATGGACTCAC DroPse_1_ ------TGTACTTAC Each alignment can be summarized by counting the number of matches ( #M ), mismatches ( #X ), gaps ( #G ), and spaces ( #S ). #M=31, #X=22, #G=3, #S=12 #M=27, #X=18, #G=3, #S=28 2(#M+#X)+#S=n+m ( n,m length of seqs.) so #X,#G and #S suffice. This notation follows Chapter 7 (Parametric Sequence Alignment) by Colin Dewey and Kevin Woods in the new book Algebraic Statistics for Computational Biology (edited by L. Pachter and B. Sturmfels).

27 We can mark a point in space for every alignment… In the example of our two sequences there are 379522884096444556699773447791552717765633 different alignments, but only 53890 different summaries. So we don’t need to mark that many points. But 53890 is still quite a large number. Fortunately, there are only 69 vertices on the convex hull. That is something we can draw…

28 >mel CTGCGGGATTAGGGGTCATTAGAGTGCCGA AAAGCGAGTTTATTCTATGGAC >pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGA GGAGAGGCCATCATCGTGTAC For the sequences: 49 #x=24, #S=10, #G=2 There are eight alignments that have this summary. the polytope is:

29 mel CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC mel CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC mel CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC mel CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC mel CTGCGGGATTAGGGGTCATTAGA---------GTGCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC mel CTGCGGGATTAGGGGTCATTAGA---------GTGCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC mel CTGCGGGATTAGGGGTCATTAG---------AGTGCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC mel CTGCGGGATTAGGGGTCATTAG---------AGTGCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC

30 mel CTGCGGGATTAGGGGTCATTAGAGT===------===GCCGAAAAGCGAGTTTATTCTA=TGGAC pse CTGGAAGAGTTTTGATTAGTAG===GGGATCCATGGGGGCGAGGAGAGGCCATCATC==GTGTAC Consensus at a vertex

31 The vertices of the polytope have special significance. They correspond to optimal alignments, that is alignments that maximize X*(#X)+G*(#G)+S*(#S) For some choice of X,G and S. Example: the eight alignments summarized by are optimal for the parameters: M = 100, X = -100, S = -30, G = -400. 49 #x=24, #S=10, #G=2

32 (#X #S #G)[#alignments] 40 (15,16,16)[1080] 41 (17,30,2)[4] 42 (18,14,5)[4] 43 (18,16,4)[56] 44 (20,10,6)[16] 45 (20,10,7)[24] 46 (23,8,6)[6] 47 (23,8,8)[165] 48 (24,8,3)[38] 49 (24,10,2)[8] 50 (25,8,2)[24] 51 (25,62,3)[2] 52 (28,48,2)[1] 53 (29,8,1)[6] Finding the polytope is what we call parametric inference. Colin Dewey’s polytope propagation software can find the vertices in 20 seconds.

33 The MAVID, MULTIZ and CLUSTALW multiple alignments did not contain a D. melanogaster -- D. pseudoobscura pairwise alignment corresponding to a vertex on the polytope. Reasons: D. melanogaster and D. pseudoobscura are not neighbors on the tree and were therefore aligned during a heuristic “progressive alignment” step. The correct alignment requires a model that includes a parameter for the transition /transversion ratio.

34 Example where robust alignments are crucial: Transcription-associated mutational asymmetry in mammalian evolution Green et al. Nature Genetics, 33 (2003). Observation: A G > T C G A > C T in mammalian genes on the coding strand of transcribed regions. In fact, A G transitions were 58% more frequent than T C and G A transitions were 18% more frequent than C T. This is established by examining human-chimpanzee-baboon alignments (with baboon the outgroup): human chimp baboon Peter Huggins has confirmed that there is transcription-associated mutational asymmetry in Drosophila (ratios of 40%). D. mel D. sim D.yak

35 from Green et al. But the real problem is testing non-coding DNA…

36 >mel CTGCGGGATTAGGGGTCATTAGAGTGCCGA AAAGCGAGTTTATTCTATGGAC >pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGA GGAGAGGCCATCATCGTGTAC Associated to every pair of sequences is a polynomial built from the “summaries” of the alignments. 49 #x=24, #S=10, #G=2 corresponds to the monomial 8x 24 S 10 G 2 For example: How do we build the polytope for ?

37 NP i,j = S*NP i-1,j +S*NP i,j-1 +(X or M)*NP i-1,j-1 A A C A T T A G A AGATTACCACA Newton polytope for positions [1,i] and [1,j] in each sequence Convex hull of union Minkowski sum Polytope propagation

38 Next Steps Biology Align all introns and intergenic regions between all pairs of Drosophila species parametrically. This will result in thousands of polytopes.

39

40 Next Steps Biology Align all introns and intergenic regions between all pairs of Drosophila species parametrically. This will result in thousands of polytopes. Distinguish robust alignments that do not depend critically on changes in parameters, from unreliable alignments. Investigate biological questions parametrically. Mathematics Optimize polytope propagation, and investigate other fast methods for building alignment polytopes. Study the structure of alignment polytopes. Develop a parametric framework for multiple alignment.


Download ppt "Parametric Inference and Drosophila Alignments Female Male Karyotype A project to compare and contrast Drosophila."

Similar presentations


Ads by Google