Download presentation
Presentation is loading. Please wait.
1
Gene and Genome Evolution
2
Model Organisms Most interesting experiments can’t morally be performed on humans, so we use model organisms as a stand-in. How similar they are to humans depends on the situation, and occasionally causes problems. Also, we would like to understand how all living things work, but finite resources lead us to concentrate on just a few organisms that are easy to work with. The main model organisms, widely used for many purposes: mice (Mus musculus), Drosophila melanogaster, Caenorhabditis elegans (C. elegans: nematode), Saccharomyces cerevesiae (yeast), Escherichia coli (E. coli: bacteria), Arabidopsis thaliana (plants). All of these have completely sequenced genomes from several different strains, as well as large collections of mutants and a way to transform them (i.e. insert DNA into their genomes), plus lots of knowledge about how to work with them. Other model organisms: rhesus monkeys, rats, Xenopus (Africa clawed frog), zebrafish, fugu, Schizosaccharomyces pombii, lots of others.
3
Escherichia coli E. coli is a Gram-negative rod-shaped bacterium that lives in the human gut. It has been an important lab organism since the beginning of molecular biology (1940 or so). Originally it was used as a way to grow the bacteriophage that early molecular biologists (notably Salvador Luria and Max Delbruck) wanted to study to determine what genes were and how they worked. It then became the organism of choice for studying gene expression, recombination, and many other fundamental genetic properties. E. coli grows quickly ( 20 minute doubling time) under easy lab conditions: aerobic, 37oC, with easy to make and cheap growth medium. It can be grown in liquid culture (mass quantities) or on Petri plates (single isolated cells). The main E. coli strain used, K12, is non-pathogenic and has lost the ability to grow in the human gut. Much biotechnology uses E. coli to grow cloned DNA segments, or uses enzymes derived from E. coli.
4
Saccharomyces cerevisiae
Saccharomyces cerevisiae is “yeast”. It is used to make alcohol from sugars: almost all beer, wine, and distilled spirits use S. cerevisiae in their production. It is also the yeast used to make bread rise, by producing bubbles of carbon dioxide that get trapped by the gluten proteins in the bread dough. S. cerevisiae is a eukaryote, a member of the fungus kingdom. As such it is more closely related to humans than plants are. S. cerevisiae is single celled, and many of the microbiological techniques used to study E. coli and other bacteria can be used with it. Many processes basic to eukaryotes have been studied in yeast: control of the cell cycle, protein-protein interactions It can be grown as a haploid or as a diploid, which allows easy detection of mutants (as haploids) as well as the ability to maintain lethal mutations as diploids. There are deletion mutants covering most of the genome, and you can order knockout mutation strains for every gene. It can be propagated vegetatively, but it also undergoes sexual reproduction readily.
5
Caenorhabditis elegans
C. elegans is a small, free-living nematode (also called roundworms). “worms” They are animals with nervous systems and all other typical animal tissues. Has 3 germ layers (endoderm, mesoderm, ectoderm) like humans. Started as a model organism in the 1960’s by Sydney Brenner, who had previously worked with bacteriophage. Has about 1000 cells, and every cell’s origin and fate is determined. Very unlike higher animals, where cells depend of external cues (like morphogen gradients) to determine how they should develop. They live on E. coli growing on Petri plates, and can be stored indefinitely by freezing them. RNA interference discovered in C. elegans, plus many studies of simple nervous systems and meiosis.
6
Drosophila melanogaster
Drosophila (“flies”) have been used in genetics research since the early 1900’s. Thomas Hunt Morgan started using them in 1910 at Columbia University, and they have remained popular ever since. As most students know, flies have a rapid life cycle, are easy to grow, and have many interesting morphological mutants. Much of genetics knowledge came from fly research. More recently, fundamental knowledge of development came from studying various Drosophila mutations. The salivary glands of the larvae have giant polytene chromosomes, which allowed specific genes to be located and gene activity to be detected: the polytene chromosomes puff out when active transcription is occurring.
7
Mus musculus The house mouse, a long term associate of humans, is considered vermin whose life is not worth protecting. This has made mice the lab animals of choice for a very long time. Use of mice in the lab is not regulated by the Animal Welfare Act. However, the National Institutes of Health have standards for mouse care that we at NIU follow. The use of mice in genetics started around 1909, when Clarence Cook Little produced the first inbred strains. Cook later founded the Jackson Laboratory in Bar Harbor Maine, which is the primary stock center for mouse genetics today. Mice provide the main mammalian model for humans in genetics and medicine. Unlike humans, mice can be made homozygous at almost all loci by inbreeding (brother-sister matings for many generations). Mouse genes can be manipulated in vitro and re-inserted in the genome of embryos to produce transgenic mice. Similar techniques allow any specific gene to be inactivated: knockout mice. The immune system is almost completely inactivated in the nude mouse. These mice can accept tissue transplants from humans, producing mice with a human immune response.
8
Arabidopsis thaliana Arabidopsis thaliana, is the primary model plant (angiosperm = flowering plant). Sometimes called thale cress, but mostly just known as Arabidopsis. Started in the 1980’s It has a very small genome: about 135 Mbp (million base pairs), as compared to humans (3000 Mbp) or even rice (430 Mbp) Arabidopsis is small and has a short generation time (6 weeks), which makes for easy genetics. Huge collection of mutant strains, easy to transform, large research community. Lots of work on basic plant development that has been easily transferred to crop species.
12
Genome Changes in Evolutionary Time
A basic principle: all current life on Earth arose from a single common ancestor, the Last Universal Common Ancestor (LUCA) There were certainly other living things before the time of LUCA, and after it as well Perhaps 3.5 billion years ago Must have had same DNARNAprotein that we have, plus several other features common to all living organisms today. Thus we want to explore the forces of mutation and selection that have converted LUCA into the diversity we see today. Some mechanisms we will discuss: Whole genome duplications Chromosomal rearrangements: translocations, inversions, transposon movements Gene family expansions Horizontal gene transfer Natural selection within genes and in regulatory regions
13
Evolution by Natural Selection
A fundamental principle: lots of mutations occur, but only a small number end up fixed (i.e. present in all individuals) within a species. Natural selection removes deleterious mutations. Some mutations are selectively neutral, neither selected for nor against. Their survival depends on random chance events. A simple way of looking at the effects of selection is to compare homologous genes (genes in different species that have the same function and are derived from a common ancestor) Two types of selection that can be detected when comparing homologous genes: most selection is negative or purifying selection. Most genes perform the same function in closely related species, and mutations that disrupt that function are eliminated. A few genes undergo positive selection. The homologous genes are evolving different functions, and so require different amino acid sequences.
14
Base Substitutions The simplest type of mutation is the base substitution, also called a point mutation or a single nucleotide polymorphism (SNP). One nucleotide has been substituted for another. Caused by tautomeric shifts, incorrect DNA repair, random events. Two basic types: transition: converting one purine to the other purine, or one pyrimidine into the other pyrimidine. transversion: converting a purine to a pyrimidine or the reverse. Logically, transversions should be twice as frequent since there are twice as many possible transversions as transitions. However, in practice, transitions are about twice as common as transversion. Due to a combination of natural selection and ease of occurrence. Neutral substitution rate: how often to nucleotides change in the absence of selection pressure. In a comparison of the human and mouse genomes, 165 Mbp of DNA associated with non-functional transposon sequences were identified in both species. These had about 67% identical bases, which implied a rate of 0.46 substitutions per position over the 75 million years since the human and mouse lineages diverged. This works out to 2 x 10-9 substitutions per year for each site, in the presumed absence of selection pressure. This estimate agrees with other estimates based on different methods.
15
Substitutions Within Genes
We mostly care about the functional parts of the genome, the genes and their control regions. Since most of the genes are presumably necessary for life, some mutations will be deleterious and others not. In the human-mouse genome comparison, variation in the rate of substitutions across the various portions of genes was clear: fewest in the exons, most in the introns, and an intermediate amount in the UTRs and flanking regions. For coding regions, the degeneracy of the genetic code has a large effect. some sites are non-degenerate: any change results in a different amino acid. Mostly in the first or second bases of codons. other sites are two-fold degenerate: transitions give the same amino acid while transversions give a different amino acid. other sites are four-fold degenerate: any mutation gives the same amino acid. These sites are all third positions of codons. Mutations that give the same amino acid are called silent or synonymous mutations. They are presumed to be selectively neutral.
16
More on Substitution In addition to synonymous mutations, some amino acid changes are conservative in that they have little or no affect on the protein’s function. for example, isoleucine and valine are both hydrophobic and readily substitute for each other. other amino acid substitutions are very unlikely: leucine (hydrophobic) for aspartic acid (hydrophilic and charged). This would be a non-conservative substitution. Some amino acids play unique roles: cysteines form disulfide bridges, prolines induce kinks in the chain, etc. However, some amino acids are critical for active sites and cannot be substituted. Tables of substitution frequencies for all pairs of amino acids have been generated. These are based on counts of homologous sequences that have been aligned. Just counts of changes along the whole length of the proteins, not accounting for active sites, etc. BLOSUM62 Table. Numbers on the diagonal indicate the likelihood of the amino acid staying the same. The off-diagonal numbers are relative substitution frequencies. Numbers greater than zero indicate that the change is seen more often than predicted by random chance; negative numbers imply that the substitution is less frequent than predicted by chance.
17
Short Indels Indel – insertion/deletion, a position in a protein or DNA sequence where one species has nucleotides or amino acids, and the other species doesn’t. Since we can’t usually tell whether one species had an insertion or the other species had a deletion, we just call it an indel. Short indels: 1-10 bp or so, are the second most frequent type of mutation seen (after base substitutions). In the human genome, the current estimate is that short indels occur at about 1/20 the frequency of base substitutions. The cause of short indels is slippage of DNA polymerase during the replication process. The sliding clamp mechanism keeps the polymerase bound to the DNA most of the time, but random events (like Brownian motion) can cause it to temporarily fall off. This can generate small indels. This is especially common in Simple Sequence Repeats (SSRs) in which a short (2-5 bp) sequence is repeated many times in a row.
18
Simple Sequence Repeats
Simple sequence repeats (SSRs) are found all over the genome. The first high quality human genetic maps were made using SSRs as loci. Realize that using visible mutant phenotypes or genetic diseases won’t work: no one has very many of them, and controlled genetic crosses aren’t possible. SSRs work well because the number of repeats at a given SSR locus is usually stable enough to be almost always inherited from parent to child, and because they are scattered throughout the genome. Since everyone has all of the markers, any mating will give informative results. Trinucleotide repeats (TNRs) are a type of SSRs that have an array of 3 bp repeats. Because a codon is 3 bp long, TNRs within a coding region don’t change the reading frame. However, some TNRs cause diseases even though they are in the UTRs. Below a certain number, the repeats are relatively stable. But, above that, the copy number can change drastically in both mitosis and meiosis due to DNA polymerase slippage. These alleles are called pre-mutation alleles. Above an even higher point, the mutant phenotype appears. SSR of the 3 base sequence CTT. Alleles A, B, and C differ in the number of CTT repeats present.
19
Huntington Disease Huntington Disease. A dominant autosomal disease, with most people heterozygotes. Caused by trinucleotide repeat mutations. Onset usually in middle age. Neurological: starts with irritability and depression, includes fidgety behavior and involuntary movement (chorea), followed by psychosis and death. Caused by CAG repeats within the coding region, giving a tract of glutamines. Below 28 copies is normal, between 28 and 34 copies is the premutation allele: normal phenotype but unstable copy number that puts the next generation at risk. Above 34 copies gives the disease. HD shows “anticipation”: the age of onset gets earlier with every generation. This is due to a direct correlation between copy number and age of onset. There is a genetic test for the disease, but in the absence of effective treatment few actually take the test. Function of the protein remains unknown, the excess glutamines cause it to aggregate and (probably) poison the nerve cells.
20
Comparative Genomics Start with 2 completely sequenced genomes. Find regions of sequence similarity (homology) using BLAST or some other alignment program. The basic principle of comparative genomics is that sequence conservation across species lines implies natural selection for that sequence. The sequence must be important, because it affects fitness. Mutations that alter the sequence mostly have negative effects and tend to be eliminated by natural selection. Some conserved regions are genes, while others are regulatory, or have functions we don't know yet. The further the evolutionary distance between two species is, the less sequence conservation. Amino acid sequences are preserved better than nucleotide sequences, mostly due to the degeneracy of the genetic code. Medicago genes
21
Dotplots In a dotplot, the chromosomal positions of one genome is on the x-axis, and the other genome is on the y-axis. Sequences that match are marked with a dot. A long diagonal line shows a region with significant cross- species homology Reverse diagonal lines indicate inversions between the species. On the left: 2 strains of E. coli are almost completely collinear. Above, human and mouse chromosomes show many scattered regions of homology.
22
Types of Sequences Conserved Between Species
Genes: if it looks like a gene (i.e. open reading frame) and is conserved between species, it probably is a gene. Conversely, ORFs that aren't conserved have often been shown to be random events, not part of a gene. Many RNA genes have been found because they are conserved between species Lots of conserved sequences between human and mouse that hasn't been assigned a definite function yet. 3.9% of the genome vs. 1.1% that is coding. Ultra-conserved elements: greater than 200 bp and 100% sequence identity between species. Originally found about 400 UCEs between human and mouse. But now, some have been found between Drosophila and humans, and sea urchins and humans, etc. They are often found near important genes: transcription factors, developmental regulators, ion channels. Probably involved in gene regulation, but still unclear. Some may be undetected RNA genes. Human-accelerated regions (HARs). A set of 49 regions that are conserved in vertebrate evolution but very different between humans and chimpanzees. Quite short: 140 bp average. Mostly not in genes. One well known one, HAR1, is an RNA gene. Others are enhancers of nearby gene activity. Many associated with neural development.
23
Genome Changes in Evolution
There are very few genes found in humans and nowhere else. Most of the differences between us and our closest relatives are changes in gene families, altered functions of existing genes, and changes in regulatory sequences. Human vs. chimpanzee: For sequences that can be aligned: 1.2% base substitutions, plus 3% differences in insertions and deletions (indels). There are fewer indels than base substitutions, but indels can cover many more bases. 1500 inversions, from very small (23 bp to 62 Mbp). 23 bp is at the detection limit for BLAST searches, and there are probably plenty of smaller inversions. Several hundred changes in gene family copy number Lots of changes in repeat sequences (3 x as many Alu elements in humans as in chimps) Loss of function in about 80 genes (half of which are olfactory receptors). About 29% of all proteins with clear orthologs are identical between humans and chimps, and most of the rest differ by only 1 or 2 amino acids.
24
Whole Genome Duplication
As the name implies, a whole genome duplication is an event where the genome size doubles, going from diploid to tetraploid. These events also require the chromosomes to pair up as if they were diploids during meiosis. Otherwise the organism would not produce offspring. Common in plants, but very rare in animals. Plants can undergo many generations of clonal (non-sexual) propagation. Two duplications in vertebrate lineage between when tunicates (urochordates) split from the rest of the chordates and when the cephalochordates (like Amphixous) split off. A third duplication in bony fish lineage, after they split from the tetrapod lineage. Maintaining a polyploid state occurs frequently in amphibians and reptiles, but it is thought that X chromosome inactivation and the problems of maintaining gene balance with 2 different sex chromosomes makes this very difficult in the mammals. The problem can be seen with the abnormalities associated with XXY and XO individuals: Klinefelter and Turner syndromes.
25
Diploidization After a genome duplication, most of the genes are duplicated. What follows is a period of diploidization, trying to regain the stable diploid state, during which many genes lose one or the other copy. The result is that most genes end up with just one copy. Some genes retain both copies, and often there will be a functional divergence: they take on different roles. Notably, the Hox genes have retained all 4 copies: there are 4 clusters on different chromosomes that are recognizably similar all the way from the coelocanths (cartiligenous fishes on the tetrapod side of the fish/tetrapod split) to humans.
26
Hox Genes Hox genes specify segment identity: different members of the cluster are expressed in different segments as you move from anterior to posterior. Hox genes make transcription factors. Order of expression on the chromosome is the same as order in the body. Same mechanism used in and all bilateran animals. First described and understood Drosophila. Conservation is enough that a Drosophila Hox gene works correctly when put into chickens. Hox genes contain a homeobox domain, which is also found in plants and serves a similar role in development.
27
Chromosome Rearrangements
When comparing mammalian genomes, it is clear that synteny is common: when two genes are neighbors in one species, they are usually neighbors in other species. However, comparing the genomes of two species show the results of multiple translocations and inversions. Blocks of syntenic genes are seen, but often spread across multiple chromosomes. Average size of synteny blocks between mouse and humans is 10 Mbp. Partly a consequence of the fact that genes on a chromosome mostly don’t interact with their neighbors. New centromeres often form in what was previously euchromatin. Centromere sequences evolve rapidly. The difference between human and chimp chromosomes (23 vs 24) is due to a translocation that connected the long arms of two ape chromosomes into a single human chromosome. Notable exception is the X chromosome: most X genes stay on the X over long evolutionary time. Problems with dosage compensation.
29
Gene Duplication in Gene Families
Tandem arrays of genes can very easily expand or contract their numbers. Unequal crossing over, as in the beta-globin genes Different gene families expand in different lineages: an expanded gene family is presumably doing something important for that lineage. Between humans and mice, there are about 15,000 genes that match 1-to-1 as homologs. However, there are another 5000 genes in gene families with very different copy numbers. Sometimes the main effect is simply to increase the amount of gene product. A good example: the salivary amylase genes (AMY1); amylase converts starch into sugar. Apes have only 1 amylase gene, but humans have multiple copies of the gene in a tandem array. Since the agricultural revolution (about 10,000 years ago), we eat much more starch than our hunter-gatherer ancestors and our ape cousins. The copy number of AMY1 genes is different in different populations, and it correlates with starch levels in the diet. We know it's a recent duplication because the different copies are all very similar: they have picked up very few random mutations, even at synonymous sites.
30
Amylase Gene Duplication
Copy number is roughly correlated with starch levels in the diet. The Hadza are hunter-gatherers in Tanzania who rely on starchy roots and tubers The BiAka and the Mbuti are hunter- gatherers in the African rain forest. The Datog are pastoralists (they herd cattle) in east Africa. The Yakut are hunters and fishermen from Siberia.
31
More on Gene Families When genes get duplicated, the two copies are referred to as paralogs. (Orthologs are the same gene in two different species.) Several possibilities for the newly formed paralogs: one copy gets inactivated by mutation and becomes a pseudogene one paralog evolves a new function while the other keeps the old function. This is called neofunctionalization. The two paralogs split the previous function: they get expressed in different tissues or different times in development. This is called subfunctionalization. Orthologs: the same gene in two different species. Paralogs: two genes in the same species derived from a common ancestral gene.
32
Globin Gene Evolution Start with ancestral globin gene, 800 million years ago. 3 single genes on different chromosomes, all of which work as monomers. Myoglobin carries oxygen in muscle cells; others have less well known functions. Two gene clusters, for alpha and beta globins. These work as tetramers in carry oxygen in the blood. Zeta and epsilon are active in the embryo, gamma-A and gamma-g in the fetus, and alpha, delta, and beta in the adult. Also several pseudogenes in the tandem clusters.
33
Horizontal Gene Transfer
Horizontal gene transfer: transfer of DNA between distantly related species. As opposed to vertical gene transfer: the normal method, genes transferred from parent to offspring. It’s quite unusual (but it does happen) in eukaryotes (at least, things like plants and animals), but a major issue in prokaryotes, where 10% or more of DNA in a species has been transferred in across large evolutionary distances. Prokaryotic sexual processes (conjugation, transduction, transformation) often work very well between species. Detected because a gene’s sequence resembles orthologs in very different species more than in closely related species. Different members of the same bacterial species often differ in 20% or more of their genes: genes present in one strain but absent in another. This makes the definition of “species” difficult in bacteria. The outer circle above is a comparison of 2 E. coli strains. Shared regions are in blue; red regions are found in strains EDL933 only, and yellow regions are in strain MG1655 only.
34
Transposon Insertions
At first glance, transposons seem to be intra-nuclear parasites, bent on increasing their copy number without helping the organism at all. This is the selfish DNA hypothesis. Viruses are another example of selfish DNA. Some closely related organisms differ widely in the number of transposable elements present in their genomes. Transposons can cause trouble by interrupting important genes, but they mostly have little effect. Arabidopsis has about 27,000 genes and 25 Mbp of transposon DNA; maize has about 40,000 genes and about 1800 Mbp of transposon DNA. Five dipterans (flies) showing differences in genome size, intron length, CDS length, and transposon (TE) numbers).
35
Transposon Insertions and Evolution
However, transposons can also affect gene regulation, altering the pattern of gene expression in different tissues. This is potentially a positive role: the raw material for natural selection. Also, non-autonomous DNA transposons consist of nothing but a pair of short inverted repeats that are recognized by transposase. Often, random pieces of genomic DNA are trapped between pairs of inverted repeats, and moved to new locations. Functional copies of LINE-1 elements, Alu sequences, and some endogenous retroviral sequences (LTR retrotransposons) exist in the human genome. They occasionally transpose into genes that give a detectable phenotype. The first examples found were two independent insertions of LINE-1 into exons of the clotting factor 8 gene. These events caused hemophilia: the inability for blood to clot. Transposable element movement has also been implicated in cancer and the chromosome rearrangements that accompany it. Recombination between Alu sequences in different parts of the genome can generate deletions and perform exon shuffling: the insertion of a new exon into a gene from a completely unrelated gene
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.