Presentation on theme: "Genomics in Tree Breeding and Forest Ecosystem Management"— Presentation transcript:
1 Genomics in Tree Breeding and Forest Ecosystem Management -----Module 2 – Genes, Genomes and MendelWe begin instruction with a review of fundamental genetic properties and definitions as they pertain to virtually all living things. The goal of this module, and the two or three that follow, is to provide a quick refresher of topics most of you will have studied previously. In so doing, we hope to bring focus to the genetic basis of marker informed breeding.
2 Quick review: Genes and genomes In eukaryotes, DNA is found in the...NucleusMitochondriaChloroplasts (plants)Organelle inheritance is often uniparental, making it powerful for certain types of applicationsWhile chromosomes were proven to be the “hereditary units” in 1903, it was not until 1944 that DNA was shown to be the “genetic material”. In higher organisms, DNA is contained largely in chromosomes within the nucleus of each cell, though organelles such as mitochondria and chloroplasts house their own small amount of DNA with a limited number of genes encoded. Nuclear DNA is inherited in a Mendelian manner, while organelle DNA is typically passed on uniparentally. In general, mitochondria in plants and animals is inherited from the mother only. In most conifers, the chloroplast DNA is inherited from the father. While organelle DNA offers many interesting and powerful applications, the following modules will focus primarily on nuclear DNA and Mendel's laws of inheritance.Image credit:
3 ChromosomesLinear strands of DNA and associated proteins in the nucleus of eukaryotic cellsChromosomes carry the genes and function in the transmission of hereditary informationDiploid cells have two copies of each chromosomeOne copy comes from each parentPaternal and maternal chromosomes may have different allelesWalter Flemming, a German biologist and father of cytogenetics, first identified chromosomes in 1878 and described the process of mitosis, but, unaware of Mendel’s work 13 years earlier, never made the connection with inheritance. It was not until 1910 that genes were defined to lie on chromosomes. Germ cells such as sperm or ova carry but one copy of each chromosome in an organism’s chromosomal complement. They are haploid cells. All other cells are diploid and carry two copies of each chromosome, one originating from each parent. For any given gene on a chromosome, an individual may sport two copies of the same allele or one copy each of different alleles, inherited from the parents. The figure on this slide shows a complete chromosomal complement of a human genome which has 23 pairs, 22 of which are autosomal and one is a pair of sex chromosomes (X and Y). Most conifers have 12 sets of chromosomes. Do plants have sex chromosomes?Image Credit: Jane Ades -
4 Genes Units of information on heritable traits In eukaryotes, genes are distributed along chromosomesEach gene has a particular physical location: a locusGenes encompass regulatory switches and include both coding and non-coding regionsGenes are separated by intergenic regions whose function is not understoodMendel is credited with the discovery of genes, which he described as “hereditary factors” in Thomas Hunt Morgan, working with fruit flies, developed the gene theory in 1910, which included the principle of linkage and distribution of genes along the chromosomes. In 1950, Erwin Chargaff identified the one-to-one relationship DNA bases , followed soon thereafter by Watson and Crick’s discovery of the 3-dimensional structure of DNA. But it wasn’t until 1966 that the genetic code was cracked. Much has been learned about gene structure in the intervening years, spurred by Sanger’s 1977 development of methods to sequence the order of nucleotides in stretches of DNA.Image Credit: National Human Genome Research Initiative, Darryl Leja
5 The central dogma of molecular biology Much of what we know and understand about the relationship between genes and physical expression of traits is based on the Central Dogma of Molecular Biology. The Central Dogma was first described by Francis Crick in 1958, though it required the cracking of the genetic code by Nirenberg and Khorana in 1966 to fully describe the process. The Central Dogma defines how information encoded in DNA is transcribed into messenger RNA and subsequently translated into protein chains (polypeptides). In eukarotic cells the primary transcript (pre-mRNA) is often processed further via alternative splicing. In this process, blocks of mRNA are cut out and rearranged, to produce different arrangements of the original sequence which explains the increase in size of the transcriptome relative to the genome as noted in the figure above. The Central Dogma was preceded by the One Gene One Enzyme concept advanced by Beadle and Tatum in Eventually it was recognized that the one gene one enzyme concept was too simplistic to stand, though it remains a reasonable model for instructional purposes.University of Georgia
6 Alleles Alternative forms of a gene A diploid cell has two copies of each gene (i.e. two alleles) at each locusNew alleles arise through mutationAlleles on homologous chromosomes may be the same or different (homozygous vs. heterozygous)Homologous chromosomes from male and female parents are seldom, if ever, identical with respect to nucleotide sequence. Mutations, which accumulate through time, occur throughout the chromosome, though it is now known that they occur more frequently in some areas (hot spots) than others. Mutations in DNA that code for genes define alternative forms, or alleles, of that gene. Any diploid cell has two copies of each gene. If they are identical to one another, they are said to be homozygous. If they differ in sequence, they are said to be heterozygous. In a population of individuals there may be many allelic forms for a given gene.Image Credit: Megan McKenzie-Conca, Oregon State University
7 Markers reflect genetic polymorphisms that are inherited in a Mendelian fashion DNA markers 'mark' locations where DNA sequence varies (2 or more alleles)Such polymorphisms can vary within and among individuals (e.g. heterozygotes vs. homozygotes) and populationsMarkers may be located in genes or elsewhere in the genomeHistorically, we've had too few markers to inform breedingGenomics tools provide an almost unlimited supply of markersGenetic markers are detectable sequence variants or “polymorphisms” that are inherited in a Mendelian fashion. In population genetics theory, a locus is considered polymorphic only if the frequency of the most common allele is less than some arbitrarily defined upper limit, like 90 or 95%. In practice, polymorphisms may have value as markers regardless of the frequency of the alleles. It all depends on the marker application. Markers occur throughout the genome, both within and outside DNA sequence that codes for genes. Intuitively, markers found within genes have potentially greater utility for evolutionarily relevant applications. In forest trees, geneticists have been historically constrained from doing “genetics” because of the lack of markers. The development of electrophoretic techniques in the 1970’s opened new horizons as dozens of allozyme markers (protein variants) became available. Today, we have a virtually unlimited supply of genetic markers to work with thanks to new sequencing and genotyping technologies and a constantly improving set of software for handling and interpreting the wealth of sequence information available.
8 Mutations may take many forms Some simple, single nucleotide mutations can totally alter a protein product by producing a frameshift. This results in a new amino acid being producedInsertions and deletions: The addition or loss of one or more nucleotide(s) in coding sequenceMutations are the source of all genetic variation, the basis upon which the science of genetics was developed, the reagents of evolutionary change, and stuff of which markers are derived. They come in many different forms, some simple and others quite complex. Simple insertion or deletion of a single nucleotide within a gene coding region of DNA can result in a non-functional gene product. Obviously, such a mutation would have little selective value and would no doubt be quickly lost, though one such rather famous mutation is known to exist in loblolly pine. It occurred in a gene that codes for a protein in the lignin biosynthetic pathway. A marker developed for this mutation has been used in an operational tree breeding application. This mutation, found in only one parent tree out of thousands, seems to confer more than 5% increase in pulp yield and as much as 10% increase in growth.Image Credit: Genes VII: B. Lewin. Oxford University Press, 2000
9 Single Nucleotide Polymorphisms (SNPs) embedded within a DNA sequence DNA sequences are alignedPolymorphic sites are identifiedHaplotypes (closely linked markers of a specific configuration) are deduced by direct observation or statistical inference* * *atggctacctgaactggtcaactcatcgaaagctaa 1atgcctacctgaactggtcaactcatcgaaagctaa 2atgcctacctgaactggtcaactcatcgaaggctaa 3atgcctacctgaactggtcaacacatcgaaggctaa 4The most common form of mutation is the SNP, or single nucleotide polymorphism. Unlike the insertion or deletion mutation, the SNP results in “replacement” of one nucleotide by another. If this occurs in the coding region of a gene, it may lead to the change in one amino acid in a protein sequence and affect the functionality of that protein. Since the genetic code is redundant or third base degenerate, not all SNPs result in protein change. Though SNPs are common, SNP abundance varies widely among organisms and among genes or regions of the genome. In the figure noted here, DNA sequence is represented by letters that designate the 4 types of nucleotides: a for adenine, c for cytosine, g for quanine, and t for thymine. Note there are 4 distinct haplotypes illustrated in the figure on this page.Image Credit: David Harry, Oregon State University9
10 Single Nucleotide Polymorphism (SNP) This slide is simply intended to reinforce some key concepts. For any given location in a DNA sequence there may exist up to four alternative nucleotides (A,C,G, and T), though we typically only see two represented at a polymorphic site. An individual may be homozygous for either nucleotide (or allele) or carry both, the heterozygous condition. A site with two or more alleles is considered polymorphic and a site that is invariant is considered monomorphic.Image Credit: Glenn Howe, Oregon State University
11 The genome An individual’s complete genetic complement For eukaryotes, a haploid set of chromosomesFor bacteria, often a single chromosomeFor viruses, one or a few DNA or RNA moleculesGenome size is typically reported as the number of base pairs (nucleotide pairs) in one genome complement (i.e. haploid for eukaryotes)Until recently, we studied genes and alleles one or a few at a time (genetics)Aided by high throughput technologies we can now study virtually all genes simultaneously (genomics)Historically, geneticists were content to work on one or two genes at a time, in model organisms like Neurospora (bread mold), C. elegans (nematode), Drosophila (fruit fly), and Mus (house mouse). Today, there are fewer restrictions on how many genes we study and which organisms we choose to work with. The concept of model species has less relevance. The science of genomics is simply the study of many to all genes in an organism at one time.
12 Genome sizeHow large is a typical genome? There is no simple answer of course, for organisms vary widely in genome size. Arabidopsis, the tiny model species in the mustard family was the first plant to have a fully sequenced genome. It sports a genome of 160 million bases. Poplar, the first tree sequenced, has about 480 million bases in its genome. The corn genome has nearly 2.5 billion bases, and humans around 3 billion. Genome size for conifers is substantially greater. In the figure above, genome size is given for 181 gymnosperms (mostly conifers). They vary in size from 6 to well over 30 billion bases. Some members of the lily family exceed 100 billion bases in size. Explanations for why genome size varies as it does for individual organisms are many and often speculative.Image Credit: Daniel Peterson, Mississippi State University (modified)
13 How large are genes?The longest human gene is 2,220,223 nucleotides long. It has 79 exons, with a total of only 11,058 nucleotides, which specify the sequence of the 3,685 amino acids and codes for a protein dystrophin. It is part of a protein complex located in the cell membrane, which transfers the force generated by the actin-myosin structure inside the muscle fiber to the entire fiberThe smallest human gene is 252 nucleotides long. It specifies a polypeptide of 67 amino acids and codes for an insulin-like growth factor IIThe longest bacterial gene is 110,418 nucleotides long, which specifies the sequence of 36,805 amino acids. Its function is unknown, most likely a surface protein. The smallest bacterial gene is 54 nucleotides long. It specifies a polypeptide of 17 amino acids, and codes for a regulatory protein in cyanobacteria. The average gene in a tree is probably on the order of 2,500 to 5,000 nucleotides in length.
14 How many genes are there in a genome? Estimates for the number of genes found in a genome vary widely and reflect the state of the art in gene detection of fully–sequenced organisms. Not long ago, the estimate for the number of genes in humans was nearly quadruple that shown here (Levin, 2000). It was rather humbling to find there may only be as many genes in humans as there is in a diminutive mustard plant that thrives for a few months and then disappears. Conifers, which are ancestral to flowering plants, are believed to have as many as 50,000 genes but that is only a guess. Until one is fully sequenced and studied, we will not know for certain.
15 Why so much DNA? Genome size / Gene number do not add up Duplicated genesRepetitive elementsRegulatory elements(Other?)Clearly, the size of a genome does not necessarily reflect the number of genes in an organism. Called the C-value paradox, it is now hypothesized that genome size variation arises from lineage-specific differences in the relative rates of expansion of repetitive sequences and deletion of nearly neutral sequences. Repetitive sequences come in many forms like transposons, pseudogenes, segmental duplications, microsatellites, and heterochromatin.(Gibson and Muse, 2004)
16 The central dogma of molecular biology To close this bit of introductory material we take an alternative view of the Central Dogma to review, and expand on, some of the concepts covered so far. We begin at the top, with DNA. A locus or gene is denoted as occurring within a stretch of non-coding, repetitive DNA, the likes of which make up much of the genome for many higher plants. The gene is composed of exons and introns, is preceded by a promoter or regulatory region, and possesses a termination sequence. Through the process of transcription, facilitated by the enzyme RNA polymerase, one to many mRNA molecules are formed. In this figure, a short stretch of the mRNA is depicted as an EST, or Expressed Sequence Tag. We will discuss ESTs more in future modules. Without a genome sequence, the EST has been the preferred approach to learning about the expressed genome or gene space of many organisms. The mRNA is subsequently translated into protein, facilitated by a number of other RNA molecules. Using an enzyme called reverse transcriptase, mRNA can also be used as a template to produce a complementary or cDNA molecule that contains only the exons of the original gene.Figure Credit: Modified from Randy Johnson, USFS
17 Applied geneticsApplying genomics to breeding and resource management requires an introduction to sub-disciplines of genetics…Mendelian genetics describes inheritance from parents to offspringDiscrete qualitative traits (including genetic markers)Predicts frequencies of offspring given specific matingsPopulation genetics describes allele and genotype frequencies over space and time, includingChanges in allele frequencies between generationsEnvironmental factors contributing to fitnessModels are limited to a small number of genesAnalyzes variation within and among populationsQuantitative genetics describes variation in traits influenced by multiple genes (continuous rather than discrete attributes)Relies on statistical tools describing correlations among relativesMany genes, each with a small effect, influence a specific traitWe will finish this module with a brief review of some Mendelian concepts. In subsequent modules we will introduce population and quantitative genetics concepts and attempt to point out how they are relevant to our long-term goal of applying genomics to plant breeding and gene resource management at the landscape level.
18 Mendelian genetics (and the basis of genetic markers)
19 Mendelian inheritance How do we explain family resemblance?Image Credit:In the early 1800’s, plant and animal breeders were successful in altering breeds through artificial selection. Indeed, this provided Darwin with the foundation for his theory of “natural selection”. Though familial resemblance was well known, it was not clear what the basis of it was. The concept of blending inheritance of traits between parents was generally accepted. This no doubt resulted from a number of factors. Presumably, most traits of interest were controlled by many genes and in those instances where simply inherited traits were observed, breeders did not carefully track traits past one generation or, as it is known, the F1.
20 Mendel’s experimentsSelected 14 garden pea varieties to conduct breeding trialsThese represented alternative forms of seven distinct traitsCross bred alternative forms, followed by inbreeding the progeny of those crosses (F2)Born to a poor farming family in 1822, Mendel joined a monastery so that he may obtain an education. He started his breeding experiments in 1857 in the monastery gardens at Brunn. He apparently selected the garden pea (Pisum sativum) carefully, since seedsmen had collected many known varieties and the plant was easy to cross, though it is a naturally self fertile plant. His success, ultimately, was a function of many factors: a) he made a series of crosses between varieties that possessed alternative forms of a given character, b) he subsequently selfed the progeny of the first generation crosses to obtain a second generation of progeny, c) he carefully kept track of all seed produced. His success was in part facilitated by the fortuitous genetics of the traits he chose. Not only were they all controlled by a single gene, or as he was to define it, a single “factor”, but all traits were expressed as dominant recessive pairs of alleles. This led to his consistently finding 3:1 ratios in the F2 progeny.Image Credit: Strickberger, Genetics Fig. 6.1
21 SegregationSegregation – Members of an allelic pair separate cleanly from each other in germ cellsIn addition to defining the basis of heredity as due to “factors” or genes, Mendel defined two other important genetic concepts, segregation and independent assortment. Segregation refers to the behavior of alleles at a single locus. During meiosis or the production of germ cells, alleles separate cleanly and are expected to be represented equally in the next generation. Exceptions to this 1:1 expectation occur when one of the alleles may confer a significant decline in fitness (such as lethal or semi-lethal effects). This results in something called segregation distortion.Image Credit: Strickberger, Genetics Fig. 6.2
22 Independent assortment Independent assortment – Members of different pairs of alleles assort independently of each otherExceptions:When loci are linkedWhen loci are in linkage disequilibriumMendel also made crosses in which two traits were segregating simultaneously and consistently observed the ratio of combined traits to meet the predicted 9:3:3:1. He correctly concluded that members of different pairs of alleles assort independently of each other. Here again Mendel was rather lucky. The garden pea has 7 sets of chromosomes. The seven traits he selected to work with not only were controlled by single genes, but those genes occurred on separate chromosomes. Of course, there are important exceptions to the rule of independent assortment. When genes occur in close proximity to each other on the same chromosome they tend to be genetically “linked”. A separate, though not entirely unrelated condition, called linkage disequilibrium may also violate the rule of independent assortment. We will discuss each in much greater detail in later modules as they are the foundation for things like genetic maps and association genetics, respectively.Image Credit: Strickberger. Genetics, Fig. 7.1
23 Genetic linkage and recombination Genes on different chromosomes are inherited independentlyGenes located on the same chromosome tend to be inherited together because they are physically linked – except that widely separated genes behave as if they are unlinkedRecombination during gametogenesis breaks up parental configurations into new (recombinant) classesThe relative frequency of parental and recombinant gametes reflects the degree of genetic linkageGenetic mapping is the process of determining the order and relative distance between genes or markersGenes located on the same chromosome tend to be inherited together simply because they are physically linked. If the integrity of chromosomes could be retained through generations, large blocks of genes would remain perpetually linked through time. Such is not the case of course. It is common for chromosomes, during germ cell production (gametogenesis), to undergo recombination between homologous pairs though the process called crossing over.
24 Parental and recombinant chromosomes As depicted in this figure, crossing over results in the exchange of materials between homologous chromosomes. The colors here reflect different parental contributions while the upper/lower case letters represent alternative allelic forms for a gene or marker. In this cartoon we can deduce that a cross over has occurred because recombinant products were produced and tracked with markers. If these were the only markers available, we would have no idea how large the cross over section actually was. All we would know is that the event occurred somewhere between the two markers. The more markers available, the better we could define genetic proximity between them.24
25 Markers track inheritance Markers allow for tracking of inheritanceLinkage between markers allows for the creation of genetic mapsAssociation between markers and phenotypic traits, observed within crosses, allows for mapping of QTLSo markers help track inheritance of traits. In Mendel’s case, he used morphological markers that were simply inherited and reflected segregating alleles at independent loci. It is possible to identify markers that have 4 or more allelic forms in a population such that all four chromosomes in a parental pair can be uniquely identified and followed through generations (as shown in the top figure). By observing the ratios of parental to recombinant marker classes we can create genetic maps, as indicated in the lower figure, where markers are arrayed on different linkage groups or chromosomes. By measuring morphological or “phenotypic” traits on the same progeny used for genetic mapping, we can test for association between marker alleles and trait conditions. This can result in identifying chromosomal regions where genetic factors, controlling the traits of interest, reside as shown by the pink lines and colored segments on Chr 1 and Chr 4.Figure Credit: David Harry, Oregon State University25
26 Linkage disequilibrium The correlation of alleles at different lociLD is a measure of non-random association among alleles – it describes the extent to which the presence of an allele at one locus predicts the presence of a specific allele at a second locusThough partly a function of physical distance, LD has several other causes including recombination frequency, inbreeding, selection, etcLD is the basis upon which association genetics functionsA more complex concept is that of linkage disequilibrium or LD. As noted here, LD, which is a measure of the non-random association among alleles at different loci, may result from many factors. Since LD is central to the application of association genetics, it will be discussed in much greater detail in Module 10.
27 Genotype and phenotype Genotype refers to the particular gene or genes an individual carriesPhenotype refers to an individual’s observable traitsOnly rarely can we determine genotype by observing phenotypeGenomics offers tools to better understand the relationship between genotype and phenotype (genetic markers in abundance)Before leaving this module we need to introduce a few more commonly used terms that can occasionally create confusion depending on the context of their usage. The term genotype can refer to the genetic condition of an individual at a single locus (for instance AA vs aa), at multiple loci (for instance AaBB vs aaBb), or simply as the individual itself. Phenotype refers to traits we can see and measure. They may be morphological, physiological, or even metabolic in nature. The traits may be binomial (like those chosen by Mendel) or metrical, varying continuously. Conditions like those experienced by Mendel, where genotype was reflected by phenotype (at least in part), are rare, especially in trees. The development of abundant genetic markers for many organisms is opening the door for better understanding the relationship between genotype and phenotype.
28 Single-gene traits in trees are rare… Here’s one in alder (F Single-gene traits in trees are rare… Here’s one in alder (F. pinnatisecta)In the year 1900 Mende’ls principles were independently rediscovered by no fewer than three scientists: Correns, de Vries, and von Tschermak. This marked the beginning of modern genetics. For years, doing Mendelian genetics in trees was hindered by the lack of single-gene traits. One of the few observed, illustrated here, is a leaf variant. Crosses between normal and deeply dissected leaves of alders result in normal looking leaves. The subsequent cross among the progeny of the first generation result in a 3:1 segregation of leaf types, just as Mendel predicted with peas. Indeed, genetic principles were found by scientists to hold with virtually all crops and livestock.Image Credit: David Harry, Oregon State University28
29 Traits run in families but are not typically simply inherited Virtually all traits of interest in forest trees, such as growth or adaptation to environmental variables like cold or drought, are controlled by many genes, each with a small effect. For example, consider cone shape and size in knobcone pine, as illustrated in this slide. Clearly there is a genetic component to cone appearance as reflected by cone differences and similarities among and within sibs of two families. No single gene is likely to control all facets of cone morphology like length, girth, number of cone scales, umbo swelling, and prickle development. As we shall see however, markers may be used to dissect the inheritance of each of these traits independently. It is reasonable to believe that we will someday be able to identify most of the genes controlling traits of interest and determine their metabolic functions.Image Credit: David Harry, Oregon State University
30 Landscape genomicsHopefully this module has provided a succinct refresher of concepts and ideas you have learned in the past, and in so doing, drawn attention to the elements that are key to understanding how knowledge of genetics can be used in an applied sense. Though discussion has largely focused on genetic applications in plant breeding, the foundations have also been set for our discussions of landscape genomics and gene resource management that will follow. Regardless of application, we shall see the importance of understanding a common set of principles dealing with the genetics of populations, the topic of the next module.Image Credit: Nicholas Wheeler, Oregon State University
31 References cited in this module Lewin, B Genes VII. Oxford University Press, New York.Gibson, G., and S. V. Muse A primer of genome science. Sinauer Associates, Sunderland, MA.Strickberger, M. W Genetics 2nd edition. Macmillan Publishing, New York.