Presentation on theme: "Molecular Genetic Methods in Psychology www.well.ox.ac.uk/~tprice/presentations.xml Tom Price."— Presentation transcript:
Molecular Genetic Methods in Psychology Tom Price
Recap: Heredity Heritable characteristics are influenced by genetic variation (Mendels pea plants) Traits are correlated within families (Galton) Twin and adoption studies provide evidence of heritability
How? Crick and Watson (1952) provided the mechanism. the single biggest advance in molecular biology
DNA DNA exists in the nucleus in twin strands Each strand consists of A, C, G, T bases on a sugar-phosphate backbone Each base binds only to its complement The sequence of bases along a strand is called the DNA sequence
DNA Replication During replication the DNA molecule unwinds, with each single strand becoming a template for synthesis of a new, complementary strand. Each daughter molecule, consisting of one old and one new DNA strand, is an exact copy of the parent molecule.
Transcription & Translation DNA is first transcribed (copied) to a molecule of messenger RNA in a process similar to DNA replication. The mRNA molecules then leave the cell nucleus and enter the cytoplasm to be translated into protein in the ribosomes. Triplets of bases (codons) in the mRNA form the genetic code that specify the particular amino acids that make up an individual protein.
Genes A gene is a region of DNA whose sequence encodes a protein. The human genome contains ~30,000 genes. Only about 10% of the genome is known to include the protein- coding sequences (exons) of genes. Start of transcription exons introns
Chromosomes Humans have diploid chromosomes: each contains 2 DNA molecules, one from each parent Humans have 23 autosomal chromosomes and 1 sex chromosome (XX for females, XY for males) The extra copy of chromosome 21 identifies this individual as having Down syndrome.
Genetic Variation Genetic variants (polymorphisms) arise by mutation, either spontaneously or from radiation, viruses, cancer, toxins… Mutations in coding regions can change the gene product (coding variations) – or not (silent mutations) Variations in non-coding regions can affect transcription (gene expression) Most variation occurs in junk DNA
Polymorphisms Deletion (e.g. Williams Syndrome) Polysomy (e.g. Down Syndrome) Variable-number repeat (e.g. Fragile X) Single-Nucleotide Polymorphism (e.g. FOXP2 mutation in KE family with severe speech disorder) Insertions, inversions, translocations…
Meiosis and Recombination During meiosis, the chromosomes duplicate, then cross over (recombine) to produce a haploid gamete (sperm/egg) The gamete derives genetic variants from both parents Meiosis is the basis for heredity Mother Egg Father Sperm Child Meiosis Fertilisation
Alleles and Genotype Alleles = the genetic variants that exist at a particular genetic location (locus) Genotype = the alleles present at a locus –cp. Phenotype = trait(s) of organism Homozygous = 2 of same allele Heterozygous = different alleles Allele frequency = % of allele in a population
How to Find A Gene Candidate genes: –You already have good reason to believe it is implicated. e.g. pharmacological evidence: dopamine transporter & receptor genes in ADHD Functional genes: –Candidate based on what it is known to do. e.g. expression patterns in relevant tissue. BUT ~15,000 genes expressed in the brain
Positional Cloning The identification of a gene based solely on its position in the genome Most widespread strategy in human genetics in the last 15 years Strengths: –No knowledge of gene product required –Very strong track record in single-gene disorders Weaknesses: –Understanding of function not a certain outcome –Poor track record with multifactorial traits
Sequencing of Human Genome Facilitates Positional Cloning Collins, F.S. Positional cloning moves from perditional to traditional, Nat Genet, 9: , 1995
Mendels Laws: I. Segregation There are two elements of heredity governing a trait in each individual, and these segregate (separate) during reproduction Alleles DominantRecessive
Mendelian Disorders Measured phenotype caused by a single gene –May have multiple mutations in gene –May be additional (environmental) causes Follow clear segregation in families Typically rare in population Examples –Duchenne Muscular Dystrophy –Cystic Fibrosis (1989) –Huntingdons Disease (1993) –~1200 have been mapped
Pedigree Analysis Genetic disorders, e.g. PKU caused by a recessive allele, have characteristic patterns of inheritance within families. above: autosomal dominant below: autosomal recessive
Mendels Laws: II. Independent Assortment Traits are inherited independently of each other. NB. This is law is violated for traits governed by genes close by on the same chromosome. Alleles of these linked loci will tend to co-segregate during recombination.
Linkage Only ~1 recombination per chromosome Loci that are close together on the same chromosome tend to be inherited together (linked or in LD) The closer the loci, the more linkage Degree of linkage is a measure of genetic distance Linkage is measured by the recombination fraction, θ = proportion of recombinants θ = 0: no linkage θ = 0.5: complete linkage
Recombinants & Nonrecombinants Grandchildren in generation III who received either A 1 B 1 or A 2 B 2 from their father are the product of nonrecombinant sperm; persons who received A 1 B 2 or A 2 B 1 are recombinant. Estimated recombination fraction = 2 / 7 = 0.28 We cannot classify any of the individuals in generations I and II as recombinant or nonrecombinant, or identify recombinants arising from oogenesis in individual II 2. Paternal alleles (where it can be worked out)
Markers A polymorphic marker locus can be informative about a disease locus over 10 6 base pairs away Originally, phenotypic markers used in place of genotype e.g. blood groups and APOe4 in Alzheimers Disease Sequencing of genome many markers The vast majority of markers have no effect on phenotype.
Genetic Linkage Trait co-segregates with marker allele within families Requirements: (i)Many families with trait of interest (ii)Informative markers
Linkage Analysis We do not usually have this much information to work out recombinants / nonrecombinants. If inheritance (e.g. dominant / recessive) is known, the likelihood of linkage can be calculated: LOD = log10  Paternal alleles (where it can be worked out) P( θ = 0.5 ) P( θ = 0 )
Single Gene Linkage Analysis
Nonparametric Linkage Analysis In practice, complex inheritance is the norm, and nonparametric linkage analysis, which does not require the genetic model to be specified, is most commonly used. A design employing affected sib pairs allows model-free analysis in nuclear families using programs like MAPMAKER/SIBS or GENEHUNTER. LOD > 3.3 generally accepted as threshold for genome-wide significance.
Netherton Syndrome Linkage Chavanas et al., Am J Hum Genet, 66: , 2000
Netherton Syndrome Haplotypes
Netherton Syndrome Gene Chavanas et al. 2000, Nature Genetics
Linkage: Success Stories Linkage analysis has been successfully used to map many single gene disorders, e.g. early-onset Alzheimers Disease, many forms of mental retardation
Linkage: Problems For complex traits, there have been many unreplicated findings True linkage is hard to find
Multifactorial (Complex) Traits No clear segregation pattern in families Caused by > 1 gene Possibly triggered / moderated by environment Each gene (environment) may have small effect Epistasis or intragenic interactions likely Pleiotropy, environmental influences, gene x environment interactions likely Epigenetic influences possible Measurement of phenotype not highly reliable Heterogeneity
Why such limited success with Complex Trait Linkage studies? Power –Power calculations have always indicated need for many 100s, probably thousands of families to detect genes of even moderate effect –N ~ 200 for most studies conducted to date –For QTL, this is about enough to detect a locus explaining 25% of the total variance in the trait Hope for low-hanging fruit –If there are one or a few monogenic-like loci within oligogenic spectrum, could lead to pathway information –Not supported by data. Practical problems: errors in data
A Link in the Chain Linkage analysis can do no more than point to broad regions – linkage hotspots – at best ~20cM, ~200 genes More powerful methods must be used to home in on the crucial gene.
The Next Link
(Allelic) Association Why? Markers remain in LD with the founding mutation over many generations Trait correlates with marker allele in population
Association = same ancestral origin Generation 1: a disease-causing mutation occurs on a chromosome Generation 2: about 50% of the children receive the mutation and the surrounding chromosomal segment from the mutated founder Generation 3: segments originating from the mutated founder chromosome get shorter … Generation N: very short segments around the mutated locus are conserved
Linkage: Allelic association within families
Allelic Association: Extension of linkage to the population For both families, the same marker is linked with the trait, but a different allele is implicated
Allelic Association: Extension of linkage to the population Trait is linked with the same marker in all families: Allele 6 is associated with trait.
Allelic Association Allele 6 is associated with disease
Allelic Association: Three Common Forms Direct Association –Mutant or susceptible polymorphism –Allele of interest is itself involved in phenotype Indirect Association –Allele itself is not involved, but a nearby correlated gene changes phenotype Spurious association –Apparent association not related to genetic aetiology –Including: Natural selection, statistical artifact, and population stratification (see later)
Indirect & Direct Allelic Association Direct Association Measure trait relevance (*) directly, ignoring correlated markers nearby Indirect Association & LD Assess trait effects on D via correlated markers (Mi) rather than susceptibility/etiologic variants. Linkage Disequilibrium: correlation between (any) markers in population Allelic Association: correlation between marker allele and trait
Population Stratification Recent admixture of populations Requirements: –Group differences in allele frequency –Group differences in outcome Leads to spurious association In epidemiology, this is a classic matching problem, with genetics as a confounding variable Most oft-cited reason for lack of association replication
Population Stratification Association induced by sample mixing
Population Stratification: Solutions Because of fear of stratification, complex trait genetics turned away from case/control studies 1.Family-based controls (e.g. TDT) 2.Genetic control: extra genotyping Look for evidence of background population substructure and account for it.
Linkage v. Association LinkageAssociation Requires familiesFamilies or unrelateds Matching/ethnicity generally unimportantMatching/ethnicity important Few markers for genome coverage ( STRs) Many markers for genome coverage (10 5 – 10 6 SNPs) Weak design (allele-sharing based on covariances) Powerful design (based on mean differences) Yields coarse locationYields fine-scale location Good for initial detection, poor for fine-mapping Good for fine-mapping, poor for initial detection Powerful for rare variantsPowerful for common variants, rare variants generally impossible
Association Study Outcomes Reported p-values from association studies in Am J Med Genet or Psychiatric Genet, 1997 Terwilliger & Weiss, Curr Opin Biotech, 9: , 1998
Why limited success with association studies? 1.Small sample sizes results overinterpreted 2.Phenotypes are complex. Candidate genes difficult to choose 3.Allelic/genotypic contributions are complex. Even true associations difficult to see. 4.Background patterns of LD are unknown. Difficult to appreciate signal when cant assess noise. 5.Spurious results due to population stratification
Effects of Linkage Disequilibrium Roses, Nature 2000
Alzheimers Disease Common Disease of old age: –Main cause of dementia in the elderly –4 th leading cause of death –Prevalence increases with age; much earlier onset in rare cases Progressive loss of memory, cognitive deterioration, and emotional disturbance Loss of neurons with many amyloid- containing plaques, neurofibrillary tangles
Genetic Epidemiology Early-onset disease is sometimes Mendelian and autosomal dominant. Standard lod score analysis in dominant early-onset families allowed mapping and subsequently cloning of three genes. Multicase late-onset families showed evidence of linkage to chromosome 19 when analyzed by the affected pedigree member method.
Apolipoprotein E 3 alleles: E2 (8%), E3 (77%), E4 (15%). Risk relative to E3/E3 at age 65+ –E3/E4:~3 –E4/E4:~14 Accounts for ~20% of susceptibility APOe risk associated with age of onset, clinical manifestations of AD, selective effect on episodic memory
Investigation of APOe Risk Mechanism currently not known Possible ethnic differences Genetic risk interacts with head injury, education, possibly nutrition (anti- oxidants?) Clinical trials of folic acid, statins, NSAIDs as protective factors.
AD & APOe Poster child for behavioural genetics? Or cautionary tale?
Further Reading Plomin R, DeFries JC, McClearn GE & McGuffin P. (2001). Behavioral Genetics (4 th ed.). Worth. Strachan T & Read AP (1999). Human Molecular Genetics. Bios. (look online) Lahiri DK, Sambamurti K & Bennett DA. Apolipoprotein gene and its interaction with the environmentally driven risk factors: molecular, genetic and epidemiological studies of Alzheimers disease. Neurobiology of Aging 25:651–660.