Human Gene Mapping & Disease Gene Identification Cont.

MAPPING HUMAN GENES BY LINKAGE ANALYSIS Determining Whether Two Loci Are Linked Linkage analysis is a method of mapping genes that uses family studies to determine whether two genes show linkage (are linked) when passed on from one generation to the next. To decide whether two loci are linked, and if so, how close or far apart they are, we rely on two pieces of information. First, we ascertain whether the recombination fraction between two loci deviates significantly from 0.5 Second, if θ is less than 0.5, we need to make the best estimate we can of θ since that will tell us how close or far apart the linked loci are.

 For these determinations, a statistical tool called the likelihood ratio is used.  Likelihoods are probability values; odds are ratios of likelihoods.  One proceeds as follows: ◦ examine a set of actual family data, count the number of children who show or do not show recombination between the loci, ◦ calculate the likelihood of observing the data at various possible values of θ between 0 and 0.5. ◦ Calculate a second likelihood based on the null hypothesis that the two loci are unlinked, that is, θ= 0.50. ◦ take the ratio of the likelihood of observing the family data for various values of θ to the likelihood the loci are unlinked to create an odds ratio.

The computed odds ratios for different values of θ are usually expressed as the log 10 of this ratio and are called a LOD score (Z) for "logarithm of the odds."

Model-Based Linkage Analysis of Mendelian Diseases  Linkage analysis is called model-based (or parametric) when it assumes that there is a particular mode of inheritance (autosomal dominant, autosomal recessive, or X-linked) that explains the inheritance pattern.  LOD score analysis allows mapping of genes in which mutations cause diseases that follow mendelian inheritance.  The LOD score gives both: a best estimate of the recombination frequency, θ max, between a marker locus and the disease locus; and

 an assessment of how strong the evidence is for linkage at that value of θ max. Values of the LOD score above 3 are considered strong evidence.  Linkage at a particular θ max of a disease gene locus to a marker with known physical location implies that the disease gene locus must be near the marker

The odds ratio is important in two ways.  First, it provides a statistically valid method for using the family data to estimate the recombination frequency between the loci.  If θ max differs from 0.50, you have evidence of linkage. However, even if θ max is the best estimate you can make, how good an estimate is it? The odds ratio also provides you with an answer to this question because the higher the value of Z, the better an estimate θmax is.

 Positive values of Z (odds >1) at a given θ suggest that the two loci are linked, whereas negative values (odds <1) suggest that linkage is less likely than the possibility that the two loci are unlinked.  By convention, a combined LOD score of +3 or greater (equivalent to greater than 1000:1 odds in favor of linkage) is considered definitive evidence that two loci are linked.

 Mapping genes by linkage analysis provides an opportunity to localize medically relevant genes by following inheritance of the condition and the inheritance of alleles at polymorphic markers to see if the disease locus and the polymorphic marker locus are linked.  Return to the family shown in Figure 10-6. The mother has an autosomal dominant form of retinitis pigmentosa. She is also heterozygous for two loci on chromosome 7, one at 7p14 and one at the distal end of the long arm.Figure 10-6

 One can see that transmission of the RP mutant allele (D) invariably "follows" that of allele B at marker locus 2 from the first generation to the second generation in this family.  All three offspring with the disease (who therefore must have inherited their mother's mutant allele D at the RP locus) also inherited the B allele at marker locus 2. All the offspring who inherited their mother's normal allele, d, inherited the b allele and will not develop RP. The gene encoding RP, however, shows no tendency to follow the allele at marker locus 1.

Suppose we let θ be the "true" recombination fraction between RP and locus 2, the fraction we would see if we had unlimited numbers of offspring to test. Because either a recombination occurs or it does not, the probability of a recombination, θ, and the probability of no recombination must add up to 1. Therefore, the probability that no recombination will occur is 1 -θ. There are only six offspring, all of whom show no recombination. Because each meiosis is an independent event, one multiplies the probability of a recombination, θ, or of no recombination, (1 -θ ), for each child. The likelihood of seeing zero offspring with a recombination and six offspring with no recombination between RP and marker locus 2 is therefore (θ) 0 (1 -θ ) 6. The LOD score between RP and marker 2, then, is: The maximum value of Z is 1.81, which occurs when θ = 0, and is suggestive of but not definite evidence for linkage because Z is positive but less than 3.

 In the same way that each meiosis in a family that produces a nonrecombinant or recombinant offspring is an independent event, so too are the meioses that occur in other families.  We can therefore multiply the likelihoods in the numerators and denominators of each family's likelihood odds ratio together.  An equivalent but more convenient calculation is to add the log10 of each likelihood ratio, calculated at the various values of θ, to form an overall Z score for all families combined.

 In the case of RP in Figure 10-6, suppose two other families were studied and one showed no recombination between locus 2 and RP in four children and the third showed no recombination in five children. The individual LOD scores can be generated for each family and added together (Table 10-1). In this case, one could say that the RP gene in this group of families is linked to locus 2. Because the chromosomal location of polymorphic locus 2 was known to be at 7p14, the RP in this family can be mapped to the region around 7p14, which is near RP9, an already identified locus for one form of autosomal dominant RP.Figure 10-6Table 10-1

Table 10-1. LOD Score Table for Three Families with Retinitis Pigmentosa θ=0.4θ=0.3θ=0.2θ=0.1θ=0.05θ=0.01θ=0θ=0 0.480.881.221.531.671.781.8 Family 1 0.320.580.821.021.111.191.2 Family 2 0.390.731.021.281.391.481.5 Family 3 1.192.193.062.834.174.454.5 Total Zmax = 4.5 at θ max = 0

 If, however, some of the families being used for the study have RP due to mutations at another locus, the LOD scores between families will diverge, with some showing a trend to being positive at small values of θ and others showing strongly negative LOD scores at these values.  One can still add the Z scores together, but the result will show a sharp decline in the overall LOD score.  Thus, in linkage analysis involving more than one family, unsuspected locus heterogeneity can obscure what may be real evidence for linkage in a subset of families.

 Phase information is important in linkage analysis. Figure 10-14 shows two pedigrees of autosomal dominant neurofibromatosis, type 1 (NF1). Figure 10-14  In the three-generation family on the left, the affected mother, II-2, is heterozygous at both the NF1 locus (D/d) and a marker locus (M/m), but we have no genotype information on her parents.  Her unaffected husband, II-1, is homozygous both for the normal allele d at the NF1 locus and happens to be homozygous for allele M at the marker locus. He can only transmit to his offspring a chromosome that has the normal allele (d) and the M allele.

 By inspection, then, we can infer which alleles in each child have come from the mother. The two affected children received the m alleles along with the D disease allele, and the one unaffected has received the M allele along with the normal d allele.  Without knowing the phase of these alleles in the mother, either all three offspring are recombinants or all three are nonrecombinants.

Figure 10-14 Two pedigrees of autosomal dominant neurofibromatosis, type 1 (NF1). A, Phase of the disease allele D and marker alleles M and m in individual II-2 is unknown. B, Availability of genotype information for generation I allows a determination that the disease allele D and marker allele M are in coupling in individual II-2. NR, nonrecombinant; R, recombinant.

Which of these two possibilities is correct? There is no way to know for certain, and thus we must compare the likelihoods of the two possible results. Given that II-2 is an M/m heterozygote, we assume the correct phase on her two chromosomes is D-m and d-M half of the time and D-M and d-m the other half. If the phase of the disease allele is D-m, all three children have inherited a chromosome in which no recombination occurred between NF1 and the marker locus. If the probability of recombination between NF1 and the marker is θ, the probability of no recombination is (1 - θ), and the likelihood of having zero recombinant and three nonrecombinant chromosomes is θ 0 (1-θ) 3.

 The contribution to the total likelihood, assuming this phase is correct half the time, is 1/2 θ 0 (1-θ) 3.The other half of the time, however, the correct phase is D-M and d-m, which makes each of these three children recombinants; the likelihood, assuming this phase is correct half the time, is 1/2 θ 3 (1-θ) 0.  To calculate the overall likelihood of this pedigree, we add the likelihood calculated assuming one phase in the mother is correct to the likelihood calculated assuming the other phase is correct. Therefore, the overall likelihood = 1/2(1-θ) 3 + 1/2 (θ 3 )/1/8

By evaluating the relative odds for values of θ from 0 to 0.5, the maximum value of the LOD score, Zmax, is found to be log(4) = 0.602 (when θ = 0.0) Table 10-2. Because this is far short of a LOD score greater than 3, we would need at least five equivalent families to establish linkage (at θ = 0.0) between this marker locus and NF1. Table 10-2 With slightly more complex calculations (made much easier by computer programs written to facilitate linkage analysis), one can calculate the LOD scores for other values of θ (see Table 10-2).

Why are the two phases in individual II-2 in the pedigree shown in Figure 10-14A equally likely?Figure 10-14A  First, unless the marker locus and NF1 are so close together as to produce linkage disequilibrium between alleles at these loci, we would expect them to be in linkage equilibrium.  Second, new mutations represent a substantial fraction of all the alleles in an autosomal dominant disease with reduced fitness, such as NF1. If new mutations are occurring independently and repeatedly, the alleles that happened to be present at the neighboring linked loci when each mutation occurred in the NF1 gene will then be the alleles in coupling with the new disease mutation. A group of unrelated families are likely to have many different mutant alleles, each of which is as likely to be in coupling with one polymorphic marker allele at a linked locus as with any other.

 Suppose now that additional genotype information, shown in Figure 10-14B, becomes available in the family in Figure 10- 14A. By inspection, it is now clear that the maternal grandfather, I-1, must have transmitted both the NF1 allele (D) and the M allele to his daughter.Figure 10-14BFigure 10- 14A  This finding does not require any assumption about whether a crossover occurred in the grandfather's germline; all that matters is that we can be sure the paternally derived chromosome in individual II-2 must have been D-M and the maternally derived chromosome was d-m.

 The availability of genotypes in the first generation makes this a phase-known pedigree.  The three children can now be scored definitively as nonrecombinant and we do not have to consider the opposite phase.  The probability of having three children with the observed genotypes is now (1 -θ ) 3. As in the previous phase-unknown pedigree, the probability of the observed data if there is no linkage between the loci is (1/2) 3 = 1/8.  Overall, the relative odds for this pedigree are (1 - θ) 3 ÷ 1/8 in favor of linkage, and the maximum LOD score Z at θ= 0.0 is 0.903 or 8 to 1 (see Table 10-2).Table 10-2  Thus, the strength of the evidence supporting linkage (8 to 1) is twice as great in the phase- known situation as in the phase-unknown situation (4 to 1).

 As shown in the pedigree in Figure 10-14B, having grandparental genotypes may be helpful in establishing phase in the next generation.Figure 10-14B  However, depending on what the genotypes are, phase may not always be definitively determined. For example, if the grandmother, I-2, had been an M/m heterozygote, it would not be possible to determine the phase in the affected parent, individual II-2.

 For linkage analysis in X-linked pedigrees, the mother's father's genotype is particularly important because, as illustrated in Figure 10-15, it provides direct information on linkage phase in the mother.Figure 10-15  Because there can be no recombination between X- linked genes in a male and because the mother always receives her father's only X, any X-linked marker present in her genotype, but not in her father's, must have been inherited from her mother.  Knowledge of phase, so important for genetic counseling, can thus be readily ascertained from the appropriate male members of an X-linked pedigree, if they are available for study.

Pedigree of X-linked hemophilia. The affected grandfather in the first generation has the disease (mutant allele h) and is hemizygous for allele M at an X-linked locus. No matter how far apart the marker locus and the factor VIII gene are on the X, there is no recombination involving the X-linked portion of the X chromosome in a male, and he will pass the hemophilia mutation h and allele M together. The phase in his daughter must be that h and M are in coupling

 Two major approaches to locate and identify genes that predispose to complex disease or contribute to genetic variance of quantitative traits 1. Affected pedigree member method: if a region of genome is shared more frequently than expected by relatives concordant for a particular disease, the inference is alleles predispose to disease at one or more loci in that region. 2. Association: looks for increased frequency of particular alleles in affected compared with unaffected in the pop. MAPPING OF COMPLEX TRAITS

 Linkage analysis is called model-free (or nonparametric) when it does not assume any particular mode of inheritance (autosomal dominant, autosomal recessive, or X-linked) to explain the inheritance pattern.

 Nonparametric LOD (NPL) score analysis allows mapping of genes in which variants contribute to susceptibility for diseases (so-called qualitative traits) or to physiological measurements (known as quantitative traits) that do not follow a straightforward mendelian inheritance pattern. MAPPING OF COMPLEX TRAITS

 NPL scores are based on testing for excessive allele- sharing among relatives, such as pairs of siblings, who are both affected with a disease or who show greater similarity to each other for some quantitative trait compared with the average for the population. MAPPING OF COMPLEX TRAITS

 The NPL score gives an assessment of how strong the evidence is for increased allele sharing near polymorphic markers. A value of the NPL score greater than 3.6 is considered evidence for increased allele- sharing; an NPL score greater than 5.4 is considered strong evidence. MAPPING OF COMPLEX TRAITS

 One type of model-free analysis is the affected sibpair method.  Only siblings concordant for a disease are used  No assumptions need be made about the number of loci involved or the inheritance pattern.  Sibs are analyzed to determine whether there are loci at which affected sibpairs share alleles more frequently than the 50% expected by chance alone  In this method, DNA of affected sibs is systematically analyzed by use of hundreds of polymorphic markers throughout the entire genome in a search for regions that are shared by the two sibs significantly more frequently than is expected on a purely random basis.

 When elevated degrees of allele-sharing are found at a polymorphic marker, it suggests that a locus involved in the disease is located close to the marker.  Whether the degree of allele-sharing diverges significantly from the 50% expected by chance alone can be assessed by use of a maximum likelihood odds ratio to generate a nonparametric LOD score for excessive allele sharing.

 Model-free linkage methods based on allele- sharing can also be used to map loci involved in quantitative complex traits. Although a number of approaches are available, one interesting example is the highly discordant sibpair method.  Once again, no assumptions need be made about the number of loci involved or the inheritance pattern.  Sibpairs with values of a physiological measurement that are at opposite ends of the bell-shaped curve are considered discordant for that quantitative trait and can be assumed to be less likely to share alleles at loci that contribute to the trait.

 The DNA of highly discordant sibs is then systematically analyzed by use of polymorphic markers throughout the entire genome in a search for regions that are shared by the two sibs significantly less frequently than is expected on a purely random basis.  When reduced levels of allele-sharing are found at a polymorphic marker, it suggests that the marker is linked to a locus whose alleles contribute to whatever physiological measurement is under study.

 An entirely different approach to identification of the genetic contribution to complex disease relies on finding particular alleles that are associated with the disease.  The presence of a particular allele at a locus at increased or decreased frequency in affected individuals compared with controls is known as a disease association.  In an association study, the frequency of a particular allele (such as for an HLA haplotype or a particular SNP or SNP haplotype) is compared among affected and unaffected individuals in the population

totalcontrolpatients a+bbaWith allele c+ddcWithout allele b+da+ctotal Disease Association The RRR is approximately equal to the odds ratio when the disease is rare (i.e., a < b and c < d). (Do not confuse RRR (relative risk ratio) with λ r, the risk ratio in relatives. λ r is the prevalence of a particular disease phenotype in an affected individual's relatives versus that in the general population.)

 If the study design is a case-control study in which individuals with the disease are selected in the population, a matching group of controls without disease are then chosen, and the genotypes of individuals in the two groups are determined; an association between disease and genotype is then calculated by an odds ratio.  Odds are ratios. With use of the above table, the odds of an allele carrier's developing the disease is the number of allele carriers that develop the disease (a) divided by the number of allele carriers who do not develop the disease (b). Similarly, the odds of a noncarrier's developing the disease is the number of noncarriers who develop the disease (c) divided by the number of noncarriers who do not develop the disease (d). The disease odds ratio is then the ratio of these odds, that is, a ratio of ratios.

 If the study was designed as a cross-sectional or cohort study, in which a random sample of the entire population is chosen and then analyzed both for disease and for the presence of the susceptibility genotype, the strength of an association can be measured by the relative risk ratio (RRR).  The RRR compares the frequency of disease in all those who carry a susceptibility allele ([a/(a + b)] with the frequency of disease in all those who do not carry a susceptibility allele ([c/(c + d)].

 The RRR is approximately equal to the odds ratio when the disease is rare (i.e., a < b and c < d).  The significance of any association can be assessed in one of two ways: ◦ One is simply to ask if the values of a, b, c, and d differ from what would be expected if there were no association by a χ2 test. ◦ The other is determined by a 95% confidence interval for the relative risk ratio. This interval is the range in which one would expect the RRR to fall 95% of the time that you genotype a similar group of cases and controls by chance alone. If the frequency of the allele in question were the same in patients and controls, the RRR would be 1. Therefore, when the 95% confidence interval excludes the value of 1, then the RRR deviates from what would be expected for no association with P value <0.05.

TotalControls without CVT Patients with CVT 2742320210G > A allele present 2131169720210G > A allele absent 240120 Total, Example suppose there were a case-control study in which a group of 120 patients with cerebral vein thrombosis(CVT)

 For example, suppose there were a case-control study in which a group of 120 patients with cerebral vein thrombosis (CVT) and 120 matched controls were genotyped for the 20210G > A allele in the prothrombin gene.  There is clearly a significant increase in the number of patients carrying the 20210 G > A allele versus controls (χ2 = 15 with 1 df; P < 10 -10 ). Since this is a case-control study, we use an odds ratio (OR) to assess the strength of the association.

 Association studies for human disease genes have been limited to particular sets of variants in restricted sets of genes. For example, geneticists might look for association with variants in genes encoding proteins thought to be involved in a pathophysiological pathway in a disease.  Many such association studies were undertaken before the Human Genome Project era, with use of the HLA loci, because these loci are highly polymorphic and easily genotyped in case-control studies. Genome-Wide Association and the Haplotype Map

 A more powerful approach, however, would be to test systematically for association genome-wide between the more than 10 million variants in the genome and a disease phenotype, without any preconception of what genes and genetic variants might be contributing to the disease.  Although such a massive undertaking is not currently feasible, recent advances in genomics, building on the HapMap, make possible an approximation to a full-scale genome-wide association that still retains sufficient power to detect significant associations across the entire genome Genome-Wide Association and the Haplotype Map

 By examining all the haplotypes within an LD block and measuring the degree of LD between them, it is possible to identify the most useful, minimum set of SNP alleles (so-called tag SNPs) that are capable of defining most of the haplotypes contained in each LD with minimum redundancy.  In theory, a set of well-chosen tag-SNPs constitutes the minimal numbers of SNPs that need to be genotyped to provide nearly complete info on which haplotypes are present on any chromosome

 In practice, genotyping a few hundred thousand tag SNPs is only a bit less useful for an association study than is genotyping more than 10 million SNP genotypes at every known variant in the genome.  Tag SNPs need to be examined and refined before we know if the results based on the four populations studied in the Hap-Map project are applicable world-wide.

 Positional cloning: Mapping location of a disease gene by linkage analysis or other means, followed by identifying the gene on basis of its map position  This strategy has led to identification of genes associated with hundreds of mendelian disorders and to a small but increasing number of genes associated with complex disorders.

Human Gene Mapping & Disease Gene Identification Cont.

Similar presentations

Presentation on theme: "Human Gene Mapping & Disease Gene Identification Cont."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Human Gene Mapping & Disease Gene Identification Cont.

Similar presentations

Presentation on theme: "Human Gene Mapping & Disease Gene Identification Cont."— Presentation transcript:

Similar presentations

About project

Feedback