Lecture overview Polymerase chain reaction (PCR) and its applications. X-ray crystallography and the Protein Data Bank (PDB). Microarrays and applications.
Polymerase Chain Reaction (PCR) A method that allows us to generate a large amount (relatively) of a particular DNA sequence even from an extremely small sample. Exquisitely sensitive; even the DNA from a single cell may suffice! Numerous applications in biotechnology.
PCR: main ideas You need to know what you are looking for, e.g. the DNA sequence for a particular gene (the target). Sample, primers, nucleotides to build new DNA strands, and Taq polymerase mixed together. Mixture is subjected to cycles of heating, cooling, reheating, on the order of a few minutes. If the target is present in the initial sample, the amount of it in the mixture will grow exponentially with the number of cycles.
ds-DNA target primers primers are complementary to opposite ends of target seq.
PCR cycle Mixture is heated to 90ºC for 1-2 minutes to separate the DNA strands (denature). Temperature is dropped to 50º-60ºC so that primers can anneal to complementary regions. Temperature is raised to 70ºC for 1-2 minutes to allow Taq polymerase to synthesize new DNA strands, starting at the primers; this goes from 5’ to 3’ for both strands. Note: The Taq polymerase is a DNA polymerase from Thermus aquaticus, a bacteria that lives in hot springs.
PCR notes Primer selection is critical. The primers should be at least 15-20 bases to ensure specificity. If you are unsure of the exact sequence, you can use “degenerate” primers, i.e. a mixture of primers (vary at third codon position). Note that almost all of the product is exactly the target sequence you want, i.e. with flush ends.
PCR applications Making a lot of protein! Use RT-PCR, “reverse transcriptase” PCR, to create DNA with introns removed and then insert it into bacteria to clone the gene. E.g. to make proteins for X-ray crystallography. Medical diagnosis: e.g. detect HIV viral proteins long before AIDS symptoms arise; or rapid tuberculosis test. Forensics; detect trace amounts of DNA at a crime scene.
Methods to determine protein structures X-ray crystallography (most important, over 80% of structures in the PDB are obtained this way). NMR spectroscopy (Nuclear Magnetic Resonance). Electron microscopy; uses a beam of electrons to create images (maybe issues with sample preparation and resolution in regards to applications to protein structure determination).
Protein crystallography steps Grow crystals of the protein that diffract well (a difficult step, can take from weeks to years!). Obtain the X-ray diffraction data. Compute electron density maps. Refinement: calculate an atomic model to fit electron density; compare the diffraction data computed from the model with the actual data; refine the model to fit the data (iterate).
Protein crystals http://www-structure.llnl.gov/crystal_lab/Crys_lab.html
Protein crystal molecule crystal The unit cell is the basic unit of symmetry in the crystal.
Facts about protein crystals In contrast e.g. to salt or quartz crystals, protein crystals are mostly water (due to the irregular shape of the molecule) and therefore fragile. Since they are mostly water, the actual protein structures obtained must be similar to their conformations in vivo. To preserve the crystal in the X-ray beam, it is kept at a very low temperature (100ºK).
X-ray diffraction The incident beam of X-rays is diffracted by the electrons in the protein molecules in the crystal. Some of the diffracted waves will interfere constructively, and others will interfere destructively. This results in a diffraction pattern of spots of varying intensity on the detector.
Illustration of diffraction http://www.eserc.stonybrook.edu/ProjectJava/Bragg/index.html
Analysis of the diffraction pattern The diffraction pattern is analyzed by mathematical/computation methods (Fourier analysis) to produce an electron density map. This gives a 3-dimensional image of the molecule that will be subjected to further processing and analysis.
Electron density maps at different resolutions http://www-structure.llnl.gov/Xray/101index.html
Refinement Refinement is an iterative process; one constructs an atomic model based on the electron density, then computes diffraction data from the model, which is compared to the actual diffraction data. The crystallographic R-factor is a measure of how well the model fits the diffraction data. Can be subject to error! The electron density for certain pairs of amino acid residues is extremely similar.
Fitting amino acid residues into the electron density map
NMR Based on magnetic moments of atomic nuclei. NMR spectra give information about distances between atoms in the molecule. Applied to protein molecules in solution (no crystals needed!). Only works well for smaller proteins, e.g. 100 residues or less (or so). A different set of mathematical/computational tools is involved. Note: The different “models” represent different structures compatible with the distance contraints, not actual conformations of the molecule.
From Coordinates to Models 1EJ9: Human topoisomerase I
Annotating Secondary Structure 1EJ9: Human topoisomerase I α-Helices β-strands coils/loops
Creating 3D Domains 3D Domain 0: 1EJ9A0 = entire polypeptide
Creating 3D Domains 3D Domains 1EJ9A1 1EJ9A3 1EJ9A2 1EJ9A4 1EJ9A5 < 3 Secondary Structure Elements
Microarrays Used to study gene expression levels in cells. Cells can differ dramatically in the amounts of various proteins that they synthesize; e.g. due to different cell types or different external/internal conditions. In fact, in higher level organisms only a fraction of the genes in a cell are expressed at a given time, and that subset depends on the cell type. Via microarrays it is possible to study the expression levels of tens of thousands of genes simultaneously.
Microarray technology Physically, a microarray is just a glass slide with spots of DNA on it; each spot is a probe (or target). The DNA is single-stranded cDNA (complementary) and may consist of an entire gene or part of one (an oligonucleotide consisting of 50 bases or so). If the microarray is exposed to a solution containing mRNA, then the mRNA molecules will bind to those probes to which they are complementary.
Microarray probes ssDNA gene sequences or oligos
Microarray technology Thousands of probes can fit on a single slide. The slides can be spotted by robots. Of course, what genes you can study with a given microarray depends on the collection of probes on it. There are a number of commercial manufacturers; e.g. Affymetrix, Agilent, Amersham. They’re expensive!
Microarray experiments Start with two cell types, e.g. “healthy” and “diseased”. Isolate mRNA from each cell type, generate cDNA with fluorescent dyes attached, e.g. green for healthy and red for diseased. Mix the cDNA samples and incubate with the microarray. After incubation the cDNA in the samples has had a chance to bind (hybridize) with the probes on the chip. The chip is read by a scanner that uses lasers to excite the fluorescent tags; the intensity levels of the dyes are recorded for each probe gene and stored in a computer.
Microarray data representation There is a “standard” color scale representation, as follows. Red means the gene produced more mRNA in the experimental condition; green means the gene produced more mRNA in the control. Black means equal amounts of mRNA for both experiment and control. If e.g. there were 5 times as much mRNA for the experimental condition compared to the control, we would say there was a 5-fold induction; 1/5 as much would be 5-fold repression. The data is recorded numerically as the log base 2 of the expression ratio.
Microarray data analysis Since there are typically so many genes, it is useful to cluster the genes based on similar expression patterns. Different clustering algorithms may be used, e.g. hierarchical with different metrics, or k-means, k- medians. It may also be useful to cluster the samples (we’ll see this shortly). Other statistical methods may be useful, e.g. support vector machines (SVM).
Acute Lymphoblastic Leukemia (ALL) Constitutes 75% of annual diagnoses of childhood leukemia. Long-term outlook has improved dramatically since about 1970. At that time the long term disease free survival rate (LTDFS) was under 10%; at present it is over 80%. There is still a risk of relapse in 20% of patients.
ALL (cont.) The LTDFS rate improved because it was recognized that ALL is heterogeneous, and the therapy should be tailored to the subtype so as to improve the odds of a successful treatment (e.g. bone marrow transplant vs. chemotherapy). Important subtypes include: T-ALL, E2A-PBX1, BCR- ABL, TEL-AML1, MLL rearrangement, and hyperdiploid > 50 chromosomes.