Microarray - Introduction

Microarray - Introduction
Ka-Lok Ng Department of Bioinformatics Asia University

Graduate syllabus Week Date Graduate Level Topic 1 2010/2/23 Introduction - RNA expression, chip technology 2 2010/3/2 Suffix tree, application of microarray 3 2010/3/9 Data normalization, filtering, Log transform 4 2010/3/16 Data Normalization, Lowess normalization, SMD 5 2010/3/23 Statistical analysis of gene expression data 6 2010/3/30 Midterm 7 2010/4/6 distance measure, Entropy 8 2010/4/13 Statistics, normal distribution 9 2010/4/20 Test of hypothesis, t-test of microarray data 10 2010/4/27 ArrayExpress - analysing of data 11 2010/5/4 Gene expression databases 12 2010/5/11 Gene regulation network prediction 13 2010/5/18 Microarray data analysis tools 14 2010/5/25 student presentation 15 2010/6/1 16 2010/6/8 17 2010/6/15 18 2010/6/22

Undergraduate Level Topic
次數日期 Undergraduate Level Topic 授課教師 1 2010/2/23 Introduction - RNA expression, chip technology 吳家樂 2 2010/3/2 Suffix tree, application of microarray 3 2010/3/9 Data normalization, filtering, Log transform 4 2010/3/16 Data Normalization, Lowess normalization, SMD 5 2010/3/23 Statistical analysis of gene expression data 6 2010/3/30 Midterm 7 2010/4/6 distance measure, Entropy 8 2010/4/20 Statistics, normal distribution 9 2010/4/27 Test of hypothesis, t-test of microarray data 10 2010/5/4 ArrayExpress - analysing of data 11 2010/5/11 chi-square test of microarray data 12 2010/5/18 Second examination 13 2010/5/25 張培均 14 2010/6/1 15 2010/6/8 16 2010/6/15 17 2010/6/22 18 2010/6/29

Topics to be covered Introduction - RNA expression
Experimental design, image processing, Microarray databases 　　 Data normalization, filter and analysis 　　 Statistical analysis of gene expression data 　　 Clustering methods Time series data (cell cycle) and reverse engineering Gene regulatory networks 　　 Gene regulatory networks and protein-protein interaction networks　　

Classwork, homework, class attendance
Mid-term, final exam. or SCI paper/thesis presentation (for graduate students)

References Gibson G., Muse Spencer. A primer of Genome Science, Ch. 4. 2nd edition, Sinauer (2004) Causton H., Quackenbush J., and Brazma A. Microarray Gene Expression Data Analysis. A Beginner’s Guide. Blackwell (2003) Baxevanis A. and Ouellette B.F. Francis. Bioinformatics Ch. 16. J. Wiley (2005) Knudsen S. A Biologist’s Guide to Analysis of DNA Microarray Data. J. Wiley (2002) Benfey P. and Protopapas A.D. Genomics Ch. 5. Prentice Hall (2005). Setubal J. and Meidanis J. Introduction to computational molecular biology. PWS publishing. (1997). A. Gu´enoche (2005). “About the design of oligo-chips”, Discrete Applied Mathematics, v147(1), pp

Contents Introduction – the central dogma of molecular biology, applications, data analysis, Microarray slide surface Printing technologies – spotting, photolithography, ink-jet Selection of genes for spotting on arrays Selection of primers for PCR – suffix tree Microarray application - four different types of brain tumors Gene co-expression and gene expression profile Data management

Knowledge is the process of piling up facts; wisdom lies in their simplification.
Martin H. Fischer

Introduction The last 10 years have brought spectacular achievements in genome sequencing (such as the HGP) It took >1000 years for science to progress from human anatomy to understand how genomes function) Even if we assume all the genes have correctly identified, the results represents only sequence High throughput DNA sequencing technology created a system approach to biology

The central dogma of molecular biology http://www. hort. purdue
Glossary Transcripts – mRNA Transcriptome – the complete set of transcripts Hybridization

Microarray technology allow one to identify the genes that are expressed in different cell types, to learn how their expression levels change in different developmental stages or disease states, and to identify the cellular processes in which they participate Microarray technology provide clues about how genes and gene products interact and their interaction networks Microarray gene expression data analysis Experimental design  data transformations from raw data to gene expression matrices  data mining and analysis of gene expression matrices

What are microarrays and how do they work ?
A microarray is typically a glass or polymer slide DNA molecules are attached at fixed locations called spots or features

Smooth surface enables even deposition of surface chemistries and perfect spot morphology.

What are microarrays and how do they work ?
~10,000 spots on an array each spot contains ~107 of identical DNA of lengths from 10s to 100s of bp spots are either printed on the microarrays by a robot or jet, or synthesised by photolithography (石版影印術) or by inkjet printing Principle of cDNA microarrays EST fragments arrayed in 96- or 384-well plates are spotted at high density onto a glass microarray slide. Subsequently, two different fluorescently labeled cDNA populations derived from independent mRNA samples are hybridized to the array.

Ink-jet printer microarrays
Ink-jet printhead draws up DNA Printhead moves to specific location on solid support DNA ejected through small hole Used to spot DNA or synthesize oligonucleotides directly on glass slide Use pioneered by Agilent Technologies, Inc. Another method for depositing DNA on glass slides is the use of ink-jet printer technology. Ink-jet printers are designed to deposit very small quantities of ink in precise locations. As adapted for microarray spotting, the printhead draws up a small amount of DNA, moves to a particular point on the slide, and ejects the DNA through a very small hole. This technology has also been adapted to provide for the synthesis of oligonucleotides on slides. After each base is added through the printhead to the growing oligonucleotide, the slide is washed free of excess nucleotides, and the exposed bases are primed for the addition of the next nucleotide. Agilent Technologies has pioneered the use of ink-jet printing for making microarrays. The primary drawback of this approach as compared with photolithography is that the maximum number of spots on a slide is far fewer with the ink-jet printing of DNA.

Types of printing pins (A) Tweezer or split-pin designs transfer low nanoliter (10-9 liter) amounts of DNA to the array by capillary action as the tip strikes the solid surface. (B) TeleChemTM tips and pins apply small droplets by contact between the pin and substrate. (C) The pin-and-loop design picks up the DNA in a small loop, and a pin stamps solution on a slide at a uniform density. (D) Ink jets spray picoliter (10-12 liter) droplets of liquid under pressure. Robotic spotting, capillary action, the DNA sticks through hydrostatic interactions The spacing between spot centers is specified from mm according to the density required. The entire microarray usually covers an area 2.5x5.0 cm, though shorter grids can be printed when fewer clones are to be represented. pgs2e-fig jpg

DNA spotting I DNA spotting usually uses multiple pins
DNA in microtiter plate DNA usually PCR amplified Oligonucleotides can also be spotted Most robotic spotters use pins that act by capillary action similar to that of fountain pens. Multiple pins are mounted together and dipped simultaneously into DNA aliquoted into the different wells of a microtiter plate. The DNA in the wells has usually been amplified from cDNA or genomic DNA, using PCR. Oligonucleotides can also be spotted in this way.

Commercial DNA spotter
In this microarray spotter made by GeneMachines®, microtiter plates are stacked on the left, awaiting the pins, which are poised over a set of microscope slides. The action of printing microscope slides is shown in the next slide.

Oligonucleotide microarrays – pioneered by Affymetrix Affymetrix GeneChips
Oligonucleotides Usually at least 20–25 bases in length, optimal with 45~60 bp long 10–20 different oligonucleotides for each gene Oligonucleotides for each gene selected by computer program to be the following: Unique in genome (4 (20 to 25) =2(40 to 50) >> 3*109 = 230), not likely to appear twice Non-overlapping (if the sequence length is too short then specificity is low, whereas if the length is too long, self-hybridization could happen) Composition based design rules Empirically derived rules (ratio of G-C pairs vs. A-T pairs which could affect the melting temperature of the seq., that is, Tm = *(GC%)-675/L, where L = length of the oligonucleotide On Affymetrix GeneChips, there are between 10 and 20 oligonucleotides for each gene. The choice of oligonucleotides is determined using a computer program that searches for nonoverlapping stretches of bases that are unique in the genome (in order to prevent cross-hybridization). Furthermore, the computer searches for oligonucleotides that will fit empirically derived design rules that dictate the ratio of G–C pairs vs. A–T pairs and that attempt to reduce the likelihood that the oligonucleotide will hybridize to itself, creating hairpin structures.

Oligonucleotide microarrays – pioneered by Affymetrix
Construction of oligonucleotide arrays. Oligonucleotide are synthesized in situ in the silicon chip. (A) In each step, a flash of light “deprotects” the oligonucleotides at the desired location on the chip; then “protected” nucleotides of one of the four types (A, C, G or T) are added so that a single nucleotide can add to the desired chains. pgs2e-fig jpg

Oligonucleotide microarrays
pgs2e-fig jpg Construction of oligonucleotide arrays. The light flash is produced by photolithography using a mask to allow light to strike only the required features on the surface of the chip.

Photolithography Light-activated chemical reaction
For addition of bases to growing oligonucleotide Custom masks Prevent light from reaching spots where bases not wanted Mirrors also used NimbleGen™ uses this approach lamp mask chip In photolithography, each step of the oligonucleotide synthesis process is activated by light. In the Affymetrix manufacturing process, masks are used to allow bases to be added to growing oligonucleotides at specific locations on the chip. The masks prevent light from reaching locations on the silicon wafer where a base is not to be added that round. Instead of masks, digitally controlled mirrors can be used to shine light only on those spots where activation is desired. The biotechnology company, NimbleGen™ is using this approach to produce chips through photolithography.

Example: building oligonucleotides by photolithography
Want to add nucleotide G Mask all other spots on chip Light shines only where addition of G is desired (light “deprotects” the oligonucleotides at the desired location on the chip) G added and reacts Now G is on subset of oligonucleotides light To understand how photolithography works with masks we use an example of adding the base G. A mask is used that prevents light from reaching all the oligonucleotides on the chip where G is not supposed to be the next base. When light is turned on, it reaches only those positions where a G is to be added. The light activates the growing chain, allowing a base to be added. A solution containing the base G is then added to the chip, and it becomes attached to the oligonucleotides activated by the light.

Example: adding a second base
Want to add T New mask covers spots where T not wanted Light shines on mask T added Continue for all four bases Need 80 masks for total 20-mer oligonucleotide light To add the base T, a new mask is used that covers all spots except those where a T is needed. Light is then shined on the mask, activating the specified oligonucleotides, and the T is then added. This process is performed sequentially for all four bases, until all of the oligonucleotides on the chip are synthesized. Thus, 80 custom masks are needed to make a chip that has 20-base-long oligonucleotides. When mirrors are used to control the photolithography process, there is no need to manufacture the custom masks. This means that there is no cost associated with changing the sequences on the chip. The use of mirrors also allows for much longer oligonucleotide chains to be synthesized on chips.

Comparisons of microarrays
Photolithograhy Mechanical printing This slide compares the three methods for preparing microarrays. The top panel illustrates photolithography, the middle panel illustrates mechanical printing, and the bottom panel illustrates ink-jet printing. Ink-jet printing

Design of oligonucleotides by photolithography
There are four types of masks according to the added nucleotide. Given a set of oligos to synthesize, the mask is a common supersequence of the oligo set or, in other words, each oligo is a subsequence of the mask sequence (characters may be separated, but they remain in the same order. To minimize the number of masks necessary to build a supersequence of a given set of words, so-called the shortest common supersequence problem, or SCS-problem, is a NP-hard problem. We call realization of an oligo a sequence of masks capable to synthesis it. The number of realizations Count the number of realizations of the probe sequence GTATC (L=5) in the mask sequence GGTTATC (L=7). It is found that the following four sets of positions can match the probe sequences; (1,3,5,6,7), (1,4,5,6,7), (2,3,5,6,7) and (2,4,5,6,7). 1 2 3 4 5 6 7 G T A C + X The left copies are indicated by sign +. The instances of identical characters (repeated) in these intervals are marked by a ‘X’.

Design of oligonucleotides by photolithography
Count the realizations of the probe sequence ATTAC in the mask sequence ATTATTACAC. The left and right copies are indicated by sign + and -. The instances of identical characters in these intervals are marked by a ‘X’. 23 realizations: (1,2,3,4,8), (1,3,5,7,8), (1,5,6,7,8,), (4,5,6,7,8) ….etc. too short Total number of possible paths from Start (S) to End (E) is 23. Circle denotes the possible position of probe sequence within mask sequence. Edge denotes consecutive positions in the probe sequence. 吳哲賢生物晶片之探針辨識數目問題第二十四屆組合數學與計算理論研討會

Comparison of microarray hybridization
Spotted microarrays Competitive hybridization Two labeled cDNAs hybridized to same slide  measure the relative difference between the signal intensity of two targets binding to the same spot of DNA Affymetrix GeneChips One labeled RNA population per chip Comparison made between hybridization intensities of same oligonucleotides on different chips In addition to the differences in their manufacturing, spotted microarrays and GeneChips (as well as NimbleGen chips) differ in how the hybridization is performed. For spotted microarrays, usually the two labeled targets to be compared are hybridized to the same microarray. This procedure is known as competitive hybridization. For GeneChips, only one labeled target is hybridized to each chip. Comparisons are made at the analysis stage between hybridization intensities measured on two different chips. With competitive hybridization, one is measuring the relative difference between the signal intensity of two targets binding to the same spot of DNA. The practical reason for this approach is that there is often variability in the quality of the spotted DNA, in terms of amount and integrity. This measurement compensates for differences in the quality of the spot. Microarrays made with photolithography tend to have higher reproducibility from slide to slide, making competitive hybridization less important.

Selection of genes for spotting on arrays
Suppose you are interested in a family of proteins, say a particular class of receptors To identify all the genes that are part of the family, you can do a homology search (PSI-BLAST) or a PubMed keywords search PSI-BLAST Another way is to use a commercial Affymetrix array In the context of spotted arrays, the term probe often refers to the labelled population of nucleic acid in solution, while in connection with GeneChipsTM it is used to refer to the nuclei acid attached to the array. In the MIAME convention probe is referring to the mobile population of nucleic acid as the labelled extract and the nucleic acid attached to the array as the reporter, feature or spot GeneChipsTM Target - labeled cDNA or RNA Spotted probe, MIAME probe GeneChipsTM Probe – the bound DNA Spotted array – target

Selection of regions within genes
Once you have the list of genes you wish to spot on the array The next question is cross-hybridization How can you prevent spotting probes that are complementary to more than one gene (target mRNA or cDNA seq.) if you are working with a gene family (many similar genes) with similarities in sequence (such as > 70% similarity) ? That is a probe could cross-hybridized with different mRNA That is there are probes appear to be more abundant than they really are or a gene’s mRNA (alternative splicing mechanism could generate different mRNAs) could cross-hybridized with different probe  non-specific  not a true expression level of the gene under study Not always can find a solution to the cross-hybridization problem Solve this problem by using  ProbeWiz Server Use Blast to find regions in those genes that are the least homologous to other genes ProbeWiz -

Selection of primers for PCR
Once those unique regions have been identified, the probe needs to be designed  use PCR amplification of a probe Solve this problem by using  ProbeWiz or OligoArray Servers ProbeWiz predicts optimal PCR primer pairs for generation of probes for cDNA arrays avoid self-hybridization  hairpin structure  high specificity OligoArray Genome-scale oligonucleotide design for microarrays Other option - By using long oligonucleotides (50 to 70 bps) instead of PCR primers Other complicated issues: alternative splicing, SNP

Selection of primers for PCR
Minimal primer set (MPS) problem Given a set of ORF sequences S = {S1, S2, …Sn}, L is the length of the primer, one needs to find the minimal set of primer P = {P1, P2, …Pk} , such that for every i, Si contains at least one sequence from P. In other words, identify a set of primers P, which is common among the set of ORF sequences S Then selected highly specific primers (dissimilar to the complementary strand of the template, otherwise they will hybridize to a lot of positions along the template) from P Example S = {ATTC, GATT, TTAC}, L = 3  P = {ATT, TTA}, P ={ATT, TAC} or {ATT, TTA} if L = 2  P = {TT}

Selection of whole genome oligonucleotide or cDNA primers
Automatic generation of whole genome oligonucleotide or cDNA probes Probe pre-selection by suffix tree algorithm, size of memory spacing O(n) ~ 40n, where n is the length of the input seq. (e.g Hs gene seqs. is about 35MB in length, human gene seqs.  memory space ~ 40*35*3.0 MB=4200 MB ! Probes are filtered for length, GC content and not contain self complementary regions >4bp Hybridization prediction The most time-consuming part Need to predicts melting temperatures Tm for all probes (on average 4 probes/gene  do a 4*30000 vs Tm calculations (i.e. 4*30000*30000 = 1.2*109 times of using the tool Mfold) Probe selection Select the probe-target vs. probe-non-target seqs. Probe pre-selection Hybridization prediction Probe selection

Suffix tree - Basic notation
Concatenation (串聯) of two strings s and t is denoted by st and is formed by appending all characters of t after s, in the order they appear in t, for instance, if s =GGCTA and t=CAAC, then st=GGCTACAAC. The length of st is |s|+|t|. A prefix of s is any substring of s of the form s[1….j] for 0≦j≦ |s|. It is admit j=0 and define s[1….0] as being the empty string, which is a prefix of s as well. Note that t is a prefix of s if and only if there is another string u such that s=tu. Sometimes one needs to refer to the prefix of s with exactly k characters, with 0≦k≦|s|, and we use the notation prefix(s,k) to denote this string. prefix(s,3)  ATT is a prefix of ATTCGATTTTAC A suffix of s is a substring of the form s[i….|s|] for a certain i such that 1≦i≦ |s|+1. one admit i=|s|+1, in which case s[|s|+1….|s|] denotes the empty string. A string t is a suffix of s if and only if there is another string u such that s=ut. The notation suffix(s,k) denotes the unique suffix of s with k characters, for 0≦k≦|s|. suffix(s,3)  TAC is a suffix of ATTCGATTTTAC

Suffix tree Suffix tree – contains all suffixes of a string factoring out common prefixes as much as possible in the tree structure Edges are directed away from the root, and each edge is labeled by a substring from S. All edges coming out of a given vertex have different labels, and all such labels have different prefixes (not counting the empty prefix). To each leaf there corresponds a suffix from S, and this suffix is obtained by concatenating all labels on all edges on the path from the root to the leaf.

Identifying substring, S
Suffix tree More example, X = AATAATGC$, where $ signals the end of the sequence Let the substring S be the shortest substring beginning at i which does not occur elsewhere in X The longest repeat within the string is AAT Suffix tree of X, where () denotes position A Position Identifying substring, S 1 AATA 2 ATA 3 TA 4 AATG 5 ATG 6 TG 7 G 8 C 9 $ ATA (1) ATG (4) TA (2) TG (5) C (8) G (7) TA (3) TG (6) $ (9)

Suffix tree Given a set of three ORF sequences S = {S1,S2,S3}, S1= {AATG}, S2={TTTG}, and S3 ={TTTC}. Merging S1 S2 S3 together to form AATG$1TTTG$2TTTC$3, with a total length of 15. Non-overlap Longest Repeat among S1 and S2 is TG, and among is S2, and S3 is TTT Leaf A AATG$1TTTG$2TTTC$3 with a length of 15 AATG$1TTTG$2TTTC$3 with a length of 14 Leaf C AATG$1TTTG$2TTTC$3 with a length of 2 Leaf G AATG$1TTTG$2TTTC$3 with a length of 12 AATG$1TTTG$2TTTC$3 with a length of 7 Leaf T AATG$1TTTG$2TTTC$3 with a length of 3 AATG$1TTTG$2TTTC$3 with a length of 13 AATG$1TTTG$2TTTC$3 with a length of 8 AATG$1TTTG$2TTTC$3 with a length of 4 AATG$1TTTG$2TTTC$3 with a length of 9 AATG$1TTTG$2TTTC$3 with a length of 5 AATG$1TTTG$2TTTC$3 with a length of 10 Leaf $1 AATG$1TTTG$2TTTC$3 with a length of 11 Leaf $2 AATG$1TTTG$2TTTC$3 with a length of 6 Leaf $3 AATG$1TTTG$2TTTC$3 with a length of 1 H. Chen and Y.-S. Hou, A study on specific primer selection algorithms using suffix trees, Journal of information technology and applications, Vol. 1, No. 1, 25-30, 2006.

cDNA microarrays Microarrays are used to measure gene expression levels in two different conditions. Green label for the control sample and a red one for the experimental sample. DNA-cDNA or DNA-mRNA hybridization. The hybridised microarray is excited by a laser and scanned at the appropriate wavelenghts for the red and green dyes Amount of fluorescence emitted (intensity) upon laser excitation ~ amount of mRNA bound to each spot If the sample in control/experimental condition is in abundance  green/red, which indicates the relative amount of transcript for the mRNA (EST) in the samples. If both are equal  yellow If neither are present  black pgs2e-fig jpg

Scanning of microarrays
Confocal laser scanning microscopy Laser beam excites each spot of DNA Amount of fluorescence detected Different lasers used for different wavelengths Cy3 Cy5 laser detection Confocal laser scanning microscopy is used to determine the amount of fluorescently labeled target that has hybridized to the DNA on the microarray. In this process, a laser beam is aimed at each spot on the microarray. The fluorescent light that is emitted upon excitation of the dye passes through a pinhole that effectively eliminates all surrounding light. This condition permits a precise determination of the level of fluorescence coming from the hybridized target at a single spot on the microarray. For competitive hybridization, the microarray is scanned twice, using different wavelengths for each of the fluorescent dyes Cy3 and Cy5.

Analysis of hybridization
Results given as ratios Images use colors: Cy3 = Green Cy5 = red Yellow Yellow is equal intensity or no change in expression Once the levels of fluorescence are determined for each spot, software is used to compare the relative levels for the two dyes. This comparison is usually given as a ratio and is depicted by gradations of color. The Cy3 hybridization is normally shown in green, and the Cy5 hybridization is given as red. These colors are actually pseudocolors generated by the software used to analyze the output from the confocal laser scanning microscope. Thus, when there are equal levels of hybridization, the resulting color is yellow. This condition indicates that there has been no change in the levels of RNA between the two experimental conditions that are being tested.

Example of spotted microarray
RNA from irradiated cells (red) Compare with untreated cells (green) Most genes have little change (yellow) Gene CDKN1A: red = increase in expression Gene Myc: green = decrease in expression CDKNIA An experiment performed with spotted cDNA microarrays was the comparison of RNA from cells that had been subjected to radiation (Cy5 = red) with RNA from untreated control cells (Cy3 = green). Most spots on the microarray were yellow indicating no change in gene expression. The spot with the gene CDKN1A was red, indicating an increase in its expression, while the spot where the myc oncogene was spotted was green, indicating that its expression had decreased. MYC

Visualizing the hybridized target on a microarray can be performed by using either a confocal detector or a charge couple detector (CCD) camera. pgs2e-fig jpg Microarray images produced with a pin-and-loop arrayer. (A) Two common undesirable features are indicated, namely high local background (arrow head) and scratches (two arrows) that would suggest “flagging” of the associated spots. (B) A close-up of a portion of the array demonstrates the uniformity of relative hybridization within each spot and differences in the red:green ratio of reach clone.

By Hanne Jarmer, BioCentrum-DTU, Technical University of Denmark
cDNA labeled by Cy3 (Green) by Cy5 (Red) Probe genes Target Microarray – overview

What can we learn from the
microarray data ? Microarray permits an integrated approach to biology, in which genetic regulation can be examined  allows us to build a gene network Classification of disease, diagnosis, prognostic (judgment of the likely or expected development of a disease) prediction and pharmaceutical applications

Co-expression of gene expression
Co-expressed genes  genes involved in common processes  clustering of genes Examples Genes required for nutrition and stress responses Genes whose products encode components of metabolic pathways Genes encoding subunits of multi-subunit complexes such as the ribosome, the proteasome and the nucleosome are coordinately expressed Ribosome - site of cellular protein synthesis Proteasome - large multi-enzyme complexes that digest proteins Nucleosome – A length of DNA consisting of about 140 base pairs makes two turns around the histone core thus forming a nucleosome. Animation -

Co-expression of gene expression
Waves of co-expressed temporally regulated genes has been observed during the development of the rat spinal cord the expression levels of 112 genes at nine different time points are measured during the development of rat cervical spinal cord, and 70 genes during development and following injury of the hippocampus 海馬體)

Gene expression profile and phenotype
Profile or so-called signature the combination of the mRNAs (representing a subset of the total genotype) being expressed by the cell [Thomas A. Houpt, Nutrition, 827 (2000)] Can be thought of as a precise molecular definition of the cell in a specific state Expression profile is a way to describe a phenotype, and can be used to characterize a wide variety of samples Example human cancer cell lines treated with agents independently or in combinations have been used to link drug activity with its mode of action genes and putative drug targets

Affymetrix GeneChip experiment - Profiling tumors
RNA from four different types of brain tumors extracted Extracted RNA hybridized to GeneChips containing approximately 6,800 human genes Identified gene expression profiles specific to each type of tumor Image portrays gene expression profiles showing differences between four different types of brain tumors Tumors: MD (medulloblastoma) Mglio (malignant glioma) Rhab (rhabdoid) PNET (primitive neuroectodermal tumor) Ncer: normal cerebella Across the top of the image are the four different types of tumors: medulloblastoma (MD), malignant glioma (Mglio), rhabdois (Rhab), and primitive neuroectodermal (PNET). Genes that had expression patterns specific to each type of tumor are listed along the vertical axis. The colors represent intensities read from the GeneChips, with red being the highest intensity and purple the lowest (shown on the bar at the bottom of the image). The algorithms used to cluster the genes are described in the chapter on bioinformatics. Very clear differences were found in the expression profiles of the four tumor types.

Affymetrix GeneChip experiment - Cancer diagnosis by microarray
For a single type of tumor, medulloblastoma (MD), RNA from 60 different tumor samples was analyzed Response to chemotherapy was known for each of the tumor samples Gene expression differences for MD correlated with response to chemotherapy Patients who failed to respond had a different profile from survivors (who did respond and survived longer) Can use this approach to determine which tumor samples are likely to respond to treatment 60 different samples A more detailed analysis was performed by the same authors for a single type of tumor, medulloblastoma. RNA from 60 different tumor samples was analyzed. The response to chemotherapy was known for each of the tumors. Analysis of the gene expression profiles indicated that it was possible to correlate specific gene expression patterns with response to chemotherapy: Patients who failed to respond had a different expression profile than those who did respond and survived longer. This result is shown in the graph at the top of the image and the clustered expression profiles beneath it. This type of analysis holds the promise that in the future, microarrays could be used to determine which tumors are likely to respond to different treatments.

Microarray data generation, processing and analysis
Two parts Material processing and data collection Information processing Five steps - Material processing and data collection Array fabrication Preparation of the biological samples to be studied Extraction and labeling of the RNA from the samples Hybridization of the labeled extracts to the array Scanning of the hybridized array

Microarray data generation, processing and analysis
Four steps - Information processing Image quantitation – locating the spots and measuring their fluorescence intensities Data normalization and integration – construction of the gene expression matrix from sets of spot Gene expression data analysis and mining – finding differentially expressed genes or clusters of similarly expressed genes Generation from these analyses of new hypotheses about the underlying biological processes  stimulates new hypotheses that in turn should be tested in follow-up experiments Image analysis Data analysis clustering

Microarray data processing and analysis
Microarray experimental raw data (image data)  spot quantitation matrices (row = spot on array, column = quantitation of that spot, i.e. mean, median, background)  gene expression matrix  data analysis (clustering or classification (SVD or PCA, see

Microarray data processing and analysis
Clustering – unsupervised method, i.e. do not assign some prior knowledge about function to the genes and/or samples – supervised method, i.e. assign some prior knowledge about function to the genes and/or samples Next, the reverse engineering of gene regulatory networks  based on the hypothesis that genes have similar expression profiles under a variety of conditions are likely to be regulated by common mechanisms Cluster of genes  some of these genes’ promoter sequences are obtained  may contain a ‘signal’, e.g. a specific seq. pattern relevant to gene regulation Application of different algorithms, or different parameters (such as distance measures), or different data filtering methods  produce different results !! What happen ? Well, it reflects the fact that cells typically carry out multiple processes simultaneously via multiple interacting pathways Future research directions – data analysis method, quality or reliability of data  in the next generation of microarrays, where each spot is printed or synthesized multiple times  estimate the measurement reliability using the standard deviation between the individual measurements  data mining

Microarray data management
Microarray database consists of three major parts – the gene expression matrix, gene annotation, and sample annotation No established standards for microarray experiments or raw data processing No standard ways for measuring gene expression levels

Microarray data management
Microarray Gene Expression Data Society (MGED), has developed recommendations for the Minimum Information About a Microarray Experiment (MIAME) that attempt to define the set of information sufficient to interpret the experiment, and the experiment, unambiguously, and to enable verification of the data A set of guidelines for the describing an experiment, and the guidelines are translated into protocols enabling the electronic exchange of data in a standard format The MIAME standard has been adopted and supported by the EBI ArrayExpress database, NCBI GEO and the CIBEX database at the DDBJ Members of MGED joins with Rosetta Inpharmatics lead to the development of the microarray gene expression object model (MAGE-OM) and an XML-based extensible markup language (MAGE-ML) MAGE is now built into a wide range of free available software, including BASE, BioConductor, and TM4.

Microarray image processing
Labeled probe  transform the fluorescence intensity  transcript abundance  most of these steps are done by software provided with commercial scanners Image processing essentially involved four steps (1) image acquisition, (2) spot location, (3) computation of spot intensities, and (4) data reporting (1) image acquisition Raw image of a microarray scan  a 16-bit image file of the intensity of fluorescence associated with each pixel  a number between 0 and (i.e. 216). Higher resolution use a 32-bit image file (i.e. 0 ~ 4*109) However the sources of experimental error are greater than the image resolution ! Gain on the laser – too high  high intensity spots will converge on the same upper value, if the gain is too low  information at the low end is lost in the background Dyes (Cy 3 and Cy5) quench (平息下來) with time, and different rates, it is not a good idea to repeatedly scan the same array (2) spot location Spot location  achieved by laying a grid over the image that places a square or circle around each spot Always imperfections in the spacing of spots  spots must be re-centered by deforming the grid so as to maximize the coverage of the spots by the circles

Microarray image processing
(3) computation of spot intensities Spot intensities = mean intensity for each pixel within the circle surrounding a spot – mean intensity of the background pixels immediately surrounding the spot (4) data reporting Data is usually reported as a tab-delimited text file linkage of the data to genome databases Microarray data or protocols are built on XML-based languages that allow storage and retrieval from public databases

A comparison between cDNA and oligonucleotides arrays
cDNA arrays Oligonucleotide arrays Long sequences Two-color array platforms Short sequences due to the limitations of the synthesis technology. Single color array platforms such as Affymetrix GeneChips™ Spot small DNA sequences, whole genes or arbitrary PCR products. Spot known sequences. More variability in the system. More reliable data. Easier to analyze with appropriate experimental design, but the choice of direct comparisons on each chip may limit the feasibility of other comparisons. . More difficult to analyze. All comparisons are inferred in the sense that different chips are used for each measurement. As a result, chip-to-chip variation can lead to errors in any comparison. Regardless of the choice of platform, one of the most significant aspects of experimental design is determining the level of replication that is necessary to achieve significance in any study. Two general types of replicates: (1) biological replicate - even inbred (近親的) strains of species held under the same conditions (could exhibit fairly significant inter-individual variation in gene expression), (2) technical replicate – use repeated measurements of the same samples

Patterns of gene expression
Deduce gene function based on patterns of expression Uses patterns of gene expression as a biomarker to classify samples Infer gene function by monitoring changes in expression resulting from experimental perturbations. Search for genes exhibiting patterns of expression that differentiate the various groups If the transcriptional differences between groups can be validated, these expression patterns can then be used as “biomarkers” in classifying other experimental subjects. Disadvantage Even simple changes can often produce a large number of transcriptional changes and these may be difficult to link to the underlying biological perturbation. In applications such as these, it is not essential that the genes themselves be linked causally to the underlying disease or other phenomenon that separates the classes. Gene function Functional studies and searches for biomarkers are not mutually exclusive. Ultimately the most useful and informative biomarkers are likely those that can be linked causally to a disease or outcome. Northerns blotting experiments are generally used to test a hypothesis based on biology. Microarrays generate hypotheses that should be tested to validate them.

Summary A comparison between cDNA and oligonucleotides arrays
Patterns of gene expression Deduce gene function based on patterns of expression vs. uses patterns of gene expression as a biomarker to classify samples

Hyperlink to the National library - 國家圖書館全國博碩士論文資訊網, http://etds. ncl
先註冊 – account registration 搜索的關鍵字：keyword search, such as microarray 需有電子全文檔之碩士論文，博士論文也可, Look for full-text PDF thesis only, then download the file 按年份排序 – 從最新近發表開始. Sorting according to the year of publishing 每人一篇，不得相同. Select one thesis to report. 製作PPT，報告論文研究之背景，簡述其方法，研究結果之重要發現，及結論. Prepare a PPT file for your presentation, including background, methodology, important results, and conclusion.

Microarray - Introduction

Similar presentations

Presentation on theme: "Microarray - Introduction"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Microarray - Introduction

Similar presentations

Presentation on theme: "Microarray - Introduction"— Presentation transcript:

Similar presentations

About project

Feedback