Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly A+ select) Fragment RNA Make and amplify cDNA Sequence the ends of the cDNA Map the sequences to the human genome Count the number of sequence tags at each known gene (and at locations for which no gene is known) Correct for background Analyze the data
PROCEDURE - Solexa –Cluster Amplification 1) Load Samples to Flow Cell 8 Lanes are loaded onto the flow cell for simultaneous analysis 2. Attach DNA to Surface Single stranded DNA fragments bind randomly to the inside surface of the flow cell. 3. Bridge Amplification Unlabeled nucleotides and enzyme are added to initiate solid-phase bridge amplification.
4. Fragments Become Double Stranded The enzyme incorporates nucleotides to build double stranded bridges on the solid-phase substrate. Denaturation leaves single- stranded template anchored to the substrate Several million dense clusters of double stranded DNA are generated in each channel of the flow cell. PROCEDURE - Solexa –Cluster Amplification 5. Double Stranded Molecules are Denatured 6. Amplification is Completed
PROCEDURE- Solexa Sequencing & Genome Analyzer 1. Determine 1 st Base The first sequencing cycle is initiated by adding all 4 labeled reversible terminators, primers, and DNA polymerase to the flow cell After laser excitation, an image of the emitted fluorescence from each cluster on the flow cell is captured. The 2 nd sequencing cycle is initiated by adding all 4 labeled reversible terminators and enzymes. 2. Image 1 st Base3. Determine 2 nd Base
PROCEDURE- Solexa Sequencing & Genome Analyzer 4. Image 2 nd Base After laser excitation, image data is collected like before. The identity of the 2 nd base for each cluster is recorded. 35 cycles of sequencing are repeated to determine the sequence of bases in a given fragment a single base at a time. Align data and map the sequences to the reference genome. 5. Sequence Read Continues Over Multiple Chemistry Cycles 6. Align and Map Data
What are the similarities and differences? What can you learn with each one? –What can you learn from one but not from the other? How is the primary data acquired? How are systematic biases eliminated? –How do you normalize How would you look for differential expression? How would you cluster? How can you combine data from multiple experiments? –Which is more sensitive? What kinds of additional software do you need?
References Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods (2008) 5: 621. http://www.illumina.com/pages.ilmn?ID=203