Presentation on theme: "Dideoxy or Chain Termination Method Determining precise sequence of nucleotides in a DNA sample DNA SEQUENCING 1974 Maxam/Gilbert (USA) (Chemical Cleavage."— Presentation transcript:
Dideoxy or Chain Termination Method Determining precise sequence of nucleotides in a DNA sample DNA SEQUENCING 1974 Maxam/Gilbert (USA) (Chemical Cleavage protocol) Sanger (England) (natural process of DNA replication) * Phosphodiester bond cannot form with incoming nucleotides leading to termination of DNA synthesis
Chromatogram: Coloured peaks corresponding to specific nucleotides in specific locations
Expressed Sequence Tags (ESTs) ESTs are short DNA sequences (usually 200 to 500 nt) that are generated by sequencing either one or both ends of an expressed gene. These small strands of DNA represent genes expressed in certain cells from different organisms. They are used as "tags" to fish a gene out of a portion of chromosomal DNA by matching base pairs. Challenge of gene identification: - dependent upon genome size (number of introns/exons). Isolating mRNA is key to identifying expressed genes. You now want to work with this mRNA sequence. However, mRNA is very unstable outside the cell. Therefore the mRNA is converted to cDNA (complementary DNA) which is much more stable outside the cell. cDNA is synthesized by reverse transcribing the mRNA. Since cDNA is made directly from mRNA, it represents ONLY expressed DNA sequence (the gene!)
Reverse Transcriptase Reverse transcriptase is a common name for an enzyme that functions as a RNA-dependent DNA polymerase. They are encoded by retroviruses, where they copy the viral RNA genome into DNA prior to its integration into host cells. Reverse transcriptases have 2 Main Functions: 1. DNA polymerase activity: Needs primer to initiate synthesis (oligodT) 2. RNase H activity: RNase H is a ribonuclease that degrades the RNA from RNA-DNA hybrids, such as are formed during reverse transcription of an RNA template. All retroviruses have a reverse transcriptase, but the enzymes that are available commercially are derived from one of two retroviruses, Moloney murine leukemia virus: a single polypeptide * Avian myeloblastosis virus: composed of two peptide chains
Use T7 sequencing primer Your plasmid will be sent for sequencing this week. I will analyze each EST sequence using the database prior to submission Sequence should be posted by Tues, Feb 6. Your assignment 1 is due Feb 19 giving you sufficient time to finish it EcoRI HindIII
Until your sequence comes in, you may use the sample sequence posted on the course website to practice how to navigate your way around the NCBI website.
Results and Assignments: Results are handed in with the assignments (See course Website for details) Although the assignments rely on group results, each student will hand in their own results and assignment. I cannot help you with the assignments. Why? They are essentially your “tests” since there are no midterm exams Results and Assignment 1 is the most time consuming so start the assignment as soon as your EST sequence is uploaded onto the course website. Meanwhile, I recommend you to do the following: -Learn how to use and explore the Honeybee Database using sample sequence -There is extensive on-line help on how to use this site -You must learn to do this independently
Page 1: Picture of your agarose gel from experiment 1 - You must LABEL your lane very clearly. Include scientific figure legend and comments about the gel. Page 2: Print out of the ‘Mock Sequence Submission’ Pages 3 and 4: Analysis of the results (2 pages only) 12 point font 1.5 line spacing 1 inch margins Figures included in the 2 pages This must be individual work; no collaboration. You will use NCBI Website (http:/www.ncbi.nlm.nih.gov/) to do this assignment. PURPOSE OF ASSIGNMENT 1: To test your ability to learn how to use an electronic database available for genetic research. (The Honeybee Genome Database)
Assignment 1 Format (Con’t) -Incorporate answers to all questions into 2 pages. -Figures should be useful and kept to a minimal size. 6. To get full marks you will have to explore the web for at least 4 other important information about the gene that you have not been asked. For example, what do we know about the role of the gene in other organisms? This is just one example, you don’t have to use this. Not a figure Good figure
What is NCBI? The National Center for Biotechnology Information was established in 1988 as a national resource for molecular biology information. It creates public databases, conducts research in computational biology, develops software tools for analyzing genome data. The goal is for this information to ultimately lead to a better understanding of molecular processes affecting human health and disease
EST Sequence Submission and Analysis 1.Mock Sequence Submission To GenBank GenBank is a large database kept by the US government which includes nucleotide sequences submitted by scientists worldwide. Using Sequin to submit data to GenBank: -software developed by NCBI to submit and update sequences to GenBank, EMBL, DDBJ. -goal of sequin is to convert raw sequence data to an assembled record that can be viewed, edited and submitted to databases. -data is normally submitted in FASTA format. 2. Analysis of EST sequence BLAST (Basic Local Alignment Search Tool) compares a querynucleotide (or protein) sequence to a database of known sequences in order to determine similarity to previously published sequences. If the sequence has not yet been published, it provides insight into the function of the DNA or protein.
BLAST Search blastn compares your query nucleotide sequence with database nucleotide sequences. blastx translates your query sequence into amino acids and then compares the protein sequences with protein databases.
FASTA is a DNA and Protein Sequence Alignment software package first described in 1988. It stems from the original FASTP (protein vs All). The FASTA program is a program which can be used to search protein or DNA sequence data bases and can compare a protein sequence to a DNA sequence data base by translating the DNA data base as it is searched. The FASTA program follows a largely heuristic method which contributes to the high speed of its execution. The format allows for sequence names and comments to precede the sequences. Example: >gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus] LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNL VEWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDF LGLLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX IENYElephas maximus maximus FASTA Format
Sequence uploaded on course website What does this sequence represent?