Presentation is loading. Please wait.

Presentation is loading. Please wait.

TEMPLATE DESIGN © 2008 www.PosterPresentations.com BIOINFORMATICS REFERENCES Your name and the names of the people who have contributed to this presentation.

Similar presentations


Presentation on theme: "TEMPLATE DESIGN © 2008 www.PosterPresentations.com BIOINFORMATICS REFERENCES Your name and the names of the people who have contributed to this presentation."— Presentation transcript:

1 TEMPLATE DESIGN © BIOINFORMATICS REFERENCES Your name and the names of the people who have contributed to this presentation go here. The names and addresses of the associated institutions go here. BIOLOGICAL SEQUENCE B IOINFORMATICS is about searching biological databases, comparing sequences, looking at protein structures, and (more generally) asking biological and biomedical questions with a computer. It is the computational branch of molecular biology. ANALYZING PROTEIN SEQUENCE  Steak eating familiarizes you with protein  Proteins are found in both fish and vegetables  They are made up of the same basic building blocks known as Amino Acids – these are complex organic molecules, called carbon, hydrogen, oxygen, nitrogen, and sulfur atoms. PROTEINS Proteins are like small machines in the cell. Proteins carry out most of the work in a cell. Proteins are synthesized from RNA sequences. Proteins are like small machines in the cell. Proteins carry out most of the work in a cell. Proteins are synthesized from RNA sequences. AMINO ACIDS Proteins are made of 20 amino acids. Each amino acid is small molecule made up of fewer than 100 atoms. The 20 amino acids have similar terminations; they can be chained to one another like Lego bricks. PROTEIN SEQUENCES Proteins are made of amino acids chained by peptide bonds. Protein sequences are written from the N to the C-terminus. Your average protein is 400 amino acids long. The longest protein is 30,000 amino acids long. Proteins have well-defined 3-dimensional structures. Hydrophobic amino acids are in the protein’s core. Hydrophilic amino acids are on the protein’s surface. PROTEIN STRUCTURES Proteins have well-defined 3-dimensional structures. Hydrophobic amino acids are in the protein’s core. Hydrophilic amino acids are on the protein’s surface. DNA: DeoxyriboNucleic Acid Genomes and genes are made of DNA DNA is the main support of heredity DNA SEQUENCES DNA sequences are made of 4 nucleotides Adenine A Guanine G Cytosine C ThymineT DNA Sequences can be very long Human chromosomes contain hundreds of millions of nucleotides NUCLEOTIDES Nucleotides have similar terminations. Nucleotides are meant to be chained like Lego bricks. Nucleotides can interact with each other: Adenine with thymine (A with T) Guanine with cytosine (G with C) A tiny bacterium can contain a genome of several million nucleotides DOUBLE-STRAND DNA DNA sequences always come in two strands. The strands are complementary and opposite in orientation. By convention, biologists write only the 5’ and 3’ strands. Database-search programs search both strands automatically. RNA: Ribonucleic Acid RNA is a close relative of DNA RNA has many functions Provides coding for proteins Helps synthesize proteins Helps many basic processes in the cell RNA is not very stable RNA is synthesized and very often degraded DNA, by contrast, is very stable THE RNA SEQUENCE RNA contains 4 nucleotides: A, G, C, U U is Uracil RNA does not contain Thymine (T) Uracil replaces Thymine in RNA RNA is single-stranded RNA SECONDARY STRUCTURES RNA can make secondary structures RNA can make 1 strand with itself as a secondary structure Secondary structures are made of stems and loops PUBMED/MEDLINE MULTIPLE-SEQUENCE ALIGNMENTS (MSAS) RETRIEVING PROTEIN SEQUENCES IN SWISS-PROT TYPICAL PROKARYOTIC GENOME GENBANK EXPLORING THE HUMAN GENOME WITH ENSEMBL OPTIONAL LOGO HERE TURNING DNA INTO PROTEINS: THE GENETIC CODE DNA gets transcribed into RNA using nucleotide complimentarily. RNA gets translated into proteins using the genetic code: UCU UAU GCG UAA SER-TYR-ALA-STOP PubMed is a database containing all the recent scientific publications in biology PubMed is free You can search PubMed using any keyword you are interested in. Open Type your favorite keywords Press Return or Enter Click the Limits tab Check the boxes you are interested in, such as Review English AIDS Restrict the search with fields [AU] Author [SO] Source (journal) [TI] Title [AD]Address [MH]Keywords The words will be searched only in the corresponding fields Medline contains only papers published after 1965 Use no more than 10 names for papers before 1995 Swiss-Prot is a database containing all the proteins with known functions Swiss-Prot is available from the ExPAsy server at ExPASy: Expert Protein Analysis System ExPASy contains many useful online tools Each Swiss-Prot entry is dedicated to a protein A Swiss-Prot entry summarizes everything that is known about a given protein The entry contains functional information and links to other databases mentioning this protein LOOKING FOR DNA SEQUENCES There are many types of DNA sequences The most common are Regulatory regions, often before genes Untranslated regions, often around the genes Protein-coding regions Intergenic regions (between the genes) All these sequences can be found in GenBank FETCHING A DNA SEQUENCE AT THE NCBI Navigate to nk/ nk/ Type in a keyword. Press Return or Enter. You get a list of entries matching your keyword. Point, click, and explore… Multiple alignments reveal common features between sequences Multiple alignments are useful for :- C omparing very different sequences, Making phylogenetic trees, Making structure predictions Multiple-sequence alignments are abbreviated as MSAs MAKING AN MSA WITH M-COFFEE Open Click MCoffee::Regular Cut and paste your sequences Submit your MSA MAKING SENSE OF YOUR MSA Positions are marked: Completely conserved = asterisk ( * ) Highly conserved = colon (:) Conserved = period (.) Look for highly conserved blocks: The red box on this slide shows a highly conserved block. These blocks are often functionally important positions. PROKARYOTIC ORGANISMS - are organisms lacking a true nucleus. EUKSRYOTIC ORGANISMS - are organisms having a true nucleus. GENE – is defined as the contiguous genome segment encompassing all the nucleotide-sequence information necessary to bring about its successful expression – that is, the production of protein or RNA. The 3 most basic classes of living organism are the - PROKARYOTES – such as bacteria, ARCHAEA – these are bacteria-like organisms living in extreme conditions), and THE EUKARYOTES – going from microscopic yeast to humans, animals, and plants. FOR BIOINFORMATICS – Prokaryotes and Achaea are very much the same – with few exceptions. TYPICAL PROKARYOTIC PROTEIN - CODING GENE The gene has an uninterrupted sequence Prokaryotic mRNA contains The Ribosome Binding Site (RBS) The Open Reading Frame (ORF) in one piece In operons, the RNA can contain several ORFs Eukaryotes can be small (yeast) or big (whales) Genomes are made of linear pieces of DNA called chromosomes One chromosome: 10 to 700 Mb The Human Genome Contains 22+1 chromosomes Is 3 Gb long One gene every 100 Kb (human) 5 % of the genome is coding for proteins Prokaryotes Genome=one large circular chromosome + a few small circular chromosomes (plasmides) 0.5 to 8 Mb / chromosome Genes in one piece 70% of the genome is coding 1 gene / Kb Eukaryotes Genome= many large linear chromosomes 10 to 700 Mb / chromosome Genes split 5% of the genome is coding 1 gene/ 100 Kb (Human) PROKARYOTES VS. EUKARYOTES Housed by the National Center for Biotechnologies (NCBI) GenBank is the memory of biological science Contains EVERY DNA sequence ever published GenBank is the original information source for most biological databases GenBank is more complicated to use than gene-centric databases ACCESSION is the accession number Unique to each entry Permanent LOCUS contains information on gene size ORGANISM Defines the organism containing the gene REFERENCE indicates who produced the sequence FEATURES lists some functional features of the gene GenBank entries can contain more than one gene READING A PROKARYOTIC GENBANK ENTRY Accessible at ENSEMBL is a database of eukaryotic genomes Annotated entries Wide range of examples: human, mouse, dog, and so on ENSEMBL annotation is mostly automated ENSEMBL contains tools to Browse the complete genome Search the complete genome with BLAST Visualize the position of a gene Visualize all experimental information on this gene (transcripts) By pointing on a chromosome region you can zoom inside the chromosome All genes are cross-indexed with databases so you can find all related experimental information


Download ppt "TEMPLATE DESIGN © 2008 www.PosterPresentations.com BIOINFORMATICS REFERENCES Your name and the names of the people who have contributed to this presentation."

Similar presentations


Ads by Google