On line (DNA and amino acid) Sequence Information Lecture 7.

1 On line (DNA and amino acid) Sequence Information Lecture 7

2 Bioinformatcs Databases The Biological data, generated by various labs, is submitted and stored in specific databases is : The data can be: – Nucleotide: DNA and mRNA (cDNA) – Proteins sequences The main nucleotide sequence databases are: – United states: Genebank (NCBI)Genebank – Europe: Nucleotide sequence database (EMBL)Nucleotide sequence database – Japan: DNA databank of Japan. (DDJB)DNA databank of Japan These databases also contain sequences related to: – Expressed sequence tags (ESTs) small (800 bp) of mRNA that be used to see what genes are expressed…

3 Protein Databases The main protein databases is: Uniprot (DB) databases contains data from three related databases sites: Uniprot (DB) – SWISS-PROT (most up-to date information) SWISS-PROT – Trembl: (translation of coding sequences.) Trembl – PIR database [protein information resource] PIR Both the nucleotide and protein databases contain much more detail than just sequences. The data is generated is referred to gene annotated data.

4 The Annotation of genes Once the gene sequence’s have been determined then the data must be annotated, This basic annotated data includes: (Klug 2010) – Identify regulatory regions – Identify coding sequences (cds); the exons/ introns (if a sequence; eukaryotic)…. – The amino acid sequence for the gene. – Other organisms where the DNA sequence/ AA sequence is to found – Journals/Reference to where data came from. – Links to other databases that contain information about the gene, 4Global Sequence

5 Bioinformatics Database To faciliate finding annotated data about genes and protein information there are a number of sites containing specific search engines; – NCBI has ENTREZENTREZ – EMBL has the EBI search page previously SRS engineEBI search pageSRS engine – The SIB ExPaSy search engine (This is more fosuces on protein related information. )ExPaSy search engine Consider the following query: – What is the DNA and amino acid sequence for the following gene: Human BTEB – Type the following into the search text box: – Human[orgamism] AND BTEB[title]

BTEB NCBI Nucleotide Record

Coding section of gene The Exon intron structure is also available in graphic form

9 Further information On the right hand column you will find links to online analytical resources; e.g. BLAST (psi- blast) (a tool to search for similar sequences contained in the database): Information on the amino acid sequence obtained for the CDs of the gene. The text box also provides a link to information on the protein in the uniprot database.

10 An EMBL nucleotide record Annotated data can also be found in the EMBL database: BTEB EMBL record.: shows the main record. BTEB EMBL record Clicking on the “text” link at the top right hand corner will give the essential features of the gene. BTEB-EMBL-EBI_text_record.BTEB-EMBL-EBI_text_record An ExPASy database search gives the following information for this gene: Type BTEB and then BTEB and HumanExPASy

11 The BTEB Protein record A link to a graphic representation of the protein and the relevant annotated data can be found at: BTEB Human ProteinBTEB Human Protein

12 Other databases databases The nucleotide (Genbank and EMBL) and protein (Uniprot) contain the “raw data” and are referred to as “primary databases”. – More specific databases derive data from these and are referred to as secondary database; examples include protein family and sequence similarity databases such as PROSITE and PRINTSPROSITEPRINTS – There are databases which contain information about specific organisms such as e. coli using Genome online database (GOLD)GOLD

13 Other databases – Databases for specific types of sequences such as those associated with promoters and other regulatory elements. dbEST ; Homologous structure alignment database.dbESTHomologous structure alignment database. – Structural databases from the Protein Data BankProtein Data Bank – On-line Mendelian inheritance of man (OMIM) which contains information on human genes and genetic disorders. On-line Mendelian inheritance of man The nucleic acids research journal January edition provides up-to-date analysis of current online bioinformatics databases: Nucleic acid research database editionNucleic acid research database edition

14 Other important information sources PUBMED: Literature research: journal articles/ conference proceedings/ books etc. – Search under many fields: keyword, author…. – Returns: journal articles/abstracts – Two types: general/review. – BTEB pubmed search found at: tailsSearch tailsSearch The user can register a NCBI account to manage their activity and store findings of: gene searches; pubmed searches…. This information can be download, ed….

17 Exercise The EMBL-EBI record: BTEB_”text”_record.BTEB_”text”_record The NCBI : BTEB NCBI Nucleotide RecordBTEB NCBI Nucleotide Record The DDJB: BTEB flatfile RecordBTEB flatfile Record Exercise: write a briefy report comparing and contrasting the core elements of both records: refer to page 8-16 in Bioinformatics: A practical guide to the analysis of genes and proteins 3 rd edition ; Book can be found in the library.

18 Exercise Search for the following gene “DNA” sequence: – Human Leukocyte Elastase gene linear DNA [ hint should be 5292 bp long]. – Retrieve the record and download and save the fasta file.

