Presentation is loading. Please wait.

Presentation is loading. Please wait.

Function preserves sequences

Similar presentations


Presentation on theme: "Function preserves sequences"— Presentation transcript:

1 Function preserves sequences
Christophe Roos - MediCel ltd Mutations change sequences Molecular evolution Function preserves sequences Part 3: sequence databases & comparisons Similarity is the result of conservation or converging evolution – it has its reason of being

2 The public biological databases
EMBL or GenBank or DDBJ for DNA emblnew for daily updates, merges the main DB 4x/year SwissProt or PIR for proteins Trembl, tremblnew, remtrembl PDB for structures In flat file format, yet quite informative and convertible Fasta format is a ‘universal’ sequence format: first line starts with ‘>’ followed by free text. Second line has the start of the sequence (50 or 60 characters per line). Use the first line for the name or the Accession Number (AC) A wealth of databases. Here some of the most central ones. Christophe Roos - 3/6 Sequence databases & comparison Spring 2002

3 Database homes The European database home is in Hinxton, Cambridge, UK: European Bioinformatics Institute - EBI Access through the Sequence Retrieval System, SRS The American database home is in Washington DC: National Center for Biotechnology Information – NCBI Access through Entrez Both centers exchange their data on a daily basis, however there are differences in annotations, consistency, speed and quality. There is also a Japanese database provider, DDBJ. Public databases, corporate databases, private databases, merged databases, primary databases, derived databases, databases with more errors or less accuracy... There is one for every taste. Go to EBI or NCBI to start with. Christophe Roos - 3/6 Sequence databases & comparison Spring 2002

4 A look at one entry from EMBL
From the EBI home page, take the SRS-link under Databases or go directly to . Choose from the Top Page the database (in this case EMBL) and ask for a standard or extended query page (bottons at left). Use the field specifications and enter the query string. To get information on what fields are available and what they mean, follow from the EBI main page the links to the databases  EMBL  Documentation  User manual. part 1/3 Christophe Roos - 3/6 Sequence databases & comparison Spring 2002

5 A look at one entry from EMBL
Each line in the database is described in . part 2/3 Christophe Roos - 3/6 Sequence databases & comparison Spring 2002

6 A look at one entry from EMBL
The feature table of the entry contains several linked items, such as exon-assembly (mRNA) and coding sequence (CDS). There are also cross-references to other databases Some kind of integration is available. Some databases cross-refer to other databases. For example the protein databases might refer to DNA databases, since the DNA has the coding sequences. part 3/3 Christophe Roos - 3/6 Sequence databases & comparison Spring 2002

7 A look at one entry from SwissProt
The eyeless gene: a master regulatory gene in eye formation Use SRS to find the fruitfly protein for the eyeless gene. Search for ’Drosophila melanogaster’ in the organism field and ’eyeless’ in the description field selectable on the query page. Two entries are listed (April 2002): pax6_drome and O96791, the latter being only a fragment of the full protein. Christophe Roos - 3/6 Sequence databases & comparison Spring 2002

8 The effect of the eyeless gene
The eyeless gene is a master regulatory gene in eye formation When it is absent, no eyes are formed When it is present where it should not, it induces eye formation Normal The eyeless gene has been conserved during evolution: it is used in the formation as such various eyes as the insect facette eye, the octopus, the owl or the human eye. It is responsible for turning on the eye development program. When it is mutated, no eyes are formed (in the human, no iris is formed). When its expression is artificially induced where it is normally not expressed, it induces eye-like development. Overexpressed in antennae and wings Absent Christophe Roos - 3/6 Sequence databases & comparison Spring 2002

9 A look at one entry from SwissProt
Part 2: the annotations about the function and location Use SRS to find the fruitfly protein for the eyeless gene. Search for ’Drosophila melanogaster’ in the organism field and ’eyeless’ in the description field selectable on the query page. Two entries are listed (April 2002): pax6_drome and O96791, the latter being only a fragment of the full protein. Christophe Roos - 3/6 Sequence databases & comparison Spring 2002

10 A look at one entry from SwissProt
Part 3: The feature table and the amino acid sequence Use SRS to find the fruitfly protein for the eyeless gene. Search for ’Drosophila melanogaster’ in the organism field and ’eyeless’ in the description field selectable on the query page. Two entries are listed (April 2002): pax6_drome and O96791, the latter being only a fragment of the full protein. Christophe Roos - 3/6 Sequence databases & comparison Spring 2002

11 A look at one entry from SwissProt
The eyeless gene is also called PAX6 and can be found in several species: birds, mammals, reptiles, fish, invertebrates Use SRS to find the other PAX6 proteins. They are found in chicken, quail, human, mouse, frog, fishes and fruit fly from the SwissProt databases and are therefore expected to be found in real life in many more species. Extract the sequences by choosing ’Save’. Keep all sequences (except the first, which is a fragment only) for future use in sequence comparison. Christophe Roos - 3/6 Sequence databases & comparison Spring 2002

12 Sequence comparison - Why?
Function by analogy: If sequences are conserved their function is probably also conserved. Functional domains: If some parts of the sequences are more conserved than other parts, there must be an underlying biological reason for it. Establishing relationship/differences in function: By quantification of sequence relationships it is possible to estimate function of novel genes Establishing relationship between species Pairwise sequence comparison is the primary means of linkinng biological function to a sequence and of propagating known information from one sequence to another. This applies for individual sequences as well as whole genomes. Christophe Roos - 3/6 Sequence databases & comparison Spring 2002

13 Sequence comparison – how?
Compare two sequences of similar length Compare two sequences of very different length Compare several sequences Allow gaps or not? Scoring: yes-no or good-intermediate-bad The best or all above a threshold? Knowledge-based single sequence analysis for sequence characteristics, pairwise sequence comparisons and sequence-based searching or multiple sequence alignments: The criteria for comparing varies and the metrics must be formalised. Christophe Roos - 3/6 Sequence databases & comparison Spring 2002

14 Sequence comparison – metrics
gap match GA-CGGATTAG GATCGGAATAG The scoring matrix The score for a match The penality for a mismatch The penality for the insertion of a gap (gap-open) The penality for elongating a gap (gap-length) Local or global similarities ? mismatch We must define what is biologically similar and what not. It is also good to grade the scale to more than two values. Christophe Roos - 3/6 Sequence databases & comparison Spring 2002


Download ppt "Function preserves sequences"

Similar presentations


Ads by Google