Presentation is loading. Please wait.

Presentation is loading. Please wait.

BioPython Workshop Gershon Celniker Tel Aviv University.

Similar presentations


Presentation on theme: "BioPython Workshop Gershon Celniker Tel Aviv University."— Presentation transcript:

1 BioPython Workshop Gershon Celniker Tel Aviv University

2 Introduction The Biopython Project is an international association of developers of freely available Python (http://www.python.org) tools for computational molecular biology. Python is an object oriented, interpreted, exible language that is becoming increasingly popular for scientific computing. Python is easy to learn, has a very clear syntax and can easily be extended with modules. The Biopython web site (http://www.biopython.org) provides an online resource for modules, scripts, and web links for developers of Python-based software for bioinformatics use and research. Basically, the goal of Biopython is to make it as easy as possible to use Python for bioinformatics by creating high-quality, reusable modules and classes. Biopython features include parsers for various Bioinformatics file formats(BLAST, Clustalw, FASTA, Genbank,...), access to online services (NCBI, Expasy, Clustalw, DSSP, MSMS...) Basically, we just like to program in Python and want to make it as easy as possible to use Python for bioinformatics by creating high-quality, reusable modules and scripts. https://github.com/biopython/biopython/tree/master/Doc/exa mples

3 Introduction The full tutorial located here: http://biopython.org/DIST/docs/tutorial/Tutorial.html Example files are located here: https://github.com/biopython/biopython/tree/master/Doc/examples

4 BioPython, Lets try it!

5 FASTA format http://en.wikipedia.org/wiki/FASTA_format http://en.wikipedia.org/wiki/FASTA_format FASTA is pronounced "fast A", and stands for "FAST-All", because it works with any alphabet, an extension of "FAST-P" (protein) and "FAST-N" (nucleotide) alignment.

6 Lets write our first parsing script Parsing sequence File formatsCypripedioideae (this is the subfamily of lady slipper orchids). This search gave me only 94 hits, which I saved as a FASTA - ls orchid.fasta >gi|2765658|emb|Z78533.1|CIZ78533 C.irapeanum 5.8S rRNA gene and ITS1 and ITS2 DNACGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGATGAGACCGTGGAATAAACG ATCGAGTGAATCCGGAGGACCGGTGTACTCAGCTCACCGGGGGCATTGCTCCCGTGGTGACCCTG ATTTGTTGTTGGG Notice that the FASTA format does not specify the alphabet, so Bio.SeqIO has defaulted to the rathergeneric SingleLetterAlphabet() rather than something DNA specic.

7 Lets write our first parsing script Output: gi|2765658|emb|Z78533.1|CIZ78533 Seq('CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGATGAGACCGTGG...CGC', SingleLetterAlphabet()) 740... gi|2765564|emb|Z78439.1|PBZ78439 Seq('CATTGTTGAGATCACATAATAATTGATCGAGTTAATCTGGAGGATCTGTTTACT...GCC', SingleLetterAlphabet()) 592

8 Sequence slicing Output: gi|2765658|emb|Z78533.1|CIZ78533

9 GC content exercise Output: My seq legnth: 32 G: 9

10 Transcription Output:

11 Translation

12 Translation tables

13 Translation – continued

14 Retrieving data from the net Output: O23729 CHS3_BROFI RecName: Full=Chalcone synthase 3; EC=2.3.1.74; AltName: Full=Naringenin-chalcone synthase 3; Seq('MAPAMEEIRQAQRAEGPAAVLAIGTSTPPNALYQADYPDYYFRITKSEHLTELK...GAE', ProteinAlphabet()) Length 394 ['Acyltransferase', 'Flavonoid biosynthesis', 'Transferase']

15 Parsing data from fasta – part B

16 Alignment

17 Blast

18 Plots

19 Plots - result

20 Going 3D: The PDB module Bio.

21 Going 3D: The PDB module Bio.


Download ppt "BioPython Workshop Gershon Celniker Tel Aviv University."

Similar presentations


Ads by Google