Presentation is loading. Please wait.

Presentation is loading. Please wait.

GE3M25: Computer Programming for Biologists Python, Class 5

Similar presentations


Presentation on theme: "GE3M25: Computer Programming for Biologists Python, Class 5"— Presentation transcript:

1 GE3M25: Computer Programming for Biologists Python, Class 5
TCD, 08/12/2015 Karsten Hokamp, PhD Genetics

2 Overview http://bioinf.gen.tcd.ie/GE3M25/ Recap Modules Dictionaries
Working from the command line Weekly task

3 Recap Collections: list(), tuple(), set()
Special methods: 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort' Special functions: all(), any(), len(), max(), min(), sorted(), sum(), zip()  Find out more through help() function

4 Exercise: Create a variable 'seq' containing a DNA string
Create a list 'dna1' from the DNA string Create a tuple 'dna2' from the DNA string Create a set 'dna3' from the DNA string Compare structure and content of the collections Try to access the first element of each collection Try to modify the first element of each collection Add an element to each of your collection Try to remove the last element from each of your collection

5 Weekly task: Read in a DNA sequence in FASTA format from a file
Prompt the user for a short motif Split the sequence at the sites that match Print the fragment lengths in sorted order Do not report fragments of zero length

6 Python modules Software packages that add functionality
Part of distribution (random, math, string, ...) External packages: wiki.python.org/moin/UsefulModules

7 Python modules Load module: import module_name Use module:
module_name.variable module_name.method() Documentation: help(module_name)

8 Python modules Examples: import random random.random()  0.231185
random.randint(1,10)  3 random.choice('ACGT')  'G'

9 Python modules Exercises: Create a random number
Create a random integer between 50 and 100 Get a random letter from the word 'mississippi' Check out the help for module 'string' Print all small letters, one per line Sort the ascii_letters string, which letter is first? Check out the help for module 'math' Calculate the log2 value of 0.5 Print the value of pi

10 Python modules Exercise:
Revisit the script 'gene_list.py' from last lesson Change it to read a file name from the command line (instead of hard-coding it into the script) Tip: Use module 'string' , object 'argv' Run your script from the command line: python3 gene_list.py ~/Downloads/gene_list.txt

11 Exercise: Read in a file with probe ids, gene ids, fold-change and p-values, separated by tab Print out only gene ids and fold-change Print out gene ids and fold-change as log2 values 3. Print all the lines with absolute fold-change > 2 and p-value <= 0.05 Print values to a file instead of the screen

12 DNA  Protein translation
Process a DNA string three nucleotides at a time Translate that codon Print the amino acid

13 DNA  Protein translation
Process a DNA string three nucleotides at a time dna = 'ATGCCAGGTTTACACGGT' codon = dna[0:3] print(codon) 3 6 15

14 DNA  Protein translation
Process a DNA string three nucleotides at a time dna = 'ATGCCAGGTTTACACGGT' i = 0 codon = dna[i:i+3] print(codon) 3 6 15

15 DNA  Protein translation
Process a DNA string three nucleotides at a time dna = 'ATGCCAGGTTTACACGGT' for i in range(0, 16, 3) : codon = dna[i:i+3] print(codon) 3 6 15

16 DNA  Protein translation
Process a DNA string three nucleotides at a time dna = 'ATGCCAGGTTTACACGGT' for i in range(0, len(dna)-2, 3) : codon = dna[i:i+3] print(codon) 3 6 15

17 DNA  Protein translation
2. Translate the codon dna = 'ATGCCAGGTTTACACGGT' for i in range(0, len(dna)-2, 3) : codon = dna[i:i+3] if codon == 'AAA' : print('K') elif codon == 'AAC' : print('N')

18 DNA  Protein translation
2. Translate the codon dna = 'ATGCCAGGTTTACACGGT' for i in range(0, len(dna)-2, 3) : codon = dna[i:i+3] if codon == 'AAA' : print('K') elif codon == 'AAC' : print('N') We need a look-up table!

19 Dictionary Collection of key-value pairs Symbols: {} and []
Initialisation: table = {} table = dict() Storing values: table = { 'AAA' : 'K', 'AAG' : 'K' } table['AAC'] = 'N' key value

20 Dictionary Accessing keys and values aa = table['AAC']
aa = table[codon] codons = table.keys() amino_acids = set(table.values()) for codon in table.keys() : print("translate %s into %s" %

21 Dictionary Exercise: Generate one million random integers from 1 to 10
Use a dictionary (occ) to count how often each integer occurs Calculate and print the frequency of each integer Tips: check if a key exists: if key in occ.keys() increase value to an existing key: occ[key] += 1

22 Dictionary Look-up table for codons:

23 Dictionary Generate table on the fly:

24 Dictionary Exercise: Read a DNA sequence from a file and translate it into a protein sequence Make it work for upper and lower case

25 Weekly task 5 Option a: 100 HOXA protein sequences have been assembled from UniProt First align the sequences with the tool of your choice from the EBI website and then load the tree file into TreeDraw. Use the controls to generate a tree that is best suited to indicate the clustering of sequences and relationships between genes from different species. Submit an image of your tree together with a short description of how you generated the alignment and the tree and a discussion of the presented relationships. Possible points of discussion: Can you think of a suitable sequence to use for rooting the tree? Can you detect any inconsistencies/surprises in the tree in respect to known/expected evolutionary relation of species?

26 Weekly task 5 Option b: Write a Python script that does the following:
Read in a DNA sequence from a file in Fasta format Translate the DNA into a protein sequence and print to the screen Repeat mutating one nucleotide at a time and stop if a) the start codon is changed b) a stop codon is introduced before the end of the sequence Report for each mutation where it occurs and what substitution is made

27 Weekly task 5 To be submitted by e-mail to kahokamp@tcd.ie
before Thursday, 17th December, 5 pm


Download ppt "GE3M25: Computer Programming for Biologists Python, Class 5"

Similar presentations


Ads by Google