# Primitives.

Primitives

Simple Values logical (Boolean) integer float string None

Expressions Numeric Operators Logical Operations String Operations

Calls Function calls Method calls len('TATA')
print('AAT', 'AAC', 'AAG', 'AAA') Method calls 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'.count('DL')

Running Python interactively
Ctrl-A Go to the beginning of the line. Ctrl-E Go to the end of the line. Ctrl-B or left arrow Move one character to the left. Ctrl-F or right arrow Move one character to the right. Backspace Delete the preceding character. Ctrl-D Delete the next character. Ctrl-K Delete the rest of the line after the cursor. Ctrl-Y “Yank” the last killed text into the line at the location of the cursor. Ctrl-_ (underscore) Undo; can be repeated. Ctrl-R Search incrementally for a preceding input line. Ctrl-S Search incrementally for a subsequent input line.

Tracebacks NameError: 'Non' is not defined
Python doesn’t recognize a name (more on this in the next chapter). IndexError: string index out of range For a string of length N, an index (i.e., the value between square brackets) must be in the range -N <= index < N-1. SyntaxError Python syntax violation. ZeroDivisionError /, //, or % with 0 as the second operand.

Names, Functions, and Modules

Names bound to objects

Names in different namespaces bound to objects

Assigning Names

Defining Functions

Function calls

Function returns

Do-nothing statement

Example def validate_base_sequence(base_sequence):
"""Return True if the string base_sequence contains only upper- or lowercase T, C, A, and G characters, otherwise False""" seq = base_sequence.upper() return len(seq) == \ seq.count('A') + seq.count('G') + \ seq.count('T') + seq.count('C')

GC content of a given DNA sequence
def gc_content(base_seq):

def gc_content(base_seq): """"Return the percentage of G and C characters in base_seq""" seq = base_seq.upper() return (seq.count('G') + seq.count('C')) / len(seq)

Assertion

def gc_content(base_seq): """Return the percentage of G and C characters in base_seq""" assert validate_base_sequence(base_seq), \ 'argument has invalid characters' seq = base_seq.upper() return ((base_seq.count('G') + base_seq.count('C')) / len(base_seq))

Default Parameter Values
def validate_base_sequence(base_sequence, RNAflag=False): """Return True if the string base_sequence contains only upper- or lowercase T, C, A, and G characters, otherwise False""" seq = base_sequence.upper() return len(seq) == ( seq.count('A') + seq.count('G') + seq.count('U' if RNAflag else 'T') + seq.count('C'))

Using Modules

Using Modules from random import randint
def random_base(RNAflag = False): return ('UCAG' if RNAflag else 'TCAG')[randint(0,3)] def random_codon(RNAflag = False): return random_base(RNAflag) + random_base(RNAflag) + random_base(RNAflag)

Python Files def test(): assert validate_base_sequence('ACTG')
def validate_base_sequence(base_sequence, RNAflag = False): """Return True if the string base_sequence contains only upper- or lowercase T (or U, if RNAflag), C, A, and G characters, otherwise False""" seq = base_sequence.upper() return len(seq) == (seq.count('U' if RNAflag else 'T') + seq.count('C') + seq.count('A') + seq.count('G')) def gc_content(base_seq): """Return the percentage of bases in base_seq that are C or G""" assert validate_base_sequence(base_seq), \ 'argument has invalid characters' seq = base_seq.upper() return (base_seq.count('G') + base_seq.count('C')) / len(base_seq) def recognition_site(base_seq, recognition_seq): """Return the first position in base_seq where recognition_seq occurs, or −1 if not found""" return base_seq.find(recognition_seq) def test(): assert validate_base_sequence('ACTG') assert validate_base_sequence('') assert not validate_base_sequence('ACUG') assert validate_base_sequence('ACUG', False) assert not validate_base_sequence('ACUG', True) assert validate_base_sequence('ACTG', True) assert .5 == gc_content('ACTG') assert 1.0 == gc_content('CCGG') assert .25 == gc_content('ACTT') print('All tests passed.') test()

Python Files

Collections

compound they group together multiple objects
called collections or containers Some collection types can even contain items with a mixture of types, including other collections. sets - don’t allow individual access sequences - use numerical indexes mappings - use keys

Sets an unordered collection of items that contains no duplicates
set('TCAGTTAT') -> set(['A', 'C', 'G', 'T'])

Sets

Sequences ordered collections that may contain duplicate elements

Sequences - String Testing str1.isalpha() str1.isalnum()
str1.isdigit() str1.numeric() str1.isdecimal() str1.islower() str1.isupper() str1.istitle()

Sequences - String Searching
str1.startswith(str2[, startpos, [endpos]]) str1.endswith(str2[, startpos, [endos]]) str1.find(str2[, startpos[, endpos]]) str1.rfind(str2[, startpos[, endpos]]) str1.index(str2[, startpos[, endpos]]) str1.rindex(str2[, startpos[, endpos]]) str1.count(str2[, startpos[, endpos]])

Sequences - String Replacing Changing case
str1.replace(oldstr, newstr[, count]) str1.translate(dictionary) Changing case str1.lower() str1.upper() str1.capitalize() str1.title() str1.swapcase()

Sequences - String Reformatting str1.lstrip([chars])
str1.rstrip([chars]) str1.strip([chars]) str1.ljust(width[, fillchar]) str1.rjust(width[, fillchar]) fillchar (the default fill character is a space). str1.center(width[, fillchar]) str1.expandtabs([tabsize])

Sequences - Tuples immutable sequence that can contain any type of element ('TCAG', 'UCAG') bases = 'TCAG', 'UCAG'

Sequences - Lists mutable sequence of any kind of element

Sequences - Lists

Mappings mutable unordered collection of key/value pairs
associative arrays, lookup tables, and hash tables

The RNA amino acid translation table

Mappings - Dictionary

Mappings - Dictionary