The if statement and files. The if statement Do a code block only when something is True if test: print "The expression is true"

Slides:



Advertisements
Similar presentations
Chapter 4 - Control Statements
Advertisements

Introduction to Computing Science and Programming I
16-May-15 Sudden Python Drinking from the Fire Hose.
Perl for Bioinformatics Lecture 4. Variables - review A variable name starts with a $ It contains a number or a text string Use my to define a variable.
Python (yay!) November 16, Unit 7. Recap We can store values in variables using an assignment statement >>>x = We can get input from the user using.
Python November 18, Unit 7. So Far We can get user input We can create variables We can convert values from one type to another using functions We can.
Files in Python Caution about readlines vs. read and split.
Files in Python Input techniques. Input from a file The type of data you will get from a file is always string or a list of strings. There are two ways.
“Everything Else”. Find all substrings We’ve learned how to find the first location of a string in another string with find. What about finding all matches?
General Programming Introduction to Computing Science and Programming I.
PYTHON: PART 2 Catherine and Annie. VARIABLES  That last program was a little simple. You probably want something a little more challenging.  Let’s.
Programming Fundamentals. Today’s lecture Decisions If else …… Switch Conditional Operators Logical Operators.
Lists and the ‘ for ’ loop. Lists Lists are an ordered collection of objects >>> data = [] >>> print data [] >>> data.append("Hello!") >>> print data.
Working on exercises (a few notes first). Comments Sometimes you want to make a comment in the Python code, to remind you what’s going on. Python ignores.
What does C store? >>A = [1 2 3] >>B = [1 1] >>[C,D]=meshgrid(A,B) c) a) d) b)
Functions. Built-in functions You’ve used several functions already >>> len("ATGGTCA")‏ 7 >>> abs(-6)‏ 6 >>> float("3.1415")‏ >>>
CSC 110 Using Python [Reading: chapter 1] CSC 110 B 1.
Python Conditionals chapter 5
More about Strings. String Formatting  So far we have used comma separators to print messages  This is fine until our messages become quite complex:
Lecture 2 Conditional Statement. chcslonline.org Conditional Statements in PHP Conditional Statements are used for decision making. Different actions.
Lecture 26: Reusable Methods: Enviable Sloth. Creating Function M-files User defined functions are stored as M- files To use them, they must be in the.
Guide to Programming with Python Chapter Seven Files and Exceptions: The Trivia Challenge Game.
Working on exercises (a few notes first)‏. Comments Sometimes you want to make a comment in the Python code, to remind you what’s going on. Python ignores.
Susie’s lecture notes are in the presenter’s notes, below the slides Disclaimer: Susie may have made errors in transcription or understanding. If there.
Files Tutor: You will need ….
Strings in Python. Computers store text as strings GATTACA >>> s = "GATTACA" s Each of these are characters.
1 CSC 221: Introduction to Programming Fall 2011 Input & file processing  input vs. raw_input  files: input, output  opening & closing files  read(),
FILES. open() The open() function takes a filename and path as input and returns a file object. file object = open(file_name [, access_mode][, buffering])
Midterm Exam Topics (Prof. Chang's section) CMSC 201.
SEQUENCES:STRINGS,LISTS AND TUPLES. SEQUENCES Are items that are ordered sequentially and accessible via index offsets into its set of elements. Examples:
File input and output and conditionals Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
9. ITERATIONS AND LOOP STRUCTURES Rocky K. C. Chang October 18, 2015 (Adapted from John Zelle’s slides)
Chapter 2: Fundamental Programming Structures in Java Adapted from MIT AITI Slides Control Structures.
Perl for Bioinformatics Part 2 Stuart Brown NYU School of Medicine.
While loops. Iteration We’ve seen many places where repetition is necessary in a problem. We’ve been using the for loop for that purpose For loops are.
Python – Part 4 Conditionals and Recursion. Conditional execution If statement if x>0:# CONDITION print (‘x is positive’) Same structure as function definition.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
ENGINEERING 1D04 Tutorial 2. What we’re doing today More on Strings String input Strings as lists String indexing Slice Concatenation and Repetition len()
Discussion 4 eecs 183 Hannah Westra.
Introduction to Python
Python: Control Structures
ECS10 10/10
Miscellaneous Items Loop control, block labels, unless/until, backwards syntax for “if” statements, split, join, substring, length, logical operators,
Psuedo Code.
While Loops in Python.
Topics Introduction to Repetition Structures
Exceptions and files Taken from notes by Dr. Neil Moore
Selection CIS 40 – Introduction to Programming in Python
Introduction to Python
Topics Introduction to File Input and Output
Chapter 7 Files and Exceptions
Conditions and Ifs BIS1523 – Lecture 8.
Today in CS161 Week #3 Learn about… Writing our First Program
Exceptions and files Taken from notes by Dr. Neil Moore
Coding Concepts (Basics)
子夜吳歌 ~ 李白 長安一片月, 萬戶擣衣聲; 秋風吹不盡, 總是玉關情。 何日平胡虜, 良人罷遠征。
Module 4 Loops.
Selection Statements.
子夜吳歌 ~ 李白 長安一片月, 萬戶擣衣聲; 秋風吹不盡, 總是玉關情。 何日平胡虜, 良人罷遠征。
Topics Introduction to File Input and Output
Introduction to Python
Introduction to Computer Science
Lists and the ‘for’ loop
Unit 3: Variables in Java
“Everything Else”.
Topics Introduction to File Input and Output
While Loops in Python.
Getting Started in Python
Presentation transcript:

The if statement and files

The if statement Do a code block only when something is True if test: print "The expression is true"

Example if "GAATTC" in "ATCTGGAATTCATCG": print "EcoRI site is present"

if the test is true... if "GAATTC" in "ATCTGGAATTCATCG": print "EcoRI site is present" The test is: "GAATTC" in "ATCTGGAATTCATCG"

if "GAATTC" in "ATCTGGAATTCATCG": print "EcoRI site is present" Then print the message >>> if "GAATTC" in "ATCTGGAATTCATCG":... print "EcoRI is present"... EcoRI is present >>> Here is it done in the Python shell

What if you want the false case? if not "GAATTC" in "AAAAAAAAA": print "EcoRI will not cut the sequence" There are several possibilities; here’s two 2) The not operator switches true and false 1) Python has a not in operator if "GAATTC" not in "AAAAAAAAA": print "EcoRI will not cut the sequence"

In the Python shell >>> x = True >>> x True >>> not x False >>> not not x True >>> if "GAATTC" not in "AAAAAAAAA":... print "EcoRI will not cut the sequence"... EcoRI will not cut the sequence >>> if not "GAATTC" in "ATCTGGAATTCATCG":... print "EcoRI will not cut the sequence"... >>> if not "GAATTC" in "AAAAAAAAA":... print "EcoRI will not cut the sequence"... EcoRI will not cut the sequence >>>

else: What if you want to do one thing when the test is true and another thing when the test is false? if "GAATTC" in "ATCTGGAATTCATCG": print "EcoRI site is present" else: print "EcoRI will not cut the sequence" Do the first code block (after the if:) if the test is true Do the second code block (after the else:) if the test is false

Examples with else >>> if "GAATTC" in "ATCTGGAATTCATCG":... print "EcoRI site is present"... else:... print "EcoRI will not cut the sequence"... EcoRI site is present >>> if "GAATTC" in "AAAACTCGT":... print "EcoRI site is present"... else:... print "EcoRI will not cut the sequence"... EcoRI will not cut the sequence >>>

Where is the site? The ‘find’ method of strings returns the index of a substring in the string, or -1 if the substring doesn’t exist >>> seq = "ATCTGGAATTCATCG" >>> seq.find("GAATTC") 5 >>> seq.find("GGCGC") >>> There is a GAATTC at position 5 But there is no GGCGC in the sequence } }

But where is the site? >>> seq = "ATCTGGAATTCATCG" >>> pos = seq.find("GAATTC") >>> if pos == -1:... print "EcoRI does not cut the sequence"... else:... print "EcoRI site starting at index", pos... EcoRI site starting at index 5 >>>

seq = "ATCTGGAATTCATCG" pos = seq.find("GAATTC") if pos == -1: print "EcoRI does not cut the sequence" else: print "EcoRI site starting at index", pos Start by creating the string “ATCTGGAATTCATCG” and assigning it to the variable with name ‘seq’

seq = "ATCTGGAATTCATCG" pos = seq.find("GAATTC") if pos == -1: print "EcoRI does not cut the sequence" else: print "EcoRI site starting at index", pos Using the seq string, call the method named find. This looks for the string “GAATTC” in the seq string

seq = "ATCTGGAATTCATCG" pos = seq.find("GAATTC") if pos == -1: print "EcoRI does not cut the sequence" else: print "EcoRI site starting at index", pos The string “GAATC” is at position 5 in the seq string. Assign the 5 object to the variable named pos. The variable name “pos” is often used for positions. Common variations are “pos1”, “pos2”, “start_pos”, “end_pos”

seq = "ATCTGGAATTCATCG" pos = seq.find("GAATTC") if pos == -1: print "EcoRI does not cut the sequence" else: print "EcoRI site starting at index", pos Do the test for the if statement Is the variable pos equal to -1?

seq = "ATCTGGAATTCATCG" pos = seq.find("GAATTC") if pos == -1: print "EcoRI does not cut the sequence" else: print "EcoRI site starting at index", pos Since pos is 5 and 5 is not equal to -1, this test is false. The test is False

seq = "ATCTGGAATTCATCG" pos = seq.find("GAATTC") if pos == -1: print "EcoRI does not cut the sequence" else: print "EcoRI site starting at index", pos Skip the first code block (that is only run if the test is True) Instead, run the code block after the else:

seq = "ATCTGGAATTCATCG" pos = seq.find("GAATTC") if pos == -1: print "EcoRI does not cut the sequence" else: print "EcoRI site starting at index", pos This is a print statement. Print the index of the start position This prints EcoRI site starting at index 5

seq = "ATCTGGAATTCATCG" pos = seq.find("GAATTC") if pos == -1: print "EcoRI does not cut the sequence" else: print "EcoRI site starting at index", pos There are no more statements so Python stops.

A more complex example restriction_sites = [ "GAATTC", # EcoRI "GGATCC", # BamHI "AAGCTT", # HindIII ] seq = raw_input("Enter a DNA sequence: ") for site in restriction_sites: if site in seq: print site, "is a cleavage site" else: print site, "is not present" Using if inside a for

Nested code blocks restriction_sites = [ "GAATTC", # EcoRI "GGATCC", # BamHI "AAGCTT", # HindIII ] seq = raw_input("Enter a DNA sequence: ") for site in restriction_sites: if site in seq: print site, "is a cleavage site" else: print site, "is not present" } This is the code block for the for statement

This is the code block for the True part of the if statement restriction_sites = [ "GAATTC", # EcoRI "GGATCC", # BamHI "AAGCTT", # HindIII ] seq = raw_input("Enter a DNA sequence: ") for site in restriction_sites: if site in seq: print site, "is a cleavage site" else: print site, "is not present" }

This is the code block for the False part of the if statement restriction_sites = [ "GAATTC", # EcoRI "GGATCC", # BamHI "AAGCTT", # HindIII ] seq = raw_input("Enter a DNA sequence: ") for site in restriction_sites: if site in seq: print site, "is a cleavage site" else: print site, "is not present" }

The program output Enter a DNA sequence: AATGAATTCTCTGGAAGCTTA GAATTC is a cleavage site GGATCC is not present AAGCTT is a cleavage site

Read lines from a file raw_input() asks the user for input Most of the time you’ll get data from a file. (Or would you rather type in the sequence every time?) To read from a file you need to tell Python to open that file.

The open function >>> infile = open("/usr/coursehome/dalke/10_sequences.seq") >>> print infile >>> open returns a new object of type file A file can’t be displayed like a number or a string. It is useful because it has methods for working with the data in the file.

the readline() method >>> infile = open("/usr/coursehome/dalke/10_sequences.seq") >>> print infile >>> infile.readline() 'CCTGTATTAGCAGCAGATTCGATTAGCTTTACAACAATTCAATAAAATAGCTTCGCGCTAA \n' >>> readline returns one line from the file The line includes the end of line character (represented here by “\n”) (Note: the last line of some files may not have a “\n”)

readline finishes with "" >>> infile = open("/usr/coursehome/dalke/10_sequences.seq") >>> print infile >>> infile.readline() 'CCTGTATTAGCAGCAGATTCGATTAGCTTTACAACAATTCAATAAAATAGCTTCGCGCTAA\n' >>> infile.readline() 'ATTTTTAACTTTTCTCTGTCGTCGCACAATCGACTTTCTCTGTTTTCTTGGGTTTACCGGAA\n' >>> infile.readline() 'TTGTTTCTGCTGCGATGAGGTATTGCTCGTCAGCCTGAGGCTGAAAATAAAATCCGTGGT\n' >>> infile.readline() 'CACACCCAATAAGTTAGAGAGAGTACTTTGACTTGGAGCTGGAGGAATTTGACATAGTCGAT\n' >>> infile.readline() 'TCTTCTCCAAGACGCATCCACGTGAACCGTTGTAACTATGTTCTGTGC\n' >>> infile.readline() 'CCACACCAAAAAAACTTTCCACGTGAACCGAAAACGAAAGTCTTTGGTTTTAATCAATAA\n' >>> infile.readline() 'GTGCTCTCTTCTCGGAGAGAGAAGGTGGGCTGCTTGTCTGCCGATGTACTTTATTAAATCCAATAA\n' >>> infile.readline() 'CCACACCAAAAAAACTTTCCACGTGTGAACTATACTCCAAAAACGAAGTATTGGTTTATCATAA\n' >>> infile.readline() 'TCTGAAAAGTGCAAAGAACGATGATGATGATGATAGAGGAACCTGAGCAGCCATGTCTGAACCTATAGC\n' >>> infile.readline() 'GTATTGGTCGTCGTGCGACTAAATTAGGTAAAAAAGTAGTTCTAAGAGATTTTGATGATTCAATGCAAAGTTCTATTAATCGTTCAATTG\n' >>> infile.readline() '' >>> When there are no more lines, readline returns the empty string

Using for with a file A simple way to read lines from a file >>> filename = "/usr/coursehome/dalke/10_sequences.seq" >>> for line in open(filename):... print line[:10]... CCTGTATTAG ATTTTTAACT TTGTTTCTGC CACACCCAAT TCTTCTCCAA CCACACCAAA GTGCTCTCTT CCACACCAAA TCTGAAAAGT GTATTGGTCG >>> for starts with the first line in the file... then the second... then the third and finishes with the last line.

A more complex task List the sequences starting with a cytosine >>> filename = "/usr/coursehome/dalke/10_sequences.seq" >>> for line in open(filename):... line = line.rstrip()... if line.startswith("C"):... print line... CCTGTATTAGCAGCAGATTCGATTAGCTTTACAACAATTCAATAAAATAGCTTCGCGCTAA CACACCCAATAAGTTAGAGAGAGTACTTTGACTTGGAGCTGGAGGAATTTGACATAGTCGAT CCACACCAAAAAAACTTTCCACGTGAACCGAAAACGAAAGTCTTTGGTTTTAATCAATAA CCACACCAAAAAAACTTTCCACGTGTGAACTATACTCCAAAAACGAAGTATTGGTTTATCATAA >>> Use rstrip to get rid of the “\n”

Exercise I Get a sequence from the user. If there is an A in the sequence, print the number of times it appears in the sequence. Do the same for T, C and G. If a base does not exist, don’t print anything. Enter a sequence: ACCAGGCA A count: 3 C count: 3 G count: 2 Enter a sequence: TTTTTGGGG T count: 5 G count: 4 Test input #1: Test input #2:

Excercise 2 Get a sequence from the user. If there is an A in the sequence, print the number of times it appears in the sequence. If it does not exist, print “A not found”. Do the same for T, C and G. Enter a sequence: ACCAGGCA A count: 3 T not found C count: 3 G count: 2 Enter a sequence: TTTTTGGGG A not found T count: 5 C not found G count: 4 Test input #1: Test input #2:

Exercise 3 number lines in a file Read the file /usr/coursehome/dalke/10_sequences.seq. Print out the line number (starting with 1) then the line. Remember to use rstrip() to remove the extra newline. 1 CCTGTATTAGCAGCAGATTCGATTAGCTTTACAACAATTCAATAAAATAGCTTCGCGCTAA 2 ATTTTTAACTTTTCTCTGTCGTCGCACAATCGACTTTCTCTGTTTTCTTGGGTTTACCGGAA 3 TTGTTTCTGCTGCGATGAGGTATTGCTCGTCAGCCTGAGGCTGAAAATAAAATCCGTGGT 4 CACACCCAATAAGTTAGAGAGAGTACTTTGACTTGGAGCTGGAGGAATTTGACATAGTCGAT 5 TCTTCTCCAAGACGCATCCACGTGAACCGTTGTAACTATGTTCTGTGC 6 CCACACCAAAAAAACTTTCCACGTGAACCGAAAACGAAAGTCTTTGGTTTTAATCAATAA 7 GTGCTCTCTTCTCGGAGAGAGAAGGTGGGCTGCTTGTCTGCCGATGTACTTTATTAAATCCAATAA 8 CCACACCAAAAAAACTTTCCACGTGTGAACTATACTCCAAAAACGAAGTATTGGTTTATCATAA 9 TCTGAAAAGTGCAAAGAACGATGATGATGATGATAGAGGAACCTGAGCAGCCATGTCTGAACCTATAGC 10 GTATTGGTCGTCGTGCGACTAAATTAGGTAAAAAAGTAGTTCTAAGAGATTTTGATGATTCAATGCAAAGTTCTATTAATCGTTCAATTG The output should look like this

Exercise 4 List the sequences in /usr/coursehome/dalke/10_sequences.seq which have the pattern CTATA. Hint: You should find two of them. Once that works, print the index of the first time that pattern is found.

Exercise 5 - Filtering Using /usr/coursehome/dalke/sequences.seq A. How many sequences are in that file? B. How many have the pattern CTATA? C. How many have more than 1000 bases? D. How many have over 50% GC composition? E. How many have more than 2000 bases and more than 50% GC composition? Note: for %GC use float to convert the counts into floats before doing the division for percentage.