Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp

Similar presentations


Presentation on theme: "Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp"— Presentation transcript:

1 Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp http://bioinf.gen.tcd.ie/GE3M25/programming

2 Computer Programming for Biologists  recap  project  scalar variables  built-in functions  exercises Overview

3 Computer Programming for Biologists Topics covered in the first class: 1.Unix 2.Perl Recap

4 Computer Programming for Biologists 1. Unix details  commands: mkdir, cd, ls, pwd, rm, chmod  command parameters (ls -l)  command line extension with TAB key  command history (arrow up or down)  special treatment of spaces (quotes or backslash)  information and help in manual pages (man ls) Recap

5 Computer Programming for Biologists 2. Perl details variables ($repeat, $message) statements ($repeat = 4;) print function (print ‘Hello world!’;) newline (print “Hello world!\n”;) x operator (print $message x 4;) reading user input from command line ($repeat = shift; or $repeat = <>;) Recap

6 Computer Programming for Biologists Overall goal: development of a Perl script for sequence analysis Input: file with sequences in FASTA format and command line options Output: number and length of sequences GC content base / aminoacid composition reverse complement translations virtual enzyme digest Course project

7 >124249405 ATGCCACCGAAGTTCGACCCCAACGAGATCAAGGTCGTATACCTGAGGTGCACCAGAGGTGAAGTCGGTG CCACTTCTGCCCTGGCCCCCAAGATCGGCCCCCTGGGTCTGTCTCCAAAAAAGGTTGGTGATGACATTGC CAAGGCAACGGGTGACTGGAAGGGCCTGAGGATTACAGTGAAACTGACCATTGAGAACAGACAGGCCCAG AACAGAAAAACATTAAACACAATGGGAATATCACTTTTGATGAGATCGTCAACATTGCTCGACAGATGCG CTAGTTAA >124249383 ATGCCACCGAAGTTCGACCCCAACGAGATCAAGGTCGTATACCTGAGGTGCACCAGAGGTGAAGTCGGTG CCACTTCTGCCCTGGCCCCCAAGATCGGCCCCCTGGGTCTGTCTCCAAAAAAGGTTGGTGATGACATTGC CAAGGCAACGGGTGACTGGAAGGGCCTGAGGATTACGGTGAAACTGACCATTGAGAACAGACAGGCCCAG AACAGAAAAACATTAAACACAATGGGAATATCACTTTTGATGAGATCGTCAACATTGCTCGACAGATGCG CTAGTTAA >110350667 ATGCCACCGAAGTTCGACCCCAACGAGATCAAGGTCGTATACCTGAGGTGCACCAGAGGTGAAGTCGGTG CCACTTCTGCCCTGGCCCCCAAGATCGGCCCCCTGGGTCTGTCTCCAAAAAAGGTTGGTGATGACATTGC CAAGGCAACGGGTGACTGGAAGGGCCTGAGGATTACGGTGAAACTGACCATTGAGAACAGACAGGCCCAG AACAGAAAAACATTAAACACAATGGGAATATCACTTTTGATGAGATCGTCAACATTGCTCGACAGATGCG CTAGTTAA Computer Programming for Biologists FASTA Format headers (starting with ‘>’ followed by sequence ID)

8 >124249405 ATGCCACCGAAGTTCGACCCCAACGAGATCAAGGTCGTATACCTGAGGTGCACCAGAGGTGAAGTCGGTG CCACTTCTGCCCTGGCCCCCAAGATCGGCCCCCTGGGTCTGTCTCCAAAAAAGGTTGGTGATGACATTGC CAAGGCAACGGGTGACTGGAAGGGCCTGAGGATTACAGTGAAACTGACCATTGAGAACAGACAGGCCCAG AACAGAAAAACATTAAACACAATGGGAATATCACTTTTGATGAGATCGTCAACATTGCTCGACAGATGCG CTAGTTAA >124249383 ATGCCACCGAAGTTCGACCCCAACGAGATCAAGGTCGTATACCTGAGGTGCACCAGAGGTGAAGTCGGTG CCACTTCTGCCCTGGCCCCCAAGATCGGCCCCCTGGGTCTGTCTCCAAAAAAGGTTGGTGATGACATTGC CAAGGCAACGGGTGACTGGAAGGGCCTGAGGATTACGGTGAAACTGACCATTGAGAACAGACAGGCCCAG AACAGAAAAACATTAAACACAATGGGAATATCACTTTTGATGAGATCGTCAACATTGCTCGACAGATGCG CTAGTTAA >110350667 ATGCCACCGAAGTTCGACCCCAACGAGATCAAGGTCGTATACCTGAGGTGCACCAGAGGTGAAGTCGGTG CCACTTCTGCCCTGGCCCCCAAGATCGGCCCCCTGGGTCTGTCTCCAAAAAAGGTTGGTGATGACATTGC CAAGGCAACGGGTGACTGGAAGGGCCTGAGGATTACGGTGAAACTGACCATTGAGAACAGACAGGCCCAG AACAGAAAAACATTAAACACAATGGGAATATCACTTTTGATGAGATCGTCAACATTGCTCGACAGATGCG CTAGTTAA sequence (split across multiple lines) Computer Programming for Biologists FASTA Format

9 Computer Programming for Biologists Example usage: sequence length and GC content Course project command output

10 Computer Programming for Biologists Example usage: base composition and reverse complement Course project

11 Computer Programming for Biologists Example usage: translation and digest Course project

12 Computer Programming for Biologists Basic elements for programming:  hold information  allow changing of information  organize data  complex constructs possible  special operations for data handling Variables

13 Computer Programming for Biologists Three different types in Perl: 1.scalar 2.array 3.hash Variables

14 Computer Programming for Biologists 1) Scalars:  Content: number or string of characters  Variable name starts with dollar sign ($) followed by letter or number, can contain underscore Variables

15 Computer Programming for Biologists Variables

16 Computer Programming for Biologists Variables

17 Computer Programming for Biologists Special scalars:  $_default scalar  $a, $bspace holders for comparisons  $0name of program  $!system error messages See man perlvar for many more special variables Variables

18 Computer Programming for Biologists Practical session:  Go to http://bioinf.gen.tcd.ie/GE3M25/class2 and try the ‘Recap’ exercises Scalars

19 Built-in functions for scalars lc (change letters in string to lower case) uc (change letters in string to upper case) chop (remove last character) chomp (remove last character if it’s whitespace reverse (reverse list or string) length (calculate length of a string) split (split a string into a list) substr (extract parts of a string) tr (translation of text) Computer Programming for Biologists

20 Built-in functions for scalars Different ways of using functions: $out = uc($in); $out = uc $in; $out = uc; # works on default variable $_ Combination of functions: $out = uc(reverse($in)); $out = uc reverse $in; Computer Programming for Biologists

21 Built-in functions  online help: http://perldoc.perl.org/  more help available on the command line:  man perlfunc  overview of all built-in functions  perldoc -f command  information on a specific command only Computer Programming for Biologists

22 Practical session:  Go to http://bioinf.gen.tcd.ie/GE3M25/class2 and try the ‘Functions’ exercises Built-in functions

23 Computer Programming for Biologists 1)Write a program that takes some text from the command line, prints it out in capital letters and also reports the length of the text, e.g.: caps.pl ‘Hello World!’ returns: HELLO WORLD! (length: 12) 2)Write a program that takes a DNA sequence from the command line and prints out the reverse complement. Make sure that it works both with small and capital letters, e.g. revcomp.pl aatTTgggcca returns: TGGCCCAAATT Excercises


Download ppt "Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp"

Similar presentations


Ads by Google