Presentation is loading. Please wait.

Presentation is loading. Please wait.

96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

Similar presentations


Presentation on theme: "96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯."— Presentation transcript:

1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

2 課前準備 課程網頁  http://gene.csie.ntu.edu.tw/~sbb/summer-course/ http://gene.csie.ntu.edu.tw/~sbb/summer-course/ 安裝流程  抓 Putty / Pietty  連上 140.112.28.186  wget http://gene.csie.ntu.edu.tw/~sbb/summer- course/doc/course1.tgzhttp://gene.csie.ntu.edu.tw/~sbb/summer- course/doc/course1.tgz  tar zxvf course1.tgz

3 序號姓名帳號 1 許郁彬 course1 2 杜羿樞 course2 3 黃裕雄 course3 4 王建智 course4 5 陳士杰 course5 6 莊智傑 course6 7 朱柏威 course7 8 洪文峯 course8 9 吳耿豪 course9 10 張雯琪 course10 11 王悅 course11 12 張嘉芸 course12 13 林義峰 course13 14 游棨元 course14 15 許育堂 course15 16 陳建瑋 course16 17 黃國鑫 course17 18 翁小涵 course18 19 郭建鴻 course19 20 曾意儒 course20

4 Appendix Scalar, Array, Hash

5 Variable reset (1/2) $scalar = undef; $scalar = “”; $scalar = 0; @array = (); %hash = ();

6 Variable reset (1/2) @array = undef; print scalar(@array);

7 Array my @number = ("one", "two", "three"); my $number = ("one", "two", "three"); print "@number\n"; print scalar(@number)."\n"; print $#number."\n"; print @number."\n"; print $number."\n";

8 Array @array = qw"5 4 9 8 1 3 6 2 7 10"; print "@array\n"; print @array."\n"; print @array;

9 Array – sort by number #! /usr/bin/perl @test=(1, 5, 4, 22, 9, 101); @mmm=sort {$a $b} @test; print join ',', @mmm, "\n\n";

10 Hash – show all elements #! /usr/bin/perl -w %nucleotide_bases = ( A => Adenine, T => Thymine, G => Guanine, C => Cytosine ); while (($key, $value)=each %nucleotide_bases) { print "$key ====> $value\n"; } foreach $key (keys %nucleotide_bases) { print "$key ====> $nucleotide_bases{$key}\n"; }

11 Hash – reverse with identical values %nucleotide_bases = ( A => Adenine, T => Thymine, G => Adenine, C => Cytosine ); while (($key, $value)=each %nucleotide_bases) { print "$key ====> $value\n"; } %reverse = reverse %nucleotide_bases; while (($key, $value)=each %reverse) { print "$key ====> $value\n"; }

12 Hash – the number of elements How to know the number of elements in a hash? Ex: my %hash = ( 'a'=>1, 'b'=>2); print scalar(keys(%hash))."\n";

13 Comment # This is a comment =This is a comment, too =This is a comment, three =cut print "Really ?\n";

14 Appendix STDIN, <>, our/my

15 $_ - extract data from while ( ) {print;} if ( ) {print;}

16 <>; $line = <>; #! /usr/bin/perl -w while ( $line = <> ) { print $line; } Processing Data Files (like UNIX command : cat) #! /usr/bin/perl -w while (<> ) { print; }

17 Others … while (defined($_ = <>)) { print; } while ($_ = <>) { print; } while (<>) { print; } for (;<>;) { print; } print while defined($_ = <>); print while ($_ = <>); print while <>;

18 our/my my $var; $var = 1; { my $var; $var = 2; print $var,"\n"; } print $var, "\n"; our $var; $var = 1; { our $var; $var = 2; print $var,"\n"; } print $var, "\n";

19 Appendix Regular expression

20 Reserved word open log, ">test.txt“ or die “…”; print log "test\n"; close log;

21 Magic diamond - <> print “$_” while (<>); print “$_” while ( );

22 Get the list of files in the current directory my @files = ; my @files = glob("*.pl");

23 Greedy matching my $string = "course1:x:509:510::/home/course1: /bin/bash"; if ($string =~ /(.*):/) { print "matched string = [$1]\n"; } #How to match the first column ?

24 Greedy matching my $string = "course1:x:509:510::/home/course1:/bin/bash"; if ($string =~ /^([\S]*):/) { print "matched string = [$1]\n"; } if ($string =~ /^([\S]*?):/) { print "matched string = [$1]\n"; } if ($string =~ /([^:]*):/) { print "matched string = [$1]\n"; }

25 Substitution – remove all x $_ = "China xxxxxx Taiwan"; s/x*//; # How to rewrite ? print; China xxxxx Taiwan

26 Quoted syntax SymbolGeneralDescriptionInterpolated ‘ q/ /StringNo “ qq/ /StringYes ` qx/ /ExecutionYes ( )qw/ /List of wordsNo / m/ /Pattern matchingYes s/ / / SubstitutionYes y/ / /tr/ / /transliterationNo “ qr/ /Regular expressionYes

27 Appendix Useful techniques

28 Shell command – file/directory mkdir(“doc”,0x744); chdir(“doc”); rmdir(“doc”); unlink(“log.txt”); chmod(0x700, “log1.txt”, “log2.txt”,”log3.txt”); rename (“old_name”, “new_name”); chown(,,”log1.txt”,”log2.txt”,”log3.txt”);

29 Perl Usage: perl [switches] [--] [programfile] [arguments] -c check syntax only (runs BEGIN and CHECK blocks) -d[:debugger] run program under debugger -e program one line of program (several -e's allowed, omit programfile) -i[extension] edit <> files in place (makes backup if extension supplied) -n assume "while (<>) {... }" loop around program -p assume loop like -n but print line also, like sed -u dump core after parsing program -v print version, subversion -w enable many useful warnings (RECOMMENDED) -W enable all warnings -X disable all warnings

30 Removal of ^M perl -pi.bak -e 's/\r//g;' index.html

31 File Copy #! /usr/bin/perl use File::Copy; copy("file1", "file2");

32 Reserved word for debug __FILE__ __LINE__ Ex: print "FILE:".__FILE__." LINE:".__LINE__."\n";

33 Debug Perl –d “program name”

34 Debug $perlcc –d test.pl

35 Special variable $_the last assignment $!Error message $$current process ID $?the status when the previous child process end $”the separator of the list $/ $ `,$&,$ ’ string matching $+the last backreference @-@LAST_MATCH_START @+@LAST_MATCH_END @_arguments of a subroutine

36 Bytecode generator $perlcc -B -o test test3.pl

37 CPAN perl -MCPAN -e "install GD"

38 BioPerl

39 PSI-BLAST Position Specific Iterative BLAST constructs a multiple sequence alignment then creates a position-specific scoring matrix (PSSM) Query Sequence Blast Sequence database PSSM Multiple sequence alignment Homologous proteins Blast New homologous proteins

40 PSSM (1/4) GHEGVGKVVKLGAGA GHEKKGYFEDRGPSA GHEGYGGRSRGGGYS GHEFEGPKGCGALYI GHELRGTTFMPALEC Query Sequence Homologous proteins 123456789101112131415 A000000000002102 C000000000100001 D000000000100000 E005010001000010 F000100011000000 G500205101023110 H050000000000000 I000000000000001 K000110110100000 L000100000010200 M000000000100000 N000000000000000 P000000100010100 Q000000000000000 R000010010110000 S000000001000011 T000000110000000 V000010011000000 W000000000000000 Y000010100000020 Frequency Column 1: f A,1 =0/5, f C,1 =0/5, …, f G,1 =5/5, … Column 2: f A,1 =0/5, f C,1 =0/5, …, f H,1 =5/5, … … Column 15: f A,1 =2/5, f C,1 =1/5, …, f S,1 =1/5, …

41 PSSM (2/4) The original data: Column 1: f A,1 =0/5, f C,1 =0/5, …, f G,1 =5/5, … Column 2: f A,1 =0/5, f C,1 =0/5, …, f H,1 =5/5, … … Column 15: f A,1 =2/5, f C,1 =1/5, …, f S,1 =1/5, … Set a pseudo-counts of 1: Column 1: f’ A,1 = (0+1)/(5+20),f’ C,1 = (0+1)/(5+20),…,f’ G,1 = (1+1)/(5+20),… Column 2: f’ A,1 = (0+1)/(5+20),f’ C,1 = (0+1)/(5+20),…,f’ H,1 = (1+1)/(5+20),… … Column 15: f’ A,1 = (2+1)/(5+20),f’ C,1 = (1+1)/(5+20),…,f’ S,1 = (1+1)/(5+20),…

42 PSSM (3/4) The score is derived from the ratio of the observed to the expected frequencies. More precisely, the logarithm of this ratio is taken and refereed to as the log- likelihood ratio: where Score i,j is the score for residue i at position j, f’ ij is the relative frequency for a residue i at position j and q i is the expected relative frequency of residue i in a random sequence.

43 PSSM (4/4) 123456789101112131415 A-0.2 1.30.7-0.21.3 C-0.2 0.7-0.2 0.7 D-0.2 0.7-0.2 E 2.3-0.20.7-0.2 0.7-0.2 0.7-0.2 F 0.7-0.2 0.7 -0.2 G2.3-0.2 1.3-0.22.30.7-0.20.7-0.21.31.70.7 -0.2 H 2.3-0.2 I 0.7 K-0.2 0.7 -0.20.7 -0.20.7-0.2 L 0.7-0.2 0.7-0.21.3-0.2 M 0.7-0.2 N P 0.7-0.2 0.7-0.20.7-0.2 Q R 0.7-0.2 0.7-0.20.7 -0.2 S 0.7-0.2 0.7 T-0.2 0.7 -0.2 V 0.7-0.2 0.7 -0.2 W Y 0.7-0.20.7-0.2 1.3-0.2


Download ppt "96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯."

Similar presentations


Ads by Google