1 why to become a Pyologist Perl is for plumbers – Python is for biologists Stefan Maetschke Teasdale Group
2 why Biologists suffer for no good reason Perl is difficult to write and read Perl gives weak error feedback Perl obscures basic concepts Limited understanding of principles Low productivity Reduced research scope Perl is for plumbers - Python is for scientists I want to have an easy life why, why, why …
3 plumbers and others sys admin plumbing vi awk/Perl grep/diff SW developer designing Emacs/IDE C/C++/Java UML/Unit test spectrum of tasks, tools and roles scientist Python
4 equals(, ) Cross-platform, open-source, scripting language, multi-paradigm, dynamic typing, statement ratio: 6 There should be one wayThere’s more than one way Guido van RossumLarry Wall Python Perl EasyDifficult
5 you must be joking! = ('a', 'b', 'c'); my %hash; $hash{‘letters'} = print list = ['a', 'b', 'c'] hash = {} hash[‘letters'] = list print hash[‘letters'] package Person; use strict; sub new { my $class = shift; my $age = shift or die "Must pass age"; my $rSelf = {'age' => $age}; bless ($rSelf, $class); return $rSelf; } class Person: def __init__(self, age): self.age = = ( [‘a’, ’b’, ’c’], [1, 2, 3] ); print list = [ [‘a’, ’b’, ’c’], [1, 2, 3] ] print list[0]
6 More Perl bashing… sub add { $_[0] + $_[1]; } def add(a, b): return a + b sub add { my ($a, $b) = return $a + $b; } sub add { my $a = shift; my $b = shift; return $a + $b; } def diff(a, b): return len(a) - len(b) sub diff { my ($aref, $bref) = my my return + sub add($, $) { local ($a, $b) = return $a + $b; }
7 complexity wall simple scripts ≈ 100 lines => fun stops Higher order concepts Data structures Functions Classes => Python allows you to break through the complexity wall everything you can do in Python you can do in Perl but you don’t
8 googliness C53,0001, Java7,7602, C++ 1,290 3, C#1, Perl1, Python Ruby Scala Haskell X languageX load file kilo-hits, May 2008 X bioinformatics
9 and the winner is… <- without Psyco
10 damn lies and stats sourceforge projects Perl declining, Python increasing ? May 2008, keyword search : Perl 3474, Python 4063
11 see the light… classify Iris plants Fisher, R.A. "The use of multiple measurements in taxonomic problems" Annual Eugenics, 7, Part II, (1936) Three species: Iris setosa Iris versicolor Iris virginica Four attributes: sepal length sepal width petal length petal width
12 Iris – convert data
13 Iris – correlation
14 Iris – do stats
15 Iris – linear regression
16 Iris – plot data
17 libs for life science Scientific computing: SciPy, NumPy, matplotlib Bioinformatics: BioPython Phylogenetic trees: Mavric, Plone, P4, Newick Microarrays: SciGraph, CompClust Molecular modeling: MMTK, OpenBabel, CDK, RDKit, cinfony, mmLib Dynamic systems modeling: PyDSTools Protein structure visualization: PyMol, UCSF Chimera Networks/Graphs: NetworkX, PyGraphViz Symbolic math: SymPy, Sage Wrapper for C/C++ code: SWIG, Pyrex, Cython R/SPlus interface: RSPython, RPy Java interface: Jython Fortran to Python: F2PY … Check also out: and:
18 last words Perl perfect for plumbing Python excellent for scientific programming Easy to learn, write and maintain Suited for scripting and mid-size projects Huge number of scientific libraries Python is an attractive alternative to Matlab/R Easy integration of Java, C/C++ or Fortran code
19 questions Interest: Python Course? isn’t Python lovely…
20
21 links Wikipedia – Python Instant Python How to think like a computer scientist Dive into Python Python course in bioinformatics Beginning Python for bioinformatics SciPy Cookbook Matplotlib Cookbook Biopython tutorial and cookbook Huge collection of Python tutorial What’s wrong with Perl 20 Stages of Perl to Python conversion Why Python
22 some papers Bassi S. (2007) A Primer on Python for Life Science Researchers. PLoS Comput Biol 3(11): e199. doi: /journal.pcbi Mangalam H. (2002) The Bio* toolkits--a brief overview. Brief Bioinform. 3(3): Fourment M., Gillings MR. (2008) A comparison of common programming languages used in bioinformatics. BMC Bioinformatics 9:82.
23 to whom it may concern NPs who don’t use Perl yet NPs who want to see the light NPs who want to give their code away without being rightfully ashamed Matlab aficionados NP = Non-Programmer
24 one of ten Perl myths “…Perl works the way you do…” “…That's one, fairly natural way to think about it…” while (<>) { s/(.*):(.*)/$2:$1/; print; } Swap two sections of a string: “aaa:bbb” -> “bbb:aaa” for line in file: line = line.strip() first, second = line.split(‘:’) print second+’:’+first while (<>) { chomp; ($first, $second) = split /:/; print $second, ":", $first, "\n"; } “…we can happily consign the idea that ‘Perl is hard’ to mythology.” from re import sub for line in file: print sub(‘(.*):(.*)’, r’\2:\1’, line)
25 camel chaos does not scale well complex syntax cryptic commands does not encourage clear code difficult to read/maintain hard to understand the principles error prone no check of subroutine arguments variables are global by default …
26 why Python overcome the complexity wall many, excellent scientific libraries clear, easy to learn syntax hard to do it wrong does not require prior suffering/experience
27 my bias R&D: C/C++ -> applied ML in robotics, image processing, quality control SW Development: Java -> Speech Processing, Data Mining Computational Biology: Java, Python Other languages I played with: Ada, APL, Basic, MatLab, Modula, Pascal, Perl, Prolog, R, Groovy, Forth, Fortran, Scala, Assembly code