Presentation is loading. Please wait.

Presentation is loading. Please wait.

PERL SCRIPTING. COMPUTER BASICS CPU, RAM, Hard drive CPU can only use data in the register directly CPU RAM HARD DRIVE.

Similar presentations


Presentation on theme: "PERL SCRIPTING. COMPUTER BASICS CPU, RAM, Hard drive CPU can only use data in the register directly CPU RAM HARD DRIVE."— Presentation transcript:

1 PERL SCRIPTING

2 COMPUTER BASICS CPU, RAM, Hard drive CPU can only use data in the register directly CPU RAM HARD DRIVE

3 COMPUTER LANGUAGES Machine languages: binary code directly taken by the CPU. Usually CPU model specific. Fast. Assembly language: mapping binary code to three-letter instructions; Platform-dependent. Fast High-level language: “human-like” syntax, often non-CPU dependent. Compiled into machine code before use. Fast. E.g. C, C++, Fotran, Pascal, Basic. Scripting language: usually not compiled into binary code. Interpreted and executed on request. Slow. E.g. Perl, Php, Python Javascript, Bash script,Ruby Byte-code language: source code converted to platform independent, intermediate code for rapid compilation. Java, Microsoft.NET. Speed intermediate.

4 TWO ELEMENTS OF A PROGRAM Data structure & Algorithm Different data structures may have corresponding, well optimized algorithms for information processing and extraction. (computer science) For example: Inserting (algorithm) a node (data structure) in a linked list (data structure).

5 BASIC TYPES Bit: 1 bit has 2 states, 1 or 0 1 Byte = 8 bits, i.e. max(1 Byte) = (binary)11111111 = 255 Characters in the ASCII encoding can be encoded by 1 byte. In C, data type byte is in fact written as “char” Byte is the smallest unit of storage. Boolean (true/false) theoretically takes only 1 bit, but in reality it takes 1 Byte. How many Boolean states can you store using 1 byte?

6 BASIC TYPES Integer: 32 bit, signed -2 16 + 1 ~ +2 16 - 1; unsigned +2 32 -1 Long integer: 64 bit. Float: 32 bit. 24bit for significand, the rest for the exponent. Float point numbers could lose precision, try this in perl: print 0.6/0.2-3; Correct way: sub round { my($n) = @_; return int($n + $n/abs($n*2)); } print round(0.6/0.2)-3;

7 POINTERS / REFERENCE Pointers (or reference in other languages) are essentially an integer. This integer stores a memory address. This memory address refers to another variable. http://perldoc.perl.org/perlref.html

8 COMPLEX TYPES Set: unordered values. Array (vector): a set of ordered values of the same basic type. Index starting from 0 in most langs, last index = length -1 Hash: key => value pairs. Key must be unique. Array can be thought of as a special Hash where key values are ordered, consecutive integers. String * : in C, a string is simply an array of characters. In many other languages, strings are treated as a “basic type”. Most algorithms for arrays also works for strings.

9 COMPLEX TYPES Classes: objected-oriented programming A class packages related data of different datatypes, as well as algorithms associated with them into a nice blackbox for you to use. Objected-oriented programming.

10 PERL PERL lumps all “basic types” as “Scalar”, “$” PERL interpreter decides on what it “looks like” Convenient, but sometimes problematic, especially when you parse in a user-provided data file. Arrays, definition: @, reference $. Hash, definition: %, reference $ RegExp Handlers. use strict; PERL has an ugly grammar. PERL has many short-cuts, such as $_ DO NOT USE THEM!

11 FLOW CONTROL for, foreach, while, unless, until, if elsif else http://perldoc.perl.org/perlsyn.html#Compound- Statements http://perldoc.perl.org/perlsyn.html#Compound- Statements

12 FUNCTIONS (SUBROUTINES) Traditionally, “subroutines” do not accept parameters Function is a better term, but b/c perl is ugly so it continues to use sub. sub functionname { my($param1, $param2) = @_; #get the parameters return xxxx. } Call: functionname($param1, $param2); I prefix all private functions with “fn”. But you don’t need to do that. However, capitalize first letter of each word! Use Verb + Noun phrases as function names fnGetFileName(), fnDownloadPicture.

13 HOW TO NAME VARIABLES Variable names should reflex their basic types. Descriptive names should be given, with each word capitalized I use the c-style prefix on them TypeprefixExp. boolb$bGenomeLoaded integern$nLen floatn/f$fAlleleFreq strings$sInFile File Handlerh$hInFile arrayarr$arrLoci hasharr$arrGeneID constantALLCAPSMAX_LINE

14 1.Start with the DNA sequence: ATGGAAATGGAGAGGCCTCTGCAAATGATGCCGGATTGTTTCAGACATATAGAAATGTCT, report its length and check if its length can be divided by 3, also check if it's a valid DNA sequence. If check fails, do not continue. 2.Translate it into Peptide sequences using universal codon table.universal codon table 3.Display it on screen in the following format where DNA is on first line, translated amino acids aligns with the middle letter at each codon at the second line: 4.This DNA sequence goes through generation after generation of replication. 5.At each replication, it has a user-specified probability (0-1) of single-nucleotide mutation. This mutational probability is specified through the command line.

15 6.If mutation happens, 1 random letter in the DNA will be changed to A,T,C or G with equal probability. It's okay if the letter "changes" to the same letter. 7.Display at each generation the DNA and protein sequence as described in step 3, also display the generation. 8.Check if a stop codon has occured at each generation. If so the protein has lost its function, stop the evolution and output the generation at which the stop codon occurs. 9.This program should be able to deal with DNA sequence with upper or lowercase letters.

16 Create a shell script called getdistr.sh 1.Run the simulation mutation.pl for 1000 times with mutational probabilities of 0.01, 0.1 and 0.5 respectively 2.Collect all DNA and protein sequence outputs to dist_$mutationprob.log 3.Collect the stopping generation at which stop codon first occurs in dist_$mutationprob.txt 4.Use R to plot dist_0.01.txt, dist_0.1.txt and dist_0.5.txt on a histogram (each parameter with different colors). X axis should be log10(Generation).


Download ppt "PERL SCRIPTING. COMPUTER BASICS CPU, RAM, Hard drive CPU can only use data in the register directly CPU RAM HARD DRIVE."

Similar presentations


Ads by Google