Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applied Bioinformatics Introduction to Linux and R Bing Zhang Department of Biomedical Informatics Vanderbilt University

Similar presentations


Presentation on theme: "Applied Bioinformatics Introduction to Linux and R Bing Zhang Department of Biomedical Informatics Vanderbilt University"— Presentation transcript:

1 Applied Bioinformatics Introduction to Linux and R Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

2 Quick summary of the introduced Linux commands 2 CommandMeaning rsh Remote shell passwdModify a user’s password exitExit the shell pwdDisplay the path of the current directory lsList files and directories ls -aList all files and directories ls -a -lList all files and directories in a long listing format mkdir Make a directory cd Change to named directory cdChange to home directory cd ~Change to home directory cd..Change to parent directory rmdir Remove a directory moreView the contents of a file cp Copy file1 and name the copied file file2 mv Move or rename file1 to file2 rm Remove a file man Display manual pages for a command

3 Getting help man (display manual pages for a command)  space bar to show next page  up and down arrows to move up and down  q to exist 3

4 Exercise 4 TaskCommand Go to home directorycd Display manual pages for the command lsman ls List the contents of the current directoryls List the contents of the current directory, including entries starting with. and using a long listing format ls -a -l Create a test directory if you don’t have one yet, ignore this if you already have it mkdir test Go to the test directorycd test Copy the file sample_data.txt under directory /home/igptest to current directory with the same name cp /home/igptest/sample_data.txt. View the content of the created filemore sample_data.txt Make a copy of the filecp sample_data.txt sample_data_copy.txt View the content of the new copymore sample_data_copy.txt List the contents of the current directoryls Remove the new copyrm sample_data_copy.txt List the contents of the current directoryls

5 Data manipulation with filters Filters: programs that accept textual data and then transform it in a particular way. head, tail, cut, sort, uniq, sed … 5 TaskCommand View the content of a filemore sample_data.txt Get the first 10 lines of the filehead sample_data.txt Get the first 5 lines of the filehead -n 5 sample_data.txt Get all but the last 5 lines of the filehead -n -5 sample_data.txt Get the last 10 lines of the filetail sample_data.txt Get the last 5 lines of the filetail -n 5 sample_data.txt Get all lines starting from line 5tail -n +5 sample_data.txt Get the first three columns of the filecut -f 1-3 sample_data.txt Get selected columns of the filecut -f 1,3,5 sample_data.txt Sort all lines based on the numerical values in the second column (non-numeric entries are interpreted as zero) sort -k 2 -n sample_data.txt

6 Data manipulation with piping and redirection Piping (|) : sending data from one program to another program. Redirection: sending output from one program to a file  >: save output to a file  >>: append output to a file 6 TaskCommand Get the first 10 lines of the file and then get the first three columns head sample_data.txt | cut -f 1-3 Get the first 10 lines of the file, then get the first three columns of these lines, and then redirect the content to a new file head sample_data.txt | cut -f 1-3 >sample_data_subset.txt View the new filemore sample_data_subset.txt Append the last 10 lines of the old file to the end of the new file tail sample_data.txt >> sample_data_subset.txt View the new filemore sample_data_subset.txt

7 Editing files with nano nano is a user-friendly text editor A quick tutorial http://staffwww.fullcoll.edu/sedwards/Nano/IntroToNano.htmlhttp://staffwww.fullcoll.edu/sedwards/Nano/IntroToNano.html 7 TaskCommand Open sample_data.txt for editingnano sample_data.txt Delete the text “Line_01” and the space after it, save the file, and then exit In nano, ^O for saving and ^X for exit View the edited filemore sample_data.txt View the content of the.bashrc file, which is located under your home directory. The file includes commands that are executed when starting the system. more ~/.bashrc Open.bashrc file under your home directory for editing.nano ~/.bashrc Add “setpkgs –a R” to the end of this file. This will allow you to use the R environment which has been installed in the ACCRE system for statistical computing. In nano, ^O for saving and ^X for exit View the edited.bashrc filemore ~/.bashrc Run the.bashrc filesource ~/.bashrc

8 What is R R is a free software environment for statistical computing and graphics. It includes:  an effective data handling and storage facility  a suite of operators for calculations on arrays, in particular matrices  a large, coherent, integrated collection of intermediate tools for data analysis  graphical facilities for data analysis and display either on-screen or on hardcopy  a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities 8

9 R Installation and tutorial Download and install R  http://www.r-project.org/ http://www.r-project.org/  Choose a CRAN (Comprehensive R Archive Network) mirror  Binary distributions of the base system and contributed packages Windows version Mac OS X version Linux version (already installed on the ACCRE cluster, will be used for this module) Tutorials  http://cran.r-project.org/doc/manuals/r-release/R-intro.html http://cran.r-project.org/doc/manuals/r-release/R-intro.html  An introduction to R 9

10 R interface 10 Command-line R: Linux/OS X Type R in your Linux shell to start R; Type q() in the R interface to close R. R Gui: OS X (Windows Gui is similar) Download and Install on your laptop Rstudio: Power and user-friendly user interface for R. Excellent for both beginners and developers (http://www.rstudio.com/)

11 Install and load packages CRAN packages  http://cran.r-project.org/web/packages/ http://cran.r-project.org/web/packages/  >6000 packages BioConductor packages  http://www.bioconductor.org/ http://www.bioconductor.org/  ~1000 packages for the analysis of high-throughput genomics data 11 TaskR code Install a CRAN packageinstall.packages (“package name”) Install a BioConductor packagesouce (“http://www.bioconductor.org/biocLite.R”) biocLite (“package name”) Load a package/librarylibrary (“package name”)

12 Basic R syntax Object <- function (arguments)  <-: assignment operator Object <- object[arguments] 12 TaskR code Assign a numeric vector with five numbers to object x using the c() function x <- c(1.3, 10.4, 5.6, 3.1, 6.4, 21.7) Assign a subset of x to a new object yy <- x[1:3] Show the content of xx Show the content of yy Getting information on function c?c Display the output of a function without assignmentc(1,2,5)

13 Data types Numeric data  1, 2, 3 Character data  “a”, “b”, “c” Logical data  TRUE, FALSE, TRUE 13 TaskR code Assign a numeric vector with five numbers to object x using the c() function x <- c(1.3, 10.4, 5.6, 3.1, 6.4, 21.7) Create a character vector from xas.character(x) Create a logical vector from xx>5

14 Data objects Vectors: an ordered collection of items of the same data type (numeric, character, or logical), 1-dimensional Matrices: 2-dimensional objects, all items must have the same data type Arrays: similar to matrices but can have more than two dimensions Data frames: similar to a matrices but can have different data types Lists: an ordered collection of objects Functions 14 TaskR code Create a numeric vector with numbers ranging from 1 to 9 c(1:9) Create a 3x3 numeric matrixmatrix(c(1:9),nrow=3,ncol=3,byrow=TRUE) Create another 3x3 numeric matrix by changing an argument matrix(c(1:9),nrow=3,ncol=3,byrow=FALSE)

15 Operators and calculations Comparison operators: ==, !=,, = Logical operators: & (AND), | (OR), ! (NOT) Calculations  Arithmetic operators: +,-,*,/,^  Arithmetic functions: log, exp, sqrt, mean, var, sd, sum, etc. 15 TaskR code Comparisons3==5 3!=5 3<5 Logical operatorsx<-5 y<-(-8) x>0 | y>0 x>0 & y>0 Calculations(4+2^2)/(2*2) x<-c(1,3,5,7,9) y<-c(2,4,6,8,10) x+y sum((x-mean(x))^2)/(length(x)-1) var(x)

16 Data import, simple analyses, and export 16 TaskR code Import data from a tabular filemyData<-read.table(“~/test/sample_data.txt”,head=T,sep=“\t”) Display the new objectmyData Get class name of the objectclass(myData) Convert data frame to matrixmyMatrix<-as.matrix(myData) Get class name of the matrixclass(myMatrix) Display the matrix objectmyMatrix Get dimensions of the matrixdim(myMatrix) Get a high-level summarysummary(myMatrix) Log transformation of the datamyMatrix_log<-log2(myMatrix) Calculate variance for row #1var(myMatrix_log[1,]) Calculate variances for all rowsvariances<-apply(myMatrix_log,1,var) Calculate means for all rowsmeans<-apply(myMatrix_log,1,mean) Data subsettingmyMatrix_log[1:3,1:2] myMatrix_log[c(“Line_02”,”Line_04”),] myMatrix_log[means>median(means),] Combining dataresults<-cbind(myMatrix_log,means,variances) Write data to a tabular filewrite.table(results, “~/test/sample_data_output.txt”, sep=“\t”, quote=FALSE) Quit Rq() Go to your test directory, and check the file sample_data_output.txt

17 Copying files to/from a local computer Windows  Application: Bitvise SSH (https://www.bitvise.com/ssh-client-download)https://www.bitvise.com/ssh-client-download Mac  Application: Cyberduck (https://cyberduck.io/)https://cyberduck.io/  Click on “Open Connection”  Select “SFTP (SSH File Transfer Protocol)”  Server: vmplogin.accre.vanderbilt.edu  Username: your_user_name  Password: your-password  Don’t change other items 17

18 Copying files to/from a local computer (using Bitvise SFTP in Windows) 18

19 Copying files to/from a local computer (using Cyberduck in Mac) 19


Download ppt "Applied Bioinformatics Introduction to Linux and R Bing Zhang Department of Biomedical Informatics Vanderbilt University"

Similar presentations


Ads by Google