Applied Bioinformatics Introduction to Linux and R Bing Zhang Department of Biomedical Informatics Vanderbilt University

Slides:



Advertisements
Similar presentations
Introduction to Linux command line for bioinformatics Wenjun Kang, MS Jorge Andrade, PhD 6/28/2013 Bioinformatics Core, Center.
Advertisements

Introduction to Matlab Workshop Matthew Johnson, Economics October 17, /13/20151.
Learning Unix/Linux Bioinformatics Orientation 2008 Eric Bishop.
Chapter One The Essence of UNIX.
Unix. Outline Commands Environment Variables Basic Commands CommandMeaning lslist files and directories ls -alist all files and directories mkdirmake.
Cosc 4750 Getting Started in UNIX Don’t be afraid of the prompt, in linux it can be your best friend. In some cases, the only way to do certain things.
R for Research Data Analysis using R Day1: Basic R Baburao Kamble University of Nebraska-Lincoln.
Introduction to UNIX Working in a multi-user environment.
Introducing the Command Line CMSC 121 Introduction to UNIX Much of the material in these slides was taken from Dan Hood’s CMSC 121 Lecture Notes.
Working Environment - - Linux - -.
T UTORIAL OF U NIX C OMMAND & SHELL SCRIPT S 5027 Professor: Dr. Shu-Ching Chen TA: Samira Pouyanfar Spring 2015.
Guide To UNIX Using Linux Third Edition
Lecture 01CS311 – Operating Systems 1 1 CS311 – Lecture 01 Outline Course introduction Setting up your system Logging onto the servers at OSU with ssh.
Very Quick & Basic Unix Steven Newhouse Unix is user-friendly. It's just very selective about who its friends are.
How to Use the R Programming Language for Statistical Analyses Part I: An Introduction to R Jennifer Urbano Blackford, Ph.D. Department of Psychiatry Kennedy.
What is R Muhammad Omer. What is R  R is the programing language software for statistical computing and data analysis  The R language is extensively.
1 SEEM3460 Tutorial Unix Introduction. 2 Introduction What is Unix? An operation system (OS), similar to Windows, MacOS X Why learn Unix? Greatest Software.
L INUX C OMMAND L INE I NTERFACE G UNAANBAN.G
CS 141 Labs are mandatory. Attendance will be taken in each lab. Make account on moodle. Projects will be submitted via moodle.
Using Macs and Unix Nancy Griffeth January 6, 2014 Funding for this workshop was provided by the program “Computational Modeling and Analysis of Complex.
GETTING STARTED USING LINUX UBUNTU FOR A MULTI-USER SYSTEM Team 4 Lab Coordinator Manager Presentation Prep Webmaster Document Prep Faculty Facilitator.
What is R By: Wase Siddiqui. Introduction R is a programming language which is used for statistical computing and graphics. “R is a language and environment.
Applied Bioinformatics Course Overview & Introduction to Linux Bing Zhang Department of Biomedical Informatics Vanderbilt University
Advanced File Processing
Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.
Overview of Linux CS3530 Spring 2014 Dr. José M. Garrido Department of Computer Science.
Help session: Unix basics Keith 9/9/2011. Login in Unix lab  User name: ug0xx Password: ece321 (initial)  The password will not be displayed on the.
1 Intro to Linux - getting around HPC systems Himanshu Chhetri.
Linux environment ● Graphical interface – X-window + window manager ● Text interface – terminal + shell.
Unix Basics Chapter 4.
Dedan Githae, BecA-ILRI Hub Introduction to Linux / UNIX OS MARI eBioKit Workshop; Nov , 2014.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
PROGRAMMING PROJECT POLICIES AND UNIX INTRO Sal LaMarca CSCI 1302, Fall 2009.
System Administration Introduction to Unix Session 2 – Fri 02 Nov 2007 Reference:  chapter 1, The Unix Programming Environment, Kernighan & Pike, ISBN.
Session 2 Wharton Summer Tech Camp Basic Unix. Agenda Cover basic UNIX commands and useful functions.
Applied Bioinformatics Introduction to R, continued Bing Zhang Department of Biomedical Informatics Vanderbilt University
Getting started: Basics Outline: I.Connecting to cluster: ssh II.Connecting outside UCF firewall: VPN client III.Introduction to Linux IV.Intoduction to.
Welcome to CS323 Operating System lab 1 TA: Nouf Al-Harbi NoufNaief.net.
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
An Introduction to R Statistical Computing AMS 597 Stony Brook University Spring 2009 By Tianyi Zhang.
Introduction to Programming Using C An Introduction to Operating Systems.
Introduction to R Carol Bult The Jackson Laboratory Functional Genomics (BMB550) Spring 2011.
Unix Servers Used in This Class  Two Unix servers set up in CS department will be used for some programming projects  Machine name: eustis.eecs.ucf.edu.
Linux A practical introduction. 1)Background and Getting Started Linux is an operating system with multiple providers Red Hat/CentOS (our version) Ubuntu.
 Last lesson, the Windows Operating System was discussed along with the Windows command shell  Unix is a computer operating system, that similarly manages.
CS 120 Extra: The CS1 Server Tarik Booker CS 120.
Learning Unix/Linux Based on slides from: Eric Bishop.
INTRODUCTION TO SHELL SCRIPTING By Byamukama Frank
Assignprelim.1 Assignment Preliminaries © 2012 B. Wilkinson/Clayton Ferner. Modification date: Jan 16a, 2014.
Tutorial Six Linux Basics CompSci Semester Two 2016.
Tutorial of Unix Command & shell scriptS 5027
Lesson 5-Exploring Utilities
R Brown-Bag Seminar 2.1 Topic: Introduction to R Presenter: Faith Musili ICRAF-Geoscience Lab.
Development Environment Basics
Programming in R Intro, data and programming structures
Tutorial Six Recap & Linux Basics CompSci Semester Two 2016.
Andy Wang Object Oriented Programming in C++ COP 3330
Some Linux Commands.
Part 3 – Remote Connection, File Transfer, Remote Environments
Assignment Preliminaries
Tutorial of Unix Command & shell scriptS 5027
Tutorial of Unix Commands
Tutorial of Unix Command & shell scriptS 5027
Web Programming Essentials:
Tutorial of Unix Command & shell scriptS 5027
Andy Wang Object Oriented Programming in C++ COP 3330
UNIX/LINUX Commands Using BASH Copyright © 2017 – Curt Hill.
Tutorial Unix Command & Makefile CIS 5027
CSE 303 Concepts and Tools for Software Development
> Introduction to Nelson Rios, Tulane University
Presentation transcript:

Applied Bioinformatics Introduction to Linux and R Bing Zhang Department of Biomedical Informatics Vanderbilt University

Quick summary of the introduced Linux commands 2 CommandMeaning rsh Remote shell passwdModify a user’s password exitExit the shell pwdDisplay the path of the current directory lsList files and directories ls -aList all files and directories ls -a -lList all files and directories in a long listing format mkdir Make a directory cd Change to named directory cdChange to home directory cd ~Change to home directory cd..Change to parent directory rmdir Remove a directory moreView the contents of a file cp Copy file1 and name the copied file file2 mv Move or rename file1 to file2 rm Remove a file man Display manual pages for a command

Getting help man (display manual pages for a command)  space bar to show next page  up and down arrows to move up and down  q to exist 3

Exercise 4 TaskCommand Go to home directorycd Display manual pages for the command lsman ls List the contents of the current directoryls List the contents of the current directory, including entries starting with. and using a long listing format ls -a -l Create a test directory if you don’t have one yet, ignore this if you already have it mkdir test Go to the test directorycd test Copy the file sample_data.txt under directory /home/igptest to current directory with the same name cp /home/igptest/sample_data.txt. View the content of the created filemore sample_data.txt Make a copy of the filecp sample_data.txt sample_data_copy.txt View the content of the new copymore sample_data_copy.txt List the contents of the current directoryls Remove the new copyrm sample_data_copy.txt List the contents of the current directoryls

Data manipulation with filters Filters: programs that accept textual data and then transform it in a particular way. head, tail, cut, sort, uniq, sed … 5 TaskCommand View the content of a filemore sample_data.txt Get the first 10 lines of the filehead sample_data.txt Get the first 5 lines of the filehead -n 5 sample_data.txt Get all but the last 5 lines of the filehead -n -5 sample_data.txt Get the last 10 lines of the filetail sample_data.txt Get the last 5 lines of the filetail -n 5 sample_data.txt Get all lines starting from line 5tail -n +5 sample_data.txt Get the first three columns of the filecut -f 1-3 sample_data.txt Get selected columns of the filecut -f 1,3,5 sample_data.txt Sort all lines based on the numerical values in the second column (non-numeric entries are interpreted as zero) sort -k 2 -n sample_data.txt

Data manipulation with piping and redirection Piping (|) : sending data from one program to another program. Redirection: sending output from one program to a file  >: save output to a file  >>: append output to a file 6 TaskCommand Get the first 10 lines of the file and then get the first three columns head sample_data.txt | cut -f 1-3 Get the first 10 lines of the file, then get the first three columns of these lines, and then redirect the content to a new file head sample_data.txt | cut -f 1-3 >sample_data_subset.txt View the new filemore sample_data_subset.txt Append the last 10 lines of the old file to the end of the new file tail sample_data.txt >> sample_data_subset.txt View the new filemore sample_data_subset.txt

Editing files with nano nano is a user-friendly text editor A quick tutorial 7 TaskCommand Open sample_data.txt for editingnano sample_data.txt Delete the text “Line_01” and the space after it, save the file, and then exit In nano, ^O for saving and ^X for exit View the edited filemore sample_data.txt View the content of the.bashrc file, which is located under your home directory. The file includes commands that are executed when starting the system. more ~/.bashrc Open.bashrc file under your home directory for editing.nano ~/.bashrc Add “setpkgs –a R” to the end of this file. This will allow you to use the R environment which has been installed in the ACCRE system for statistical computing. In nano, ^O for saving and ^X for exit View the edited.bashrc filemore ~/.bashrc Run the.bashrc filesource ~/.bashrc

What is R R is a free software environment for statistical computing and graphics. It includes:  an effective data handling and storage facility  a suite of operators for calculations on arrays, in particular matrices  a large, coherent, integrated collection of intermediate tools for data analysis  graphical facilities for data analysis and display either on-screen or on hardcopy  a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities 8

R Installation and tutorial Download and install R   Choose a CRAN (Comprehensive R Archive Network) mirror  Binary distributions of the base system and contributed packages Windows version Mac OS X version Linux version (already installed on the ACCRE cluster, will be used for this module) Tutorials   An introduction to R 9

R interface 10 Command-line R: Linux/OS X Type R in your Linux shell to start R; Type q() in the R interface to close R. R Gui: OS X (Windows Gui is similar) Download and Install on your laptop Rstudio: Power and user-friendly user interface for R. Excellent for both beginners and developers (

Install and load packages CRAN packages   >6000 packages BioConductor packages   ~1000 packages for the analysis of high-throughput genomics data 11 TaskR code Install a CRAN packageinstall.packages (“package name”) Install a BioConductor packagesouce (“ biocLite (“package name”) Load a package/librarylibrary (“package name”)

Basic R syntax Object <- function (arguments)  <-: assignment operator Object <- object[arguments] 12 TaskR code Assign a numeric vector with five numbers to object x using the c() function x <- c(1.3, 10.4, 5.6, 3.1, 6.4, 21.7) Assign a subset of x to a new object yy <- x[1:3] Show the content of xx Show the content of yy Getting information on function c?c Display the output of a function without assignmentc(1,2,5)

Data types Numeric data  1, 2, 3 Character data  “a”, “b”, “c” Logical data  TRUE, FALSE, TRUE 13 TaskR code Assign a numeric vector with five numbers to object x using the c() function x <- c(1.3, 10.4, 5.6, 3.1, 6.4, 21.7) Create a character vector from xas.character(x) Create a logical vector from xx>5

Data objects Vectors: an ordered collection of items of the same data type (numeric, character, or logical), 1-dimensional Matrices: 2-dimensional objects, all items must have the same data type Arrays: similar to matrices but can have more than two dimensions Data frames: similar to a matrices but can have different data types Lists: an ordered collection of objects Functions 14 TaskR code Create a numeric vector with numbers ranging from 1 to 9 c(1:9) Create a 3x3 numeric matrixmatrix(c(1:9),nrow=3,ncol=3,byrow=TRUE) Create another 3x3 numeric matrix by changing an argument matrix(c(1:9),nrow=3,ncol=3,byrow=FALSE)

Operators and calculations Comparison operators: ==, !=,, = Logical operators: & (AND), | (OR), ! (NOT) Calculations  Arithmetic operators: +,-,*,/,^  Arithmetic functions: log, exp, sqrt, mean, var, sd, sum, etc. 15 TaskR code Comparisons3==5 3!=5 3<5 Logical operatorsx<-5 y<-(-8) x>0 | y>0 x>0 & y>0 Calculations(4+2^2)/(2*2) x<-c(1,3,5,7,9) y<-c(2,4,6,8,10) x+y sum((x-mean(x))^2)/(length(x)-1) var(x)

Data import, simple analyses, and export 16 TaskR code Import data from a tabular filemyData<-read.table(“~/test/sample_data.txt”,head=T,sep=“\t”) Display the new objectmyData Get class name of the objectclass(myData) Convert data frame to matrixmyMatrix<-as.matrix(myData) Get class name of the matrixclass(myMatrix) Display the matrix objectmyMatrix Get dimensions of the matrixdim(myMatrix) Get a high-level summarysummary(myMatrix) Log transformation of the datamyMatrix_log<-log2(myMatrix) Calculate variance for row #1var(myMatrix_log[1,]) Calculate variances for all rowsvariances<-apply(myMatrix_log,1,var) Calculate means for all rowsmeans<-apply(myMatrix_log,1,mean) Data subsettingmyMatrix_log[1:3,1:2] myMatrix_log[c(“Line_02”,”Line_04”),] myMatrix_log[means>median(means),] Combining dataresults<-cbind(myMatrix_log,means,variances) Write data to a tabular filewrite.table(results, “~/test/sample_data_output.txt”, sep=“\t”, quote=FALSE) Quit Rq() Go to your test directory, and check the file sample_data_output.txt

Copying files to/from a local computer Windows  Application: Bitvise SSH ( Mac  Application: Cyberduck (  Click on “Open Connection”  Select “SFTP (SSH File Transfer Protocol)”  Server: vmplogin.accre.vanderbilt.edu  Username: your_user_name  Password: your-password  Don’t change other items 17

Copying files to/from a local computer (using Bitvise SFTP in Windows) 18

Copying files to/from a local computer (using Cyberduck in Mac) 19