Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Linux and R

Similar presentations


Presentation on theme: "Introduction to Linux and R"— Presentation transcript:

1 Introduction to Linux and R

2 Justification for Linux
Linux is one of several variants of Unix; Linux, Solaris, MacOS Several “flavours” of Linux: eg Ubuntu, Debian, Fedora, RedHat Many Bioinformatics tools are only available for Linux All large computers that are analysing large data sets run Unix There are many commands and programs for managing large files You can run a linux machine from anywhere, If you have a large dataset that you cannot analyse on your laptop you can run it on a linux machine on another continent from your laptop

3 Learning Linux Graphical interfaces available but we use the command line Can access remote Linux machines from a Windows machine with ”putty” available from Much to learn but a lot of help on the internet

4 Virtual Machines Windows Host Operating System
Docker provides a way to run applications securely isolated in a container, packaged with all its dependencies and libraries. Virtual Box Virtual machine Xubuntu Guest Operating System Docker is free from Docker Container H3 Africa project is developing workflows for GWAS for distribution on the Cloud in Docker Containers Python Workflow for fastStructure Other software eg: Beagle, Plink, vcftools etc

5 Cut and Paste Between Windows and Linux
The most recent set linux installation notes has instructions for enabling cut and paste between Windows and Linux If you have already done the installation but not enabled cat andpaste between Windows andLinux then Google “virtualbox cut and paste between host and guest” to get instructions

6 Getting a terminal To get the Application menu
Right click on the desktop Or Alt-F1

7 R R is a programming language popular with statisticians
Many Bioinformatics packages written in R As a programming language R is complex to learn A very little knowledge of R is sufficient to run many R packages We will mainly use R because it can generate good graphics You should have installed R on your computers There is a lot of help online Plink is one program with many options R is a programming language in which anyone can write a program

8 Lists, Arrays and Vectors
Used for storing sets of similar information Zero based index 1 2 3 4 5 6 7 Array of Integers 672 242 530 501 972 417 180 One based index (R) 8 Array of Strings Apple orange mango melon banana pineapple guava lemon If I enter Fruit[2] the program will return orange If I enter Fruit[2] = “melon” the program will change the contents of cell 2 to melon The R tutorial will teach you how to create vectors and extract data from them in R. Arrays can have many dimensions. In R multi-dimensional vectors are called matrices. Other languages use the same concept but different syntax

9 Bash scripting The shell is an application that allows users to communicate with the computer. The “bash” shell is the most widely used shell for Linux The shell can be used to write simple programs or shell scripts We will use a couple of scripts to run the same command many times each time with different parameters. Scripts are very fussy about exact use of capitals and punctuation Bad things can happen if you copy from word documents into linux commands. Punctuation may need to be reentered.

10 Example bash shell script
eucalypt="eucalypt.clean3" #Begininging of loop for first in bh br cr dr kh lj lr qv rt do grep $first ${eucalypt}.ped | awk '{print $1, $2, $1}' > within.txt plink --file $eucalypt --fst --within within.txt --allow-no-sex --out temp done To learn how to write loops in the bash shell Google ‘bash loops’

11 Excercises Do the online Linux tutorials 1-6 at Complete the Rpractical.doc


Download ppt "Introduction to Linux and R"

Similar presentations


Ads by Google