A gentle introduction to R – how to load in data and produce summary statistics BRC MH Bioinformatics group.

Slides:



Advertisements
Similar presentations
Training Manual HOW TO LOAD A DELIMITED FILE IN X88S PRODUCT PANDORA.
Advertisements

Setting up an E-XL A Step by Step Tutorial Engineering Consultants Group, Inc.
DL Windows Software “Rules” Import a CSV File From Excel
How to install “bubble” fonts These fonts are used to create the Formative Assessment sheets.
How to Import an Excel File Using the SAS Import Wizard SAS 9 for Windows.
Importing Titles in Destiny. 1. Click My Computer or My Documents or your G drive. Save it to whichever drive you wish. 2. Click your G drive if you want.
Importing GPS Data Lecture 13. EasyGPS  Free software for downloading waypoints  EasyGPS ( EasyGPS  Free software for downloading.
How to Import Names into GradeCam By Monica Dixon.
Alternative FILE formats
Outlook Contacts Export Guideline Powered by DonorCommunity TM DonorCommunity eLearning Series v1.2, September 2012 Outlook Contacts Export Guideline Outlook.
V I T R C Free PDF Conversion. V I T R C What do I do?  Open an Internet Browser  Go to
Click to add your Title Click to Write your name. Make sure it looks like this. Author: Nancy Power Point.
1 Introduction to OBIEE: Learning to Access, Navigate, and Find Data in the SWIFT Data Warehouse Lesson 8: Printing and Exporting an OBIEE Analysis This.
What is Dropbox ?– Dropbox is a file storage site which gives you an easy way to save your documents, files, and presentations online and access them from.
RIMS II Online Order and Delivery System Tutorial on Downloading and Viewing Multipliers.
Add a File with X, Y coordinates to MapWindow
Access Tutorial 8 Sharing, Integrating, and Analyzing Data
Introduction to Excel, Word and Powerpoint Developing Valuable Technology Skills! Shawn Koppenhoefer Training in Research in Reproductive Health/Sexual.
1 CM Pilot Introduction Getting around: –Structure –Navigation –Download instructions.
XP New Perspectives on Windows 2000 Professional Windows 2000 Tutorial 2 1 Microsoft Windows 2000 Professional Tutorial 2 – Working With Files.
Copying Music From a CD Margaret S. Britt. Loading Media Player  Click Start  Select the Windows Media Player.
SMS Experiment Work as a Group Do not modify the password Send sms to your group mates and yourself 1.Type the phone numbers in the box 2.Use csv or xls.
How do I export the Address Book to Excel? The first step is to go to "Address Book Report" under Admin Only menu Choose the fields you want. note that.
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Part 4 Processing and saving data with CGI/Perl Psychological Science on the Internet: Designing Web-Based Experiments From the Ground Up R. Chris Fraley.
MySQL Importing and creating a database. CSV (Comma Separated Values) file CSV = Comma Separated Values – they are simple text files containing data which.
TCIS: How to Save Bookmarks How to Save Your Bookmarks in My Favorites Folder Step 1 File Go to File Step 2 Import and Export Select Import and Export.
How to Back Up In 10 Easy Steps. Step 1: Double click on My computer. You should see something like this. WatchWatch.
Bioinformatics for biologists
TCIS: How to Upload Bookmarks to Yahoo Geocities How to Upload Your Bookmarks as a Web Page in Yahoo The main advantage in saving your bookmarks (My Favorites)
Unity Application Generator How Can I… Export variables of a Control module with all parameters, modify the some of the parameters like Initial values.
Word 2013 REVIEW AND LOOK AT RIBBONS USING MORE TEMPLATES FREE TRAINING INFORMATION FROM MICROSOFT.
Pupil Premium for Financial Year 2015/16 Calculation Spreadsheet Version 1.1.
OpenSolver Introduction. Table of Contents About OpenSolver – Slide 3 Installing OpenSolver – Slide 4: For Windows OS – Slide 13: For Mac OS Using OpenSolver.
T3/Tutorials: Data Submission
Unit 2: Lesson 11 & 12 Making Data Visualizations
Multi-Axis Tabular Loads in ANSYS Workbench
Introduction to OBIEE:
Pupil Premium for Financial Year 2017/18 Calculation Spreadsheet
Saving Obvibase Files Correctly
Population Projections Workshop
Naming, Saving, & Retrieving Files
Data File Import / Export
Microsoft Windows 2000 Professional
How to Import an Excel File
Aqua Data Studio.
Last updated: February 16, 2016
Reading a CSV file in R.
Introduction to Web Page Design
Access Tutorial 8 Sharing, Integrating, and Analyzing Data
Code is on the Website Outline Comparison of Excel and R
Which Software?.
InControl R2 Contact Center Reports Overview
CSCI N207 Data Analysis Using Spreadsheet
funCTIONs and Data Import/Export
You can please some of the people some of the time…
Blackboard Tutorial (Student)
To insert this slide into your presentation
To insert this slide into your presentation
To insert this slide into your presentation
You can please some of the people some of the time…
Exporting Data from the Analog Discovery to Excel
To insert this slide into your presentation
To insert this slide into your presentation
Tutorial 8 Sharing, Integrating, and Analyzing Data
To insert this slide into your presentation
To insert this slide into your presentation
Wednesday, October 3rd MICROSOFT OFFICE.
To insert this slide into your presentation
ME 123 Computer Applications I Lecture 7: Basic Functions 3/20/03
Presentation transcript:

A gentle introduction to R – how to load in data and produce summary statistics BRC MH Bioinformatics group

Tutorial outline How to install R on your own computers – Its free – But its already installed on these computers Loading data from excel Plotting Summary statistics

Files Data and slides on: bioinformatics-workshop-october-2012

Show file extensions

Uncheck hide extensions for known file types Click Apply

Installing R – skip as already installed

And follow operating system specific installation instructions Installing R – skip as already installed

Starting R on these computers

Help files

Loading help files A useful function is read.table() – It allows you to read data from spreadsheets into R To see its help file you can use You can use ?function_name for any function to see a help file ?read.table

Loading data into R from excel

From excel Open testdata.xls

From excel You need to save it as a comma separated value file (.csv), go to file>save as>other formats

From excel

R working directory To open a file you will need to point R towards the folder that contains it. You can do this with setwd(), but well do it using the mouse Suppose you have the file in My Documents

Browsing folders To check that you are in the right folder type To see files in this folder you can type To list the current variables type Nothing should be loaded yet getwd()list.files()ls()

Loading data To follow along with this section, make sure your R working directory is that which contains the tutorial data

Read the contents of file testdata.csv into an R variable my.data with: read.csv is a wrapper for read.table which lets you specify more details about your file, eg: my.data <- read.csv(testdata.csv)my.data <- read.table(testdata.csv,sep=,,header=TRUE)

sep : Column separator header : Does the first row of the file contain column headers? skip : Number of rows to skip at the top of the file ?read.table for other useful parameters read.table()

Looking at loaded data

Take a look at the top couple of lines: Generate some basic summary stats: Check your new variable is in the R environment: ls()head(my.data)summary(my.data)

Number of rows and columns Row and column names Check the dimensions of your dataset: dim(my.data) nrow(my.data) ncol(my.data) rownames(my.data) colnames(my.data)

Subsetting Data

Look at the first col: Look at the third column of row 10 Look at the first row: my.data[1,]my.data[,1]my.data[10,3]

Look at the first column for rows 100 to 110 Same as above, but save to a variable Same as above but pre-defining the index vector Look at rows 30,40,50 and 60 my.data[100:110,1]my.subset <- my.data[100:110,1]my.data[c(30,40,50,60),] my.indices <- c(30, 40, 50, 60) my.data[my.indices,]

Look at the columns named 'height' and 'weight' for row 1: Same as above but pre-define the colnames vector Look at the column named 'weight' for row 1: You can subset on names instead of indices: my.data[1,weight]my.data[1,c(weight,height)] cols <- c(weight,height) my.data[1,cols]

Look at all columns except the second for row 1 Extract all rows except Extract all rows except 35, 67,101 Negative indices exclude elements: my.data[1,-2]my.new.data <- my.data[-1:-100,] my.indices <- -1 * c(35, 67, 101) my.new.data <- my.data[my.indices,]

Quiz!

How tall is the person in the 7 th row? What gender is the person in the 300 th row? For the people in rows 20-30, who is the heaviest? For the people in rows 110, 350, 219, 74, who is the tallest? Save all rows except in a variable my.new.data How many males and females are in this new dataset?

Formatting problems

Data isn't comma-separated? Specify the separator in read.table tab-delimited text is another common format, for which you can use sep=\t Load "testdata.txt", a tab-delimited version of the data

Data has extra header information at the top? Either delete this data in Excel before exporting to csv Or, use the skip=N argument to read.table Have a look at "testdata_1.csv" in Excel and then load it into R using read.table

Factors are inconsistently named R will just read in the data you give it. If you aren't consistent naming the levels of your factors it will see them as different levels R is case sensitive. 'MyLevel' != 'mylevel' Load the data from testdata_2.csv and have a look at the gender variable. Try and fix the problems in Excel and reload.

Measurements and units in a single column If you store values like 10kg, R will not interpret this as a numeric column Try loading file 'testdata_3.csv' - what has happened to the weights and heights information? Try loading again so that the two are loaded as character vectors. Have a look at the sub() function and see if you can fix the problem

Excel has just screwed up your data Older versions of Excel have a limit of rows. If you open a larger dataset in Excel it will be truncated. If you then save this dataset you will be saving the truncated version. Avoid opening large datasets in Excel, use R Excel tries to be helpful by formatting elements for you. Try the following and then open in Excel, save as csv and reload into R. What has happened? my.genes<-c('MASH1','SOX2','OCT4') write.csv(my.genes, file='mygenes.csv')

Plotting

Drawing histograms Optional exercises – 1) Try drawing a histogram of height 2) Try and label the x axis [hint: read the help file] hist(my.data$weight)

Drawing normal QQ plots qqnorm(my.data$weight);qqline(my.data$weight)

Drawing scatterplots Optional exercises: try these, do you understand this plot? plot(height~weight,data=my.data) plot(height~weight,data=my.data,col=as.numeric(gender))

Drawing boxplots boxplot(height~gender,data=my.data)

Saving plots JPEGs PDFs jpeg(boxplot.jpg) boxplot(height~gender,data=my.data) dev.off() pdf(boxplot.pdf) boxplot(height~gender,data=my.data) dev.off()

Summary statistics

Functions Covered read.table() head() dim() write.table() mean() sd() cor() cor.test() t.test() shapiro.test() wilcox.test() kruskal.test() lm() anova() coefficients() fitted() residuals() NB: to find help type ?function Eg: ?cor

Writing tables my.data <- read.table("testdata.csv",head=T,sep=",")Here we have height and weight for males and females. We now want to calculate Body Mass Index and save the data as *.csv my.data$BMI <- my.data$weight/(my.data$height ^2)head(my.data)write.table(my.data,file="my_testdata.csv",sep=",",quote=F,row.names=F)

Calculate Mean and SD mean_height <- mean(my.data$height)sd_height <- sd(my.data$height)mean_heightsd_height Try this with the other phenotypes Now lets get mean & sd for just the males mean_height_M <- mean(my.data$height[my.data$gender=="M"])sd_height_M <- sd(my.data$height[my.data$gender=="M"])mean_height_Msd_height_M

Correlate phenotypes and test for group differences You can use the cor( ) function to produce correlations and t.test to test for group differences cor(my.data$height,my.data$weight)cor.test(my.data$height,my.data$weight) Assesses whether the means of two groups are statistically different from each other using T-test t.test(height~gender,data=my.data)

It is always important to check model assumptions before making statistical inferences Test for normality : try this for all phenotypesshapiro.test(my_data$height) Non-parametric alternatives to the t-test R provides functions for carrying out Mann-Whitney U, Wilcoxon Signed Rank and Kruskal Wallis test wilcox.test(height~ gender,data=my.data)kruskal.test(height~ gender,data=my.data)

Linear regression fit <- lm(height~ gender + BMI,data=my.data)summary(fit) Anova table anova(fit) Other useful functions coefficients(fit) # model coefficientsconfint(fit, level=0.95) # CIs for model parametersfitted(fit) # predicted valuesresiduals(fit) # residuals