# Data in R. General form of data ID numberSexWeightLengthDiseased… 112m4.5338.60… 256f3.61 NA1… 3……………… 4……………… n91m5.1711… NOTE: A DATASET IS NOT A MATRIX!

## Presentation on theme: "Data in R. General form of data ID numberSexWeightLengthDiseased… 112m4.5338.60… 256f3.61 NA1… 3……………… 4……………… n91m5.1711… NOTE: A DATASET IS NOT A MATRIX!"— Presentation transcript:

Data in R

General form of data ID numberSexWeightLengthDiseased… 112m4.5338.60… 256f3.61 NA1… 3……………… 4……………… n91m5.1711… NOTE: A DATASET IS NOT A MATRIX! WHY?

Datasets in R A dataset in R is a collection of vectors One vector includes observations of one variable Length of the vectors is the number of observations, e.g. number of sampled individual A line in a dataset is one “data unit”, i.e. one observation, information collected from one individual Dataset can be created by attaching together measurement vectors using data.frame() function When creating a dataset, names of the variables (vectors) can be given data=data.frame(ID=aa, Sex=bb, …)

Anatomy of a data frame Structure of a dataset: –“name of the dataset” \$ “name of the variable”: e.g data\$ID –> In this way you can point to one variable in a dataset Rows, columns and individual variable values –Column: “name of the dataset” [, “column number of the variable”], e.g. data[,1] OR data\$ID –Row: “name of the dataset”[“row number”,], e.g. data[1,] –Cells: “name of the dataset[“row number”,”column number”], e.g. data[1,1] OR data\$ID[ “row number” ] Basic information about a dataset –Structure: str(“name of the dataset”) –Dimensions: dim(“name of the dataset”) –Basic statistics: summary(“name of the dataset”) -> DEMO 1

Importing and exporting data Most commonly, data has to be exported to R from excel There is an R package for this library(read.xls) But, most universal way is to import data to R is to first save it as a.txt file, and read it by read.table() To make this easy: –Save missing values as NA –Separate decimals by. not by, –Separate variables by tabs –Do not leave empty spaces to variable names or values

Function read.table() read.table(file, header = T, sep = "\t",…) file e.g. “F:/data.txt” sep e.g "," or "\t" If decimals separated with, then dec="," Success of the data reading can be checked by: dim( “name of the dataset” ) summary( “name of the dataset” ) Raw data can be also viewed and edited in R using Data editor fix (“name of the dataset”)

Exporting data from R Data files can be exported from R using write.table() function write.table( ‘name of the data’, “file path”, sep = “ “,… ) Or using a write.csv() -> DEMO 2

Exploring data loaded to R Once again, summary() and dim() are the first functions to investigate the contents and size of the data It is important to check if the variable types are correct!!! (use e.g. summary() for this) For categorically structured data, tapply() function is very handy: tapply (target vector, list(factor1, factor2, …), ‘function to be applied’, na.rm=T) This procedure returns function values for every combination of categories of the factors given in the list

Some additional utilities R contains a large variety of dataset built in to R –To get a list of those: data() Pointing to variables without \$ attach( ”name of the dataset” ) To remove this effect detach( ”name of the dataset” ) DEMO 3

Download ppt "Data in R. General form of data ID numberSexWeightLengthDiseased… 112m4.5338.60… 256f3.61 NA1… 3……………… 4……………… n91m5.1711… NOTE: A DATASET IS NOT A MATRIX!"

Similar presentations