R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables 

Slides:



Advertisements
Similar presentations
Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Advertisements

Computer Concepts BASICS 4th Edition
Describing Quantitative Variables
 Statistics package  Graphics package  Programming language  Can be used to share/reproduce analyses  Many new packages being created - can be downloaded.
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
LSP 121 Week 2 Intro to Statistics and SPSS/PASW.
Lecture 2 MATLAB fundamentals Variables, Naming Rules, Arrays (numbers, scalars, vectors, matrices), Arithmetical Operations, Defining and manipulating.
How to Use the R Programming Language for Statistical Analyses Part I: An Introduction to R Jennifer Urbano Blackford, Ph.D. Department of Psychiatry Kennedy.
SPSS Statistical Package for the Social Sciences is a statistical analysis and data management software package. SPSS can take data from almost any type.
End Show Introduction to Electronic Spreadsheets Unit 3.
Introduction to Array The fundamental unit of data in any MATLAB program is the array. 1. An array is a collection of data values organized into rows and.
Problem 1: Relationship between Two Variables-1 (1)
Database Design IST 7-10 Presented by Miss Egan and Miss Richards.
Use of Quantile Functions in Data Analysis. In general, Quantile Functions (sometimes referred to as Inverse Density Functions or Percent Point Functions)
General Purpose Packages Spreadsheets. What is a Spreadsheet? A Spreadsheet is a computer program used mainly for recording mathematical data such as.
Microsoft Excel How to make a SPREADSHEET. Microsoft Excel IT is recommended that you have EXCEL running at the same time. You can try what you are reading.
Chapter 5 Review: Plotting Introduction to MATLAB 7 Engineering 161.
Pairwise Alignment, Part I Constructing the Values and Directions Tables from 2 related DNA (or Protein) Sequences.
Excel Projects 5 & 6 Notes Mr. Ursone. Excel Project 5: Sorting a List  Sorting: Arranging records in a specific sequence  The Sort command is on the.
Microsoft Excel 2003 Illustrated Complete And Editing Worksheets Building.
CS&E 200 Exfunctions Using Functions in Excel Objectives: Using Excel functions l Sum, Min, Max, Average, Count l Large, Small, Rank l Round l Countif.
Introduction to MATLAB
Intro to Statistics and SPSS. Mean (average) Median – the middle score (even number of scores or odd number of scores) Percent Rank (percentile) – calculates.
 A database is a collection of data that is organized so that its contents can easily be accessed, managed, and updated. What is Database?
Worked examples and exercises are in the text STROUD (Prog. 28 in 7 th Ed) PROGRAMME 27 STATISTICS.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
Copyrighted material John Tullis 10/17/2015 page 1 04/15/00 XML Part 3 John Tullis DePaul Instructor
Matrix Arithmetic. A matrix M is an array of cell entries (m row,column ) and it must have rectangular dimensions (Rows x Columns). Example: 3x x.
Arrays 1 Multiple values per variable. Why arrays? Can you collect one value from the user? How about two? Twenty? Two hundred? How about… I need to collect.
Collecting Things Together - Lists 1. We’ve seen that Python can store things in memory and retrieve, using names. Sometime we want to store a bunch of.
Chapter 17 Creating a Database.
Introduction to R. Why use R Its FREE!!! And powerful, fairly widely used, lots of online posts about it Uses S -> an object oriented programing language.
1 Copyright © Cengage Learning. All rights reserved. 3 Descriptive Analysis and Presentation of Bivariate Data.
Advanced Topics- Functions Introduction to MATLAB 7 Engineering 161.
Built-in Data Structures in Python An Introduction.
R packages/libraries Data input/output Rachel Carroll Department of Public Health Sciences, MUSC Computing for Research I, Spring 2014.
WHAT IS A DATABASE? A DATABASE IS A COLLECTION OF DATA RELATED TO A PARTICULAR TOPIC OR PURPOSE OR TO PUT IT SIMPLY A GENERAL PURPOSE CONTAINER FOR STORING.
Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
Reports and Queries Chapter 3 – Access text Reports – Page Queries – Page
Lecture 5 1.What is a variable 2.What types of information are stored in a variable 3.Getting user input from the keyboard 1.
Quantitative analysis and R – (1) LING115 November 18, 2009.
Chapter 3 Automating Your Work. It is frustrating when you have to type the same passage of text repeatedly. For example your name and address. Word includes.
June 14, ‘99 COLORS IN MATLAB.
DTC Quantitative Methods Summary of some SPSS commands Weeks 1 & 2, January 2012.
Learn R Toolkit D Kelly O'DayExcel & R WorldsMod 2 - Excel & R Worlds: 1 Module 2 Moving Between Excel & R Worlds Do See & HearRead Learning PowerPoint.
1 An Introduction to R © 2009 Dan Nettleton. 2 Preliminaries Throughout these slides, red text indicates text that is typed at the R prompt or text that.
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
Data & Graphing vectors data frames importing data contingency tables barplots 18 September 2014 Sherubtse Training.
Math 252: Math Modeling Eli Goldwyn Introduction to MATLAB.
Manipulating MATLAB Vector, Matrices 1. Variables and Arrays What are variables? You name the variables (as the programmer) and assign them numerical.
1 Take a challenge with time; never let time idles away aimlessly.
 The term “spreadsheet” covers a wide variety of elements useful for quantitative analysis of all kinds. Essentially, a spreadsheet is a simple tool.
Basics in R part 2. Variable types in R Common variable types: Numeric - numeric value: 3, 5.9, Logical - logical value: TRUE or FALSE (1 or 0)
Descriptive Statistics using R. Summary Commands An essential starting point with any set of data is to get an overview of what you are dealing with You.
String and Lists Dr. José M. Reyes Álamo. 2 Outline What is a string String operations Traversing strings String slices What is a list Traversing a list.
Part 1 Learning Objectives To understand that variables are a temporary named location to store data and that programmers work with different data types.
Practical Office 2007 Chapter 10
Correlation – Regression
Microsoft Excel 2003 Illustrated Complete
Creation, Traversal, Insertion and Removal
Excel: Formulas & Functions I Participation Project
Data Tables and Arrays.
Spreadsheets, Modelling & Databases
Statistics for the Social Sciences
MIS2502: Data Analytics Introduction to R and RStudio
Statistics for the Social Sciences
Lecture 7 – Delivering Results with R
Single Variable Statistics
Data analysis with R and the tidyverse
Introduction to Computer Science
Presentation transcript:

R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables  Indexing  R packages and datasets

Vectors  Think of vectors as being equivalent to a single column of numbers in a spreadsheet  You can create a vector using the c( ) function (concatenate) as follows: x <- c( )  For example: x <- c(1,2,4,8) creates a column of the numbers 1,2,4,8

Vectors Other ways of creating columns of numbers (vectors):  The seq function seq(1,10,1) = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 seq(1,4,0.5) = 1, 1.5, 2, 2.5, 3, 3.5, 4  x:y 1:10 = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 2 * 1:10 = 2, 4, 6, 8, 10, 12, 14, 16, 18, 20  The rep function rep(2,4) = 2, 2, 2, 2 ?seq() ?rep()

Indexing Referencing (indexing) specific ‘cells’ in a column: Example: if x is the vector 1, 2, 5 then x [1] = 1, x [2] = 2, x [3] = 5 and x [1:2] = 1, 2first two listed items in x x [2:3] = 2, 52 nd & 3 rd listed items in x x [x>2] = 5use of ‘>’ and ‘<‘ characters Example: if x is the vector 1, 2, 5 then x [1] = 1, x [2] = 2, x [3] = 5 and x [1:2] = 1, 2first two listed items in x x [2:3] = 2, 52 nd & 3 rd listed items in x x [x>2] = 5use of ‘>’ and ‘<‘ characters

Performing simple operations on vectors  In R, when you carry out simple operations (+ - * /) on vectors that have the same number of entries, R just performs the normal operations on the numbers in the vector, entry by entry  If the vectors don’t have the same number of entries, then R will cycle through the vector with the smaller number of entries

Performing simple operations on vectors Example:

Performing simple operations on vectors Examples:

Performing simple operations on vectors Example:

Performing simple operations on vectors Vectors (columns of numbers) can be assigned by putting together other vectors, for example:

Functions  R functions take arguments (information that you put into the function which goes between the brackets) and can perform a range of tasks  In the case of the ‘help’ function the task is to display information from the R documentation files  A comprehensive list of R functions can be obtained from the R reference manual under the help menu

Simple statistic functions R comes with some useful functions: sqrt ( ) square root mean ( )arithmetic mean hist ( ) calculating & plotting histograms sqrt ( ) square root mean ( )arithmetic mean hist ( ) calculating & plotting histograms R also comes with pre-loaded datasets, which we’ll discuss later….

Basic statistic functions on vectors > X1 <- c(1.1, 4.3, 5, 2, 1, 4, 9.5) > sum(X1)sum = 26.9 > mean(X1)mean = > median(X1)median = 4 > var(X1)variance = > sd(X1)standard deviation = > summary(X1) Min. 1st Qu. Median Mean 3rd Qu. Max > quantile(X1) 0% 25% 50% 75% 100%

Mixing vectors and scalars  R has the very convenient feature of having operators that work with vectors  It is even possible to mix vectors and scalars  For example: > X1 <- c(1.1, 4.3, 5, 2, 1, 4, 9.5) > X1 + 1 [1] > X1 * 2 [1]

Vectors to record data > x = c(45,43,46,48,51,46,50,47,46,45) > length(x) [1] 10 > x = c(x,48,49,51,50,49) # append values to x > length(x) [1] 15 > x[16] = 41 # add to a specified index > length(x) [1] 16 > mean(x) [1] > x[17:20] = c(40,38,35,40) # add to many specified indices > length(x) [1] 20 > mean(x) [1] 45.4

Factors  A factor is a vector that encodes information about the group to which a particular observation belongs  Categorical data is often used to classify data into various levels or factors  To make a factor is easy, using the factor function

Factors – smoking survey example A survey asks people if they smoke or not. The data is: Yes, No, No, Yes, Yes > x=c("Yes","No","No","Yes","Yes") > x # print out values in x [1] "Yes" "No" "No" "Yes" "Yes" > factor(x) # print out value in factor(x) [1] Yes No No Yes Yes Levels: No Yes # notice levels are printed. A survey asks people if they smoke or not. The data is: Yes, No, No, Yes, Yes > x=c("Yes","No","No","Yes","Yes") > x # print out values in x [1] "Yes" "No" "No" "Yes" "Yes" > factor(x) # print out value in factor(x) [1] Yes No No Yes Yes Levels: No Yes # notice levels are printed. Notice the difference in how R treats factors with this example

Factors – student height example Suppose the recorded height of South African and British students are as follows heights <- c(1.7,1.95,1.63,1.54,1.29) You make a new vector fac_heights, to record the nationality that each observation pertains to fac_heights <- factor(c(“GB”, “SA”, “GB”, “GB”, “SA”)) Suppose the recorded height of South African and British students are as follows heights <- c(1.7,1.95,1.63,1.54,1.29) You make a new vector fac_heights, to record the nationality that each observation pertains to fac_heights <- factor(c(“GB”, “SA”, “GB”, “GB”, “SA”)) Useful when testing for differences between groups

Factors – gender survey example Consider a survey that has data on 691 females and 692 males > gender <- c(rep("female",691), rep("male",692))# create vector > gender <- factor(gender) # change vector to factor Consider a survey that has data on 691 females and 692 males > gender <- c(rep("female",691), rep("male",692))# create vector > gender <- factor(gender) # change vector to factor Once stored as a factor, the space required for storage is reduced Values “female” and “male” are the levels of the factor > levels(gender) # assumes gender is a factor [1] "female" "male" Once stored as a factor, the space required for storage is reduced Values “female” and “male” are the levels of the factor > levels(gender) # assumes gender is a factor [1] "female" "male" Internally, the factor ‘gender’ is stored as 691 1’s, followed by 692 2’s. It has stored with it a table that looks like this:

Lists A set of objects (e.g. vectors) can be combined under a single name as a list (similar to a spreadsheet in Excel) Example: x <- c (1, 7, 8, 9, 10) y <- c (“red”, “yellow”, “blue”, “green”) example_list <- list (size = x, colour = y) Example: x <- c (1, 7, 8, 9, 10) y <- c (“red”, “yellow”, “blue”, “green”) example_list <- list (size = x, colour = y) Note: vectors can consist of characters (i.e. letters/words) instead of numbers, but never numbers AND characters

Data frames The function data.frame( ):  This is a special kind of list, in which the entries in a specific position in the elements of the list correspond to one another  Each element of the list has the same length  It is a rectangular table, with rows and columns

Data frames Example 1:  Simple data frames can be created  Enter the following information at the prompt line: h <- c (150, 170, 168, 179, 130) w <- c (65, 70, 72, 80, 51) patient_data <- data.frame (weight=w, height=h)  Type in patient_data to see what’s just been created…

Access of elements in data frames  Individual elements can be accessed using a pair of square brackets “[ ]” and by specifying their index, or name  Here are some ways to access a cell, row or column: patient_data$heightaccesses a column patient_data [, i]accesses the i th column patient_data [ i, ]accesses the i th row patient_data$height [i] i is the cell position in height column patient_data [ i, j ]looking for the j th cell in the i th column

Data frames  More complex tables can be created  Data within each column must have the same type (e.g., number, text), but different columns may have different types – like a spreadsheet, as in the example:

Data frames Accessing specific cells, or data: Note: "$" is a shortcut; minus "-" sign means not.

Tables  We often view categorical data with tables  The table function allows us to look at tables  Its simplest usage is table(x) where x is a categorical variable

Tables Example: smoking survey A survey asks people if they smoke or not. The data is: Yes, No, No, Yes, Yes > x=c("Yes","No","No","Yes","Yes") > table(x) x No Yes 2 3 A survey asks people if they smoke or not. The data is: Yes, No, No, Yes, Yes > x=c("Yes","No","No","Yes","Yes") > table(x) x No Yes 2 3 The table command simply adds up the frequency of each unique value of the data

 View a list of R packages:library()  Access datasets with the data function data( ) provides a list of all the datasets data (Titanic) loads the Titanic dataset summary (Titanic) provides summary information about the Titanic dataset attributes(Titanic) provides more information Titanicdataset name will display the data  List all datasets in a package, e.g., data(package='stats') R packages and datasets

 List preloaded datasets in R:data( )  Display the “women” dataset :women Now let’s access specific data……  Access data from each column: women$height or women[,1] women$weight or women[,2]  Access data from individual rows: women[1, ] or women[10,] etc.  Try it……. Working through some examples

Now that you can access sample data, let’s work with it:  Get the mean weight and height of the women in our example…..  Remember the help function: help(mean)  Also, R can show an example:example(mean) Working through some examples

Common useful functions print()# prints a single R object cat()# prints multiple objects, one after the other length()# number of elements in a vector, or of a list mean() median() range() unique()# gives the vector of distinct values sort()# sort elements into order order()# x[order(x)] orders elements of x rev()# reverse the order of vector elements print()# prints a single R object cat()# prints multiple objects, one after the other length()# number of elements in a vector, or of a list mean() median() range() unique()# gives the vector of distinct values sort()# sort elements into order order()# x[order(x)] orders elements of x rev()# reverse the order of vector elements