R Lecture 5 Naomi Altman Department of Statistics.

Slides:



Advertisements
Similar presentations
Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Advertisements

Things to do in Lecture 1 Outline basic concepts of causality
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
Hypothesis Testing Steps in Hypothesis Testing:
Error Handling with Exceptions Concepts C and other earlier languages often had multiple error-handling schemes, and these were generally established.
The Geometric Distributions Section Starter Fred Funk hits his tee shots straight most of the time. In fact, last year he put 78% of his.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Summarizing Bivariate Data Introduction to Linear Regression.
Programming Fundamentals. Programming concepts and understanding of the essentials of programming languages form the basis of computing.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Lecture 24: Thurs., April 8th
CSC 160 Computer Programming for Non-Majors Lecture #7: Variables Revisited Prof. Adam M. Wittenstein
Inferences About Process Quality
Introduction to Linear Regression.  You have seen how to find the equation of a line that connects two points.
Baburao Kamble (Ph.D) University of Nebraska-Lincoln
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
(c) University of Washington03-1 CSC 143 Java Inheritance Reading: Ch. 10.
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
C++ Object Oriented 1. Class and Object The main purpose of C++ programming is to add object orientation to the C programming language and classes are.
A quick introduction to R prog. 淡江統計 陳景祥 (Steve Chen)
R Lecture 6 Naomi Altman Department of Statistics Department of Statistics (Based on S Poetry stat.com/pages/spoetry.html)
SWE 316: Software Design and Architecture – Dr. Khalid Aljasser Objectives Lecture 11 : Frameworks SWE 316: Software Design and Architecture  To understand.
Chapter 11 Introduction to Classes Intro to Computer Science CS1510, Section 2 Dr. Sarah Diesburg.
An introduction to R: get familiar with R Guangxu Liu Bio7932.
Fall 2006AE6382 Design Computing1 OOP: Creating a Class More OOP concepts An example that creates a ASSET class and shows how it might be used Extend the.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
CMSC 202 Exceptions. Aug 7, Error Handling In the ideal world, all errors would occur when your code is compiled. That won’t happen. Errors which.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran.
Demo. Overview Overall the project has two main goals: 1) Develop a method to use sensor data to determine behavior probability. 2) Use the behavior probability.
11 Chapter 11 Object-Oriented Databases Database Systems: Design, Implementation, and Management 4th Edition Peter Rob & Carlos Coronel.
Course: Software Engineering ©Alessandra RussoUnit 2: States and Operations, slide number 1 States and Operations This unit aims to:  Define: State schemas.
Functions, Procedures, and Abstraction Dr. José M. Reyes Álamo.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 2 – Slide 1 of 20 Chapter 4 Section 2 Least-Squares Regression.
Pseudocode. Simple Program Design, Fourth Edition Chapter 2 2 Objectives In this chapter you will be able to: Introduce common words, keywords, and meaningful.
Fitting Linear Functions to Data Lesson Cricket Chirps & Temp. ► Your assignment was to count cricket chirps and check the temperature ► The data.
Chapter 6 Review: User Defined Functions Introduction to MATLAB 7 Engineering 161.
Use of and Using R as an Object Oriented Language John James
Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x.
Warm Up Feel free to share data points for your activity. Determine if the direction and strength of the correlation is as agreed for this class, for the.
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
2004 Hawaii Inter Conf Comp Sci1 Specifying and Proving Object- Oriented Programs Arthur C. Fleck Computer Science Department University of Iowa.
Intro to Planning Or, how to represent the planning problem in logic.
Quadratic Regression ©2005 Dr. B. C. Paul. Fitting Second Order Effects Can also use least square error formulation to fit an equation of the form Math.
UML Part 1: Class Diagrams. Introduction UML stands for Unified Modeling Language. It represents a unification of the concepts and notations presented.
OO in Context Lecture 13: Dolores Zage. Confused about OO Not alone, there is much confusion about OO many programs are claimed to be OO but are not really.
Slide 1 Regression Assumptions and Diagnostic Statistics The purpose of this document is to demonstrate the impact of violations of regression assumptions.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Announcements Assignment 2 Out Today Quiz today - so I need to shut up at 4:25 1.
Combined Metamodel for UCM Contributed by Anthony B. Coates, Londata 17 February, 2008.
Stats Methods at IC Lecture 3: Regression.
Naomi Altman Department of Statistics
Chapter 7. Classification and Prediction
Module 11: File Structure
Statistical Data Analysis - Lecture /04/03
Linear Regression.
The Standard Deviation as a Ruler and the Normal Model
Planning a Simulation Study
Retrieving information from forms
Naomi Altman Department of Statistics (Based on notes by J. Lee)
Regression and Residual Plots
Functions, Procedures, and Abstraction
Regression Models - Introduction
Helen Jefferis, Soraya Kouadri & Elaine Thomas
Section 3.3 Linear Regression
Paired Samples and Blocks
Chapter 26 Comparing Counts.
Functions, Procedures, and Abstraction
C++ Object Oriented 1.
Retrieving information from forms
Presentation transcript:

R Lecture 5 Naomi Altman Department of Statistics

Example: Regression The data are available at ?read.table body=read.table("body.txt",header=T) plot(body$hips,body$weight) plot(body$waist,body$weight) ?formula lm.out=lm(weight~hips+waist,data=body) attributes(lm.out)

Formulas lm fits the regression of Y on a set of X variables. The variable for Y and the predictors are denoted by a formula of the form. You can also use formulas in other contexts. e.g. plot(weight~waist, data=body)

Object Oriented Programming in R or how a bunch of smart programming types made R easier to use and harder to program - at least in the eyes of a statistician

In the bad old days If I wanted to write a function similar to something already in R, I would edit the R code: myFun=edit(Rfun) myDensity=edit(density) Sometimes the R code would call a C or C++ program, but the code for that is also available.

But now... plot boxplot rnorm

Classes and Generic Functions I have already mentioned that one of the attributes a R object can have is a class. A generic function is a function that captures the class of an object and then calls another function to do the actual work. If the function is called fun and the class is called cls, the function that does the work is (almost always) called fun.cls. If there is no suitable fun.cls, then fun.default is used.

e.g. plot(body$hips,body$weight) plot(lm.out) plot.default plot.lm methods(plot)

Classes Actually, a class can be a pair c("first","second") in which the "first" "inherits from" i.e. is a special case of "second". In practise, this means that it has all the components of class "first" objects but possibly some additional ones. If there is no fun.first, then the generic function will search for fun.second. Only if there is also no fun.second will fun.default be used.

e.g. plot uses plot.lm on an object with class "lm" and also on an object with class ("glm","lm")

'inherits' indicates whether its first argument inherits from any of the classes specified in the 'what' argument glm.out=glm(weight~hips+waist,data=body) class(glm.out) "glm" "lm" inherits(lm.out,"lm") inherits(glm.out,"lm") inherits(lm.out,"glm") inherits(glm.out,"glm") plot.lm plot.glm plot(glm.out)

unclass If you remove the class, most objects are just lists. lm.out unclass(lm.out) For example, the "lm" objects are lists with the following components: "coefficients" "residuals" "effects" "rank" "fitted.values" "assign" "qr" "df.residual" "xlevels" "call" "terms" "model" Some of these components are obvious. Some of them are matrix computations that can be used to compute, e.g. the leverages and Cook's Distance (notice that these have not been stored). Some of them are only empty - they are used primarily when the predictor variable is a factor (ANOVA).

Why use classes For the user: less to think about e.g. you can try generic functions like plot and summary with any output For the programmer: provides a framework e.g. you might think about having a plot.myfun and summary.myfun for the function you are writing also, you can use inheritance so that you do not need to write your own functions

Generic Functions Functions that act on many different types of objects are termed "generic functions". Examples include: plot print summary coefficients anova residuals

Generic Functions We have already seen that generic functions behave differently for different classes. The idea is that the user should not have to remember a lot of different function names. Generic functions are a "good thing" when you want R to do what someone else thinks it should do and can be a "bad thing" when you are trying to do something else with your data.

Generic Functions The form of the generic function "genfun" is genfun=function (object,...) { UseMethod("genfun") }

Generic Functions We can use UseMethod to give aliases to the same function. genfun=function (object,...){ UseMethod("genfun")} gen=function (object,...){ UseMethod("genfun")} gfun=function (object,...){ UseMethod("genfun")}

Generic Functions If you want an argument other than the first to be the one whose class controls the generic function, then the name of the argument must be sent to UseMethod genfun=function(x,y,z,...){ UseMethod("genfun",z) }

Generic Functions If UseMethod finds that the calling object inherits from a class, it searches for a function "genfun.class". If there is no function that matches the class, it looks through the inheritance list. If there is no match, or no class, the function "genfun.default" is used.

Generic Functions There is a lot more on this in the "S Poetry" manual - it looks very complete to me. I have been writing programs in S/R since 1981, and have not needed to create classes or methods but...

Generic Functions I have often used an existing function to create new functions - I have been confused by failing to understand generic functions (especially "summary" and "print"). One way to become well-known is to distribute your methodology as an R package. To be distributed from CRAN or other project repositories, your package must adhere to R programming standards.

Generic Functions Some of the newer packages (particularly packages for bioinformatics) rely heavily on the use of Generic Functions, and you can never understand what they are doing without understanding at least the basics of this material.

Slots I was not able to find an intuitive definition for "slot" so this is my own heuristic. An object is a list with a class. A slot is a function that extracts data from an object. It may be one of the elements stored in the object, or a derived data element.

Slots For example: an lm object includes the list: "coefficients" "residuals" "effects" "rank" "fitted.values" "assign" "qr" "df.residual" "xlevels" "call" "terms" "model" We might build a new class, "Elm" (extended "lm")

Slots Suppose we wanted to write a method that draws a histogram of any of dependent variable, residuals, studentized residuals, fitted values. We could have a method of the form: hist.Elm=function(object,slot) Our slots would be: dependent, residuals, student, fitted

Slots If we set class(lm.out)=c("Elm","lm") then hist(lm.out,residual) would extract the residuals from the list and draw the histogram. hist(lm.out,student) would compute the studentized residuals (which are not stored) and draw the histogram.

Slots By convention, the slots of an object can be extracted either by: or slotname(objectname)

Slots Again, I have used S/R for many years without writing or even encountering slots. But some of the recent packages use this programming concept, so it is important to understand it. My understanding is that slots are used primarily in areas like data-mining and microarrays, where the data storage requirements are large.

Learning to Use Objects and other Extensions Calling C or C++ from R: Writing R extensions Object oriented programming in R (S3 protocol) R Language Definition (S4 protocol) R Internals