Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introductory Data Analysis F73DA2. Contact Times (Spring Term 2008) Monday 4:15 - 5.15: Lecture in LT3 Tuesday 2:15 - 3.15: Lecture in LT3 Wednesday 10.15.

Similar presentations


Presentation on theme: "Introductory Data Analysis F73DA2. Contact Times (Spring Term 2008) Monday 4:15 - 5.15: Lecture in LT3 Tuesday 2:15 - 3.15: Lecture in LT3 Wednesday 10.15."— Presentation transcript:

1 Introductory Data Analysis F73DA2

2 Contact Times (Spring Term 2008) Monday 4:15 - 5.15: Lecture in LT3 Tuesday 2:15 - 3.15: Lecture in LT3 Wednesday 10.15 - 11.15: Lecture in LT3 Group 1 Tuesday 1.15 - 2.15: Practical in SRG12/13 Group 2 Tuesday 4.15 - 5.15: Practical in SRG12/13

3 Group 1

4 Group 2

5 The web pages for this module can be found linked from John Phillips Home Page: http://www.macs.hw.ac.uk/~jphillips John Phillips Office: CM S06 Email: j.phillips@hw.ac.uk

6 Aims This module aims to develop students' abilities in understanding and solving practical statistical problems, and to teach them how to choose appropriate techniques, analyse data and present results.

7 The module will consist of a mixture of lectures and practical work. Lectures will focus on statistical modelling, including the selection of appropriate models, the analysis and interpretation of results, and diagnostics. Exploratory and graphical techniques will be considered, as well as more formal statistical procedures.

8 Both parametric procedures (e.g. linear and generalized linear models) and nonparametric methods will be discussed, as will modern robust techniques. There will be considerable emphasis on examples, applications, and case studies, especially for continuous response variables. Computing facilities, especially R, will be used extensively.

9 Assessments The module will be assessed by the student's completion of two practical assignments, to be handed in by specified times during the term.

10 Installing R PC Caledonia

11 Simply double click on the “Installer” then select the “R” icon. This will produce a short-cut to R which should be available every time you log on.

12 Installing R On your own pc

13 Download free from the Comprehensive R Archive Network http://cran.r-project.org

14 R screen

15

16 Type command here …. appears in red

17 R screen Arrow keys on keyboard are very useful. Pressingrepeatedly allows you to retrieve previous commands entered.

18 Many keys and function names are very much as you would expect. > 6+4 [1] 10 > 18*3 [1] 54 > log(100) [1] 4.60517 > pi [1] 3.141593 > sin(pi) [1] 1.224606e-16

19 Many keys and function names are very much as you would expect. > cos(pi) [1] -1 > x=7 > y=10 > x+y [1] 17 > sqrt(x*x+7*x*y-2*y*y) [1] 18.41195 >

20 Example : A survey produced the following 200 results of individuals salaries : 23454 20622 19314 19882 22467 16611 17790 17613 19892 17397 22340 17731 20058 22083 18055 18212 24114 20396 20394 20521 17643 19692 24214 16876 22545 17608 24631 21333 21797 20734 17836 20930 16709 18319 19097 20512 17693 23130 20316 19209 21220 17315 22102 21472 19974 22764 18183 20918 19358 20685 21261 21394 22333 21732 19734 19280 18696 21055 25762 18258 20255 19762 17016 20326 19479 18699 18686 17483 20843 20395 19734 19911 18990 19220 17313 21357 17514 17455 21932 21523 21606 23169 21461 19624 18931 18785 20225 25406 21376 20141 18541 23768 19024 21353 19802 19216 19442 19450 19385 20995 21162 21399 18805 18217 17847 19992 17105 14488 20522 21032 19191 20268 19996 17428 21877 19433 20625 19453 19081 21502 21890 21844 20116 17601 22296 21751. 19513 19300 21031 19784 19767 16619 24021 22686 17818 22233 17774 20918 17180 19279 21029 19983 19703 23421 18140 20845 22054 17858 21523 20041 19968 20537 17755 19872 19005 19835 19717 20134 21757 19093 19692 21445 19219 19669 20769 22049 20561 20810 22525 21458 21618 16973 19093 18551 20841 17032 20549 18219 19224 19999 21367 22332 19235 22697 23620 22420 16811 20250 21124 19267 20400 18743 22448 20443 19634 21185 18448 21236 24047 20621

21 Graphical Representation  Histogram  Stem-and-Leaf  Boxplot  Frequency Polygon

22 >hist(salaries)

23 >hist(salaries, nclass=5)

24 > stem(salaries) The decimal point is 3 digit(s) to the right of the | 14 | 5 15 | 16 | 66789 17 | 0001233445556666778888889 18 | 112222334567777889 19 | 000111122222223333344445555667777777888889999 20 | 00000001111233333444445555566667788888999 21 | 00001122223344444445555556678888999 22 | 01112333344555778 23 | 124568 24 | 00126 25 | 48

25 >boxplot(salaries)

26

27 Summary Statistics

28 > mean(salaries)

29 > mean(salaries) [1] 20123.01

30 > mean(salaries) [1] 20123.01 > median(salaries)

31 > mean(salaries) [1] 20123.01 > median(salaries) [1] 20020

32 > mean(salaries) [1] 20123.01 > median(salaries) [1] 20020 > sd(salaries)

33 > mean(salaries) [1] 20123.01 > median(salaries) [1] 20020 > sd(salaries) [1] 1878.09

34 Scatter Diagrams

35 x y 5 6.2 7 9.3 3 6.0 4 6.1 11 12.8 7 8.1 6 8.1 15 16.7 20 23.4 3 4.7 8 10.5 7 7.7 12 14.0 15 16.6 22 24.2

36 >plot(x,y)

37 > plot(y~x) > abline(lm(y~x))

38 Pie Chart Example

39 > television=scan( ) 1: 1 1 2 2 1 4 3 3 5 5 1 1 1 2 1 3 3 3 3 3 4 1 2 1 3 4 27: Read 26 items

40 > television=scan( ) 1: 1 1 2 2 1 4 3 3 5 5 1 1 1 2 1 3 3 3 3 3 4 1 2 1 3 4 27: Read 26 items > barplot(table(television))

41

42 > television.counts=table(television) > names(television.counts)=c("BBC1","BBC2", "ITV1","CH4","Other") >pie(television.counts,col=c("purple","green2", "cyan","yellow","white"))

43

44 Binomial Distribution It takes ages to calculate a series of probabilities

45 If n= 5, a=0.2 and x runs from 0 to 5 5! p(0)= 0.2 0 0.8 5 0! 5! p(0) = 0.32768

46 If n= 5, a=0.2 and x runs from 0 to 5 5! p(1)= 0.2 1 0.8 4 1! 4! p(1) = 0.4096

47 If n= 5, a=0.2 and x runs from 0 to 5 5! p(2)= 0.2 2 0.8 3 2! 3! p(2) = 0.2048

48 If n= 5, a=0.2 and x runs from 0 to 5 5! p(2)= 0.2 2 0.8 3 2! 3! p(2) = 0.2048 …………and so on

49 Using R > dbinom(0:5,5,0.2) [1] 0.32768 0.40960 0.20480 0.05120 0.00640 0.00032

50 Using R > dbinom(0:5,5,0.2) [1] 0.32768 0.40960 0.20480 0.05120 0.00640 0.00032 > pf=dbinom(0:5,5,0.2) > pf [1] 0.32768 0.40960 0.20480 0.05120 0.00640 0.00032 >

51 Using R > pf [1] 0.32768 0.40960 0.20480 0.05120 0.00640 0.00032 > barplot(pf) >

52

53 R Packages

54 R is built from packages of datasets and functions. The base and ctest packages are loaded by default and contain everything necessary for basic statistical analysis. Other packages may be loaded on demand, either via the Packages menu, or via the R function library.

55 Once a package is loaded, the functions within it are automatically available. To make available a dataset from within a package, use the function data. Of particular interest to advanced statistical users is the package MASS, which contains the functions and datasets from the book Modern Applied Statistics with S by W N Venables and B D Ripley. This package can be loaded with > library(MASS)

56

57 To make available the dataset chem from within MASS, use additionally > data(chem) Documentation on any package is available via the R help system. Missing or further packages may usually be obtained from CRAN.

58

59 Some data sets are already in R when you open it. > data(iris) > iris Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa 7 4.6 3.4 1.4 0.3 setosa 8 5.0 3.4 1.5 0.2 setosa 9 4.4 2.9 1.4 0.2 setosa 10 4.9 3.1 1.5 0.1 setosa

60 Notice, though, that if you haven’t used the data command, R will not know that iris exists. Type `demo()' for some demos, `help()' for on-line help, or `help.start()' for a HTML browser interface to help. Type `q()' to quit R. [Previously saved workspace restored] > iris Error: Object "iris" not found >

61 Similarly if you use a file from the library and do not use the library command first, R will not know that a data set exists. Type `demo()' for some demos, `help()' for on-line help, or `help.start()' for a HTML browser interface to help. Type `q()' to quit R. [Previously saved workspace restored] > data(chem) Warning message: Data set `chem' not found in: data(chem) >


Download ppt "Introductory Data Analysis F73DA2. Contact Times (Spring Term 2008) Monday 4:15 - 5.15: Lecture in LT3 Tuesday 2:15 - 3.15: Lecture in LT3 Wednesday 10.15."

Similar presentations


Ads by Google