Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Geospatial Analysis in R SURF – 24 April 2012 Daniel Marlay.

Similar presentations

Presentation on theme: "Introduction to Geospatial Analysis in R SURF – 24 April 2012 Daniel Marlay."— Presentation transcript:

1 Introduction to Geospatial Analysis in R SURF – 24 April 2012 Daniel Marlay

2 Synopsis This month's talk is going to look at the geo-spatial capabilities of R. We'll look at how to import common geographical data formats into R and some of the free geographic data sources and map layers available. We'll then look at how to create maps in R using this data, and some of the ways to style it to display our data. We'll look at how R stores geographic data and how we can perform queries against that - for example identifying which points fall into a particular region. Finally, we'll take a brief look at modeling geospatial data and some of the issues to be aware of.

3 Introduction There are extensive geospatial capabilities in R – I’ve just started to scratch the surface This presentation will give a little bit of theory – Most of the content is a walk through of doing geospatial analysis in R I’ve picked data sets that are freely available – Trying this yourself is the best way to learn And maybe we’ll learn something about the way Australians vote…

4 R Geospatial Packages sp – provides a generic set of functions, classes and methods for handling spatial data rgdal – provides an R interface into the Geospatial Data Abstraction Library (GDAL) which is used to read and write geospatial data from R

5 Types of Geospatial Data Vector data – Points – Lines – Areas Bitmap – Often used for image data (e.g. aerial photos) – Needs to be registered to a coordinate system “Labelled” data – Has geographic information, but needs to be matched before it can be used

6 Setting up the R Environment ## Set working directory to where the data is. Update as required if running this yourself setwd("C:\\Documents and Settings\\marlada\\My Documents\\AQUA Internal\\Thought Leadership\\201204 - SURF Geospatial Analysis Presentation"); ## Load the relevant libraries library(sp); # Basic R classes for handling geographic data library(rgdal); # Library for using the Geographic Data Abstraction Layer library(nlme); # Library that gives us generalised least squares

7 Obtain Census Data (1/6)

8 Obtain Census Data (2/6)

9 Obtain Census Data (3/6)

10 Obtain Census Data (4/6)

11 Obtain Census Data (5/6)

12 Obtain Census Data (6/6)

13 Read In Census Data (1/3) ## Read in and clean the census data (Note: a lot of this cleaning could be done more easily in Excel) EducationLevel <- read.csv("EducationData.csv",skip=6,na.strings=""); EducationLevel <- EducationLevel[c(-1,-2),c(-1,-27)]; # Remove leading and trailing blank columns and blank second row EducationLevel <- EducationLevel[-(97:100),]; # Remove trailing blank lines #### Create some useable column names EduDataCols <- paste(c(rep("Male",8),rep("Female",8),rep("Total",8)), rep(c("NotStated","InadDescr","Postgrad","GradDipCert","Bachelor","Diploma","Certificate","NA"),3), sep="."); colnames(EducationLevel) <- c("SED",EduDataCols);

14 Read In Census Data (2/3) #### Recode the data into character and numeric data to avoid weird errors from factors EducationLevel[,1] <- as.character(EducationLevel[,1]); for (col in EduDataCols) { EducationLevel[,col] <- as.numeric(as.character(EducationLevel[,col])); } #### Eyeball the data to make sure it is ok. summary(EducationLevel); head(EducationLevel,10); tail(EducationLevel,10);

15 Read In Census Data (3/3)

16 Obtain Electoral Data (1/4)

17 Obtain Electoral Data (2/4)

18 Obtain Electoral Data (3/4)

19 Obtain Electoral Data (4/4)

20 Read In Electoral Data (1/2) ## Read in the electoral data ElectionResults <- read.csv("2011NSWElectionResults.csv"); #### Eyeball data to make sure it is ok summary(ElectionResults); head(ElectionResults); tail(ElectionResults);

21 Read In Electoral Data (2/2)

22 Obtain Geography (1/4)

23 Obtain Geography (2/4)

24 Obtain Geography (3/4)

25 Obtain Geography (4/4)

26 Read In SED Geography (1/3) ## Read in the state electoral division boundaries (geography) and explore the SpatialPolygonsDataFrame class SED <- readOGR("C:\\Documents and Settings\\marlada\\My Documents\\AQUA Internal\\Thought Leadership\\201204 - SURF Geospatial Analysis Presentation\\Geographies","SED06aAUST_region"); #### Have an initial look at the SED data set that we've just read in summary(SED); plot(SED);

27 Read In SED Geography (2/3)

28 Read In SED Geography (3/3)

29 Examining the SpatialPloygonsDataFrame (1/2) #### SED is a SpatialPolygonsDataFrame, an S4 object. We can have a look at how it is constructed mode(SED); slotNames(SED); summary(SED@data); summary(SED@polygons); SED@plotOrder; SED@bbox; SED@proj4string;

30 Examining the SpatialPloygonsDataFrame (2/2)

31 Simple Mapping of SpatialPolygonsDataFrames (1/2) #### Let's now look at some more mapping, we've seen that we can plot all of Australia plot(SED[SED$STATE_2006 == "1",]); # Plot NSW plot(SED[SED$STATE_2006 == "1",],xlim=c(150.6,151.4),ylim=c(-34.3,-33.4)); # Plot Sydney - xlim and ylim from google maps ;-) plot(SED[SED$STATE_2006 == "1",],xlim=c(150.6,151.4),ylim=c(-34.3,-33.4)); # Plot Sydney and put on some electoral district names text(coordinates(SED[SED$STATE_2006 == "1",]),labels=(SED[SED$STATE_2006 == "1",])$NAME_2006,cex=0.5);

32 Simple Mapping of SpatialPolygonsDataFrames (1/2)

33 Thematic Mapping (1/8) ## Thematic mapping SED.NSW <- SED[SED$STATE_2006 == "1",]; # subset of SED for convenience #### Create a ThemeData data set with a summary of the data we are interested in - proportion of people with a tertiary education ThemeData <- data.frame(SED = as.character(EducationLevel$SED), PropTertiaryEd = (EducationLevel$Total.Postgrad + EducationLevel$Total.GradDipCert + EducationLevel$Total.Bachelor + EducationLevel$Total.Diploma + EducationLevel$Total.Certificate) / (EducationLevel$Total.Postgrad + EducationLevel$Total.GradDipCert + EducationLevel$Total.Bachelor + EducationLevel$Total.Diploma + EducationLevel$Total.Certificate + EducationLevel$Total.NA), stringsAsFactors=FALSE); hist(ThemeData$PropTertiaryEd); # Histogram of the proportions to work out the appropriate cut points ThemeData$PropTertiaryEdFact <- cut(ThemeData$PropTertiaryEd,c(0,0.25,0.3,0.35,0.4,0.5,1.0)); # Create a factor for the proportion variable levels(ThemeData$PropTertiaryEdFact) <- c("25% or Less","25% to 30%","30% to 35%","35% to 40%","40% to 50%","More than 50%");

34 Thematic Mapping (2/8)

35 Thematic Mapping (3/8) #### Display a thematic map for all of NSW bands <- length(levels(ThemeData$PropTertiaryEdFact)); pal <- heat.colors(bands); plot(SED.NSW,col=pal[ThemeData$PropTertiaryEdFact[match(SED.NSW$NAME_2006,T hemeData$SED)]]); # Note the use of match() to get the right rows legend("bottomright", legend=levels(ThemeData$PropTertiaryEdFact), fill=pal, title="Prop. with Tertiary Ed.",inset=0.01); #### Display a thematic map for Sydney plot(SED.NSW,col=pal[ThemeData$PropTertiaryEdFact[match(SED.NSW$NAME_2006,T hemeData$SED)]],xlim=c(150.6,151.4),ylim=c(-34.3,-33.4)); legend("bottomright", legend=levels(ThemeData$PropTertiaryEdFact), fill=pal, title="Prop. with Tertiary Ed.",inset=0.01);

36 Thematic Mapping (4/8)

37 Thematic Mapping (5/8) #### Now we'll add the election results to our ThemeData data set rownames(ElectionResults) <- as.character(ElectionResults$District); # Adding rownames allows us to index by them when matching ThemeData$PropGreenVote <- ElectionResults[ThemeData$SED,"GRN"] / ElectionResults[ThemeData$SED,"Total"]; # Create a green vote proportion variable hist(ThemeData$PropGreenVote,breaks=20); # Have a look at the distribution ThemeData$PropGreenVoteFact <- cut(ThemeData$PropGreenVote,c(0,0.05,0.06,0.08,0.1,0.15,1.0)); # Create a factor levels(ThemeData$PropGreenVoteFact) <- c("Less than 5%","5% to 6%","6% to 8%","8% to 10%","10% to 15%","More than 15%");

38 Thematic Mapping (6/8)

39 Thematic Mapping (7/8) #### And do some thematic maps of the election results bands <- length(levels(ThemeData$PropGreenVoteFact)); pal <- heat.colors(bands); plot(SED.NSW,col=pal[ThemeData$PropGreenVoteFact[match(SED.NSW$NAME_2006,Th emeData$SED)]]) legend("bottomright", legend=levels(ThemeData$PropPropGreenVoteFactFact), fill=pal, title="Prop. Voted Green",inset=0.01) plot(SED.NSW,col=pal[ThemeData$PropGreenVoteFact[match(SED.NSW$NAME_2006,Th emeData$SED)]],xlim=c(150.6,151.4),ylim=c(-34.3,-33.4)) legend("bottomright", legend=levels(ThemeData$PropGreenVoteFact), fill=pal, title="Prop. Voted Green",inset=0.01)

40 Thematic Mapping (8/8)

41 Obtain Topographic Map Data (1/9)

42 Obtain Topographic Map Data (2/9)

43 Obtain Topographic Map Data (3/9)

44 Obtain Topographic Map Data (4/9)

45 Obtain Topographic Map Data (5/9)

46 Obtain Topographic Map Data (6/9)

47 Obtain Topographic Map Data (7/9)

48 Obtain Topographic Map Data (8/9)

49 Obtain Topographic Map Data (9/9)

50 Geographic Querying (1/4) ## Demonstration of geographic querying #### Read in the Localities layer from the TOPO 2.5M data set Locs <- readOGR("C:\\Documents and Settings\\marlada\\My Documents\\AQUA Internal\\Thought Leadership\\201204 - SURF Geospatial Analysis Presentation\\Geographies\\localities","aus25lgd_p"); Mtns <- Locs[Locs$LOCALITY == "6",]; # Select only mountains plot(Mtns) #### Use the over function to find a list of mountains in SEDs with more than 10% green votes over(SED.NSW[!$PropGreenVote[match(SED.NSW$NAME_2006,ThemeData$SED)]) & ThemeData$PropGreenVote[match(SED.NSW$NAME_2006,ThemeData$SED)] > 0.10,], Mtns); # Only gets one mountain per SED over(SED.NSW[!$PropGreenVote[match(SED.NSW$NAME_2006,ThemeData$SED)]) & ThemeData$PropGreenVote[match(SED.NSW$NAME_2006,ThemeData$SED)] > 0.10,], Mtns,returnList=TRUE); # Gets all mountains, but in a less useful format"rbind",over(SED.NSW[!$PropGreenVote[match(SED.NSW$NAME_2006,The meData$SED)]) & ThemeData$PropGreenVote[match(SED.NSW$NAME_2006,ThemeData$SED)] > 0.10,], Mtns,returnList=TRUE)); # Gives us something a bit more useable

51 Geographic Querying (2/4)

52 Geographic Querying (3/4)

53 Geographic Querying (4/4)

54 Geospatial Modelling (1/6) ## Spatial GLS relating proportion who vote green to proportion with a higher education #### Add some spatial data to the ThemeData data set - using equidistant conic coordinates - lat-long give greater distance distortion SED.NSW.coords.eqdc <- coordinates(spTransform(SED.NSW,CRS("+proj=eqdc +lat_1=-34 +lat_2=-33 +lat_0=-33.5 +lon_0=151 +x_0=0 +y_0=0"))); rownames(SED.NSW.coords.eqdc) <- as.character(SED.NSW$NAME_2006); colnames(SED.NSW.coords.eqdc) <- c("x","y"); plot(spTransform(SED.NSW,CRS("+proj=eqdc +lat_1=-34 +lat_2=-33 +lat_0=-33.5 +lon_0=151 +x_0=0 +y_0=0"))); # shows how the conic projection looks lines(spTransform(gridlines(SED.NSW,easts=seq(140,160,by=2.5),norths=seq(-37.5,- 27.5,by=2.5)),CRS("+proj=eqdc +lat_1=-34 +lat_2=-33 +lat_0=-33.5 +lon_0=151 +x_0=0 +y_0=0"))); tail(ThemeData); ThemeData2 <- ThemeData[-(94:96),]; # Remove the last few rows of ThemeData - they don't have geographic locations ThemeData2 <- cbind(ThemeData2,SED.NSW.coords.eqdc[ThemeData2$SED,]); head(ThemeData2); summary(ThemeData2);

55 Geospatial Modelling (2/6)

56 Geospatial Modelling (3/6) #### Start with a basic linear model model1 <- gls(PropGreenVote ~ PropTertiaryEd,data=ThemeData2,na.action=na.omit); summary(model1); plot(model1); plot(Variogram(model1, form=~x+y)); # Note the correlation structure

57 Geospatial Modelling (4/6)

58 Geospatial Modelling (5/6) #### Now try some gls models with spatial correlation structures model2 <- gls(PropGreenVote ~ PropTertiaryEd,data=ThemeData2,corr=corExp(form=~x+y),na.action=na.omit); summary(model2); plot(model2); plot(Variogram(model2, form=~x+y)); model3 <- gls(PropGreenVote ~ PropTertiaryEd,data=ThemeData2,corr=corGaus(form=~x+y),na.action=na.omit); summary(model3); plot(model3); plot(Variogram(model3, form=~x+y)); model4 <- gls(PropGreenVote ~ PropTertiaryEd,data=ThemeData2,corr=corSpher(form=~x+y),na.action=na.omit); summary(model4); plot(model4); plot(Variogram(model4, form=~x+y)); #### Compare the models using AIC AIC(model1,model2,model3,model4); # Looks like adding the correlation structure gave no benefit

59 Geospatial Modelling (6/6)

60 Nice Looking Map (1/2) ## Finally, lets put together a good looking map. Roads <- readOGR("C:\\Documents and Settings\\marlada\\My Documents\\AQUA Internal\\Thought Leadership\\201204 - SURF Geospatial Analysis Presentation\\Geographies\\roads","aus25vgd_l"); SED.NSW.coords <- coordinates(SED.NSW); sydrows 150.5) & (SED.NSW.coords[,1] -34.3) & (SED.NSW.coords[,2] < -33.4); SED.SYD <- SED.NSW[sydrows,]; sydgrid <- gridlines(SED.SYD,easts=seq(150.4,151.6,by=0.1),norths=seq(-34.3,-33.4,by=0.1)); sydgridat <- gridat(SED.SYD,easts=seq(150.4,151.6,by=0.1),norths=seq(-34.3,-33.4,by=0.1)); pdf("FinalMap.pdf"); bands <- length(levels(ThemeData$PropTertiaryEdFact)); pal <- heat.colors(bands); plot(SED.NSW,col=pal[ThemeData$PropTertiaryEdFact[match(SED.NSW$NAME_2006,ThemeData$SED)]],xlim =c(150.6,151.4),ylim=c(-34.5,-33.4)) lines(Roads,col="black",xlim=c(150.6,151.4),ylim=c(-34.5,-33.4)); legend("bottomright", legend=levels(ThemeData$PropTertiaryEdFact), fill=pal, title="Prop. with Tertiary Ed.",inset=0.01,bty="n",bg="white") title(c("Proportion of People with Tertiary Education","by Sydney State Electoral Divisions"),sub="Data from 2006 Census");

61 Nice Looking Map (2/2)

62 Example Data Sources Census geographies – me/Geography?opendocument#from-banner=LN me/Geography?opendocument#from-banner=LN Census results (CDATA Online) – NSW State Electoral Results – Geoscience Australia – Topographic Maps –


Download ppt "Introduction to Geospatial Analysis in R SURF – 24 April 2012 Daniel Marlay."

Similar presentations

Ads by Google