wiki:rstat:r_spat_covar

This shows you the differences between two versions of the page.

wiki:rstat:r_spat_covar [2017/12/05 22:53] |
wiki:rstat:r_spat_covar [2020/07/17 06:39] (current) |
||
---|---|---|---|

Line 1: | Line 1: | ||

+ | ====== Spatial Covariance ====== | ||

+ | Back to [[wiki:rstat:r_geography| Spatial analysis in R]] **main menu**.\\ | ||

+ | \\ | ||

+ | In probability theory and statistics, [[https://en.wikipedia.org/wiki/Covariance_function| covariance]] is a measure of how much two variables change together. While dealing with spatial statistics we test the geographical induced changes and analyze the **spatial covariance** between variable in pairwise correlations between locations. We can define the degree of the intensity (how much) and direction (same direction or inverse correlation) of the variation.\\ | ||

+ | While analysing the spatial covariance on a broad area, we define spatial **stationarity** or **non-stationarity** of the spatial covariance: That is if correlation are stable within a determined region or if they change in intensity and direction depending on the sub-region within an area, respectively. \\ | ||

+ | A classical example [[https://www.stat.washington.edu/research/reports/1992/tr236.pdf| Guttorp and Samsong 1992]] on the use of spatial covariance is the selection of redundant field monitoring sites within a network of environmental sensors. How we design the distribution of senors, should we remove and move redundant stations. We can test if our monitoring sites produce correlates recordings or not, if data are strongly related between stations we might choose to move a station in a area where data are more needed in unmonitored area.\\ | ||

+ | In R we can use SpatialPack to asses the significance of the correlation between two spatial processes.\\ | ||

+ | |||

+ | ===== References ===== | ||

+ | Reference of this analisys can be found in [[http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0068437| Casalegno et al. 2013]] and [[http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0107822| 2014]] which are applications following the more theoretical and statistical approaches explained by the folowing: | ||

+ | * [[http://dx.doi.org/10.2307/2532039 |Clifford P, Richardson S, Hemon D (1989)]] Assessing the significance of the correlation between two spatial processes. Biometrics 45: 123–134. | ||

+ | * [[http://dx.doi.org/10.2307/2532625| Dutilleul P (1993) ]] Modifying the t test for assessing the correlation between two spatial processes. Biometrics 49: 305–314. | ||

+ | * [[http://dx.doi.org/10.2307/2265776|Thomson JD, Weiblen BA, Thomson BA, Alfaro S, Legendre P (1996) ]] Untangling multiple factors in spatial distributions: Lilies, Gophers, and Rocks. Ecology 77: 1698–1715. | ||

+ | * [[http://dx.doi.org/10.1046/j.1523-1739.1999.98364.x| Carroll C, Zielinski W, Noss R (1999) ]] Using presence-absence data to build and test spatial habitat models for the fisher in the Klamath region, USA Cons. Biol. 13: 1344–1359. | ||

+ | * [[http://dx.doi.org/10.1111/j.2007.0906-7590.05171.x|Dormann CF, McPherson JM, Araujo MB, Bivand R, Bolliger J, et al. (2007)]] Methods to account for spatial autocorrelation in the analysis of species distributional data. Ecography 30: 609–628. | ||

+ | * Zar JH (2007) Biostatistical analysis. Upper Saddle River, NJ. Prentice Hall, 960 pag. | ||

+ | * Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57: 289–300. | ||

+ | * [[https://www.stat.washington.edu/research/reports/1992/tr236.pdf| Guttorp and Samsong 1992]] Methods for estimating heterogeneous spatial covariance function with environmental applications. | ||

+ | |||

+ | |||

+ | |||

+ | <note important>N.B. We have downloaded the maps needed here during the previous tutorial on raster [[wiki:rstat:r_load_rast| data visualization]].</note> | ||

+ | If you have already downloaded the maps, you can migrate into their folder, start an R session and load them using : | ||

+ | <code R> | ||

+ | # LOAD RASTERS | ||

+ | library(rgdal) | ||

+ | library(raster) | ||

+ | map_names=c("agri_m9.asc","carA_m9.asc","carS_m9.asc","mitR_m9.asc","prod_m9.asc","pvl_R_m9.asc","recA_m9.asc","recL_m9.asc","recT_m9.asc","urbR_m9.asc","winR_m9.asc","ZONES_m9.asc") | ||

+ | import12=vector("list",12) | ||

+ | for ( x in c(1:12) ){ | ||

+ | import12[[x]]=raster(readGDAL(map_names[x])) | ||

+ | } | ||

+ | MAPS=brick(import12[[1]],import12[[2]],import12[[3]],import12[[4]],import12[[5]],immport12[[11]],import12[[12]]) | ||

+ | |||

+ | # LOAD VECTORS | ||

+ | stat = read.csv("SYNT_2002.csv", header=T) | ||

+ | ward <- shapefile("cornwall.shp") | ||

+ | agri <- merge(ward, stat, by.x='PI', by.y='ID') | ||

+ | </code> | ||

+ | |||

+ | ===== Spatial covariance between vector maps ===== | ||

+ | We can use the agricultural statistics available as vector layer and test significance, direction and intensity of correlation between spatial patterns of distribution between crop and livestock types. As an example we can test the distribution of mature and young bovines livestocks and expect them to be highly correlated as well as should be different cereals types spatial distributions.\\ | ||

+ | Here we compare different features (crop/ livestock type) within the same geographical units (polygons of administrative boundary) so is fine to use centroids of polygons as geographical dimension.\\ | ||

+ | We are going to use the Clifford, Hemond and Richardson modified t test available in the [[http://www.inside-r.org/packages/cran/SpatialPack/docs/modified.ttest| R SpatialPack]] wich performs a modified version of the t test to assess the correlation between two spatial processes. To use this method, we rank transform the variables and compute the Pearson's coefficient adjusted for spatial autocorrelation.\\ | ||

+ | In case of comparison between different vector layers (different polygon distributions) the [[https://cran.r-project.org/web/packages/spatialCovariance/spatialCovariance.pdf|SpatialCovariance]] R package could be applied | ||

+ | |||

+ | <code r> | ||

+ | install.packages("SpatialPack") | ||

+ | library(SpatialPack) | ||

+ | |||

+ | # Compare young and mature beef lifestocks. | ||

+ | modified.ttest(rank(agri@data$cattle_under1yr_v),rank(agri@data$beef_v),coordinates(coordinates(agri))) | ||

+ | # F-statistic: 89.6019 on 1 and 134.5361 DF, p-value: 0 | ||

+ | # alternative hypothesis: true autocorrelation is not equal to 0 | ||

+ | # sample correlation: 0.6323 | ||

+ | |||

+ | </code> | ||

+ | |||

+ | * First we look at the p-value p-value which tells you the probability of observing an effect due to chance alone. If the probability is lower then 5% (p-value <= 0.05) our results are significant and not due to chance according to statistical conventions. | ||

+ | * Then we can look at intensity and direction of the correlation (between -1 and +1) and we found a positive and intense spatial correlation between variables (0.6). | ||

+ | |||

+ | As an example we found a lesser intense correlation between dairy and beef livestocks distributions. | ||

+ | |||

+ | <code r> | ||

+ | modified.ttest(rank(agri@data$dairy),rank(agri@data$beef_v),coordinates(coordinates(agri))) | ||

+ | # F-statistic: 5.6505 on 1 and 140.7158 DF, p-value: 0.0188 | ||

+ | # alternative hypothesis: true autocorrelation is not equal to 0 | ||

+ | # sample correlation: 0.1965 | ||

+ | </code> | ||

+ | |||

+ | ===== Spatial covariance between raster maps ===== | ||

+ | |||

+ | We have loaded a set of raster data including different ecosystem and environmental services (natural capitals) for Cornwall: | ||

+ | * Agricultural value | ||

+ | * Carbon above ground | ||

+ | * Carbon in soil | ||

+ | * Flood mitigation | ||

+ | * Primary plant productivity | ||

+ | * Renewable - Wind energy | ||

+ | * Renewable - solar energy | ||

+ | * Aesthetic value of landscapes | ||

+ | * recreational value | ||

+ | * Touristic value | ||

+ | * Urban development | ||

+ | |||

+ | Additionally we have a map of zones in Corwall: costal areas, east, central and west. We can test spatial covariance between all these layers in the overall study area and within zones. | ||

+ | |||

+ | We have [[wiki:rstat:r_load_rast | previously]] learned how to visualize these maps in R.\\ | ||

+ | |||

+ | We can perform some zonal statistics to look at discrepencies of services in the overall study area and within zones thereoff.\\ | ||

+ | |||

+ | <code r> | ||

+ | TITLE=c("Agriculture value","Carbon above ground","Carbon in soil","Flood mitigation","Plant production","Solar energy","Aesthetic","Recreation","Tourism","Urban development","Wind energy","Zones of Cornwall") | ||

+ | tiff("Zonal statistics.tif", width = 800, height = 900) | ||

+ | nf <- layout(matrix(c(1:12),4,3,byrow=TRUE), c(1,1), c(1,1), TRUE) ; layout.show(nf) ; par(mar=c(5,5,3,1)) | ||

+ | for (band in c(1,2,3,4,5,7,8,9,10,11,6)) { | ||

+ | COS=subset(na.omit(MAPS@data@values[,band][MAPS@data@values[,12]==1]), na.omit(MAPS@data@values[,band][MAPS@data@values[,12]==1]>0)) | ||

+ | WES=subset(na.omit(MAPS@data@values[,band][MAPS@data@values[,12]==2]), na.omit(MAPS@data@values[,band][MAPS@data@values[,12]==2]>0)) | ||

+ | CEN=subset(na.omit(MAPS@data@values[,band][MAPS@data@values[,12]==3]), na.omit(MAPS@data@values[,band][MAPS@data@values[,12]==3]>0)) | ||

+ | EAS=subset(na.omit(MAPS@data@values[,band][MAPS@data@values[,12]==4]), na.omit(MAPS@data@values[,band][MAPS@data@values[,12]==4]>0)) | ||

+ | ALL=subset(na.omit(MAPS@data@values[,band][MAPS@data@values[,12]!=5]), na.omit(MAPS@data@values[,band][MAPS@data@values[,12]!=5]>0)) | ||

+ | boxplot(COS,WES,CEN,EAS,ALL, col=(c("blue","red","green4","orange","grey")),main=TITLE[band], cex.main=2,cex.axis=1.5) | ||

+ | } | ||

+ | plot(1,1,type="n",xlab="",ylab="",axes=F) | ||

+ | legend(0.7,1.5,c("Coastal","West","Centre","East","Overall"), bty = "n", pch=c(15,15,15,15), pt.cex=c(2.5,2.5,2.5), cex=2.5, col=c("blue","red","green4","orange","gray")) | ||

+ | dev.off() | ||

+ | </code> | ||

+ | The output image show the median value (central line), upper quartile (edges of boxes), maximum and minimum values excluding outliers (whiskers) and outliers (dots) of environmental services maps in Cornwall and within zones thereof. | ||

+ | \\ | ||

+ | {{:wiki:rstat:zonalstat_r.png?300|}} | ||

+ | |||

+ | Now we can look at the spatial covariance between agriculture and carbon stocks above and below the soil: | ||

+ | <code r> | ||

+ | XYcoor=xyFromCell(Agri,1:12769) | ||

+ | Agri=import12[[1]] | ||

+ | CarSoil=import12[[3]] | ||

+ | Agri.CarSoil<-na.omit(data.frame(Agri@data@values,CarSoil@data@values,XYcoor)) | ||

+ | modified.ttest(rank(Agri.CarSoil[[1]]),rank(Agri.CarSoil[[2]]),matrix(ncol=2,nrow=length(Agri.CarSoil[[1]]),c(Agri.CarSoil[[3]],Agri.CarSoil[[4]]))) | ||

+ | # Corrected Pearson’s correlation for spatial autocorrelation | ||

+ | # | ||

+ | # F-statistic: 11.2706 on 1 and 350.2359 DF, p-value: 9e-04 | ||

+ | # alternative hypothesis: true autocorrelation is not equal to 0 | ||

+ | # sample correlation: -0.1766 | ||

+ | </code> | ||

+ | |||

+ | We found statistical significant correlation of low intensity and inverse direction.\\ | ||

+ | |||

+ | Additional statistics can be performed and the analyses of all pairwise correlations can be carried out per each distinct sub zone. More specifications and R scripting are available at this [[ http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0107822| PloS ONE]] paper. As an example the code below perform the whole set of correlations by the following steps:\\ | ||

+ | * Create an object list with five elements, each element corresponding to the overall study area and the zone therein; this lists //zMAP// is created for storing raster grid cell values (in total 12769), corresponding to each of the twelve environmental service layers. | ||

+ | * Create a second object list with the same elements as above but only storing a the resulting Pearson's (spatial correlation adjusted) values in a matrix of 11 x 11 binary intersections between layers. | ||

+ | * Create a third layer as above for storing p-values. | ||

+ | * Create a matrix //coords// for storing lat long coordinates of each grid cell of the study area. | ||

+ | * Build a loop for each zone within the study area and an embedded loop in the loop for computing p-value and correlation coefficients between each pairwise combination of layers. | ||

+ | * Compute p-values and correlation coefficients for the whole study area. | ||

+ | |||

+ | <code r> | ||

+ | zMAP=vector("list",5) | ||

+ | zMAP[[5]]=matrix(ncol=12,nrow=12769) | ||

+ | zCORR=vector("list",5) | ||

+ | zPVAL=vector("list",5) | ||

+ | coords=matrix(ncol=2,nrow=12769) | ||

+ | coords=xyFromCell(MAPS,1:12769) | ||

+ | |||

+ | for ( ZONE in c(1:4)){ | ||

+ | zMAP[[ZONE]]=MAPS[MAPS@data@values[,12]==ZONE] | ||

+ | zCORR[[ZONE]]=matrix(ncol=11,nrow=11) | ||

+ | zPVAL[[ZONE]]=matrix(ncol=11,nrow=11) | ||

+ | for ( x in c(1:11)){ | ||

+ | for ( y in c(1:11)){ | ||

+ | xdata=na.omit(data.frame(zMAP[[ZONE]][,x],data.frame(zMAP[[ZONE]][,y],coords[which(MAPS@data@values[,12]==ZONE),1],coords[which(MAPS@data@values[,12]==ZONE),2]))) | ||

+ | temp=modified.ttest(rank(xdata[[1]]),rank(xdata[[2]]),matrix(ncol=2,nrow=length(xdata[[1]]),c(xdata[[3]],xdata[[4]]))) | ||

+ | zCORR[[ZONE]][x,y]=temp$corr | ||

+ | zPVAL[[ZONE]][x,y]=temp$p.value | ||

+ | }}} | ||

+ | |||

+ | ZONE=5 | ||

+ | zMAP[[ZONE]]=MAPS@data@values | ||

+ | zCORR[[ZONE]]=matrix(ncol=11,nrow=11) | ||

+ | zPVAL[[ZONE]]=matrix(ncol=11,nrow=11) | ||

+ | for ( x in c(1:11)){for ( y in c(1:11)){ | ||

+ | xdata=na.omit(data.frame(zMAP[[ZONE]][,x],data.frame(zMAP[[ZONE]][,y],coords[,1],coords[,2]))) | ||

+ | temp=modified.ttest(rank(xdata[[1]]),rank(xdata[[2]]),matrix(ncol=2,nrow=length(xdata[[1]]),c(xdata[[3]],xdata[[4]]))) | ||

+ | zCORR[[ZONE]][x,y]=temp$corr | ||

+ | zPVAL[[ZONE]][x,y]=temp$p.value | ||

+ | }} | ||

+ | |||

+ | </code> |

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International