SDM1 : Montane woodcreper - Model

Preparation

Start R in the bash terminal and run the following lines to install the libraries.

install.packages("e1071")
install.packages("caret")
install.packages("rworldmap")
install.packages("maptools")
install.packages("rgeos")
install.packages("reshape")
install.packages("randomForest")
[1]:
library(ggplot2)
library(rgdal)
library(raster)
library(rgeos)
library(reshape)
library(rasterVis)
library(dismo)
library(InformationValue)
library(mgcv)
library(randomForest)
library(e1071)
library(caret)
Loading required package: sp
rgdal: version: 1.5-23, (SVN revision 1121)
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 2.3.3, released 2018/12/14
Path to GDAL shared files: /home/user/miniconda3/share/gdal
GDAL binary built with GEOS: TRUE
Loaded PROJ runtime: Rel. 5.2.0, September 15th, 2018, [PJ_VERSION: 520]
Path to PROJ shared files: /home/user/miniconda3/share/proj
Linking to sp version:1.4-5
rgeos version: 0.5-5, (SVN revision 640)
 GEOS runtime version: 3.7.1-CAPI-1.11.1
 Linking to sp version: 1.4-5
 Polygon checking: TRUE

Loading required package: terra
terra version 1.1.4

NOTE: using GDAL version 2.3.3
For full functionality you need at least version 3.0.4


Attaching package: ‘terra’

The following object is masked from ‘package:reshape’:

    expand

The following object is masked from ‘package:rgdal’:

    project

Loading required package: lattice
Loading required package: latticeExtra

Attaching package: ‘latticeExtra’

The following object is masked from ‘package:ggplot2’:

    layer


Attaching package: ‘dismo’

The following object is masked from ‘package:terra’:

    voronoi

Loading required package: nlme

Attaching package: ‘nlme’

The following object is masked from ‘package:terra’:

    collapse

The following object is masked from ‘package:raster’:

    getData

This is mgcv 1.8-34. For overview type 'help("mgcv-package")'.
randomForest 4.6-14
Type rfNews() to see new features/changes/bug fixes.

Attaching package: ‘randomForest’

The following object is masked from ‘package:ggplot2’:

    margin


Attaching package: ‘e1071’

The following object is masked from ‘package:terra’:

    interpolate

The following object is masked from ‘package:raster’:

    interpolate


Attaching package: ‘caret’

The following objects are masked from ‘package:InformationValue’:

    confusionMatrix, precision, sensitivity, specificity

[2]:
set.seed(30)

Data Exploration

We will use Montane woodcreper (Lepidocolaptes lacrymiger) as example species.

This species has a large range, occurring from the coastal cordillera of Venezuela along the Andes south to south-east Peru and central Bolivia.

8286cf8c5dda451c90e64f9fdd76d185

Read in points data

Load observation presence dataset

Let’s suppose that you have done field work and you have collected the bird presence in the Lepidocolaptes_lacrymiger_allpoints.csv file

[1]:
 points_field <- read.csv("./geodata/shp/Lepidocolaptes_lacrymiger_allpoints.csv")
 str(points_field)
'data.frame':   3438 obs. of  3 variables:
 $ lon            : num  -76.2 -76.2 -74.3 -74.3 -76.1 ...
 $ lat            : num  3.98 3.93 4.61 4.61 4.75 ...
 $ scientific_name: chr  "Lepidocolaptes_lacrymiger" "Lepidocolaptes_lacrymiger" "Lepidocolaptes_lacrymiger" "Lepidocolaptes_lacrymiger" ...

Morover you also download aditional points data from the https://www.gbif.org/ .

[3]:
# gbif_points = gbif('Lepidocolaptes' , 'lacrymiger' , download=T , geo=T , ext=c(-82,-59,-21,14) , removeZeros=TRUE )
# save(gbif_points , file="~/SE_data/exercise/geodata/SDM/gbif_points.Rdata")
load("./geodata/SDM/gbif_points.Rdata")
points=rbind.data.frame(
  data.frame(lat=gbif_points$lat,lon=gbif_points$lon),
  data.frame(lat=points_field$lat,lon=points_field$lon)
)
str(points)
'data.frame':   34835 obs. of  2 variables:
 $ lat: num  0.02978 5.3731 4.76848 0.00634 6.40736 ...
 $ lon: num  -78.7 -75.9 -75.5 -78.7 -75.7 ...

Load the environmental data layers

[4]:
rCld   <- raster("./geodata/cloud/SA_meanannual_crop_msk.tif")
# compute min and max
rCld   =  setMinMax(rCld)
rCldIA <- raster("./geodata/cloud/SA_intra_crop_msk.tif")
rCldIA =  setMinMax(rCldIA)
rElv   <- raster("./geodata/dem/SA_elevation_mn_GMTED2010_mn_crop_msk.tif")
rElv   =  setMinMax(rElv)
rVeg   <- raster("./geodata/vegetation/SA_tree_mn_percentage_GFC2013_crop_msk.tif")
rVeg   =  setMinMax(rVeg)
rElv
Error in raster("./geodata/cloud/SA_meanannual_crop_msk.tif"): could not find function "raster"
Traceback:

Load expert range map

[6]:
birdrange <- readOGR("./geodata/shp", "cartodb-query")
OGR data source with driver: ESRI Shapefile
Source: "/media/sf_LVM_shared/my_SE_data/exercise/geodata/shp", layer: "cartodb-query"
with 2 features
It has 7 fields
[7]:
plot(rElv)
points(points$lon, points$lat, col = "red", cex = .3)
plot(birdrange,add=TRUE)
../_images/CASESTUDY_SDM1_MWood_Rmodel_15_0.png
[8]:
# indicate that these data are presences
presence <- matrix(1,nrow(points),1)
points <- cbind(points,presence)
[9]:
head(points)
latlonpresence
0.029785-78.682241
5.373100-75.891001
4.768477-75.452831
0.006336-78.676351
6.407356-75.664171
11.107720-74.048441
[10]:
# building spatial dataframe
coordinates(points)=c('lon','lat')
[11]:
str(points)
Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
  ..@ data       :'data.frame': 34835 obs. of  1 variable:
  .. ..$ presence: num [1:34835] 1 1 1 1 1 1 1 1 1 1 ...
  ..@ coords.nrs : int [1:2] 2 1
  ..@ coords     : num [1:34835, 1:2] -78.7 -75.9 -75.5 -78.7 -75.7 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:2] "lon" "lat"
  ..@ bbox       : num [1:2, 1:2] -81.1 -18.8 -62.6 11.2
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:2] "lon" "lat"
  .. .. ..$ : chr [1:2] "min" "max"
  ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
  .. .. ..@ projargs: chr NA
[12]:
# assign projection
projection(points) <- "+proj=longlat +datum=WGS84"

Loading eBird sampling dataset, in order to obtain “absence” data

[13]:
# link to global sampling raster
# first crop with the gdal and then load the cropversion
system("gdal_translate -projwin -82 14 -59 -21 -co COMPRESS=DEFLATE -co ZLEVEL=9 ./geodata/SDM/eBirdSampling_filtered.tif ./geodata/SDM/eBirdSampling_filtered_crop.tif")
gsampling <- raster("./geodata/SDM/eBirdSampling_filtered_crop.tif")
[14]:
# assign projection
projection(gsampling)="+proj=longlat +datum=WGS84"
gsampling
class      : RasterLayer
dimensions : 4200, 2760, 11592000  (nrow, ncol, ncell)
resolution : 0.008333333, 0.008333333  (x, y)
extent     : -82, -59, -20.99999, 14  (xmin, xmax, ymin, ymax)
crs        : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
source     : /media/sf_LVM_shared/my_SE_data/exercise/geodata/SDM/eBirdSampling_filtered_crop.tif
names      : eBirdSampling_filtered_crop
values     : 0, 65535  (min, max)

[15]:
# convert to points within data region
samplingp <- as(gsampling,"SpatialPointsDataFrame")
samplingp <- samplingp[samplingp$eBirdSampling_filtered_crop>0,]
str(samplingp)
head(samplingp)
Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
  ..@ data       :'data.frame': 17420 obs. of  1 variable:
  .. ..$ eBirdSampling_filtered_crop: num [1:17420] 1 2 1 2 1 1 1 4 1 2 ...
  ..@ coords.nrs : num(0)
  ..@ coords     : num [1:17420, 1:2] -61 -61 -61 -61 -61 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:2] "x" "y"
  ..@ bbox       : num [1:2, 1:2] -82 -21 -59 14
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:2] "x" "y"
  .. .. ..$ : chr [1:2] "min" "max"
  ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
  .. .. ..@ projargs: chr "+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"
eBirdSampling_filtered_crop
25181
25192
25201
25232
25251
25291
[16]:
 # edit column names
 colnames(samplingp@data) <- c("observation")
 samplingp$presence=0
 plot(samplingp, col="green",pch=21,cex=.5)#absences
 plot(points, col="red",add=TRUE)#presences
 plot(birdrange, col="cyan",add=TRUE)#species range
../_images/CASESTUDY_SDM1_MWood_Rmodel_25_0.png
[17]:
summary(samplingp)
Object of class SpatialPointsDataFrame
Coordinates:
        min       max
x -81.97917 -59.00417
y -20.99583  13.99584
Is projected: FALSE
proj4string :
[+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0]
Number of points: 17420
Data attributes:
  observation          presence
 Min.   :   1.000   Min.   :0
 1st Qu.:   1.000   1st Qu.:0
 Median :   2.000   Median :0
 Mean   :   5.935   Mean   :0
 3rd Qu.:   3.000   3rd Qu.:0
 Max.   :1412.000   Max.   :0

combine presence and non-presence point datasets

[18]:
pdata <- rbind(points[,"presence"],samplingp[,"presence"])
pdata@data[,c("lon","lat")] <- coordinates(pdata)
table(pdata$presence)

    0     1
17420 34835

Plot the environmental data layers

[19]:
env <- stack(c(rCld,rCldIA,rElv,rVeg))
env
class      : RasterStack
dimensions : 4200, 2760, 11592000, 4  (nrow, ncol, ncell, nlayers)
resolution : 0.008333333, 0.008333333  (x, y)
extent     : -82, -59, -21, 14  (xmin, xmax, ymin, ymax)
crs        : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
names      : SA_meanannual_crop_msk, SA_intra_crop_msk, SA_elevation_mn_GMTED2010_mn_crop_msk, SA_tree_mn_percentage_GFC2013_crop_msk
min values :                    859,                 0,                                   -72,                                      0
max values :                  10000,              3790,                                  6460,                                  10000

[20]:
# rename layers for convenience
vars <- c("cld","cld_ia","elev","forest")

names(env) <- vars

# visual result
options(repr.plot.width=15, repr.plot.height=9)
 # check out the plot
plot(env)
../_images/CASESTUDY_SDM1_MWood_Rmodel_31_0.png

Scaling and centering the environmental variables to zero mean and variance of 1

[21]:
senv <- scale(env[[vars]])
senv
# this operation is quite long. Would be possible to do in gdal? how?
class      : RasterBrick
dimensions : 4200, 2760, 11592000, 4  (nrow, ncol, ncell, nlayers)
resolution : 0.008333333, 0.008333333  (x, y)
extent     : -82, -59, -21, 14  (xmin, xmax, ymin, ymax)
crs        : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
source     : /tmp/Rtmpmz1dGG/raster/r_tmp_2021-04-20_190050_14811_30290.grd
names      :        cld,     cld_ia,       elev,     forest
min values : -4.5919892, -1.7513057, -0.6661006, -1.7261695
max values :  2.1937157,  4.5331428,  5.0445753,  0.7798705

[22]:
hist(env)
hist(senv)
Warning message in .hist1(raster(x, y[i]), maxpixels = maxpixels, main = main[y[i]], :
“1% of the raster cells were used. 100000 values used.”Warning message in .hist1(raster(x, y[i]), maxpixels = maxpixels, main = main[y[i]], :
“1% of the raster cells were used. 100000 values used.”Warning message in .hist1(raster(x, y[i]), maxpixels = maxpixels, main = main[y[i]], :
“1% of the raster cells were used. 100000 values used.”Warning message in .hist1(raster(x, y[i]), maxpixels = maxpixels, main = main[y[i]], :
“1% of the raster cells were used. 100000 values used.”Warning message in .hist1(raster(x, y[i]), maxpixels = maxpixels, main = main[y[i]], :
“1% of the raster cells were used. 100000 values used.”Warning message in .hist1(raster(x, y[i]), maxpixels = maxpixels, main = main[y[i]], :
“1% of the raster cells were used. 100000 values used.”Warning message in .hist1(raster(x, y[i]), maxpixels = maxpixels, main = main[y[i]], :
“1% of the raster cells were used. 100000 values used.”Warning message in .hist1(raster(x, y[i]), maxpixels = maxpixels, main = main[y[i]], :
“1% of the raster cells were used. 100000 values used.”
../_images/CASESTUDY_SDM1_MWood_Rmodel_34_1.png
../_images/CASESTUDY_SDM1_MWood_Rmodel_34_2.png

Annotate the point records with the scaled environmental data

[23]:
df.xact <- raster::extract(senv,pdata,sp=T)
[24]:
df.xact <- (df.xact[! is.na(df.xact$forest),])

Correlation plots

[25]:
## convert to 'long' format for easier plotting
df.xactl <- reshape::melt(df.xact@data,id.vars=c("lat","lon","presence"),variable.name="variable")
[26]:
head(df.xactl)
latlonpresencevariablevalue
0.029785-78.682241 cld 1.927959
5.373100-75.891001 cld 1.966560
4.768477-75.452831 cld 2.029659
0.006336-78.676351 cld 1.997739
6.407356-75.664171 cld 1.138112
11.107720-74.048441 cld 1.684472
[27]:
tail(df.xactl)
latlonpresencevariablevalue
207399-20.97083-70.137500 forest -1.726169
207400-20.97083-68.562500 forest -1.726169
207401-20.97916-70.154170 forest -1.726169
207402-20.98749-68.554170 forest -1.726169
207403-20.99583-70.137500 forest -1.726169
207404-20.99583-67.429170 forest -1.726169
[28]:
ggplot(df.xactl,aes(x=value,y=presence))+facet_wrap(~variable)+
  geom_point()+
  stat_smooth(method = "lm", formula = y ~ x + I(x^2), col="red")+
  geom_smooth(method="gam",formula=y ~ s(x, bs = "cs")) +
  theme(text = element_text(size = 20))
../_images/CASESTUDY_SDM1_MWood_Rmodel_42_0.png

Model Fitting

cross validation

10731e0884464a2a9ad711f0eae4a66d

[29]:
df.xact <- as.data.frame(df.xact)
[30]:
df.xact$grp <- kfold(df.xact,2)
[31]:
head(df.xact)
presencelonlatcldcld_iaelevforestlon.1lat.1grp
1 -78.68224 0.029785 1.927959 -1.40474920.7799278 0.2916939-78.68224 0.029785 2
1 -75.89100 5.373100 1.966560 -1.54900960.9285523 0.4776421-75.89100 5.373100 2
1 -75.45283 4.768477 2.029659 -1.42796353.0433908 -1.4251941-75.45283 4.768477 2
1 -78.67635 0.006336 1.997739 -1.40972361.0081102 0.4074729-78.67635 0.006336 2
1 -75.66417 6.407356 1.138112 -0.87911061.6227160 -0.6916761-75.66417 6.407356 1
1 -74.04844 11.107720 1.684472 -1.06648331.5134332 -0.6926786-74.04844 11.107720 1
[32]:
mdl.glm <- glm(presence~cld+cld_ia*I(cld_ia^2)+elev*I(elev^2)+forest, family=binomial(link=logit), data=subset(df.xact,grp==1))
[33]:
summary(mdl.glm)

Call:
glm(formula = presence ~ cld + cld_ia * I(cld_ia^2) + elev *
    I(elev^2) + forest, family = binomial(link = logit), data = subset(df.xact,
    grp == 1))

Deviance Residuals:
    Min       1Q   Median       3Q      Max
-3.0094  -0.0254   0.2207   0.3529   3.7362

Coefficients:
                   Estimate Std. Error z value Pr(>|z|)
(Intercept)        -3.12534    0.09761 -32.020  < 2e-16 ***
cld                 1.07658    0.07419  14.512  < 2e-16 ***
cld_ia              0.36013    0.08632   4.172 3.02e-05 ***
I(cld_ia^2)         0.33685    0.03352  10.050  < 2e-16 ***
elev                5.89509    0.21499  27.420  < 2e-16 ***
I(elev^2)          -2.96605    0.18242 -16.259  < 2e-16 ***
forest              1.14142    0.04212  27.097  < 2e-16 ***
cld_ia:I(cld_ia^2) -0.18557    0.02972  -6.244 4.27e-10 ***
elev:I(elev^2)      0.34195    0.04548   7.519 5.50e-14 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 32897  on 25925  degrees of freedom
Residual deviance: 11118  on 25917  degrees of freedom
AIC: 11136

Number of Fisher Scoring iterations: 8

Prediction

Calculate estimates of p(occurrence) for each cell. We can use the predict function in the raster package to make the predictions across the full raster grid and save the output.

[34]:
pred.glm1 <- predict(mdl.glm,df.xact[which(df.xact$grp==1),vars],type="response")
pred.glm2 <- predict(mdl.glm,df.xact[which(df.xact$grp==2),vars],type="response")
[35]:
plotROC(df.xact[which(df.xact$grp==1),"presence"],pred.glm1)
../_images/CASESTUDY_SDM1_MWood_Rmodel_52_0.png
[36]:
plotROC(df.xact[which(df.xact$grp==2),"presence"],pred.glm2)
../_images/CASESTUDY_SDM1_MWood_Rmodel_53_0.png

Out mapping

[37]:
p1 <- raster::predict(senv,mdl.glm,type="response")

Plot the results as a map:

[38]:
options(repr.plot.width=8, repr.plot.height=9)
gplot(p1)+geom_tile(aes(fill=value))+
  scale_fill_gradientn(
    colours=c("blue","green","yellow","orange","red"),
    na.value = "transparent")+
  geom_polygon(aes(x=long,y=lat,group=group),
               data=fortify(birdrange),fill="transparent",col="darkred")+
  geom_point(aes(x = lon, y = lat), data = subset(df.xact,presence==1),col="black",size=0.5)+
  coord_equal()
Regions defined for each Polygons
../_images/CASESTUDY_SDM1_MWood_Rmodel_57_1.png

Cross Validation

The library caret allow an easy implementation of the Cross Validation for several models.

[39]:
ctrl <- trainControl(method = "cv", number = 10)

mdl.glm.cv <- train(  as.factor(presence) ~cld+cld_ia*I(cld_ia^2)+elev*I(elev^2)+forest,
family=binomial(link=logit),  data = df.xact, method = "glm", trControl = ctrl, metric='Accuracy')
[40]:
mdl.glm.cv
Generalized Linear Model

51851 samples
    4 predictor
    2 classes: '0', '1'

No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 46665, 46666, 46666, 46667, 46667, 46666, ...
Resampling results:

  Accuracy   Kappa
  0.9235502  0.8210051

Predict in terms of presence/absence

[41]:
pred.mdl.glm.cv.raw <- raster::predict(senv,mdl.glm.cv ,type="raw")
[42]:
plot(pred.mdl.glm.cv.raw)
../_images/CASESTUDY_SDM1_MWood_Rmodel_63_0.png

Predict and ploting in terms of probability

[63]:
pred.mdl.glm.cv.prob1 <- raster::predict(senv,mdl.glm.cv ,type="prob" , index=1)
plot(pred.mdl.glm.cv.prob1)
../_images/CASESTUDY_SDM1_MWood_Rmodel_65_0.png
[64]:
pred.mdl.glm.cv.prob2 <- raster::predict(senv,mdl.glm.cv ,type="prob" , index=2)
plot(pred.mdl.glm.cv.prob2)
../_images/CASESTUDY_SDM1_MWood_Rmodel_66_0.png

References

Random Forest model

Random Forest is an ensemble learning method for classification, regression and other tasks that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean/average prediction (regression) of the individual trees (source https://en.wikipedia.org/wiki/Random_forest)

The randomForest packages (https://cran.r-project.org/web/packages/randomForest/index.html) allows to run reggression or classification case in R. (check also ranger: A Fast Implementation of Random Forests https://cran.r-project.org/web/packages/ranger/index.html )

[44]:
mdl.rf <- randomForest(as.factor(presence) ~ cld + cld_ia + elev +forest, data=subset(df.xact,grp==1))
[45]:
mdl.rf

Call:
 randomForest(formula = as.factor(presence) ~ cld + cld_ia + elev +      forest, data = subset(df.xact, grp == 1))
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 2

        OOB estimate of  error rate: 6.46%
Confusion matrix:
     0     1 class.error
0 7446  1119  0.13064799
1  557 16804  0.03208341

Predict as presence / absence

[46]:
pred.rf.resp  <- raster::predict(senv , mdl.rf , type='response')
[47]:
plot(pred.rf.resp)
../_images/CASESTUDY_SDM1_MWood_Rmodel_73_0.png

Predict as probability of presence / absence

[62]:
pred.rf.prob1 <- raster::predict(senv , mdl.rf , type='prob' , index=1)
pred.rf.prob0 <- raster::predict(senv , mdl.rf , type='prob' , index=2)
[65]:
plot(pred.rf.prob1)
plot(pred.rf.prob0)
../_images/CASESTUDY_SDM1_MWood_Rmodel_76_0.png
../_images/CASESTUDY_SDM1_MWood_Rmodel_76_1.png

Predict occurence presence

Create a raste with 10 km resoultion (0.083333333333)

[50]:
occ = raster(nrows=420, ncols=276, xmn=-82, xmx=-59, ymn=-21, ymx=14, crs="+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0" , vals=0)
[51]:
occ
class      : RasterLayer
dimensions : 420, 276, 115920  (nrow, ncol, ncell)
resolution : 0.08333333, 0.08333333  (x, y)
extent     : -82, -59, -21, 14  (xmin, xmax, ymin, ymax)
crs        : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
source     : memory
names      : layer
values     : 0, 0  (min, max)

[52]:
pointcount = function(r, pts){
  # make a raster of zeroes like the input
  r2 = r
  r2[] = 0
  # get the cell index for each point and make a table:
  counts = table(cellFromXY(r,pts))
  # fill in the raster with the counts from the cell index:
  r2[as.numeric(names(counts))] = counts
  return(r2)
 }

occ_raster <- pointcount(occ, points)
[53]:
plot(occ_raster)
../_images/CASESTUDY_SDM1_MWood_Rmodel_81_0.png
[54]:
occ_point = rasterToPoints(occ_raster , spatial=TRUE)
[55]:
str(occ_point)
Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
  ..@ data       :'data.frame': 115920 obs. of  1 variable:
  .. ..$ layer: num [1:115920] 0 0 0 0 0 0 0 0 0 0 ...
  ..@ coords.nrs : num(0)
  ..@ coords     : num [1:115920, 1:2] -82 -81.9 -81.8 -81.7 -81.6 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:2] "x" "y"
  ..@ bbox       : num [1:2, 1:2] -82 -21 -59 14
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:2] "x" "y"
  .. .. ..$ : chr [1:2] "min" "max"
  ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
  .. .. ..@ projargs: chr "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"

Sampling the occurrence points by eliminating the extremes.

[56]:
occ_point_sel = subset(occ_point , (layer > 0 & layer < 100)  )
[57]:
hist(occ_point_sel@data$layer)
../_images/CASESTUDY_SDM1_MWood_Rmodel_86_0.png
[58]:
occ_point_sel_predictor <- raster::extract(senv,occ_point_sel,sp=T)
str(occ_point_sel_predictor)
Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
  ..@ data       :'data.frame': 1098 obs. of  5 variables:
  .. ..$ layer : num [1:1098] 1 3 1 80 12 1 1 1 1 1 ...
  .. ..$ cld   : num [1:1098] -0.862 -0.801 -0.508 1.267 1.043 ...
  .. ..$ cld_ia: num [1:1098] 2.7755 2.6462 -0.0699 0.0876 0.3247 ...
  .. ..$ elev  : num [1:1098] -0.4921 -0.5087 -0.493 0.0657 0.4634 ...
  .. ..$ forest: num [1:1098] -1.4159 0.0819 -1.4743 0.5834 0.2877 ...
  ..@ coords.nrs : num(0)
  ..@ coords     : num [1:1098, 1:2] -74.2 -74.1 -72.4 -74.1 -74 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:2] "x" "y"
  ..@ bbox       : num [1:2, 1:2] -81.1 -18.8 -62.6 11.2
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:2] "x" "y"
  .. .. ..$ : chr [1:2] "min" "max"
  ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
  .. .. ..@ projargs: chr "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
[59]:
mdl.rf.occ <- randomForest( layer ~ cld + cld_ia + elev + forest , data=occ_point_sel_predictor )
mdl.rf.occ

Call:
 randomForest(formula = layer ~ cld + cld_ia + elev + forest,      data = occ_point_sel_predictor)
               Type of random forest: regression
                     Number of trees: 500
No. of variables tried at each split: 1

          Mean of squared residuals: 211.6602
                    % Var explained: 0.85
[60]:
pred.rf.occ <- raster::predict(senv , mdl.rf.occ , type='response')
[61]:
plot(pred.rf.occ)
../_images/CASESTUDY_SDM1_MWood_Rmodel_90_0.png