Spatial Autocorrelation

April 18, 2018
Spatial autocorrelation Data Seminar

Load packages

What is autocorrelation (spatial or temporal)?

Autocorrelation is the measure of similarity (correlation) between nearby observations (in both time and space)

Temporal

The most recent event is related to the event before and after it

set.seed(1234)

randDat <- data.frame(x = sample(1:1000, 20))

randDat %>% 
  mutate(x_1 = lead(x)) %>% 
  filter(!is.na(x_1)) -> randDat


ggplot(data = randDat) + 
  geom_point(aes(x = x, y = x_1), size = 2) +
  labs(y = "x + 1") +
  theme_classic()

cor(randDat$x, randDat$x_1)
## [1] 0.112907
cor.test(randDat$x, randDat$x_1)
## 
##  Pearson's product-moment correlation
## 
## data:  randDat$x and randDat$x_1
## t = 0.46852, df = 17, p-value = 0.6454
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3597515  0.5394514
## sample estimates:
##      cor 
## 0.112907
 randDat %>% 
  arrange(x) %>% 
  mutate(x_1a = lead(x)) %>% 
  filter(!is.na(x_1a)) -> randDat

ggplot(data = randDat) + 
  geom_point(aes(x = x, y = x_1a), size = 2) +
  labs(y = "x + 1") +
  theme_classic()

cor(randDat$x, randDat$x_1a)
## [1] 0.9794674
cor.test(randDat$x, randDat$x_1a)
## 
##  Pearson's product-moment correlation
## 
## data:  randDat$x and randDat$x_1a
## t = 19.434, df = 16, p-value = 1.489e-12
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9445037 0.9924884
## sample estimates:
##       cor 
## 0.9794674
acf(randDat$x_1)

Spatial

Spatial autocorrelation is an extension of temporal autocorrelation. Proximal points in space are more similar than distant points.

Measures of spatial autocorrelation describe the degree that two observations (whether they are points, areas, or raster cells) are similar to each other.

Spatial autocorrelation in a variable can be exogenous ( caused by another spatially autocorrelated variable like rainfall) or endogenous (caused by the process at play, like spread of disease)

p <- shapefile(system.file("external/lux.shp", package="raster"))

p@data$id <- rownames(p@data)
p.points <- ggplot2::fortify(p, region = "id")
head(p.points)
##       long      lat order  hole piece id group
## 1 6.026519 50.17767     1 FALSE     1  0   0.1
## 2 6.031361 50.16563     2 FALSE     1  0   0.1
## 3 6.035646 50.16410     3 FALSE     1  0   0.1
## 4 6.042747 50.16157     4 FALSE     1  0   0.1
## 5 6.043894 50.16116     5 FALSE     1  0   0.1
## 6 6.048243 50.16008     6 FALSE     1  0   0.1
p.df <- full_join(p.points, p@data,by = "id")

ggplot(data = p.df, aes(x = long, y = lat, group = group, fill = NAME_1)) +
  geom_polygon() +
  geom_path(color = "white") + 
  coord_equal() +
  scale_fill_brewer(palette = "Set1") + 
  theme_minimal() + 
  theme(legend.position = "bottom")

Diekirch <- p[p$NAME_1 == "Diekirch",]
Diekirch$value <- c(10,6,4,11,6)
data.frame(Diekirch)
##   ID_1   NAME_1 ID_2   NAME_2 AREA id value
## 0    1 Diekirch    1 Clervaux  312  0    10
## 1    1 Diekirch    2 Diekirch  218  1     6
## 2    1 Diekirch    3  Redange  259  2     4
## 3    1 Diekirch    4  Vianden   76  3    11
## 4    1 Diekirch    5    Wiltz  263  4     6
Diekirch.points <- ggplot2::fortify(Diekirch, region = "id")
Diekirch.df <- full_join(Diekirch.points, Diekirch@data,by = "id")


ggplot(data = Diekirch.df, aes(x = long, y = lat, group = group, fill = value)) +
  geom_polygon() +
  geom_path(color = "white") + 
  coord_equal() +
  scale_fill_viridis() + 
  theme_minimal() + 
  theme(legend.position = "bottom")

Adjacent polygons

w <- spdep::poly2nb(Diekirch)
summary(w)
## Neighbour list object:
## Number of regions: 5 
## Number of nonzero links: 14 
## Percentage nonzero weights: 56 
## Average number of links: 2.8 
## Link number distribution:
## 
## 2 3 4 
## 2 2 1 
## 2 least connected regions:
## 2 3 with 2 links
## 1 most connected region:
## 1 with 4 links

Compute Morans I

Index of spatial autocorrelation

ww <- spdep::nb2listw(w,style = "B")
ww
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 5 
## Number of nonzero links: 14 
## Percentage nonzero weights: 56 
## Average number of links: 2.8 
## 
## Weights style: B 
## Weights constants summary:
##   n nn S0 S1  S2
## B 5 25 14 28 168
spdep::moran(Diekirch$value, ww, n= length(ww$neighbours), S0 = spdep::Szero(ww))
## $I
## [1] 0.1728896
## 
## $K
## [1] 1.432464
moran.test(Diekirch$value, ww, randomisation = FALSE)
## 
##  Moran I test under normality
## 
## data:  Diekirch$value  
## weights: ww    
## 
## Moran I statistic standard deviate = 2.3372, p-value = 0.009714
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##         0.1728896        -0.2500000         0.0327381
moran.mc(Diekirch$value, ww, nsim =99)
## 
##  Monte-Carlo simulation of Moran I
## 
## data:  Diekirch$value 
## weights: ww  
## number of simulations + 1: 100 
## 
## statistic = 0.17289, observed rank = 99, p-value = 0.01
## alternative hypothesis: greater
spdata <- read.table("https://stats.idre.ucla.edu/stat/r/faq/thick.csv", header = TRUE, sep = ",")

ggplot(data = spdata) + 
  geom_point(aes(x = east, y = north, color = thick), size = 2) + 
  theme_classic()

coords <- as.matrix(spdata[,c("east","north")])

thicknb<-knn2nb(knearneigh(coords, k = 5), row.names = spdata$X)

moran.test(spdata$thick, nb2listw(thicknb))
## 
##  Moran I test under randomisation
## 
## data:  spdata$thick  
## weights: nb2listw(thicknb)    
## 
## Moran I statistic standard deviate = 12.782, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.834666523      -0.013513514       0.004402973
cGram<-ncf::spline.correlog(x = spdata$east,
                           y = spdata$north,
                           z = spdata$thick,
                           resamp = 100,
                           quiet = TRUE)
## 10  of  100 
20  of  100 
30  of  100 
40  of  100 
50  of  100 
60  of  100 
70  of  100 
80  of  100 
90  of  100 
100  of  100 
summary(cGram)
## $call
## [1] "ncf::spline.correlog(x = spdata$east, y = spdata$north, z = spdata$thick, "
## [2] "    resamp = 100, quiet = TRUE)"                                           
## 
## $estimate
##                 x        e        y
## estimate 32.68189 23.12038 1.143257
## 
## $quantiles
##              x         e         y
## 0     27.14400  8.823383 0.4831635
## 0.025 27.99162 11.985492 0.5687580
## 0.25  31.22288 18.995407 0.9712692
## 0.5   32.70983 23.565976 1.0954065
## 0.75  34.37152 25.759070 1.3585538
## 0.975 38.57292 30.018873 1.7975420
## 1     38.76838 30.685757 2.2687780
plot(cGram)

mod_1 <- lme(fixed = thick ~ soil,
             random = ~ 1|dummy,
             method = "ML",
             data = spdata)
summary(mod_1)
## Linear mixed-effects model fit by maximum likelihood
##  Data: spdata 
##        AIC      BIC    logLik
##   342.3182 351.5881 -167.1591
## 
## Random effects:
##  Formula: ~1 | dummy
##           (Intercept) Residual
## StdDev: 0.00004815352 2.247569
## 
## Fixed effects: thick ~ soil 
##                Value Std.Error DF   t-value p-value
## (Intercept) 31.94203 3.1569891 73 10.117878  0.0000
## soil         2.25521 0.8655887 73  2.605407  0.0111
##  Correlation: 
##      (Intr)
## soil -0.997
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.68798974 -0.53279498  0.03896491  0.66007203  2.20612991 
## 
## Number of Observations: 75
## Number of Groups: 1
mod_2 <- lme(fixed = thick ~ soil,
             random = ~ 1|dummy,
             correlation = corGaus(1, form = ~ east + north),
             method = "ML",
             data = spdata)
summary(mod_2)
## Linear mixed-effects model fit by maximum likelihood
##  Data: spdata 
##        AIC      BIC    logLik
##   91.50733 103.0948 -40.75366
## 
## Random effects:
##  Formula: ~1 | dummy
##           (Intercept) Residual
## StdDev: 0.00008155645 2.088385
## 
## Correlation Structure: Gaussian spatial correlation
##  Formula: ~east + north | dummy 
##  Parameter estimate(s):
##    range 
## 20.43725 
## Fixed effects: thick ~ soil 
##                Value Std.Error DF  t-value p-value
## (Intercept) 40.32797 0.5877688 73 68.61196  0.0000
## soil         0.00348 0.0160363 73  0.21693  0.8289
##  Correlation: 
##      (Intr)
## soil -0.102
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -2.9882502 -0.7133769 -0.1146244  0.6745689  2.0877371 
## 
## Number of Observations: 75
## Number of Groups: 1
bbmle::AICtab(mod_1, mod_2)
##       dAIC  df
## mod_2   0.0 5 
## mod_1 250.8 4

Time Series Analysis

April 27, 2018
Time Series Data Seminar

Dose Response

April 27, 2018
Dose Response Data Seminar

Partial Least Squares Regression

April 13, 2018
Refelectance Partial Least Squares Regression Data Seminar
comments powered by Disqus