This note is a high-fidelity Markdown migration of the Dependent Data: Time Series and Spatial Statistics chapter from the LaTeX source.

Parent map: index Prerequisites: probability-and-mathstats, linear-regression, maximum-likelihood-and-machine-learning

Concept map

flowchart TD
  A[Dependent Data] --> B[Time Series]
  B --> C[Stationarity]
  C --> D[Ergodicity]
  B --> E[AR MA ARMA]
  B --> F[Unit Root]
  F --> G[Cointegration]
  B --> H[HAC Inference]
  A --> I[Spatial Statistics]
  I --> J[Kriging]
  I --> K[Spatial Autocorrelation]
  K --> L[Variogram]
  I --> M[Spatial Regression]
  I --> N[GMRF GP CAR]

Dependent Data: Time series and spatial statistics

Time Series

A time series is a sequence of data points observed over time. In a random sample, points are iid, so the joint distribution . In time series, this is clearly violated, since observations that are temporally close to each other tend to be more similar.

A stochastic process is a sequence of random variables indexed by elements in a set of indices . Hypothetical repeated realisations of a stochastic process look like

The index set may be either countable, in which case we get a discrete time process or uncountable, in which case we get a continuous time process.

State Space

We assume a set . Then, is called the State Space of the stochastic process.

Consider a random process and an increasing sequence of information sets i.e. collection of fields s.t. . If belongs to the information set and is absolutely integrable [i.e. ], and then is called a martingale. In words, the conditional expected value of the next observation, given all the past observations, is equal to the most recent observation.

The autocovariance of is the covariance between and its lagged value

the variance-covariance matrix of has Toeplitz form:

the order correlation coefficient .

A random process is said to be stationary if the distribution functions of and are the same .

A process is said to be covariance (or weakly) stationary if

i.e. neither the mean nor the autocovariances depend on the date ; stationary expectation, variance, and covariance. Most relevant variables aren’t stationary, but their detrended or first-differenced versions may be.

If is a Markov Process,

that is, the conditional distribution of given does not depend on .

Markov Chain

A Markov chain is simply a Markov process in which the state-space is a countable set. Since a Markov chain is a markov process, the conditional distribution of depends only on . The conditional distribution is often represented by a Transition matrix where

If is the same , we say the Markov chain has stationary transition probabilities.

A stationary process is ergodic if any two variables positioned far apart in the sequence are almost independently distributed.

is ergodic if, for any two bounded functions in variables and in variables,

i.e.

Sufficient condition for ergodicity is be covariance stationary and

Ergodic processes have the following property

this result implies that

This permits us to swap s for s and derive Asymptotic theory with dependent observations, such as LLN and CLT.

A family of r.v.s indexed by a continuous variable over is a Brownian Motion iff

over an arbitrary collection of disjoint intervals are independent r.v.s

White noise is a sequence whose elements have mean zero and variance , and for which ‘s are uncorrelated over time

A moving average of order , is a weighted average of the most recent values of a white noise defined as

An autoregressive process of order , is given by as a linear combination of lags of itself and one white noise

ARMA(p, q) combines AR(p) and MA(q)

Consider AR(1): . Since this holds at , it holds at . Substitute into original to get . Repeat ad infinitum to obtain, as long as

In other words, AR(1) MA() ; they are different representations of the same underlying stochastic process.

Wold Representation: All covariance-stationary time series processes can be represented by / decomposed into a deterministic component and a

In a stationary process, , which is seldom true. A less restrictive assumption that allows for nonstationarity is to specify the mean as a function of time.

A random walk () is a process such that .

= AR(1) process with Unit Root. Rewrite as

Random walk with drift

For the following model = AR(1)

test . Distribution of under the null is non-standard: CLT not valid. test to use: Dickey Fuller, Augmented Dickey Fuller, Phillips-Perron.

Let . and are said to be cointegrated if . For example, let

where is white noise. Then, , but , with cointegration vector .

decomposes an observed time series into a trend and a stationary component so that the trend minimises

is a tuning parameter. In quarterly data, .

Regression with time series

Basic assumption in conventional OLS with time series is . Equivalently, where . The second classical assumption is .

is called autocorrelation. Fix: Newey-West HAC consistent variance estimator ‘meat’

with variance estimated the normal way

Consider the model

Subtract and add to the l.h.s. we get

and

where is the long run effect

A Quandt Likelihood ratio test begins with no knowledge of when the trend break occurs [although researchers typically know of the timing for substantive reasons], and sequentially estimates the following model

where is the first difference of the outcome, and is an indicator variable equal to zero for all years before and one for all subsequent years. The researcher varies and tests the null that , and the largest F-statistic is used to determine the best possible break point. Use Andrews (2003) critical values to account for multiple-testing.

Spatial Statistics

A spatial stochastic process is a collection of random variables indexed by location : , where is either a continuous surface of a finite set of discrete locations.

For each location , is a random variable, and thus needs to be modeled. Basic approach is to assume exist, and decompose

mean function and stochastic error process .

Kriging - modeling

Main reference: Christensen (2019, ch. 8).

Assume linear structure for . known functions of , s.t.

A special case of this is the Ordinary Kriging model where

for unknown . The most basic model is Simple Kriging where

with known .

Assume the universal kriging model holds, we have data on locations , and that we wish to predict the value of . The model can be written

Let

The best linear unbiased predictor of is

where and .

Spatial Autocorrelation: Modelling

Spatial Autocorrelation is expressed as

Covariance is often modelled in terms of an unknown parameter , in which case we write . Assumptions made about include:

  • second-order stationarity,
  • strict stationarity,
  • intrinsic stationarity,
  • increment stationarity,
  • isotropy.

Covariance functions can be modelled in three basic ways:

  • specify a functional form for the stochastic process generating , and derive covariance from that process,
  • model covariance directly as a function of a small number of parameters,
  • leave covariance unspecified and estimate nonparametrically.

A process is strictly stationary if for all , locations , Borel sets , and shifts ,

This implies translation invariance of the joint law. In particular:

If, in addition, the finite-dimensional distributions are multivariate Gaussian, the process is a Gaussian Process.

Second-order (weak) stationarity imposes the same constant mean and covariance depending only on distance, but does not require full strict stationarity.

Increment-stationarity requires invariant increment laws:

Brownian motion is increment-stationary but not strictly stationary.

For increment-stationary processes, the semivariogram is

The variogram is . Under increment-stationarity:

An intrinsically-stationary process satisfies the constant-mean restriction and this semivariogram invariance condition. All second-order stationary processes are intrinsically stationary, but not vice versa.

For a linear model, stipulate a nonnegative definate weighting matrix, and fit

to obtain residuals . For any vector , there is a finite number of pairs of observations for which . For each of these pairs, list the corresponding residual pairs, . If , the traditional empirical covariance estimator is

The traditional empirical semivariogram estimator in ordinary kriging (no covariates) is

A second-order stationary process is said to be isotropic if

An intrinsically stationary process is isotropic if

where is a weight matrix. a spatial lag for

A parsimonious specification of a small number of parameters for the covariance matrix is typically presumed.

where are residuals, is the error variance, is the distance between , and is a distance decay function such that and , with being a vector.

The generalised Moran’s I is a weighted, scaled cross-product

Its expected value is .

A test for Moran’s I involves shuffling the locations of points and computing times. This produces a randomization distribution under .

A Monte-carlo P-value is

Spatial Linear Regression

A simple spatial regression is

the solution is

Its reduced form is

The spatial lag term induces correlation between the error and explanatory variables, and thus must be treated as an endogenous variable.

A spatial error model is simply an linear model with a non-spherical but typically parametric structure in the error covariance matrix.

A covariance function decomposes into a systematic part and idiosyncratic noise as follows

where is a correlation function, is the distance between points .

Kelly recommends using a Whittle-Matern function defined next. These parameters can be fitted on the error distribution to estimate the covariance matrix.

A covariance function describes the joint variability between a stochastic process at two locations and . This covariance function is vital in spatial prediction. The fields package includes common parametric covariance families (e.g. exponential and Matern) as well as nonparametric models (e.g. radial and tensor basis functions).

When modeling we are often forced to make simplifying assumptions.

Stationarity assumes we can represent the covariance function as

for some function where .

  • Isotropy assumes we can represent the covariance function as

for some function where is a vector norm.

Exponential :

Matern:

where is a modified Bessel function of the second kind, of order

Matern covariance depends on , while exponential depends on , where

: is the range of the process at which observations become uncorrelated

: marginal variance / ‘sil’

: small scale variation such as measurement error

: smoothness

Here, is a weight matrix (typically row-standardised), so is a spatial lag. In spatial econometrics, the above form nests many popular regressions

Spatially Autoregressive (SAR) Model :

Spatially lagged :

Spatial Durbin Model :

Spatial Error model :

In the Social Interactions literature (e.g., Manski 1993), the above expression is written in the form of conditional expectations

in practice, the expectations are replaced with empirical counterparts and so on, so the estimation steps are isomorphic.

Define unobservables as , and assume they are uncorrelated with observables ; that is, there is no sorting and no omitted spatial variables. Then, we can write

Premultiplying by gives

This shows that is correlated with , i.e. , and least square estimates of the above regression are biased.

If we assume is idempotent (by constructing a block-diagonal, transitive matrix), we can simplify the above expression to

In summary, cannot be separately identified from the composite parameters . This is the reflection problem discussed by Manski (1993).

Spatial Modelling

Based on Rue and Held (2005) and lecture notes.

are conditionally independent given if, for a given value of , learning gives one no additional information about . The density representation is therefore

which is a simplification of the general representation.

for some functions , and

This can be re-expressed as

So, for ,

In addition to the conditional distribution, also assume the marginal distribution of , which is the stationary distribution of this process. Then, the join distribution of is

where is a precision matrix of the form

This tridiagonal form is due to the fact that if given the rest of the sequence. This is generally true for any GMRF: .

While the conditional independence structure is readily apparent from the precision matrix, it isn’t evident in the covariance matrix , which is completely dense with entries

Entries of the covariance matrix only give direct information about the marginal dependence structure, not the conditional one.

A spatial process is said to follow a Gaussian Process if any realisation at the finite number of locations follows an variate Gaussian. More precisely, let denote a mean function returning a mean at location (typically assumed to be linear in covariates ) and denote a covariance function. Then, follows a spatial Gaussian process, and has a density

Where is the mean vector and is the covariance matrix. Evaluating this density requires operations and memory, which means it does not scale well with large datasets. See Heaton et al. (2019) for an overview of alternatives.

Let be associated with some property of points (typically location), with no natural ordering of the indices. The joint density of a zero-mean GMRF is specified by each of the full-conditionals

these are called CAR models. The associated precision matrix is

which is symmetric and positive-definite.

A random vector is called a GMRF wrt a labelled graph with mean and precision matrix iff its density has the form

and . If is completely dense, is completely connected. In spatial settings, is typically sparse [depending on how neighbours are defined.]

Key summary quantities

and

Let be a GMRF wrt . The following are equivalent

Pairwise Markov Property:

Local Markov Property;

Global Markov: for disjoint sets where separates and are nonempty.

let the spatial process at location be

where collects a vectors of covariates for site , and is a p-vector of coefficients. Spatial dependence can be imposed by modelling as a zero-mean stationary Gaussian Process. Distributionally, this implies that for any , if we let , and be the parameters of the model

where is the covariance matrix of a n-dimensional normal density. We need to be Symmetric, PD for this distribution to be proper.

Special cases:

Exponential Covariance Matrix: , where the th element of . The ‘nugget’ is the variance of the non-spatial error, dictates the scale, and dictates the range of the spatial dependence.

Matern Covariance: for distance .

where is a modified Bessel function of order .

Specifying directly can be awkward when dealing with irregular spatial data [i.e. every real use case].

So, random effects are modelled conditionally. Let denote the vector of excluding . Model in terms of its full-conditional.

where describes the neighbourhood structure.

Besag (1974) proved that if is symmetric PD, with in the diagonals and in the off-diagonals. . Simplest version assumes common precision parameter .

Intrinsic GMRF: . When for neighbours (i.e. adjacency matrix instead of distances), it simplifies further to

Let and be two spatial processes on . Assume s are conditional independent given random effects , and that follow some common distributional form, and

Let for some known link function e.g. for logit. Assume linear form for projection

. Spatial dependence via , where is often Matern.