This note is a high-fidelity Markdown migration of the Dependent Data: Time Series and Spatial Statistics chapter from the LaTeX source.
Parent map: index Prerequisites: probability-and-mathstats, linear-regression, maximum-likelihood-and-machine-learning
Concept map
flowchart TD A[Dependent Data] --> B[Time Series] B --> C[Stationarity] C --> D[Ergodicity] B --> E[AR MA ARMA] B --> F[Unit Root] F --> G[Cointegration] B --> H[HAC Inference] A --> I[Spatial Statistics] I --> J[Kriging] I --> K[Spatial Autocorrelation] K --> L[Variogram] I --> M[Spatial Regression] I --> N[GMRF GP CAR]
Dependent Data: Time series and spatial statistics
Time Series
A time series is a sequence of data points observed over time. In a random sample, points are iid, so the joint distribution . In time series, this is clearly violated, since observations that are temporally close to each other tend to be more similar.
A stochastic process is a sequence of random variables indexed by elements in a set of indices . Hypothetical repeated realisations of a stochastic process look like
The index set may be either countable, in which case we get a discrete time process or uncountable, in which case we get a continuous time process.
State Space
We assume a set . Then, is called the State Space of the stochastic process.
Consider a random process and an increasing sequence of information sets i.e. collection of fields s.t. . If belongs to the information set and is absolutely integrable [i.e. ], and then is called a martingale. In words, the conditional expected value of the next observation, given all the past observations, is equal to the most recent observation.
The autocovariance of is the covariance between and its lagged value
the variance-covariance matrix of has Toeplitz form:
the order correlation coefficient .
A random process is said to be stationary if the distribution functions of and are the same .
A process is said to be covariance (or weakly) stationary if
i.e. neither the mean nor the autocovariances depend on the date ; stationary expectation, variance, and covariance. Most relevant variables aren’t stationary, but their detrended or first-differenced versions may be.
If is a Markov Process,
that is, the conditional distribution of given does not depend on .
Markov Chain
A Markov chain is simply a Markov process in which the state-space is a countable set. Since a Markov chain is a markov process, the conditional distribution of depends only on . The conditional distribution is often represented by a Transition matrix where
If is the same , we say the Markov chain has stationary transition probabilities.
A stationary process is ergodic if any two variables positioned far apart in the sequence are almost independently distributed.
is ergodic if, for any two bounded functions in variables and in variables,
i.e.
Sufficient condition for ergodicity is be covariance stationary and
Ergodic processes have the following property
this result implies that
This permits us to swap s for s and derive Asymptotic theory with dependent observations, such as LLN and CLT.
A family of r.v.s indexed by a continuous variable over is a Brownian Motion iff
over an arbitrary collection of disjoint intervals are independent r.v.s
White noise is a sequence whose elements have mean zero and variance , and for which ‘s are uncorrelated over time
A moving average of order , is a weighted average of the most recent values of a white noise defined as
An autoregressive process of order , is given by as a linear combination of lags of itself and one white noise
ARMA(p, q) combines AR(p) and MA(q)
Consider AR(1): . Since this holds at , it holds at . Substitute into original to get . Repeat ad infinitum to obtain, as long as
In other words, AR(1) MA() ; they are different representations of the same underlying stochastic process.
Wold Representation: All covariance-stationary time series processes can be represented by / decomposed into a deterministic component and a
In a stationary process, , which is seldom true. A less restrictive assumption that allows for nonstationarity is to specify the mean as a function of time.
A random walk () is a process such that .
= AR(1) process with Unit Root. Rewrite as
Random walk with drift
For the following model = AR(1)
test . Distribution of under the null is non-standard: CLT not valid. test to use: Dickey Fuller, Augmented Dickey Fuller, Phillips-Perron.
Let . and are said to be cointegrated if . For example, let
where is white noise. Then, , but , with cointegration vector .
decomposes an observed time series into a trend and a stationary component so that the trend minimises
is a tuning parameter. In quarterly data, .
Regression with time series
Basic assumption in conventional OLS with time series is . Equivalently, where . The second classical assumption is .
is called autocorrelation. Fix: Newey-West HAC consistent variance estimator ‘meat’
with variance estimated the normal way
Consider the model
Subtract and add to the l.h.s. we get
and
where is the long run effect
A Quandt Likelihood ratio test begins with no knowledge of when the trend break occurs [although researchers typically know of the timing for substantive reasons], and sequentially estimates the following model
where is the first difference of the outcome, and is an indicator variable equal to zero for all years before and one for all subsequent years. The researcher varies and tests the null that , and the largest F-statistic is used to determine the best possible break point. Use Andrews (2003) critical values to account for multiple-testing.
Spatial Statistics
A spatial stochastic process is a collection of random variables indexed by location : , where is either a continuous surface of a finite set of discrete locations.
For each location , is a random variable, and thus needs to be modeled. Basic approach is to assume exist, and decompose
mean function and stochastic error process .
Kriging - modeling
Main reference: Christensen (2019, ch. 8).
Assume linear structure for . known functions of , s.t.
A special case of this is the Ordinary Kriging model where
for unknown . The most basic model is Simple Kriging where
with known .
Assume the universal kriging model holds, we have data on locations , and that we wish to predict the value of . The model can be written
Let
The best linear unbiased predictor of is
where and .
Spatial Autocorrelation: Modelling
Spatial Autocorrelation is expressed as
Covariance is often modelled in terms of an unknown parameter , in which case we write . Assumptions made about include:
- second-order stationarity,
- strict stationarity,
- intrinsic stationarity,
- increment stationarity,
- isotropy.
Covariance functions can be modelled in three basic ways:
- specify a functional form for the stochastic process generating , and derive covariance from that process,
- model covariance directly as a function of a small number of parameters,
- leave covariance unspecified and estimate nonparametrically.
A process is strictly stationary if for all , locations , Borel sets , and shifts ,
This implies translation invariance of the joint law. In particular:
If, in addition, the finite-dimensional distributions are multivariate Gaussian, the process is a Gaussian Process.
Second-order (weak) stationarity imposes the same constant mean and covariance depending only on distance, but does not require full strict stationarity.
Increment-stationarity requires invariant increment laws:
Brownian motion is increment-stationary but not strictly stationary.
For increment-stationary processes, the semivariogram is
The variogram is . Under increment-stationarity:
An intrinsically-stationary process satisfies the constant-mean restriction and this semivariogram invariance condition. All second-order stationary processes are intrinsically stationary, but not vice versa.
For a linear model, stipulate a nonnegative definate weighting matrix, and fit
to obtain residuals . For any vector , there is a finite number of pairs of observations for which . For each of these pairs, list the corresponding residual pairs, . If , the traditional empirical covariance estimator is
The traditional empirical semivariogram estimator in ordinary kriging (no covariates) is
A second-order stationary process is said to be isotropic if
An intrinsically stationary process is isotropic if
where is a weight matrix. a spatial lag for
A parsimonious specification of a small number of parameters for the covariance matrix is typically presumed.
where are residuals, is the error variance, is the distance between , and is a distance decay function such that and , with being a vector.
The generalised Moran’s I is a weighted, scaled cross-product
Its expected value is .
A test for Moran’s I involves shuffling the locations of points and computing times. This produces a randomization distribution under .
A Monte-carlo P-value is
Spatial Linear Regression
A simple spatial regression is
the solution is
Its reduced form is
The spatial lag term induces correlation between the error and explanatory variables, and thus must be treated as an endogenous variable.
A spatial error model is simply an linear model with a non-spherical but typically parametric structure in the error covariance matrix.
A covariance function decomposes into a systematic part and idiosyncratic noise as follows
where is a correlation function, is the distance between points .
Kelly recommends using a Whittle-Matern function defined next. These parameters can be fitted on the error distribution to estimate the covariance matrix.
A covariance function describes the joint variability between a stochastic process at two locations and . This covariance function is vital in spatial prediction. The fields package includes common parametric covariance families (e.g. exponential and Matern) as well as nonparametric models (e.g. radial and tensor basis functions).
When modeling we are often forced to make simplifying assumptions.
Stationarity assumes we can represent the covariance function as
for some function where .
- Isotropy assumes we can represent the covariance function as
for some function where is a vector norm.
Exponential :
Matern:
where is a modified Bessel function of the second kind, of order
Matern covariance depends on , while exponential depends on , where
: is the range of the process at which observations become uncorrelated
: marginal variance / ‘sil’
: small scale variation such as measurement error
: smoothness
Here, is a weight matrix (typically row-standardised), so is a spatial lag. In spatial econometrics, the above form nests many popular regressions
Spatially Autoregressive (SAR) Model :
Spatially lagged :
Spatial Durbin Model :
Spatial Error model :
In the Social Interactions literature (e.g., Manski 1993), the above expression is written in the form of conditional expectations
in practice, the expectations are replaced with empirical counterparts and so on, so the estimation steps are isomorphic.
Define unobservables as , and assume they are uncorrelated with observables ; that is, there is no sorting and no omitted spatial variables. Then, we can write
Premultiplying by gives
This shows that is correlated with , i.e. , and least square estimates of the above regression are biased.
If we assume is idempotent (by constructing a block-diagonal, transitive matrix), we can simplify the above expression to
In summary, cannot be separately identified from the composite parameters . This is the reflection problem discussed by Manski (1993).
Spatial Modelling
Based on Rue and Held (2005) and lecture notes.
are conditionally independent given if, for a given value of , learning gives one no additional information about . The density representation is therefore
which is a simplification of the general representation.
for some functions , and
This can be re-expressed as
So, for ,
In addition to the conditional distribution, also assume the marginal distribution of , which is the stationary distribution of this process. Then, the join distribution of is
where is a precision matrix of the form
This tridiagonal form is due to the fact that if given the rest of the sequence. This is generally true for any GMRF: .
While the conditional independence structure is readily apparent from the precision matrix, it isn’t evident in the covariance matrix , which is completely dense with entries
Entries of the covariance matrix only give direct information about the marginal dependence structure, not the conditional one.
A spatial process is said to follow a Gaussian Process if any realisation at the finite number of locations follows an variate Gaussian. More precisely, let denote a mean function returning a mean at location (typically assumed to be linear in covariates ) and denote a covariance function. Then, follows a spatial Gaussian process, and has a density
Where is the mean vector and is the covariance matrix. Evaluating this density requires operations and memory, which means it does not scale well with large datasets. See Heaton et al. (2019) for an overview of alternatives.
Let be associated with some property of points (typically location), with no natural ordering of the indices. The joint density of a zero-mean GMRF is specified by each of the full-conditionals
these are called CAR models. The associated precision matrix is
which is symmetric and positive-definite.
A random vector is called a GMRF wrt a labelled graph with mean and precision matrix iff its density has the form
and . If is completely dense, is completely connected. In spatial settings, is typically sparse [depending on how neighbours are defined.]
Key summary quantities
and
Let be a GMRF wrt . The following are equivalent
Pairwise Markov Property:
Local Markov Property;
Global Markov: for disjoint sets where separates and are nonempty.
let the spatial process at location be
where collects a vectors of covariates for site , and is a p-vector of coefficients. Spatial dependence can be imposed by modelling as a zero-mean stationary Gaussian Process. Distributionally, this implies that for any , if we let , and be the parameters of the model
where is the covariance matrix of a n-dimensional normal density. We need to be Symmetric, PD for this distribution to be proper.
Special cases:
Exponential Covariance Matrix: , where the th element of . The ‘nugget’ is the variance of the non-spatial error, dictates the scale, and dictates the range of the spatial dependence.
Matern Covariance: for distance .
where is a modified Bessel function of order .
Specifying directly can be awkward when dealing with irregular spatial data [i.e. every real use case].
So, random effects are modelled conditionally. Let denote the vector of excluding . Model in terms of its full-conditional.
where describes the neighbourhood structure.
Besag (1974) proved that if is symmetric PD, with in the diagonals and in the off-diagonals. . Simplest version assumes common precision parameter .
Intrinsic GMRF: . When for neighbours (i.e. adjacency matrix instead of distances), it simplifies further to
Let and be two spatial processes on . Assume s are conditional independent given random effects , and that follow some common distributional form, and
Let for some known link function e.g. for logit. Assume linear form for projection
. Spatial dependence via , where is often Matern.