This note is a high-fidelity Markdown migration of the Mathematical Background appendix from the LaTeX source.
Parent map: index Related chapters: probability-and-mathstats, linear-regression, maximum-likelihood-and-machine-learning
Concept map
flowchart TD A[Proof Techniques] --> B[Set Theory] B --> C[Analysis and Topology] C --> D[Measure and Integration] D --> E[Probability Theory] C --> F[Linear Algebra] F --> G[Function Spaces] G --> H[Calculus and Optimisation]
Mathematical Background
Proof Techniques
-
Direct Proof / modus ponens: If is a true statement and is a true conditional statement, then is a true statement. Direct proofs typically involve backwards-forwards reasoning - take all statements that follow from that might relate to , list them in . Then, take all statements that follow from , list them in . Then, look for statements that have a straightforward proof, and write proof of the form .
-
Contrapositive: Since every conditional statement is equivalent to its contrapositive, proving is equivalent to proving .
-
Proof by contradiction: Assume is true, and assume is false [i.e. is true], and show that (using and other possible intermediate results) where is known to be false. Conclude that must be false, so must be true, and we have proved that .
-
Induction [only applies to statements pertaining to well ordered sets]
-
Assume a base case - is a true statement
-
Prove whenever is true, is true
-
Therefore is true for every
-
Set Theory
A set is a collection of objects. E.g. .
Set operations:
-
Intersection :
-
Union :
-
Difference:
-
cartesian product:
Definition. Set of all subsets of S is itself a set. Denoted as
Relations
Given two sets and , any subset of their Cartesian product is called a binary relation. For any pair of elements .

Properties of binary relations:
-
reflexive
-
transitive if
-
symmetric If
-
antisymmetric If
-
asymmetric if
-
complete if either or or both
Definition. An equivalence relation on a set is a relation that is reflexive, transitive, and symmetric. Given an equivalence relation , the set of elements that are related to a given element :
is called the equivalence class of . e.g. Indifference preference relation is an equivalence relation, but the preference relation is not because it isn’t symmetric.
Definition. A relation that is reflexive and transitive but not symmetric is called an order relation: . This is also called a weak order. is not an order relation because it is not reflexive (and is called a strong order). Every order relation also induces an equivalence relation:
An ordered set consists of a set together with an order relation defined on X.
Intervals and Contour Sets
Given an ordered set and two elements , we can define
-
The open interval : set of all elements strictly between and .
-
The closed interval : set of all elements between and
Analogously, for arbitrary ordered sets, we can define
-
Upper contour set : set of all elements that follow or dominate a
-
Lower contour set : set of all elements that precede in the order
A partial order is a relation that is reflexive, transitive, and antisymmetric.
Definition. The join of a partially ordered set is the supremum and is denoted . is sometimes written .
The meet of a poset is the infimum and is denoted . is sometimes written .
Algebra
Definition. A set and an operation defined on . Then is called a group if the following conditions hold:
-
Closure of under :
-
Associativity:
-
Neutral element:
-
Inverse element: .
-
if additionally , then is an Abelian/Commutative group
are all groups
Definition. A real valued vector space is a vector space with two operations
Where
-
is an Abelian Group
-
Distributivity
-
-
Associativity :
-
Neutral Item wrt outer operation:
Analysis and Topology
Preliminaries:
Vectors : , where
Metric Spaces
Definition.
Requirements for a metric(e.g. ):
-
: a point is at zero distance from itself
-
: distance is symmetric
-
: triangle inequality
We can generalise this definition to arbitrary nonempty sets .
Definition. A metric space is a nonempty set and a metric of distance
For example, is a metric space. Many additional metric spaces in are generated by a norm.
Definition. A norm on is a mapping satisfying
-
Nonnegativity:
-
Non degeneracy:
-
Homogeneity:
-
Triangle Inequality:
Each norm on generates a metric on via .
E.g. generates Euclidean distance .
The pair consisting of a vector space together with a norm is called a normed linear space.
Definition. A Banach space is a normed linear space that is a complete (in the Cauchy-convergence sense) metric space with respect to the metric derived from its norm.
Definition. Also known as Minkowski Norm
A class of norms that includes as a special case is the norm defined by
norms give rise to a class of metric spaces where .
Examples:
-
1: Taxicab
-
2: Euclidian
-
: Chebychev
Definition. Frobinius Norm of a matrix is
Definition. , denotes the set of real numbers satisfying . or denotes a strict inequality (i.e. closed from above or below).
If is bounded from above, .. Then is the least upper bound or supremum of . If is not bounded from above, we write .
Similarly, the greatest lower bound of a set or infimum is denoted
Definition. A sequence is denoted by or when the range of the indices is clear.
Let be an infinite sequence of real numbers and and . Then, is the .
If is not Bounded from above, .
Definition. A sequence in a metric space is said to be a Cauchy sequence if, whenever (intuitively, points in a Cauchy sequence get tighter together).
Let be a sequence of vectors in . Suppose for any . Then, has a limit.
More basic definition: if
Sequences
Let be a metric space. A sequence is said to converge to if .
Theorem. A sequence in can have at most one limit
Definition. centered on with radius is the set
Set Definitions
Definition. A subset of is called bounded if for some and some suitably large (intuition - some arbitrarily large ball can fit inside it).
A sequence in is called bounded if its range is a bounded set.
Definition. A set is closed IFF for every convergent sequence contained in , the limit of the sequence is also in .
A closed set contains all its limit points. That is , if is a convergent sequence of points in , then is in S as well.
Definition. A subset of an arbitrary metric space is open iff its complement is closed, and closed iff its complement is open.
A set is called open if, $\forall x \in S \exists \epsilon
0 \suchthat y \in B(\epsilon, x), \rho(x,y) < \epsilon$ is in S.
If is a closed, bounded subset of , then .
-
A set is open iff its complement is closed.
-
the union of any number of open sets is open
-
the intersection of a finite number of open sets is open.
-
the intersection of any number of closed sets is closed
-
the union of a finite number of closed sets is closed.
Definition. A point is called an interior point of if the set is contained in for all sufficiently small. A point is called a boundary point if is non-empty for all sufficiently small. The set of all boundary points in is denoted by .
The closure of a set is the set combined with all points that are the limits of sequence of points in .
Definition. A subset is said to be complete iff every cauchy sequence in A converges to some point in .
Definition. The set is called compact if every sequence contained in has a subsequence that converges to a point in .
Definition. A set is called convex if, and , we have . (i.e. all convex combinations of two points in a set are also in the set).
Theorem. Every bounded sequence in Euclidean space has at least one convergent subsequence.
Theorem. A subset of is precompact in that same space iff it is bounded and compact.
IOW : Compact Closed Bounded
Theorem. All metrics on induced by a norm are equivalent.
Functions
A function from set to , written as or is a rule associating every element in A to one and only one element in . The point is also written as , and is called the image of under . For , the set is the set of all points in A that map into D under F, and is called the preimage of D under F.
a function is called
-
injective / one-to-one if distinct elements of are always mapped into distinct elements of
-
surjective / onto if every element of B is the image under of at least one point in
-
bijective if a function is both injective and surjective
Definition. A real valued function on is continuous at point if .
Equivalently,
A function is said to be continuous on the set if, . Equivalently, in , we require the sequence of points that converge to to be entirely in .
-
The sum of two continuous functions is continuous
-
The product of two continuous functions is continuous
-
The quotient of two continuous functions is continuous at any point where the denominator is nonzero
Definition.
Definition. Given two metric spaces , , a function is called Lipschitz continuous if ,
such a is referred to as a Lipschitz constant for the function .
A Real valued function is Lipschitz if such that
This limits how fast a function can change. Every function that has bounded first derivatives is Lipschitz continuous. A differentiable function is Lipschitz if and only if it has a bounded derivative.
Definition. A function defined on is said to be Holder of order if such that
this is also called Uniform Lipschitz.
Theorem. A function f is continuous iff the preimage of every open set is open in .
Definition. is continuous if , .
Theorem. Let function f , where are metric spaces and is continuous. If is compact, then so is , the image of under .
Result.
Beta function :
Theorem. Let , where (an arbitrary metric space). If is continuous and is compact, then attains its supremum and infimum on .
In case of continuous functions on compact domains, optima always exist.
Definition. The function is differentiable at if , i.e. has a limit as .
The derivative of at is this limit and is denoted or
Theorem. Differentiability Continuity Limit i.e. not all functions with limits are continuous, not all continuous functions are differentiable.
More generally,
Continuously Differentiable Lipschitz Continuous Holder Continuous Uniformly Continuous Continuous
Theorem.
-
Linearity: are differentiable at , then and are differentiable at with and .
-
Chain Rule: If and are differentiable, then is differentiable and .
Theorem. Let , is continuous and differentiable. .
Theorem. , is continuous and differentiable. Then,
Definition. The epigraph of a function is (i.e. area above the function).
Definition. Let . Then,
-
F is convex if . : i.e. the epigraph of is a convex set.
-
F is concave if .
Fixed Points
Definition. Let , where is any set. An is called a fixed point of on S if .
If , then fixed points of are those points in where meets the 45 degree line.
Theorem. Consider the space , where is the metric induced by any norm. Let , and let . If is continuous and is both compact and convex, then has at least one fixed point in S.
Definition. Let be a metric space. is a map. it is called
-
nonexpansive if
-
contracting if
-
uniformly contracting with modulus if
Theorem. Let , where is a complete metric space. If is a uniform contraction on with modulus , then has a unique fixed point . Moreover for every and , we have
Measure
Definition. A -algebra (also -field) is a collection of subsets of that
-
: includes itself and the null set
-
: is closed under complement
-
: is closed under countable unions.
This is effectively the definition of the event space for a sample space .
Definition. A measure on a set assigns a nonnegative value to many subsets of . For a collection subsets of , a measure is a map
Given is a measure of the ‘size’ of set .
A function on a field of is a measure of
-
Null empty-set:
-
Non-Negativity:
-
Countable Additivity: If are disjoint elements of (i.e. ),
Existence ensured by Caratheodory’s Extension Theorem.
Examples of measures :
-
If is countable, let number of points in A. This counting measure can be defined for any subset , then the field is the collection of all subsets of , the power set of .
-
If , define
Definition. Given a topology on , a Borel -field is a field generated by the family of open subsets of , i.e. the smallest field that contains all the open sets.
The Lebesgue measure of a set can be defined implicitly for any set B in a field called the Borel sets of . is the smallest field that contains all ‘rectangles’
Result. Suppose . We say that is a bounded interval if . Define , the smallest algebra that contains is denoted by and is called the Borelian algebra
Thus, a countable union of open intervals belongs to the Borelean -algebra. Since every open subset of can be written as a countable union of open intervals, it is therefore also a Borelean set. The closed subsets are Borelean since a closed set is the complement of an open set.
Definition. Basic problem: how to assign each subset of , i.e. each element of a real number that will represent its ‘size’.
With , is the length, area, or volume of , respectively. is a Lebesgue measure on .
Definition. If is a field of subsets of , the pair is called a measurable space, and if is a measure on , the triple is called a measure space.
Definition. A measure is called a probability measure if , and then the triple is called a probability space.
Definition. If is a measurable space and is a real-valued function on , if measureable if
for every Borel set .
Integration
An integral is a map assigning a number to a function, where the number is viewed as the area/volume ‘under’ the function. Given a measure space and a measureable function , an integral is a map from to number such that the following three properties hold
-
If , then
-
-
Definition. Suppose is a bounded function defined on . An increasing sequence defines a partition of the interval. The mesh size of the partition is defined to be
To each partition we associate two approximations of the area under the graph of , by the rules
these are called the upper and lower Riemann Sums For any partition, .
Definition. A bounded function defined on an interval is said to be Riemann integrable if
Suppose is piecewise continuous, defined on . Then, is Riemann integrable and
Theorem. If is continuous on then is differentiable on the open interval and . is called the anti-derivative.
Definition. Given a measurable space and a set , we define an indicator function defined by
This function is measurable.
Definition. Any function of the form , where is a finite partition of . A countable sum of measurable functions is measurable, which implies that is measurable. Then we define
Definition. For any measurable function , define and , which are also measurable. We also have and . When at least one of or is finite, we define the integral
When both and are finite, we say is integrable w.r.t. . Lebesgue integrals intuitively slice the function horizontally, while Riemann integrals slice vertically.
Probability Theory
Definition. The triple is a probability space if it satisfies the following
-
Unitarity:
-
Non Negativity:
-
Countable Additivity: If are pairwise disjoint[i.e. ], Then
Other properties for any event
Stated differently:
Definition. Let be a probability measure on a measurable space , so is a probability space. Sets are called events, points are called outcomes, and is called the probability of B.
Let be a measurable space. Let be a ‘set’ function mapping the algebra of subsets of into the real line. We say is a probability measure if, for events ,
-
: Events range from never happening to always happening
-
: Something must happen
-
: Nothing never happens
-
: A must either happen or not happen
-
: additivity for countable disjoint events
- Boole’s Inequality for any sequence of events
-
Monotonicity: for events
Definition. A measurable function (event space) is called a random variable. In other words, a random variable is a function from the sample space to the real line , and the probability of its value being in a given interval is well defined.
Definition. The probability measure living on such that for any ,
for Borel sets is called the distribution of . The notation is used to indicate that has distribution .
Definition. Map such that
for
Properties of CDF
-
Boundary property:
-
Nondecreasing: if
-
Right continuous:
Densities
Theorem. If a finite measure is absolutely continuous wrt a finite measure , then a nonnegative measurable function s.t.
The function in this theorem is called the Radon-Nikodym derivative of with respect to , or the density of with respect to , denoted
Definition. If a random variable has density wrt Lebesgue measure on , then or its distribution is called absolutely continuous with density . Then, from R-N ,
Using the fundamental theorem of calculus, can be found from the CDF by differentiation, .
Definition. Let be a countable subset of . The measure for borel sets is also called counting measure on . Then,
Suppose is a random variable s.t. . Then, is called a discrete random variable.
The density of w.r.t. satisfies
In particular, if , then , and so
The density is called the mass function for X.
Moments
Definition. If is a random variable on a probability space , then the expectation of (i.e. density ), is defined as
For discrete RV with for a countable set , if is counting measure on , and is the mass function given by ,
Definition. The variance of a random variable with finite expectation is defined as
If is absolutely continuous with density ,
If is discrete with mass function ,
Random vectors
If are random variables, then the function defined by
is called a random vector. The definitions above extend naturally to random vectors, e.g. the distribution of is
for Borel sets . The expectation of a random vector is the vector of expectations
A random vector is said to be absolutely continuous if the CDF can be written as
Definition. A matrix is called a random matrix if the entries are random variables.
Definition. The covariance of a random vector is the matrix of covariances of the variables in
If and is the transpose of the mean deviation, then
so
Product Measures and Independence
Let and be measure spaces. Then a unique product measure on such that . The -field is defined formally as the smallest field containing all sets with , .
Theorem. Integration against the product measure can be accomplished by iterated integration against and , in either order
Definition. Suppose are random variables.
are indepepndent for all Borel subsets of , it is true that
Conditional Expectations
Definition. Given two r.v.s with finite second moments, is defined as a measurable function such that
For continuous with pdf , For any , the conditional probability of the event is defined as a function satisfying
The conditional CDF is
where the conditional density is .
Definition. Let . Define the class of ‘Bounded Lipschitz functions’ with Lipschitz constant 1 as
Then the Bounded Lipschitz Distance is
and are said to be ‘close’ if is small.
Order Statistics
Theorem. Suppose are r.v.s with distribution . To each define . We want the distribution of .
If has a density, has a density too
More generally, the distribution function of is given by
Reference: Severini (2005), Chapter 7.
Result. has a distribution with density
Result. (Useful for Vickrey auctions)
Linear Functions and Linear Algebra
Linear Functions
Definition. A function between two linear spaces is linear if it preserves the linearity of sets and through the following properties ()
-
Additivity
-
Homogeneity
-
linear and bijective is called an isomorphism
-
A linear function that maps onto is called a linear functional.
-
A linear function that maps onto itself is called a linear operator / automorphism.
Every linear function mapping from a finite-dimensional domain can be represented by a matrix.
Theorem. Linear Functions Imply
-
Additivity : generalises to
-
Convex Functions generalises to
-
Quasiconvex functions
-
-
Homogeneity: generalises to
-
Homogeneous functions generalises to
-
Homothetic functions
-
Definition. An inner product on a vector space is a mapping that satisfies,
This generalises to an inner product of two functionals where
An inner product defines a norm . This gives us a restatement of the Cauchy-Schwartz inequality
Definition.
-
(Orthogonality). Furthermore, if , then they are said to be orthonormal.
-
x parallel to y
Definition. is symmetric, positive definite if > 0.
Definition. For conformable matrices and with identical dimensions, the element-wise product
is called the Hadamard product.
Definition. Euclidian norm of a vector is defined as
Theorem. Angle between and is given by
Theorem.
Definition.
For conformable matrices
Definition. For a square matrix , scalar and vector that satisfies constitute an eigenvalue and eigenvector respectively.
Definition. A square matrix is orthogonal iff its columns are orthonormal so that
Definition. For , we define the kernel/null space as
and the image/range
The dimension of the image is called the rank of . The dimension of the kernel is called the nullity of . If is finite dimensional,
This is the rank-nullity result
A linear function has full rank if .
Definition. Nonsingular Matrices
A matrix with columns is non-singular or one-to-one if
Projection
Definition. Let be a vector space and is a subspace of . A linear mapping is called a projection if . Since homeomorphisms can be expressed by a transformation matrix, projections can be represented as a projection matrix with the property . Projection matrices are always symmetric.
Result. We look at orthogonal projections of vectors onto lower dimensional subspaces with . Assume is an ordered basis of . Therefore, any projection . The problem, then, is to find coordinates of the projection (with respect to basis ) where given and . The solution is the familiar OLS coef vector
where is also called the pseudo-inverse of , which can be computed as long as is full rank.
The projection matrix is therefore
Definition. Any basis of an dimensional vector space can be transformed into an orthogonal/orthonormal basis where as follows
where the th basis vector is projected onto the subspace spanned by the first constructed orthogonal vectors .
This is the same as FWL Theorem, but older.
Matrix Decompositions
Definition. A square matrix admits to an eigen-decomposition if it can be factorised as where
-
is a matrix whose th column is the eigenvector of (orthogonal matrix)
-
is a diagonal matrix with corresponding eigenvalues
Theorem. If are orthogonal matrices
-
is also orthogonal
-
is orthogonal
-
Definition. If is positive definite, then it admits to
-
where is non-singular upper triangular
-
where is non-singular lower triangular
Definition. If is a matrix with full column rank, a factorisation where
-
is an orthogonal matrix
-
is upper triangular and nonsingular (invertible)
Result. is often numerically unstable, so we can define . The, the OLS estimate can be written as . The homoscedastic variance is .
Using the same decomposition, .
Definition. Any matrix may be written as
where is a orthogonal matrix, is a orthogonal matrix, and is a diagonal matrix with non-negative elements.
Result. For a square covariance matrix , if , then
where contains the square singular values. In other words,
Matrix Identities
For conformable matrices ,
Partitioned Matrices
Definition. It can be useful to partition a matrix as follows
Multiplying a partitioned matrix with a stacked vector
Theorem.
where
Function Spaces
Almost all of this section is based on Larry Wasserman’s notes: functionspaces.pdf, and Racine (2013).
Definition. Let be any set, let be the collection of all bounded functions s.t. (i.e. and let
Spaces of functions can be treated as linear vector spaces
Definition.
which leads to a norm for functions
Definition. An operator is a higher-order function that maps from one function to another. A derivative and integral are both operators.
Operators can have eigenvalues and eigenfunctions such that
is an eigenfunction for both differentiation and integration.
Definition. is a complete ( every Cauchy sequence in the space converges to a point in it), inner product space. Equivalently, it is a vector space endowed with an inner product and an associated norm and metric such that every Cauchy sequence has a limit in .
Intuitively, it means it doesn’t have any ‘holes’ in it ( is not a complete space because is missing from it).
Every Hilbert space is a Banach space but the reverse is not true in general. In a hilbert space, .
If is a hilbert space and is a closed subspace then called a projection of onto that minimises over . The set of elements orthogonal every is denoted . Every can be written as where is the projection of onto and .
Result. , the set of random variables defined on a common probability space is a Hilbert space with inner product , associated norm and metric .
Result. the space of Borel-measurable real functions on given density satisfying and associated norm and metric is a hilbert space.
Definition. If and are spaces such that every is orthogonal to every , then we define the orthogonal sum as
A set of vectors is orthonormal if when and . This is also called an orthonormal basis. Every hilbert space has an orthonormal basis. A Hilbert space is said to be separable if there exists a countable orthonormal basis.
spaces
Let be a collection of functions . The norm on is defined by
For , we define the sup norm .
The space is defined as
Every space is a Banach Space.
-
Cauchy Schwartz:
-
Minkowski : where
-
Holder: where
Result. Functions where are said to be square-integrable, and the space of square-integrable functions is called . Many familiar results from vector spaces carries over into .
is a Hilbertspace. The inner product between two functions is and the norm of is . With this inner product, is a separable Hilbert space; that is, we can find a countable orthonormal basis , that is , and . It follows that if ,
are the coefficients. Parseval’s identity .
The span of is
The projection of onto the span is , which we call the n-term linear approximation of .
Definition. A sequence of functions can be considered a basis. An orthonormal basis is one that admits to
Mononomials are a basis for on and , but they aren’t orthogonal.
Result. A popular basis for on are the sines and cosines, which may be written as , , . Coefficients in this expansion are referred to as the Fourier transform of the original function.
A cosine basis on is
Legendre basis on is
The Haar basis on [0,1] consists of functions where and
This is a doubly indexed set of functions so when is expanded in this basis we write where and The Haar basis is an example of a wavelet basis.
Definition. Let be a positive integer. Let The Holder space is the set of functions such that
The special case is sometimes called the Lipschitz space. If then we have
Roughly speaking, this means that the functions have bounded second derivatives.
Multivariate version
There is also a multivariate version of Holder spaces. Let . Given a vector define and
The Hölder class is the set of functions such that
for all and all such that .
Definition. A Sobolev space is a space of functions possessing sufficiently many derivatives for some application domain. Formally,
Let be integrable on every bounded interval. Then is weakly differentiable if there exists a function that is integrable on every bounded interval, such that whenever We call the weak derivative of Let denote the weak derivative of
The Sobolev space of order is defined by The Sobolev ball of order and radius is defined by
Definition. A Mercer kernel is a continuous function such that and such that is positive semidefinite, meaning that
for all finite sets of points and all real numbers The function
is an example of a Mercer kernel. The most commonly used kernel is the Gaussian kernel
Definition.
Given a kernel let be the function obtained by fixing the first coordinate. That is, For the Gaussian kernel, is a Normal, centered at We can create functions by taking liner combinations of the kernel:
Let denote all such functions: Given two such functions and we define an inner product In general, and might be representable in more than one way. You can check that is independent of how or is represented. The inner product defines a norm: where and is the matrix with
The Reproducing Property
.
Let Note the following crucial property: This follows from the definition of where we take This implies that This is called the reproducing property. It also implies that is the representer of the evaluation functional.
The completion of with respect to is denoted by and is called the RKHS generated by .
Evaluation Functionals. A key property of RKHS’s is the behavior of the evaluation functional. The evaluation functional assigns a real number to each function. It is defined by In general, the evaluation functional is not continuous. This means we can have but does not converge to For example, let and Then But which does not converge to Intuitively, this is because Hilbert spaces can contain very unsmooth functions.
But in an , the evaluation functional is continuous. Intuitively, this means that the functions in the space are well-behaved. To see this, suppose that Then
so the evaluation functional is continuous.
A Hilbert space is a RKHS if and only if the evaluation functionals are continuous.
Theorem. Let be a loss function depending on and on Let minimize where is any monotone increasing function. Then has the form for some
Calculus and Optimisation
Calculus
Definition. The derivative of a function at point , when defined, is the tangent to the function at .
Definition. For function , we define the
which collects the partial derivatives in a column vector.
The matrix of partial derivatives of is called the Hessian, denoted by
For a vector-valued function , we can construct the Jacobian, which collects all partial derivatives.
Theorem. admits to Taylor expansion around such that
For a function with multiple arguments , the second-order Taylor expansion around the point is
Theorem. Let have continuous first and second order partial derivatives in the neighbourhood of the optimum .
If and is positive definite, then is a local minimum. If and is negative definite, then is a local maximum.
Theorem. Let be nonnegative real numbers, and suppose maximises the Lagrangian
Then, maximises subject to constraints ()
Theorem. If is differentiable at and is invertible, then such that is bijective. Further, the inverse function is differentiable.
Theorem. Let be a domain and be a differentiable function. If , we’ll concatenate the two vectors and write .
Suppose , and . Then, differentiable function .
Further, . IoW, has a unique solution .
Theorem. Let be differentiable and consider the implicitly defined curve
(i.e. a level set of ). Pick , suppose . By IFT, we know the -coordinate of this curve can locally be expressed as a differentiable function of . Directly differentiating gives
Differentiation Rules
| Rule | ||
|---|---|---|
| Power rule | ||
| Exponential rule | ||
| Log rule | ||
| Linear rule | ||
| Product rule | ||
| Quotient rule | ||
| Chain rule |
Matrix Derivatives
Let , and be a conformable matrix
General Results from Optimisation Theory
References: Luenberger (1997), Rustagi (2014).
-
Projection Theorem - In , the shortest line from a point to the plane is furnished by the perpendicular from the point to the plane. Core idea carries through to higher dimensions and infinite-dimensional Hilbert Space
-
Hahn-Banach Theorem: given a sphere and a point not in the sphere, there exists a hyperplane separating the point and the sphere.
-
Duality: The shortest distance from a point to a convex set is equal to the maximum of the distances from the point to a hyperplane separating the point from the convex set.
-
Differentials: Set derivative of the objective function to zero.
Linear Programming
Maximise
subject to
where is the choice vector, is a given vector, is a known matrix of constants, and is a vector of constants.
Definition. By introducing slack variables , for every inequality with , we can convert every linear programming into its standard form
Definition. Primal
Dual
Theorem. A feasible solution to the primal is optimal IFF there exists a feasible solution to the dual problem such that
Dantzig’s Simplex method, Karmarkar’s Algorithm.
Nonlinear Optimisation
Saddle Point
Suppose we have and is a real valued function. Then, is a saddle-point of if
.
Definition.
where , is symmetric, positive definite, is a matrix of constraints, and .
Constrained Maximisation
Theorem. We want to maximise (implicitly defined by ). Suppose . If attains a constrained local maximum (or minimum) at on the surface , .
Generic problem of the form
Definition.
First, write
differentiating wrt yields FOCs
which gives us three (potentially nonlinear) equations with three unknowns , that can be solved simultaneously.
To check sufficiency, the second-order condition analogue is the determinant of the bordered hessian matrix
If , then it is negative definite, which implies that the that solves the system is indeed a local maximum.
Definition. The Hessian for of a [twice differentiable] function is defined by the matrix
-
If , is said to be negative semi-definite.
-
If , is said to be negative definite.
-
If , is said to be positive semi-definite.
-
If , is said to be positive definite.
Numerical Optimisation
Root-finding
We want to evaluate the roots of the equation
Assume the inverse of , denoted exists.
Finding the root of is equivalent to evaluating .
Canonical newton-raphson is
Quasi-Newton
General version of update rule:
Step length for both N-R and BHHH.
Definition. set
Update rule:
For Log-likelihood,
Definition. Uses Information-matrix equality. Set to be outer product of scores