This note is a high-fidelity Markdown migration of the Mathematical Background appendix from the LaTeX source.

Parent map: index Related chapters: probability-and-mathstats, linear-regression, maximum-likelihood-and-machine-learning

Concept map

flowchart TD
  A[Proof Techniques] --> B[Set Theory]
  B --> C[Analysis and Topology]
  C --> D[Measure and Integration]
  D --> E[Probability Theory]
  C --> F[Linear Algebra]
  F --> G[Function Spaces]
  G --> H[Calculus and Optimisation]

Mathematical Background

Proof Techniques

  1. Direct Proof / modus ponens: If is a true statement and is a true conditional statement, then is a true statement. Direct proofs typically involve backwards-forwards reasoning - take all statements that follow from that might relate to , list them in . Then, take all statements that follow from , list them in . Then, look for statements that have a straightforward proof, and write proof of the form .

  2. Contrapositive: Since every conditional statement is equivalent to its contrapositive, proving is equivalent to proving .

  3. Proof by contradiction: Assume is true, and assume is false [i.e. is true], and show that (using and other possible intermediate results) where is known to be false. Conclude that must be false, so must be true, and we have proved that .

  4. Induction [only applies to statements pertaining to well ordered sets]

    • Assume a base case - is a true statement

    • Prove whenever is true, is true

    • Therefore is true for every

Set Theory

A set is a collection of objects. E.g. .

Set operations:

  • Intersection :

  • Union :

  • Difference:

  • cartesian product:

Definition. Set of all subsets of S is itself a set. Denoted as

Relations

Given two sets and , any subset of their Cartesian product is called a binary relation. For any pair of elements .

Types of Relations

Properties of binary relations:

  • reflexive

  • transitive if

  • symmetric If

  • antisymmetric If

  • asymmetric if

  • complete if either or or both

Definition. An equivalence relation on a set is a relation that is reflexive, transitive, and symmetric. Given an equivalence relation , the set of elements that are related to a given element :

is called the equivalence class of . e.g. Indifference preference relation is an equivalence relation, but the preference relation is not because it isn’t symmetric.

Definition. A relation that is reflexive and transitive but not symmetric is called an order relation: . This is also called a weak order. is not an order relation because it is not reflexive (and is called a strong order). Every order relation also induces an equivalence relation:

An ordered set consists of a set together with an order relation defined on X.

Intervals and Contour Sets

Given an ordered set and two elements , we can define

  • The open interval : set of all elements strictly between and .

  • The closed interval : set of all elements between and

Analogously, for arbitrary ordered sets, we can define

  • Upper contour set : set of all elements that follow or dominate a

  • Lower contour set : set of all elements that precede in the order

A partial order is a relation that is reflexive, transitive, and antisymmetric.

Definition. The join of a partially ordered set is the supremum and is denoted . is sometimes written .

The meet of a poset is the infimum and is denoted . is sometimes written .

Algebra

Definition. A set and an operation defined on . Then is called a group if the following conditions hold:

  1. Closure of under :

  2. Associativity:

  3. Neutral element:

  4. Inverse element: .

  5. if additionally , then is an Abelian/Commutative group

are all groups

Definition. A real valued vector space is a vector space with two operations

Where

  1. is an Abelian Group

  2. Distributivity

  3. Associativity :

  4. Neutral Item wrt outer operation:

Analysis and Topology

Preliminaries:

Vectors : , where

Metric Spaces

Definition.

Requirements for a metric(e.g. ):

  • : a point is at zero distance from itself

  • : distance is symmetric

  • : triangle inequality

We can generalise this definition to arbitrary nonempty sets .

Definition. A metric space is a nonempty set and a metric of distance

For example, is a metric space. Many additional metric spaces in are generated by a norm.

Definition. A norm on is a mapping satisfying

  • Nonnegativity:

  • Non degeneracy:

  • Homogeneity:

  • Triangle Inequality:

Each norm on generates a metric on via .

E.g. generates Euclidean distance .

The pair consisting of a vector space together with a norm is called a normed linear space.

Definition. A Banach space is a normed linear space that is a complete (in the Cauchy-convergence sense) metric space with respect to the metric derived from its norm.

Definition. Also known as Minkowski Norm

A class of norms that includes as a special case is the norm defined by

norms give rise to a class of metric spaces where .

Examples:

  • 1: Taxicab

  • 2: Euclidian

  • : Chebychev

Definition. Frobinius Norm of a matrix is

Definition. , denotes the set of real numbers satisfying . or denotes a strict inequality (i.e. closed from above or below).

If is bounded from above, .. Then is the least upper bound or supremum of . If is not bounded from above, we write .

Similarly, the greatest lower bound of a set or infimum is denoted

Definition. A sequence is denoted by or when the range of the indices is clear.

Let be an infinite sequence of real numbers and and . Then, is the .

If is not Bounded from above, .

Definition. A sequence in a metric space is said to be a Cauchy sequence if, whenever (intuitively, points in a Cauchy sequence get tighter together).

Let be a sequence of vectors in . Suppose for any . Then, has a limit.

More basic definition: if

Sequences

Let be a metric space. A sequence is said to converge to if .

Theorem. A sequence in can have at most one limit

Definition. centered on with radius is the set

Set Definitions

Definition. A subset of is called bounded if for some and some suitably large (intuition - some arbitrarily large ball can fit inside it).

A sequence in is called bounded if its range is a bounded set.

Definition. A set is closed IFF for every convergent sequence contained in , the limit of the sequence is also in .

A closed set contains all its limit points. That is , if is a convergent sequence of points in , then is in S as well.

Definition. A subset of an arbitrary metric space is open iff its complement is closed, and closed iff its complement is open.

A set is called open if, $\forall x \in S \exists \epsilon

0 \suchthat y \in B(\epsilon, x), \rho(x,y) < \epsilon$ is in S.

If is a closed, bounded subset of , then .

  • A set is open iff its complement is closed.

  • the union of any number of open sets is open

  • the intersection of a finite number of open sets is open.

  • the intersection of any number of closed sets is closed

  • the union of a finite number of closed sets is closed.

Definition. A point is called an interior point of if the set is contained in for all sufficiently small. A point is called a boundary point if is non-empty for all sufficiently small. The set of all boundary points in is denoted by .

The closure of a set is the set combined with all points that are the limits of sequence of points in .

Definition. A subset is said to be complete iff every cauchy sequence in A converges to some point in .

Definition. The set is called compact if every sequence contained in has a subsequence that converges to a point in .

Definition. A set is called convex if, and , we have . (i.e. all convex combinations of two points in a set are also in the set).

Theorem. Every bounded sequence in Euclidean space has at least one convergent subsequence.

Theorem. A subset of is precompact in that same space iff it is bounded and compact.

IOW : Compact Closed Bounded

Theorem. All metrics on induced by a norm are equivalent.

Functions

A function from set to , written as or is a rule associating every element in A to one and only one element in . The point is also written as , and is called the image of under . For , the set is the set of all points in A that map into D under F, and is called the preimage of D under F.

a function is called

  • injective / one-to-one if distinct elements of are always mapped into distinct elements of

  • surjective / onto if every element of B is the image under of at least one point in

  • bijective if a function is both injective and surjective

Definition. A real valued function on is continuous at point if .

Equivalently,

A function is said to be continuous on the set if, . Equivalently, in , we require the sequence of points that converge to to be entirely in .

  • The sum of two continuous functions is continuous

  • The product of two continuous functions is continuous

  • The quotient of two continuous functions is continuous at any point where the denominator is nonzero

Definition.

Definition. Given two metric spaces , , a function is called Lipschitz continuous if ,

such a is referred to as a Lipschitz constant for the function .

A Real valued function is Lipschitz if such that

This limits how fast a function can change. Every function that has bounded first derivatives is Lipschitz continuous. A differentiable function is Lipschitz if and only if it has a bounded derivative.

Definition. A function defined on is said to be Holder of order if such that

this is also called Uniform Lipschitz.

Theorem. A function f is continuous iff the preimage of every open set is open in .

Definition. is continuous if , .

Theorem. Let function f , where are metric spaces and is continuous. If is compact, then so is , the image of under .

Result.

Beta function :

Theorem. Let , where (an arbitrary metric space). If is continuous and is compact, then attains its supremum and infimum on .

In case of continuous functions on compact domains, optima always exist.

Definition. The function is differentiable at if , i.e. has a limit as .

The derivative of at is this limit and is denoted or

Theorem. Differentiability Continuity Limit i.e. not all functions with limits are continuous, not all continuous functions are differentiable.

More generally,

Continuously Differentiable Lipschitz Continuous Holder Continuous Uniformly Continuous Continuous

Theorem.

  • Linearity: are differentiable at , then and are differentiable at with and .

  • Chain Rule: If and are differentiable, then is differentiable and .

Theorem. Let , is continuous and differentiable. .

Theorem. , is continuous and differentiable. Then,

Definition. The epigraph of a function is (i.e. area above the function).

Definition. Let . Then,

  • F is convex if . : i.e. the epigraph of is a convex set.

  • F is concave if .

Fixed Points

Definition. Let , where is any set. An is called a fixed point of on S if .

If , then fixed points of are those points in where meets the 45 degree line.

Theorem. Consider the space , where is the metric induced by any norm. Let , and let . If is continuous and is both compact and convex, then has at least one fixed point in S.

Definition. Let be a metric space. is a map. it is called

  • nonexpansive if

  • contracting if

  • uniformly contracting with modulus if

Theorem. Let , where is a complete metric space. If is a uniform contraction on with modulus , then has a unique fixed point . Moreover for every and , we have

Measure

Definition. A -algebra (also -field) is a collection of subsets of that

  • : includes itself and the null set

  • : is closed under complement

  • : is closed under countable unions.

This is effectively the definition of the event space for a sample space .

Definition. A measure on a set assigns a nonnegative value to many subsets of . For a collection subsets of , a measure is a map

Given is a measure of the ‘size’ of set .

A function on a field of is a measure of

  • Null empty-set:

  • Non-Negativity:

  • Countable Additivity: If are disjoint elements of (i.e. ),

Existence ensured by Caratheodory’s Extension Theorem.

Examples of measures :

  • If is countable, let number of points in A. This counting measure can be defined for any subset , then the field is the collection of all subsets of , the power set of .

  • If , define

Definition. Given a topology on , a Borel -field is a field generated by the family of open subsets of , i.e. the smallest field that contains all the open sets.

The Lebesgue measure of a set can be defined implicitly for any set B in a field called the Borel sets of . is the smallest field that contains all ‘rectangles’

Result. Suppose . We say that is a bounded interval if . Define , the smallest algebra that contains is denoted by and is called the Borelian algebra

Thus, a countable union of open intervals belongs to the Borelean -algebra. Since every open subset of can be written as a countable union of open intervals, it is therefore also a Borelean set. The closed subsets are Borelean since a closed set is the complement of an open set.

Definition. Basic problem: how to assign each subset of , i.e. each element of a real number that will represent its ‘size’.

With , is the length, area, or volume of , respectively. is a Lebesgue measure on .

Definition. If is a field of subsets of , the pair is called a measurable space, and if is a measure on , the triple is called a measure space.

Definition. A measure is called a probability measure if , and then the triple is called a probability space.

Definition. If is a measurable space and is a real-valued function on , if measureable if

for every Borel set .

Integration

An integral is a map assigning a number to a function, where the number is viewed as the area/volume ‘under’ the function. Given a measure space and a measureable function , an integral is a map from to number such that the following three properties hold

  • If , then

Definition. Suppose is a bounded function defined on . An increasing sequence defines a partition of the interval. The mesh size of the partition is defined to be

To each partition we associate two approximations of the area under the graph of , by the rules

these are called the upper and lower Riemann Sums For any partition, .

Definition. A bounded function defined on an interval is said to be Riemann integrable if

Suppose is piecewise continuous, defined on . Then, is Riemann integrable and

Theorem. If is continuous on then is differentiable on the open interval and . is called the anti-derivative.

Definition. Given a measurable space and a set , we define an indicator function defined by

This function is measurable.

Definition. Any function of the form , where is a finite partition of . A countable sum of measurable functions is measurable, which implies that is measurable. Then we define

Definition. For any measurable function , define and , which are also measurable. We also have and . When at least one of or is finite, we define the integral

When both and are finite, we say is integrable w.r.t. . Lebesgue integrals intuitively slice the function horizontally, while Riemann integrals slice vertically.

Probability Theory

Definition. The triple is a probability space if it satisfies the following

  • Unitarity:

  • Non Negativity:

  • Countable Additivity: If are pairwise disjoint[i.e. ], Then

Other properties for any event

Stated differently:

Definition. Let be a probability measure on a measurable space , so is a probability space. Sets are called events, points are called outcomes, and is called the probability of B.

Let be a measurable space. Let be a ‘set’ function mapping the algebra of subsets of into the real line. We say is a probability measure if, for events ,

  1. : Events range from never happening to always happening

  2. : Something must happen

  3. : Nothing never happens

  4. : A must either happen or not happen

  5. : additivity for countable disjoint events

    • Boole’s Inequality for any sequence of events
  6. Monotonicity: for events

Definition. A measurable function (event space) is called a random variable. In other words, a random variable is a function from the sample space to the real line , and the probability of its value being in a given interval is well defined.

Definition. The probability measure living on such that for any ,

for Borel sets is called the distribution of . The notation is used to indicate that has distribution .

Definition. Map such that

for

Properties of CDF

  • Boundary property:

  • Nondecreasing: if

  • Right continuous:

Densities

Theorem. If a finite measure is absolutely continuous wrt a finite measure , then a nonnegative measurable function s.t.

The function in this theorem is called the Radon-Nikodym derivative of with respect to , or the density of with respect to , denoted

Definition. If a random variable has density wrt Lebesgue measure on , then or its distribution is called absolutely continuous with density . Then, from R-N ,

Using the fundamental theorem of calculus, can be found from the CDF by differentiation, .

Definition. Let be a countable subset of . The measure for borel sets is also called counting measure on . Then,

Suppose is a random variable s.t. . Then, is called a discrete random variable.

The density of w.r.t. satisfies

In particular, if , then , and so

The density is called the mass function for X.

Moments

Definition. If is a random variable on a probability space , then the expectation of (i.e. density ), is defined as

For discrete RV with for a countable set , if is counting measure on , and is the mass function given by ,

Definition. The variance of a random variable with finite expectation is defined as

If is absolutely continuous with density ,

If is discrete with mass function ,

Random vectors

If are random variables, then the function defined by

is called a random vector. The definitions above extend naturally to random vectors, e.g. the distribution of is

for Borel sets . The expectation of a random vector is the vector of expectations

A random vector is said to be absolutely continuous if the CDF can be written as

Definition. A matrix is called a random matrix if the entries are random variables.

Definition. The covariance of a random vector is the matrix of covariances of the variables in

If and is the transpose of the mean deviation, then

so

Product Measures and Independence

Let and be measure spaces. Then a unique product measure on such that . The -field is defined formally as the smallest field containing all sets with , .

Theorem. Integration against the product measure can be accomplished by iterated integration against and , in either order

Definition. Suppose are random variables.

are indepepndent for all Borel subsets of , it is true that

Conditional Expectations

Definition. Given two r.v.s with finite second moments, is defined as a measurable function such that

For continuous with pdf , For any , the conditional probability of the event is defined as a function satisfying

The conditional CDF is

where the conditional density is .

Definition. Let . Define the class of ‘Bounded Lipschitz functions’ with Lipschitz constant 1 as

Then the Bounded Lipschitz Distance is

and are said to be ‘close’ if is small.

Order Statistics

Theorem. Suppose are r.v.s with distribution . To each define . We want the distribution of .

If has a density, has a density too

More generally, the distribution function of is given by

Reference: Severini (2005), Chapter 7.

Result. has a distribution with density

Result. (Useful for Vickrey auctions)

Linear Functions and Linear Algebra

Linear Functions

Definition. A function between two linear spaces is linear if it preserves the linearity of sets and through the following properties ()

  • Additivity

  • Homogeneity

  • linear and bijective is called an isomorphism

  • A linear function that maps onto is called a linear functional.

  • A linear function that maps onto itself is called a linear operator / automorphism.

Every linear function mapping from a finite-dimensional domain can be represented by a matrix.

Theorem. Linear Functions Imply

  • Additivity : generalises to

    • Convex Functions generalises to

    • Quasiconvex functions

  • Homogeneity: generalises to

    • Homogeneous functions generalises to

    • Homothetic functions

Definition. An inner product on a vector space is a mapping that satisfies,

This generalises to an inner product of two functionals where

An inner product defines a norm . This gives us a restatement of the Cauchy-Schwartz inequality

Definition.

  1. (Orthogonality). Furthermore, if , then they are said to be orthonormal.

  2. x parallel to y

Definition. is symmetric, positive definite if > 0.

Definition. For conformable matrices and with identical dimensions, the element-wise product

is called the Hadamard product.

Definition. Euclidian norm of a vector is defined as

Theorem. Angle between and is given by

Theorem.

Definition.

For conformable matrices

Definition. For a square matrix , scalar and vector that satisfies constitute an eigenvalue and eigenvector respectively.

Definition. A square matrix is orthogonal iff its columns are orthonormal so that

Definition. For , we define the kernel/null space as

and the image/range

The dimension of the image is called the rank of . The dimension of the kernel is called the nullity of . If is finite dimensional,

This is the rank-nullity result

A linear function has full rank if .

Definition. Nonsingular Matrices

A matrix with columns is non-singular or one-to-one if

Projection

Definition. Let be a vector space and is a subspace of . A linear mapping is called a projection if . Since homeomorphisms can be expressed by a transformation matrix, projections can be represented as a projection matrix with the property . Projection matrices are always symmetric.

Result. We look at orthogonal projections of vectors onto lower dimensional subspaces with . Assume is an ordered basis of . Therefore, any projection . The problem, then, is to find coordinates of the projection (with respect to basis ) where given and . The solution is the familiar OLS coef vector

where is also called the pseudo-inverse of , which can be computed as long as is full rank.

The projection matrix is therefore

Definition. Any basis of an dimensional vector space can be transformed into an orthogonal/orthonormal basis where as follows

where the th basis vector is projected onto the subspace spanned by the first constructed orthogonal vectors .

This is the same as FWL Theorem, but older.

Matrix Decompositions

Definition. A square matrix admits to an eigen-decomposition if it can be factorised as where

  • is a matrix whose th column is the eigenvector of (orthogonal matrix)

  • is a diagonal matrix with corresponding eigenvalues

Theorem. If are orthogonal matrices

  • is also orthogonal

  • is orthogonal

Definition. If is positive definite, then it admits to

  • where is non-singular upper triangular

  • where is non-singular lower triangular

Definition. If is a matrix with full column rank, a factorisation where

  • is an orthogonal matrix

  • is upper triangular and nonsingular (invertible)

Result. is often numerically unstable, so we can define . The, the OLS estimate can be written as . The homoscedastic variance is .

Using the same decomposition, .

Definition. Any matrix may be written as

where is a orthogonal matrix, is a orthogonal matrix, and is a diagonal matrix with non-negative elements.

Result. For a square covariance matrix , if , then

where contains the square singular values. In other words,

Matrix Identities

For conformable matrices ,

Partitioned Matrices

Definition. It can be useful to partition a matrix as follows

Multiplying a partitioned matrix with a stacked vector

Theorem.

where

Function Spaces

Almost all of this section is based on Larry Wasserman’s notes: functionspaces.pdf, and Racine (2013).

Definition. Let be any set, let be the collection of all bounded functions s.t. (i.e. and let

Spaces of functions can be treated as linear vector spaces

Definition.

which leads to a norm for functions

Definition. An operator is a higher-order function that maps from one function to another. A derivative and integral are both operators.

Operators can have eigenvalues and eigenfunctions such that

is an eigenfunction for both differentiation and integration.

Definition. is a complete ( every Cauchy sequence in the space converges to a point in it), inner product space. Equivalently, it is a vector space endowed with an inner product and an associated norm and metric such that every Cauchy sequence has a limit in .

Intuitively, it means it doesn’t have any ‘holes’ in it ( is not a complete space because is missing from it).

Every Hilbert space is a Banach space but the reverse is not true in general. In a hilbert space, .

If is a hilbert space and is a closed subspace then called a projection of onto that minimises over . The set of elements orthogonal every is denoted . Every can be written as where is the projection of onto and .

Result. , the set of random variables defined on a common probability space is a Hilbert space with inner product , associated norm and metric .

Result. the space of Borel-measurable real functions on given density satisfying and associated norm and metric is a hilbert space.

Definition. If and are spaces such that every is orthogonal to every , then we define the orthogonal sum as

A set of vectors is orthonormal if when and . This is also called an orthonormal basis. Every hilbert space has an orthonormal basis. A Hilbert space is said to be separable if there exists a countable orthonormal basis.

spaces

Let be a collection of functions . The norm on is defined by

For , we define the sup norm .

The space is defined as

Every space is a Banach Space.

  • Cauchy Schwartz:

  • Minkowski : where

  • Holder: where

Result. Functions where are said to be square-integrable, and the space of square-integrable functions is called . Many familiar results from vector spaces carries over into .

is a Hilbertspace. The inner product between two functions is and the norm of is . With this inner product, is a separable Hilbert space; that is, we can find a countable orthonormal basis , that is , and . It follows that if ,

are the coefficients. Parseval’s identity .

The span of is

The projection of onto the span is , which we call the n-term linear approximation of .

Definition. A sequence of functions can be considered a basis. An orthonormal basis is one that admits to

Mononomials are a basis for on and , but they aren’t orthogonal.

Result. A popular basis for on are the sines and cosines, which may be written as , , . Coefficients in this expansion are referred to as the Fourier transform of the original function.

A cosine basis on is

Legendre basis on is

The Haar basis on [0,1] consists of functions where and

This is a doubly indexed set of functions so when is expanded in this basis we write where and The Haar basis is an example of a wavelet basis.

Definition. Let be a positive integer. Let The Holder space is the set of functions such that

The special case is sometimes called the Lipschitz space. If then we have

Roughly speaking, this means that the functions have bounded second derivatives.

Multivariate version

There is also a multivariate version of Holder spaces. Let . Given a vector define and

The Hölder class is the set of functions such that

for all and all such that .

Definition. A Sobolev space is a space of functions possessing sufficiently many derivatives for some application domain. Formally,

Let be integrable on every bounded interval. Then is weakly differentiable if there exists a function that is integrable on every bounded interval, such that whenever We call the weak derivative of Let denote the weak derivative of

The Sobolev space of order is defined by The Sobolev ball of order and radius is defined by

Definition. A Mercer kernel is a continuous function such that and such that is positive semidefinite, meaning that

for all finite sets of points and all real numbers The function

is an example of a Mercer kernel. The most commonly used kernel is the Gaussian kernel

Definition.

Given a kernel let be the function obtained by fixing the first coordinate. That is, For the Gaussian kernel, is a Normal, centered at We can create functions by taking liner combinations of the kernel:

Let denote all such functions: Given two such functions and we define an inner product In general, and might be representable in more than one way. You can check that is independent of how or is represented. The inner product defines a norm: where and is the matrix with

The Reproducing Property

.

Let Note the following crucial property: This follows from the definition of where we take This implies that This is called the reproducing property. It also implies that is the representer of the evaluation functional.

The completion of with respect to is denoted by and is called the RKHS generated by .

Evaluation Functionals. A key property of RKHS’s is the behavior of the evaluation functional. The evaluation functional assigns a real number to each function. It is defined by In general, the evaluation functional is not continuous. This means we can have but does not converge to For example, let and Then But which does not converge to Intuitively, this is because Hilbert spaces can contain very unsmooth functions.

But in an , the evaluation functional is continuous. Intuitively, this means that the functions in the space are well-behaved. To see this, suppose that Then

so the evaluation functional is continuous.

A Hilbert space is a RKHS if and only if the evaluation functionals are continuous.

Theorem. Let be a loss function depending on and on Let minimize where is any monotone increasing function. Then has the form for some

Calculus and Optimisation

Calculus

Definition. The derivative of a function at point , when defined, is the tangent to the function at .

Definition. For function , we define the

which collects the partial derivatives in a column vector.

The matrix of partial derivatives of is called the Hessian, denoted by

For a vector-valued function , we can construct the Jacobian, which collects all partial derivatives.

Theorem. admits to Taylor expansion around such that

For a function with multiple arguments , the second-order Taylor expansion around the point is

Theorem. Let have continuous first and second order partial derivatives in the neighbourhood of the optimum .

If and is positive definite, then is a local minimum. If and is negative definite, then is a local maximum.

Theorem. Let be nonnegative real numbers, and suppose maximises the Lagrangian

Then, maximises subject to constraints ()

Theorem. If is differentiable at and is invertible, then such that is bijective. Further, the inverse function is differentiable.

Theorem. Let be a domain and be a differentiable function. If , we’ll concatenate the two vectors and write .

Suppose , and . Then, differentiable function .

Further, . IoW, has a unique solution .

Theorem. Let be differentiable and consider the implicitly defined curve

(i.e. a level set of ). Pick , suppose . By IFT, we know the -coordinate of this curve can locally be expressed as a differentiable function of . Directly differentiating gives

Differentiation Rules

Rule
Power rule
Exponential rule
Log rule
Linear rule
Product rule
Quotient rule
Chain rule

Matrix Derivatives

Let , and be a conformable matrix


General Results from Optimisation Theory

References: Luenberger (1997), Rustagi (2014).

  • Projection Theorem - In , the shortest line from a point to the plane is furnished by the perpendicular from the point to the plane. Core idea carries through to higher dimensions and infinite-dimensional Hilbert Space

  • Hahn-Banach Theorem: given a sphere and a point not in the sphere, there exists a hyperplane separating the point and the sphere.

  • Duality: The shortest distance from a point to a convex set is equal to the maximum of the distances from the point to a hyperplane separating the point from the convex set.

  • Differentials: Set derivative of the objective function to zero.

Linear Programming

Maximise

subject to

where is the choice vector, is a given vector, is a known matrix of constants, and is a vector of constants.

Definition. By introducing slack variables , for every inequality with , we can convert every linear programming into its standard form

Definition. Primal

Dual

Theorem. A feasible solution to the primal is optimal IFF there exists a feasible solution to the dual problem such that

Dantzig’s Simplex method, Karmarkar’s Algorithm.

Nonlinear Optimisation

Saddle Point

Suppose we have and is a real valued function. Then, is a saddle-point of if

.

Definition.

where , is symmetric, positive definite, is a matrix of constraints, and .

Constrained Maximisation

Theorem. We want to maximise (implicitly defined by ). Suppose . If attains a constrained local maximum (or minimum) at on the surface , .

Generic problem of the form

Definition.

First, write

differentiating wrt yields FOCs

which gives us three (potentially nonlinear) equations with three unknowns , that can be solved simultaneously.

To check sufficiency, the second-order condition analogue is the determinant of the bordered hessian matrix

If , then it is negative definite, which implies that the that solves the system is indeed a local maximum.

Definition. The Hessian for of a [twice differentiable] function is defined by the matrix

  • If , is said to be negative semi-definite.

  • If , is said to be negative definite.

  • If , is said to be positive semi-definite.

  • If , is said to be positive definite.

Numerical Optimisation

Root-finding

We want to evaluate the roots of the equation

Assume the inverse of , denoted exists.

Finding the root of is equivalent to evaluating .

Canonical newton-raphson is

Quasi-Newton

General version of update rule:

Step length for both N-R and BHHH.

Definition. set

Update rule:

For Log-likelihood,

Definition. Uses Information-matrix equality. Set to be outer product of scores