Matrix Calculus

Matrix calculus
In mathematics, matrix calculus is a specialized notation

for doing multivariable calculus, especially over spaces of
matrices. It collects the various partial derivatives of a
single function with respect to many variables, and/or of
a multivariate function with respect to a single variable,
into vectors and matrices that can be treated as single entities. This greatly simplies operations such as nding
the maximum or minimum of a multivariate function and
solving systems of dierential equations. The notation
used here is commonly used in statistics and engineering,
while the tensor index notation is preferred in physics.
equation
f =
f
f
f
x1 +
x2 +
x3
x1
x2
x3
where xi represents a unit vector in the xi direction for

1 i 3 . This type of generalized derivative can
be seen as the derivative of a scalar, f, with respect to a
vector, x and its result can be easily collected in vector
form.
Two competing notational conventions split the eld of

matrix calculus into two separate groups. The two
groups can be distinguished by whether they write the
derivative of a scalar with respect to a vector as a column
vector or a row vector. Both of these conventions are
possible even when the common assumption is made
that vectors should be treated as column vectors when
combined with matrices (rather than row vectors). A
single convention can be somewhat standard throughout
a single eld that commonly use matrix calculus (e.g.
econometrics, statistics, estimation theory and machine
learning). However, even within a given eld dierent
authors can be found using competing conventions. Authors of both groups often write as though their specic
convention is standard. Serious mistakes can result when
combining results from dierent authors without carefully verifying that compatible notations are used. Therefore great care should be taken to ensure notational consistency. Denitions of these two conventions and comparisons between them are collected in the layout conventions section.
More complicated examples include the derivative of a

scalar function with respect to a matrix, known as the
gradient matrix, which collects the derivative with respect
to each matrix element in the corresponding position in
the resulting matrix. In that case the scalar must be a
function of each of the independent variables in the matrix. As another example, if we have an n-vector of dependent variables, or functions, of m independent variables we might consider the derivative of the dependent
vector with respect to the independent vector. The result could be collected in an mn matrix consisting of
all of the possible derivative combinations. There are, of
course, a total of nine possibilities using scalars, vectors,
and matrices. Notice that as we consider higher numbers
of components in each of the independent and dependent
variables we can be left with a very large number of possibilities.
The six kinds of derivatives that can be most neatly organized in matrix form are collected in the following
table.[1]
f =
Scope
f
x
[
=
f
x1
f
x2
f
x3
]
.
Here, we have used the term matrix in its most general sense, recognizing that vectors and scalars are simply
matrices with one column and then one row respectively.
Moreover, we have used bold letters to indicate vectors
and bold capital letters for matrices. This notation is used
throughout.
Matrix calculus refers to a number of dierent notations

that use matrices and vectors to collect the derivative of
each component of the dependent variable with respect to
each component of the independent variable. In general,
the independent variable can be a scalar, a vector, or a
matrix while the dependent variable can be any of these
as well. Each dierent situation will lead to a dierent set
of rules, or a separate calculus, using the broader sense of
the term. Matrix notation serves as a convenient way to
collect the many derivatives in an organized way.
Notice that we could also talk about the derivative of a

vector with respect to a matrix, or any of the other unlled
cells in our table. However, these derivatives are most
naturally organized in a tensor of rank higher than 2, so
that they do not t neatly into a matrix. In the following
As a rst example, consider the gradient from vector cal- three sections we will dene each one of these derivatives
culus. For a scalar function of three independent vari- and relate them to other branches of mathematics. See
ables, f (x1 , x2 , x3 ) , the gradient is given by the vector the layout conventions section for a more detailed table.
1
3 DERIVATIVES WITH VECTORS
1.1
Relation to other derivatives
The matrix derivative is a convenient notation for keeping track of partial derivatives for doing calculations. The
Frchet derivative is the standard way in the setting of
functional analysis to take derivatives with respect to vectors. In the case that a matrix function of a matrix is
Frchet dierentiable, the two derivatives will agree up
to translation of notations. As is the case in general
for partial derivatives, some formulae may extend under weaker analytic conditions than the existence of the
derivative as approximating linear mapping.
1.2
Usages
Matrix calculus is used for deriving optimal stochastic estimators, often involving the use of Lagrange multipliers.
This includes the derivation of:
two possible notational choices involved. The reason is that the choice of numerator vs. denominator (or in some situations, numerator vs. mixed)
can be made independently for scalar-by-vector,
vector-by-scalar, vector-by-vector, and scalar-bymatrix derivatives, and a number of authors mix and
match their layout choices in various ways.
2. The choice of numerator layout in the introductory
sections below does not imply that this is the correct or superior choice. There are advantages
and disadvantages to the various layout types. Serious mistakes can result from carelessly combining formulas written in dierent layouts, and converting from one layout to another requires care to
avoid errors. As a result, when working with existing formulas the best policy is probably to identify
whichever layout is used and maintain consistency
with it, rather than attempting to use the same layout in all situations.
Kalman lter
Wiener lter
2.1 Alternatives
Expectation-maximization algorithm for Gaussian The tensor index notation with its Einstein summation
mixture
convention is very similar to the matrix calculus, except
one writes only a single component at a time. It has the
advantage that one can easily manipulate arbitrarily high
2 Notation
rank tensors, whereas tensors of rank higher than two are
quite unwieldy with matrix notation. All of the work here
The vector and matrix derivatives presented in the sec- can be done in this notation without use of the singletions to follow take full advantage of matrix notation, us- variable matrix notation. However, many problems in esing a single variable to represent a large number of vari- timation theory and other areas of applied mathematics
ables. In what follows we will distinguish scalars, vectors would result in too many indices to properly keep track
and matrices by their typeface. We will let M(n,m) de- of, pointing in favor of matrix calculus in those areas.
note the space of real nm matrices with n rows and m Also, Einstein notation can be very useful in proving the
columns. Such matrices will be denoted using bold cap- identities presented here, as an alternative to typical eleital letters: A, X, Y, etc. An element of M(n,1), that is, ment notation, which can become cumbersome when the
a column vector, is denoted with a boldface lowercase explicit sums are carried around. Note that a matrix can
letter: a, x, y, etc. An element of M(1,1) is a scalar, be considered a tensor of rank two.
denoted with lowercase italic typeface: a, t, x, etc. XT
denotes matrix transpose, tr(X) is the trace, and det(X)
is the determinant. All functions are assumed to be of 3 Derivatives with vectors
dierentiability class C 1 unless otherwise noted. Generally letters from rst half of the alphabet (a, b, c, ) will
Main article: Vector calculus
be used to denote constants, and from the second half (t,
x, y, ) to denote variables.
Because vectors are matrices with only one column, the
NOTE: As mentioned above, there are competing notasimplest matrix derivatives are vector derivatives.
tions for laying out systems of partial derivatives in vectors and matrices, and no standard appears to be emerg- The notations developed here can accommodate the usual
ing yet. The next two introductory sections use the operations of vector calculus by identifying the space
n
numerator layout convention simply for the purposes of M(n,1) of n-vectors with the Euclidean space R , and
convenience, to avoid overly complicating the discussion. the scalar M(1,1) is identied with R. The correspondThe section after them discusses layout conventions in ing concept from vector calculus is indicated at the end
of each subsection.
more detail. It is important to realize the following:
NOTE: The discussion in this section assumes the
1. Despite the use of the terms numerator layout and numerator layout convention for pedagogical purposes.
denominator layout, there are actually more than Some authors use dierent conventions. The section on
3.3
Vector-by-vector
layout conventions discusses this issue in greater detail. 3.3 Vector-by-vector

The identities given further down are presented in forms
that can be used in conjunction with all common layout Each of the previous two cases can be considered as an
application of the derivative of a vector with respect to a
conventions.
vector, using a vector of size one appropriately. Similarly
we will nd that the derivatives involving matrices will
3.1 Vector-by-scalar
reduce to derivatives involving vectors in a corresponding
way.

y1
The derivative of a vector function
y2
(a vector whose com
y1
The derivative of a vector y = . , by a scalar x is
..
y2

ponents are functions) y = . , with respect to an
ym
..
written (in numerator layout notation) as
ym

x1
y1
x2
x

y2
input vector, x = . , is written (in numerator layout
y
x
..
= . .
x ..
xn
ym
notation) as
x
In vector calculus the derivative of a vector y with respect

to a scalar x is known as the tangent vector of the vector
y
y, x
. Notice here that y:R Rm .
x1
y2
x1
y
=
.
x
..
y1
x2
y2
x2
..
.
y1
xn
y2
xn
..
.
.
..
Example Simple examples of this include the velocity
.
vector in Euclidean space, which is the tangent vector
ym
ym
ym
x1
x2
xn
of the position vector (considered as a function of time).
Also, the acceleration is the tangent vector of the velocity. In vector calculus, the derivative of a vector function y
with respect to a vector x whose components represent a
space is known as the pushforward or dierential, or
3.2 Scalar-by-vector
the Jacobian matrix.

The pushforward along a vector function f with respect to
x1
f
vector v in Rm is given by d f(v) = x
v.
x2

The derivative of a scalar y by a vector x = . , is
..
xn
4 Derivatives with matrices
written (in numerator layout notation) as
[
]
y
y y
y
=
.
x
x1 x2
xn
In vector calculus, the gradient of a scalar eld y in the
space Rn (whose independent coordinates are the components of x) is the derivative of a scalar by a vector.
In physics, the electric eld is the vector gradient of the
electric potential.
There are two types of derivatives with matrices that can

be organized into a matrix of the same size. These are
the derivative of a matrix by a scalar and the derivative
of a scalar by a matrix respectively. These can be useful
in minimization problems found many areas of applied
mathematics and have adopted the names tangent matrix and gradient matrix respectively after their analogs
for vectors.
The directional derivative of a scalar function f(x) of the NOTE: The discussion in this section assumes the
space vector x in the direction of the unit vector u is de- numerator layout convention for pedagogical purposes.
Some authors use dierent conventions. The section on
ned using the gradient as follows.
layout conventions discusses this issue in greater detail.
The identities given further down are presented in forms
that can be used in conjunction with all common layout
u f (x) = f (x) u
conventions.
Using the notation just dened for the derivative of a
scalar with respect to a vector we can re-write the directional derivative as u f = f
Matrix-by-scalar
x u. This type of notation 4.1
will be nice when proving product rules and chain rules
that come out looking similar to what we are familiar with The derivative of a matrix function Y by a scalar x is
for the scalar derivative.
known as the tangent matrix and is given (in numerator
5 LAYOUT CONVENTIONS
layout notation) by
y11
x
y21
x
y12
x
y22
x
..
.
..
.
y1n
x
y2n
x
ym1
x
ym2
x
ymn
x
= .
x
..
4.2
.. .
.
The dierential or the matrix derivative of a matrix function F(X) that maps from nm matrices to pq matrices,
F : M(n,m) M(p,q), is an element of M(p,q) ? M(m,n),
a fourth-rank tensor (the reversal of m and n here indicates the dual space of M(n,m)). In short it is an mn
matrix each of whose entries is a pq matrix.
F
=
X
Scalar-by-matrix
F
X1,1
..
.
..
.
F
Xn,1
..
.
X1,m
Xn,m
The derivative of a scalar y function of a matrix X of
independent variables, with respect to the matrix X, is
F
and note that each X
is a pq matrix dened as above.
ij
given (in numerator layout notation) by
Note also that this matrix has its indexing transposed; m
rows and n columns. The pushforward along F of an nm
y
y
y
matrix Y in M(n,m) is then
xp1
x11
x21
y
y
y
x12 x22 xp2
y
=
(
)
.
..
.. .
..
X
F
.
..
.
.
Y ,
dF(Y) = tr
y
y
y
X
x1q
x2q
xpq
Notice that the indexing of the gradient with respect to

X is transposed as compared with the indexing of X. Important examples of scalar functions of matrices include
the trace of a matrix and the determinant.
Note that this denition encompasses all of the preceding

denitions as special cases.
According to Jan R. Magnus and Heinz Neudecker, the

following notations are both unsuitable, as the determiIn analog with vector calculus this derivative is often writ- nant of the second resulting matrix would have no interpretation and a useful chain rule does not exist if these
ten as the following.
notations are being used:[2]
X y(X) =
y(X)
X
Also in analog with vector calculus, the directional

derivative of a scalar f(X) of a matrix X in the direction of matrix Y is given by
(
Y f = tr
)
f
Y .
X
It is the gradient matrix, in particular, that nds many

uses in minimization problems in estimation theory, particularly in the derivation of the Kalman lter algorithm,
which is of great importance in the eld.
4.3
Other matrix derivatives
Given , a dierentiable function of an n m

matrix X = (xi,j ) ,

x1,q
x1,1
(X)
.
..
..
=
.
.
..
X
x
xn,1
n,q
Given F = (fs,t ) , a dierentiable m n function of an n m matrix X ,
f1,1
f1,p
X
X
F(X) .
..
..
= ..
.
.
X
fm,1
fm,p
X
X
The Jacobian matrix, according to Magnus and
Neudecker,[2] is
The three types of derivatives that have not been considered are those involving vectors-by-matrices, matricesby-vectors, and matrices-by-matrices. These are not as D F (X) = vec F (X)
.
(vec X)
widely considered and a notation is not widely agreed
upon. As for vectors, the other two types of higher matrix
derivatives can be seen as applications of the derivative of
a matrix by a matrix by using a matrix with one column 5 Layout conventions
in the correct place. For this reason, in this subsection we
consider only how one can write the derivative of a matrix This section discusses the similarities and dierences between notational conventions that are used in the various
by another matrix.
5
numerator layout lays out according to Y and XT , while
consistent denominator layout lays out according to YT
and X. In practice, however, following a denominator layT
out for Y
x , and laying the result out according to Y , is
rarely seen because it makes for ugly formulas that do not
The fundamental issue is that the derivative of a vector correspond to the scalar formulas. As a result, the followy
with respect to a vector, i.e. x
, is often written in two ing layouts can often be found:
competing ways. If the numerator y is of size m and the
1. Consistent numerator layout, which lays out Y
x acdenominator x of size n, then the result can be laid out as
y
T
cording
to
Y
and
according
to
X
.
either an mn matrix or nm matrix, i.e. the elements
X
of y laid out in columns and the elements of x laid out in
2. Mixed layout, which lays out Y
x according to Y and
rows, or vice versa. This leads to the following possibiliy
X according to X.
ties:
elds that take advantage of matrix calculus. Although
there are largely two consistent conventions, some authors
nd it convenient to mix the two conventions in forms that
are discussed below. After this section equations will be
listed in both competing forms separately.
1. Numerator layout, i.e. lay out according to y and x

(i.e. contrarily to x). This is sometimes known as
the Jacobian formulation.
2. Denominator layout, i.e. lay out according to yT and
x (i.e. contrarily to y). This is sometimes known
as the Hessian formulation. Some authors term this
layout the gradient, in distinction to the Jacobian
(numerator layout), which is its transpose. (However, "gradient" more commonly means the derivative y
x , regardless of layout.)
3. A third possibility sometimes seen is to insist on
y
writing the derivative as x
, (i.e. the derivative is
taken with respect to the transpose of x) and follow the numerator layout. This makes it possible to
claim that the matrix is laid out according to both numerator and denominator. In practice this produces
results the same as the numerator layout.
y
3. Use the notation X
, with results the same as consistent numerator layout.
In the following formulas, we handle the ve possible

y y y
Y
combinations y
x , x , x , X and x separately. We also
handle cases of scalar-by-scalar derivatives that involve
an intermediate vector or matrix. (This can arise, for
example, if a multi-dimensional parametric curve is dened in terms of a scalar variable, and then a derivative
of a scalar function of the curve is taken with respect
to the scalar that parameterizes the curve.) For each
of the various combinations, we give numerator-layout
and denominator-layout results, except in the cases above
where denominator layout rarely occurs. In cases involving matrices where it makes sense, we give numeratorlayout and mixed-layout results. As noted above, cases
where vector and matrix denominators are written in
transpose notation are equivalent to numerator layout
with the denominators written without the transpose.
Keep in mind that various authors use dierent combiy

nations of numerator and denominator layouts for dierWhen handling the gradient y
and
the
opposite
case
,
x
x
we have the same issues. To be consistent, we should do ent types of derivatives, and there is no guarantee that an
author will consistently use either numerator or denomione of the following:
nator layout for all types. Match up the formulas below
y
1. If we choose numerator layout for x , we should lay with those quoted in the source to determine the layout
y
used for that particular type of derivative, but be careful
out the gradient y
x as a row vector, and x as a
not to assume that derivatives of other types necessarily
column vector.
follow the same kind of layout.
y
2. If we choose denominator layout for x
, we should When taking derivatives with an aggregate (vector or may
lay out the gradient y
trix) denominator in order to nd a maximum or minix as a column vector, and x
as a row vector.
mum of the aggregate, it should be kept in mind that using
3. In the third possibility above, we write
and use numerator layout.
y
x
and
y
x ,
Not all math textbooks and papers are consistent in this

respect throughout the entire paper. That is, sometimes
dierent conventions are used in dierent contexts within
the same paper. For example, some choose denominator
layout for gradients (laying them out as column vectors),
but numerator layout for the vector-by-vector derivative
y
x .
numerator layout will produce results that are transposed

with respect to the aggregate. For example, in attempting
to nd the maximum likelihood estimate of a multivariate
normal distribution using matrix calculus, if the domain
is a kx1 column vector, then the result using the numerator layout will be in the form of a 1xk row vector. Thus,
either the results should be transposed at the end or the
denominator layout (or mixed layout) should be used.
Similarly, when it comes to scalar-by-matrix derivatives The results of operations will be transposed when switchy
Y
X and matrix-by-scalar derivatives x , then consistent ing between numerator-layout and denominator-layout
6
y
notation.
5.1
x
y11
x2
y
=
.
x
..
Numerator-layout notation
y1
xn
Using numerator-layout notation, we have:[1]
..
.
y2
xn
y
x11
y
x21
y
x12
y
x22
y
xp1
y
xp2
y
= .
X
..
[
]
y
y y
y
=
.
x
x1 x2
xn
y2
x1
y2
x2
..
.
..
.
..
.
IDENTITIES
ym
x1
ym
x2
.. .
.
ym
xn
y
x1q
y
x2q
.. .
.
y
xpq
y1
x
y2
y
x
= .
x ..
6 Identities
As noted above, in general, the results of operations will
be transposed when switching between numerator-layout
and denominator-layout notation.
ym
x
y1
x2
y2
x2
x1
y2
x1
y
= .
x
..
..
.
y1
xn
y2
xn
..
.
.
To help make sense of all the identities below, keep in

mind the most important rules: the chain rule, product
rule and sum rule. The sum rule applies universally,
ym
ym
ym
x1
x2
xn
and the product rule applies in most of the cases below, provided that the order of matrix products is main y
y
y
x
x11
x21
tained, since matrix products are not commutative. The
p1
y
y
y
x12 x22 xp2
chain rule applies in some of the cases, but unfortunately
y
.
=
.
.
.
.
does not apply in matrix-by-scalar derivatives or scalar.
.
.
.
X .
.
.
.
by-matrix derivatives (in the latter case, mostly involving
y
y
y
x
x1q
x2q
the trace operator applied to matrices). In the latter case,
pq
The following denitions are only provided in numerator- the product rule can't quite be applied directly, either, but
the equivalent can be done with a bit more work using the
layout notation:
dierential identities.
..
.
y11
x
y21
x
y12
x
y22
x
..
.
..
.
y1n
x
y2n
x
ym1
x
ym2
x
ymn
x
dx11
dx21
dX = .
..
dx12
dx22
..
.
..
.
dx1n
dx2n
.. .
.
dxm1
dxm2
dxmn
= .
x
..
6.1 Vector-by-vector identities
.. .
.
This is presented rst because all of the operations that

apply to vector-by-vector dierentiation apply directly to
vector-by-scalar or scalar-by-vector dierentiation simply by reducing the appropriate vector in the numerator
or denominator to a scalar.
6.2 Scalar-by-vector identities

5.2
Denominator-layout notation
[3]
Using denominator-layout notation, we have:
y
x
y1
x2
y
=
.
x
..
The fundamental identities are placed above the thick

black line.
6.3 Vector-by-scalar identities
y
xn
[
]
y
y1 y2
ym
=
.
x
x x
x
NOTE: The formulas involving the vector-by-vector

f(g)
derivatives g(u)
u and g (whose outputs are matrices)
6.7
Identities in dierential form
assume the matrices are laid out consistent with the vec- 6.6.2 With matrices involved
tor layout, i.e. numerator-layout matrix when numeratorlayout vector and vice versa; otherwise, transpose the
vector-by-vector derivatives.
6.4
Scalar-by-matrix identities
Note that exact equivalents of the scalar product rule and

chain rule do not exist when applied to matrix-valued
functions of matrices. However, the product rule of this
sort does apply to the dierential form (see below), and
this is the way to derive many of the identities below involving the trace function, combined with the fact that the
trace function allows transposing and cyclic permutation,
i.e.:
6.7 Identities in dierential form

It is often easier to work in dierential form and then
convert back to normal derivatives. This only works well
using the numerator layout. In these rules, a is a scalar.
To convert to normal derivative form, rst convert it to

one of the following canonical forms, and then use these
identities:
tr(A) = tr(A )
tr(ABCD) = tr(BCDA) = tr(CDAB) = tr(DABC)
For example, to compute
tr(AXBX C)
X
d tr(AXBX C) = d tr(CAXBX ) = tr(d(CAXBX ))

= tr(CAXd(BX ) + d(CAX)BX )
= tr(CAXd(BX )) + tr(d(CAX)BX )
= tr(CAXBd(X )) + tr(CA(dX)BX )
7 See also
Derivative (generalizations)
Product integral
8 Notes
= tr(CAXB(dX) ) + tr(CA(dX)BX )
(
)
= tr (CAXB(dX) ) + tr(CA(dX)BX
) Minka, Thomas P. Old and New Matrix Algebra Useful
[1]
= tr((dX)B X A C ) + tr(CA(dX)BX ) for Statistics. December 28, 2000.
= tr(B X A C (dX)) + tr(BX CA(dX))
[2] Magnus, Jan R.; Neudecker, Heinz (1999). Matrix Dier(
)
ential Calculus with Applications in Statistics and Econo
= tr (B X A C + BX CA)dX
metrics. Wiley Series in Probability and Statistics (2nd
ed.). Wiley. pp. 171173.
Therefore,
[3]
tr(AXBX C)
= B X A C + BX CA.
X
(For the last step, see the `Conversion from dierential to
derivative form' section.)
6.5
Matrix-by-scalar identities
Further see Derivative of the exponential map.
[4] Here, 0 refers to a column vector of all 0s, of size n, where

n is the length of x.
[5] Here, 0 refers to a matrix of all 0s, of the same shape as
X.
[6] Petersen, Kaare Brandt and Michael Syskind Pedersen.
The Matrix Cookbook. November 14, 2008. http://
matrixcookbook.com. This book uses a mixed layout,
y
i.e. by Y in Y
, by X in X
.
x
[7] Duchi, John C. Properties of the Trace and Matrix
Derivatives (PDF). University of California at Berkeley.
Retrieved 19 July 2011.
[8] See Determinant#Derivative for the derivation.
6.6
6.6.1
Scalar-by-scalar identities
With vectors involved
[9] The constant a disappears in the result. This is intentional.

In general,
d ln au
1 d(au)
1 du
1 du
d ln u
=
=
a
=
=
.
dx
au dx
au dx
u dx
dx
External links
Linear Algebra: Determinants, Inverses, Rank appendix D from Introduction to Finite Element Methods book on University of Colorado at Boulder.
Uses the Hessian (transpose to Jacobian) denition
of vector and matrix derivatives.
Matrix Reference Manual, Mike Brookes, Imperial
College London.
The Matrix Cookbook (2006), with a derivatives
chapter. Uses the Hessian denition.
The Matrix Cookbook (2012), an updated version
of the Matrix Cookbook.
Linear Algebra and its Applications (author information page; see Chapter 9 of book), Peter Lax,
Courant Institute.
Matrix Dierentiation (and some other stu), Randal J. Barnes, Department of Civil Engineering,
University of Minnesota.
Notes on Matrix Calculus, Paul L. Fackler, North
Carolina State University.
Matrix Dierential Calculus (slide presentation),
Zhang Le, University of Edinburgh.
Introduction to Vector and Matrix Dierentiation
(notes on matrix dierentiation, in the context of
Econometrics), Heino Bohn Nielsen.
A note on dierentiating matrices (notes on matrix
dierentiation), Pawel Koval, from Munich Personal
RePEc Archive.
Vector/Matrix Calculus More notes on matrix differentiation.
Matrix Identities (notes on matrix dierentiation),
Sam Roweis. (needs user login)
EXTERNAL LINKS
10
10.1
Text and image sources, contributors, and licenses

Text
Matrix calculus Source: https://en.wikipedia.org/wiki/Matrix_calculus?oldid=675002213 Contributors: Michael Hardy, Fred Bauder,

AugPi, Charles Matthews, Jitse Niesen, DJ Clayworth, Bevo, Bloodshedder, Benwing, Altenmann, Giftlite, Lethe, Rich Farmbrough,
Gauge, Pan Chenguang, Pearle, CamCairns, Wtmitchell, RJFJR, Forderud, SDC, BD2412, Mathbot, Srleer, Kri, Giro720, NawlinWiki,
Arthur Rubin, SmackBot, Oli Filth, Esoth~enwiki, Ashigabou, Ctacmo, Hu12, Oussjarrouse, Dlohcierekim, CBM, Cydebot, Blaisorblade,
DumbBOT, Thijs!bot, DRHagen, Albmont, Steve98052, Thomasda, Epistemenical, Cuzkatzimhut, VolkovBot, JohnBlackburne, Anonymous Dissident, Geometry guy, Strife911, FLengyel, Melcombe, Dattorro, The Thing That Should Not Be, Jan mei118, Eroblar, Lgstarn,
Excirial, SchreiberBike, SorcererofDM, SyedAshrafulla, Addbot, Willking1979, Dimarudoy, AkhtaBot, MrOllie, Download, TStein, Ozob,
Bob K31416, Kirbin, Xiaodi.Hou, Yobot, AnomieBOT, Enisbayramoglu, Ahmadabdolkader, Xqbot, Thehelpfulbot, FrescoBot, ArloLora,
Cs32en, Sanchan89, Stpasha, Cooli46, Guohonghao, Freddy2222, EmausBot, John of Reading, Immunize, Yuzisee, Wham Bam Rock
II, Brent Perreault, Vgmddg, Aalopes, Surya Prakash.S.A., Atcold, Maschen, IznoRepeat, ClueBot NG, Hhchen1105, Alelbre, Ctsourak,
Roberticus, F=q(E+v^B), PeterShook, Brad7777, Morning Sunshine, Danielbaa, Gundamlh, Saung Tadashi, SBareSSomErMig, Lerutit,
SomeFreakOnTheInternet, Vanamonde93, CsDix, Cyrapas, Wangronin, Harikarsum, Dmaserang, ZikO24, Maltimore and Anonymous:
136
10.2
Images
File:Gnome-searchtool.svg Source: https://upload.wikimedia.org/wikipedia/commons/1/1e/Gnome-searchtool.svg License: LGPL Contributors: http://ftp.gnome.org/pub/GNOME/sources/gnome-themes-extras/0.9/gnome-themes-extras-0.9.0.tar.gz Original artist: David
Vignoni
File:Nuvola_apps_edu_mathematics_blue-p.svg Source: https://upload.wikimedia.org/wikipedia/commons/3/3e/Nuvola_apps_edu_
mathematics_blue-p.svg License: GPL Contributors: Derivative work from Image:Nuvola apps edu mathematics.png and Image:Nuvola
apps edu mathematics-p.svg Original artist: David Vignoni (original icon); Flamurai (SVG convertion); bayo (color)
10.3
Content license
Creative Commons Attribution-Share Alike 3.0

Matrix Calculus

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Matrix Calculus

Caricato da

Copyright:

Formati disponibili

Matrix calculus

In mathematics, matrix calculus is a specialized notation

where xi represents a unit vector in the xi direction for

Two competing notational conventions split the eld of

More complicated examples include the derivative of a

Matrix calculus refers to a number of dierent notations

Notice that we could also talk about the derivative of a

3 DERIVATIVES WITH VECTORS

Relation to other derivatives

layout conventions discusses this issue in greater detail. 3.3 Vector-by-vector

In vector calculus the derivative of a vector y with respect

There are two types of derivatives with matrices that can

Notice that the indexing of the gradient with respect to

Note that this denition encompasses all of the preceding

According to Jan R. Magnus and Heinz Neudecker, the

Also in analog with vector calculus, the directional

It is the gradient matrix, in particular, that nds many

Other matrix derivatives

Given , a dierentiable function of an n m

1. Numerator layout, i.e. lay out according to y and x

In the following formulas, we handle the ve possible

Keep in mind that various authors use dierent combiy

Not all math textbooks and papers are consistent in this

numerator layout will produce results that are transposed

Using numerator-layout notation, we have:[1]

To help make sense of all the identities below, keep in

6.1 Vector-by-vector identities

This is presented rst because all of the operations that

6.2 Scalar-by-vector identities

Using denominator-layout notation, we have:

The fundamental identities are placed above the thick

6.3 Vector-by-scalar identities

NOTE: The formulas involving the vector-by-vector

Identities in dierential form

Note that exact equivalents of the scalar product rule and

6.7 Identities in dierential form

To convert to normal derivative form, rst convert it to

d tr(AXBX C) = d tr(CAXBX ) = tr(d(CAXBX ))

Further see Derivative of the exponential map.

[4] Here, 0 refers to a column vector of all 0s, of size n, where

[9] The constant a disappears in the result. This is intentional.

Text and image sources, contributors, and licenses

Matrix calculus Source: https://en.wikipedia.org/wiki/Matrix_calculus?oldid=675002213 Contributors: Michael Hardy, Fred Bauder,

Creative Commons Attribution-Share Alike 3.0

Potrebbero piacerti anche