矩阵求导公式-laoliulaoliu-ChinaUnix博客

miraclemiracle.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

laoliulaoliu

博客访问： 4663621
博文数量： 1214
博客积分： 13195
博客等级：上将
技术积分： 9105
用户组：普通用户
注册时间： 2007-01-19 14:41

个人简介

C++,python,热爱算法和机器学习

文章分类

全部博文（1214）

cloud（3）
operation（9）
tornado（4）
mac_os（1）
golang（4）
架构（13）
git（4）
security（29）
shell（1）
macbook（1）
ruby（13）
javascript（15）
design（3）
testing（1）
mac（1）
bigdata（69）
nosql（46）
R（9）
gcj/acm（6）
NLP（10）
小说（3）
matlab（4）
web（44）
java（66）
product（7）
c#（1）
language（4）
machine learning（76）
science（4）
opencourse（2）
windows（3）
search（33）
algorithm（65）
database（51）
compiler（11）
ACE（5）
poem（1）
programming（29）
python（140）
assembly（1）
linux（49）
C++（16）
book（2）
cate（1）
phliosophy（3）
mental（30）
Science fiction（1）
Software（5）
c（23）
network（65）
CS（15）
thinking（10）
BSD（13）
solaris10（2）
life（57）
Debian（16）
economy（7）
Mathematics（57）
OS（8）
ibm（2）
gentoo（32）
未分配的博文（8）

文章存档

2021年（13）

2020年（49）

2019年（14）

2018年（27）

2017年（69）

2016年（100）

2015年（106）

2014年（240）

2013年（5）

2012年（193）

2011年（155）

2010年（93）

2009年（62）

2008年（51）

2007年（37）

我的朋友

最近访客

推荐博文

矩阵求导公式

分类： IT业界

2018-09-20 15:13:23

https://blog.csdn.net/pizi0475/article/details/46793947
今天推导公式，发现居然有对矩阵的求导，狂汗--完全不会。不过还好网上有人总结了。吼吼，赶紧搬过来收藏备份。

基本公式：
Y = A * X --> DY/DX = A'
Y = X * A --> DY/DX = A
Y = A' * X * B --> DY/DX = A * B'
Y = A' * X' * B --> DY/DX = B * A'

1. 矩阵Y对标量x求导：

相当于每个元素求导数后转置一下，注意M×N矩阵求导后变成N×M了

Y = [y(ij)] --> dY/dx = [dy(ji)/dx]

2. 标量y对列向量X求导：

注意与上面不同，这次括号内是求偏导，不转置，对N×1向量求导后还是N×1向量

y = f(x1,x2,..,xn) --> dy/dX = (Dy/Dx1,Dy/Dx2,..,Dy/Dxn)'

3. 行向量Y'对列向量X求导：

注意1×M向量对N×1向量求导后是N×M矩阵。

将Y的每一列对X求偏导，将各列构成一个矩阵。

重要结论：

dX'/dX = I

d(AX)'/dX = A'

4. 列向量Y对行向量X’求导：

转化为行向量Y’对列向量X的导数，然后转置。

注意M×1向量对1×N向量求导结果为M×N矩阵。

dY/dX' = (dY'/dX)'

5. 向量积对列向量X求导运算法则：

注意与标量求导有点不同。

d(UV')/dX = (dU/dX)V' + U(dV'/dX)

d(U'V)/dX = (dU'/dX)V + (dV'/dX)U'

重要结论：

d(X'A)/dX = (dX'/dX)A + (dA/dX)X' = IA + 0X' = A

d(AX)/dX' = (d(X'A')/dX)' = (A')' = A

d(X'AX)/dX = (dX'/dX)AX + (d(AX)'/dX)X = AX + A'X

6. 矩阵Y对列向量X求导：

将Y对X的每一个分量求偏导，构成一个超向量。

注意该向量的每一个元素都是一个矩阵。

7. 矩阵积对列向量求导法则：

d(uV)/dX = (du/dX)V + u(dV/dX)

d(UV)/dX = (dU/dX)V + U(dV/dX)

重要结论：

d(X'A)/dX = (dX'/dX)A + X'(dA/dX) = IA + X'0 = A

8. 标量y对矩阵X的导数：

类似标量y对列向量X的导数，

把y对每个X的元素求偏导，不用转置。

dy/dX = [ Dy/Dx(ij) ]

重要结论：

y = U'XV = ΣΣu(i)x(ij)v(j) 于是 dy/dX = [u(i)v(j)] = UV'

y = U'X'XU 则 dy/dX = 2XUU'

y = (XU-V)'(XU-V) 则 dy/dX = d(U'X'XU - 2V'XU + V'V)/dX = 2XUU' - 2VU' + 0 = 2(XU-V)U'

9. 矩阵Y对矩阵X的导数：

将Y的每个元素对X求导，然后排在一起形成超级矩阵。

10.乘积的导数

d(f*g)/dx=(df'/dx)g+(dg/dx)f'

结论

d(x'Ax)=(d(x'')/dx)Ax+(d(Ax)/dx)(x'')=Ax+A'x （注意：''是表示两次转置）

比较详细点的如下：

http://lzh21cen.blog.163.com/blog/static/145880136201051113615571/

http://hi.baidu.com/wangwen926/blog/item/eb189bf6b0fb702b720eec94.html

其他参考：

Notation
Derivatives of Linear Products
Derivatives of Quadratic Products

Notation

d/dx (y) is a vector whose (i) element is dy(i)/dx
d/dx (y) is a vector whose (i) element is dy/dx(i)
d/dx (y^T) is a matrix whose (i,j) element is dy(j)/dx(i)
d/dx (Y) is a matrix whose (i,j) element is dy(i,j)/dx
d/dX (y) is a matrix whose (i,j) element is dy/dx(i,j)

Note that the Hermitian transpose is not used because complex conjugates are not analytic.

In the expressions below matrices and vectors A, B, C do not depend on X.

Derivatives of Linear Products

d/dx (AYB) =A * d/dx (Y) * B
- d/dx (Ay) =A * d/dx (y)
d/dx (x^TA) =A
- d/dx (x^T) =I
- d/dx (x^Ta) = d/dx (a^Tx) = a
d/dX (a^TXb) = ab^T
- d/dX (a^TXa) = d/dX (a^TX^Ta) = aa^T
d/dX (a^TX^Tb) = ba^T
d/dx (YZ) =Y * d/dx (Z) + d/dx (Y) * Z

Derivatives of Quadratic Products

d/dx (Ax+b)^TC(Dx+e) = A^TC(Dx+e) + D^TC^T(Ax+b)
- d/dx (x^TCx) = (C+C^T)x
- - [C: symmetric]: d/dx (x^TCx) = 2Cx
  - d/dx (x^Tx) = 2x
- d/dx (Ax+b)^T (Dx+e) = A^T (Dx+e) + D^T (Ax+b)
- - d/dx (Ax+b)^T (Ax+b) = 2A^T (Ax+b)
- [C: symmetric]: d/dx (Ax+b)^TC(Ax+b) = 2A^TC(Ax+b)
d/dX (a^TX^TXb) = X(ab^T + ba^T)
- d/dX (a^TX^TXa) = 2Xaa^T
d/dX (a^TX^TCXb) = C^TXab^T + CXba^T
- d/dX (a^TX^TCXa) = (C + C^T)Xaa^T
- [C:Symmetric] d/dX (a^TX^TCXa) = 2CXaa^T
d/dX ((Xa+b)^TC(Xa+b)) = (C+C^T)(Xa+b)a^T

Derivatives of Cubic Products

d/dx (x^TAxx^T) = (A+A^T)xx^T+x^TAxI

Derivatives of Inverses

d/dx (Y^-1) = -Y^-1d/dx (Y)Y^-1

Derivative of Trace

Note: matrix dimensions must result in an n*n argument for tr().

d/dX (tr(X)) = I
d/dX (tr(X^k)) =k(X^k^-1)^T
d/dX (tr(AX^k)) = SUM_r=0:k-1(X^rAX^k-r^-1)^T
d/dX (tr(AX^-1B)) = -(X^-1BAX^-1)^T
- d/dX (tr(AX^-1)) =d/dX (tr(X^-1A)) = -X^-TA^TX^-T
d/dX (tr(A^TXB^T)) = d/dX (tr(BX^TA)) = AB
- d/dX (tr(XA^T)) = d/dX (tr(A^TX)) =d/dX (tr(X^TA)) = d/dX (tr(AX^T)) = A
d/dX (tr(AXBX^T)) = A^TXB^T + AXB
- d/dX (tr(XAX^T)) = X(A+A^T)
- d/dX (tr(X^TAX)) = X^T(A+A^T)
- d/dX (tr(AX^TX)) = (A+A^T)X
d/dX (tr(AXBX)) = A^TX^TB^T + B^TX^TA^T
[C:symmetric] d/dX (tr((X^TCX)^-1A) = d/dX (tr(A (X^TCX)^-1) = -(CX(X^TCX)^-1)(A+A^T)(X^TCX)^-1
[B,C:symmetric] d/dX (tr((X^TCX)^-1(X^TBX)) = d/dX (tr( (X^TBX)(X^TCX)^-1) = -2(CX(X^TCX)^-1)X^TBX(X^TCX)^-1 + 2BX(X^TCX)^-1

Derivative of Determinant

Note: matrix dimensions must result in an n*n argument for det().

d/dX (det(X)) = d/dX (det(X^T)) = det(X)*X^-T
- d/dX (det(AXB)) = det(AXB)*X^-T
- d/dX (ln(det(AXB))) = X^-T
d/dX (det(X^k)) = k*det(X^k)*X^-T
- d/dX (ln(det(X^k))) = kX^-T
[Real] d/dX (det(X^TCX)) = det(X^TCX)*(C+C^T)X(X^TCX)^-1
- [C: Real,Symmetric] d/dX (det(X^TCX)) = 2det(X^TCX)* CX(X^TCX)^-1
[C: Real,Symmetricc] d/dX (ln(det(X^TCX))) = 2CX(X^TCX)^-1

Jacobian

If y is a function of x, then dy^T/dx is the Jacobian matrix of y with respect to x.

Its determinant, |dy^T/dx|, is the Jacobian of y with respect to x and represents the ratio of the hyper-volumes dy and dx. The Jacobian occurs when changing variables in an integration: Integral(f(y)dy)=Integral(f(y(x)) |dy^T/dx| dx).

Hessian matrix

If f is a function of x then the symmetric matrix d²f/dx² = d/dx^T(df/dx) is the Hessian matrix of f(x). A value of x for which df/dx = 0 corresponds to a minimum, maximum or saddle point according to whether the Hessian is positive definite, negative definite or indefinite.

d²/dx² (a^Tx) = 0
d²/dx² (Ax+b)^TC(Dx+e) = A^TCD + D^TC^TA
- d²/dx² (x^TCx) = C+C^T
- - d²/dx² (x^Tx) = 2I
- d²/dx² (Ax+b)^T (Dx+e) = A^TD + D^TA
- - d²/dx² (Ax+b)^T (Ax+b) = 2A^TA
- [C: symmetric]: d²/dx² (Ax+b)^TC(Ax+b) = 2A^TCA