CUR分解
要理解CUR分解,需要先看下SVD分解。SVD理论以及Python实现
算法流程
给定输入的矩阵A。
A = C ∗ U ∗ R A = C* U *R A = C ∗ U ∗ R
- 随机选r个列构成C和r个行构成R(也可以使用,平方和加权过的行和列( 常用 ))
- 然后选取W矩阵(C和R的交集,也就是被选出来的部分,在C和R中同时出现的A矩阵中的位置。)
- 对W做SVD分解,得到 X ∑ Y T X\sum Y^T X ∑ Y T
- 对 ∑ \sum ∑ 做广义逆矩阵 ( ∑ ) + (\sum)^+ ( ∑ ) + ,也就是只有非0元的部分才变成原来的倒数。
- U = Y ∗ ( ∑ ) + ∗ X T U = Y*(\sum)^+* X^T U = Y ∗ ( ∑ ) + ∗ X T
Python实现
- 导入包
import
numpy
as
np
- 数据
A
=
np
.
linspace
(
0
,
14
,
15
)
.
reshape
(
(
3
,
-
1
)
)
- 算法
def
CUR
(
A
,
n
)
:
A_sq
=
A
**
2
sum_A_sq
=
np
.
sum
(
A_sq
)
sum_A_sq_0
=
np
.
sum
(
A_sq
,
axis
=
0
)
sum_A_sq_1
=
np
.
sum
(
A_sq
,
axis
=
1
)
P_x_c
=
sum_A_sq_0
/
sum_A_sq
P_x_r
=
sum_A_sq_1
/
sum_A_sq
r
,
c
=
A
.
shape
c_index
=
[
np
.
random
.
choice
(
np
.
arange
(
0
,
c
)
,
p
=
P_x_c
)
for
i
in
range
(
n
)
]
r_index
=
[
np
.
random
.
choice
(
np
.
arange
(
0
,
r
)
,
p
=
P_x_r
)
for
i
in
range
(
n
)
]
# print(c_index, r_index)
C
=
A
[
:
,
c_index
]
R
=
A
[
r_index
,
:
]
W
=
C
[
r_index
]
# print(C, R, W)
def
SVD
(
A
,
n
)
:
M
=
np
.
dot
(
A
,
A
.
T
)
eigval
,
eigvec
=
np
.
linalg
.
eig
(
M
)
indexes
=
np
.
argsort
(
-
eigval
)
[
:
n
]
U
=
eigvec
[
:
,
indexes
]
sigma_sq
=
eigval
[
indexes
]
M
=
np
.
dot
(
A
.
T
,
A
)
eigval
,
eigvec
=
np
.
linalg
.
eig
(
M
)
indexes
=
np
.
argsort
(
-
eigval
)
[
:
n
]
V
=
eigvec
[
:
,
indexes
]
sigma
=
sigma_sq
# not diag and not sqrt
return
U
,
sigma
,
V
X
,
sigma
,
Y
=
SVD
(
W
,
n
)
for
i
in
range
(
len
(
sigma
)
)
:
if
sigma
[
i
]
==
0
:
continue
else
:
sigma
[
i
]
=
1
/
sigma
[
i
]
sigma
=
np
.
diag
(
sigma
)
U
=
np
.
dot
(
np
.
dot
(
Y
,
sigma
)
,
X
.
T
)
return
np
.
dot
(
np
.
dot
(
C
,
U
)
,
R
)
- 调用
CUR
(
A
,
3
)