GCRN代码解读

Intro

若发现存在错误,欢迎指正。

GCN,是基于Spectral Graph Theory所研究出来的一种方法,它主要的好处是利用了SGT里面一些已有的结论和方法,来得到图的性质。GCRN是一个将GCN和RNN结合起来使用的模型,能处理具有空间和时序的数据。

源代码的目录结构:

1
2
3
4
5
6
7
8
9
10
11
gconvRNN
- datasets
- ptb.char.test.txt
- ptb.char.train.txt
- ptb.char.valid.txt
- gcrn_main.py # 整个程序的入口
- config.py # 用于配置超参数
- graph.py # 与图相关的操作,比如laplacian矩阵
- model.py # 模型的定义
- trainer.py # 定义了训练过程
- utils.py # 数据预处理、工具

在源码中,GCRN用于预测单词字符序列

本文从三个方面讲解GCRN源码的处理思路

  • 数据预处理
  • GCRN实现思路
  • 开始训练
  • 代码附录

数据预处理:

train\valid\test 数据集的格式都是一样的:

a e r b a n k n o t e b e r l i t z c a l l o w a y c e n t r u s t c l u e t t f r o m s t e i n g i t a n o g u t e r m a n h y d r o - q u e b e c i p o k i a m e m o t e c m l x n a h b p u n t s r a k e r e g a t t a r u b e n s s i m s n a c k - f o o d s s a n g y o n g s w a p o w a c h t e r
p i e r r e
< u n k > N y e a r s o l d w i l l j o i n t h e b o a r d a s a n o n e x e c u t i v e d i r e c t o r n o v . N
m r .
< u n k > i s c h a i r m a n o f < u n k > n . v . t h e d u t c h p u b l i s h i n g _ g r o u p

  • UNK - “unknown token” - is used to replace the rare words that did not fit in your vocabulary. So your sentence My name is guotong1998 will be translated into My name is _unk_

1. 将句子按字典映射成数字序列

1
2
3
for every sentences:
在句子后面加“|”(人为地添加句子的结束标识符)
将句子里面的*每个字符*映射成该字符在字典中*对应的数字*

例如:

p i e r r e < u n k > N y e a r s o l d w i l l j o i n t h e b o a r d a s a n o n e x e c u t i v e d i r e c t o r n o v . N |

[24, 10, 1, 2, 2, 1, 3, 27, 15, 5, 6, 28, 3, 29, 3, 14, 1, 0, 2, 16, 3, 7, 9, 21, 3, 13, 10, 9, 9, 3, 30, 7, 10, 5, 3, 8, 20, 1, 3, 4, 7, 0, 2, 21, 3, 0, 16, 3, 0, 3, 5, 7, 5, 1, 25, 1, 12, 15, 8, 10, 31, 1, 3, 21, 10, 2, 1, 12, 8, 7, 2, 3, 5, 7, 31, 32, 3, 29, 26]

2. 构造邻接矩阵

得到邻接矩阵的值:

GCRN实现思路:

GCRN = GCN + RNN。

公式

但是 代码中的实现方式稍有不同

GCN

每次得到的 $T_k(x)$ 都与x做concat,最后得到的x与W相乘

LSTM

开始训练

数据预处理 -> model定义 -> train

输出ouput的shape为(batch_size, num_node, num_unit) —— (20, 50, 50)。由于设置了有50个LSTM units,所以这里需要使所有units的输出做线性变换,使其变为一个值:

  • prediction = output * W -b,这里的prediction的shape为(20, 50, 1)

那么,所有时刻的输出的shape为(50, 20, 50, 1)

代码

GCN实现方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
def cheby_conv(x, L, lmax, feat_out, K, W):
'''
x : [batch_size, N_node, feat_in] - input of each time step
nSample : number of samples = batch_size
nNode : number of node in graph
feat_in : number of input feature
feat_out : number of output feature
L : laplacian
lmax : ?
K : size of kernel(number of cheby coefficients)
W : cheby_conv weight [K * feat_in, feat_out]
'''
nSample, nNode, feat_in = x.get_shape()
nSample, nNode, feat_in = int(nSample), int(nNode), int(feat_in)
L = graph.rescale_L(L, lmax) #What is this operation?? --> rescale Laplacian
L = L.tocoo()

indices = np.column_stack((L.row, L.col))
L = tf.SparseTensor(indices, L.data, L.shape)
L = tf.sparse_reorder(L)

x0 = tf.transpose(x, perm=[1, 2, 0]) #change it to [nNode, feat_in, nSample]
x0 = tf.reshape(x0, [nNode, feat_in*nSample])
x = tf.expand_dims(x0, 0) # make it [1, nNode, feat_in*nSample]

def concat(x, x_):
x_ = tf.expand_dims(x_, 0)
return tf.concat([x, x_], axis=0)

if K > 1:
x1 = tf.sparse_tensor_dense_matmul(L, x0)
x = concat(x, x1)

for k in range(2, K):
x2 = 2 * tf.sparse_tensor_dense_matmul(L, x1) - x0
x = concat(x, x2)
x0, x1 = x1, x2

x = tf.reshape(x, [K, nNode, feat_in, nSample])
x = tf.transpose(x, perm=[3,1,2,0])
x = tf.reshape(x, [nSample*nNode, feat_in*K])

x = tf.matmul(x, W) #No Bias term?? -> Yes
out = tf.reshape(x, [nSample, nNode, feat_out])
return out

LSTM实现方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
def __call__(self, inputs, state, scope=None):
with tf.variable_scope(scope or type(self).__name__):
if self._state_is_tuple:
c, h = state
else:
c, h = tf.split(value=state, num_or_size_splits=2, axis=1)
laplacian = self._laplacian
lmax = self._lmax
K = self._K
feat_in = self._feat_in

#The inputs : [batch_size, nNode, feat_in, nTime?] size tensor
if feat_in is None:
#Take out the shape of input
batch_size, nNode, feat_in = inputs.get_shape()
print("hey!")

feat_out = self._num_units

if K is None:
K = 2

scope = tf.get_variable_scope()
with tf.variable_scope(scope) as scope:
try:
#Need four diff Wconv weight + for Hidden weight
Wzxt = tf.get_variable("Wzxt", [K*feat_in, feat_out], dtype=tf.float32,
initializer=tf.random_uniform_initializer(minval=-0.1, maxval=0.1))
Wixt = tf.get_variable("Wixt", [K*feat_in, feat_out], dtype=tf.float32,
initializer=tf.random_uniform_initializer(minval=-0.1, maxval=0.1))
Wfxt = tf.get_variable("Wfxt", [K*feat_in, feat_out], dtype=tf.float32,
initializer=tf.random_uniform_initializer(minval=-0.1, maxval=0.1))
Woxt = tf.get_variable("Woxt", [K*feat_in, feat_out], dtype=tf.float32,
initializer=tf.random_uniform_initializer(minval=-0.1, maxval=0.1))

Wzht = tf.get_variable("Wzht", [K*feat_out, feat_out], dtype=tf.float32,
initializer=tf.random_uniform_initializer(minval=-0.1, maxval=0.1))
Wiht = tf.get_variable("Wiht", [K*feat_out, feat_out], dtype=tf.float32,
initializer=tf.random_uniform_initializer(minval=-0.1, maxval=0.1))
Wfht = tf.get_variable("Wfht", [K*feat_out, feat_out], dtype=tf.float32,
initializer=tf.random_uniform_initializer(minval=-0.1, maxval=0.1))
Woht = tf.get_variable("Woht", [K*feat_out, feat_out], dtype=tf.float32,
initializer=tf.random_uniform_initializer(minval=-0.1, maxval=0.1))
except ValueError:
scope.reuse_variables()
Wzxt = tf.get_variable("Wzxt", [K*feat_in, feat_out], dtype=tf.float32,
initializer=tf.random_uniform_initializer(minval=-0.1, maxval=0.1))
Wixt = tf.get_variable("Wixt", [K*feat_in, feat_out], dtype=tf.float32,
initializer=tf.random_uniform_initializer(minval=-0.1, maxval=0.1))
Wfxt = tf.get_variable("Wfxt", [K*feat_in, feat_out], dtype=tf.float32,
initializer=tf.random_uniform_initializer(minval=-0.1, maxval=0.1))
Woxt = tf.get_variable("Woxt", [K*feat_in, feat_out], dtype=tf.float32,
initializer=tf.random_uniform_initializer(minval=-0.1, maxval=0.1))

Wzht = tf.get_variable("Wzht", [K*feat_out, feat_out], dtype=tf.float32,
initializer=tf.random_uniform_initializer(minval=-0.1, maxval=0.1))
Wiht = tf.get_variable("Wiht", [K*feat_out, feat_out], dtype=tf.float32,
initializer=tf.random_uniform_initializer(minval=-0.1, maxval=0.1))
Wfht = tf.get_variable("Wfht", [K*feat_out, feat_out], dtype=tf.float32,
initializer=tf.random_uniform_initializer(minval=-0.1, maxval=0.1))
Woht = tf.get_variable("Woht", [K*feat_out, feat_out], dtype=tf.float32,
initializer=tf.random_uniform_initializer(minval=-0.1, maxval=0.1))


bzt = tf.get_variable("bzt", [feat_out])
bit = tf.get_variable("bit", [feat_out])
bft = tf.get_variable("bft", [feat_out])
bot = tf.get_variable("bot", [feat_out])

# gconv Calculation
zxt = cheby_conv(inputs, laplacian, lmax, feat_out, K, Wzxt)
zht = cheby_conv(h, laplacian, lmax, feat_out, K, Wzht)
zt = zxt + zht + bzt
zt = tf.tanh(zt)

ixt = cheby_conv(inputs, laplacian, lmax, feat_out, K, Wixt)
iht = cheby_conv(h, laplacian, lmax, feat_out, K, Wiht)
it = ixt + iht + bit
it = tf.sigmoid(it)

fxt = cheby_conv(inputs, laplacian, lmax, feat_out, K, Wfxt)
fht = cheby_conv(h, laplacian, lmax, feat_out, K, Wfht)
ft = fxt + fht + bft
ft = tf.sigmoid(ft)

oxt = cheby_conv(inputs, laplacian, lmax, feat_out, K, Woxt)
oht = cheby_conv(h, laplacian, lmax, feat_out, K, Woht)
ot = oxt + oht + bot
ot = tf.sigmoid(ot)

# c
new_c = ft*c + it*zt

# h
new_h = ot*tf.tanh(new_c)

if self._state_is_tuple:
new_state = LSTMStateTuple(new_c, new_h)
else:
new_state = tf.concat([new_c, new_h], 1)
return new_h, new_state

其他变量

变量 shape meaning
rnn_input (20, 50, 1, 50) (batch_size, num_node, feat_in, num_time_steps)
rnn_input_seq (20, 50, 1) * 50 (batch_size, num_node, feat_in) * num_time_steps
rnn_output (20, 50) (batch_size, num_time_steps)
rnn_output_seq (20) * 50 (batch_size) * num_time_steps
num_hidden 50 隐藏层单元
x_batches (5017, 20, 50) [-1, batch_size, seq_length]
y_batches (5017, 20, 50) [-1, batch_size, seq_length]
outputs (50, 20, 50, 50) (50个时刻50个输出, batch_size, num_node, num_unit)
output (20, 50, 50) outputs的单个时刻输出(batch_size, num_node, num_unit)

Laplacian matrix

Properties

对一个无向图$G$和他的laplacian matrix $L$,有特征值$\lambda_0 \leq \lambda_1 \leq \lambda_2 …$:

  • L是对称的
  • L是半正定的(即所有$\lambda_i > 0$),这可以在关联矩阵部分验证,也同样可以从Laplacian是对称并且对角占优(diagonally dominant)得出
  • L是M-matrix(它的非对角线上的项是负的,但它的特征值的实部为负)
  • 行或列相加结果为0
  • L的最小非零特征值称为谱间隙(spectral gap)
  • 图中连通分量的个数是拉普拉斯算子的零空间维数和0特征值的代数多重性
  • 拉普拉斯矩阵是奇异的

问题

  1. 代码实现里面的公式跟论文是否真的不一样
  2. x2 = 2 * tf.sparse_tensor_dense_matmul(L, x1) - x0 起到一个什么作用

后记

补充一下,后来跟实验室的小伙伴谈论过后对GCN有了比较深刻的认识。上面问题中的第二点,其实跟GCN的K的大小有关。K=1,表示包含了该节点一条的信息。至于为什么会这样,这是因为在计算过程中与由adj mnatrix得来的Laplacian相乘,每乘一次,就会多包含一跳的节点信息。