Motivation
Many graph analytics problems can be solved via iterative algorithms according to the graph structure, and the solutions of the algorithms are often characterized by a set of steady-state conditions.
PageRank: score of a node
Mean field inference: the posterior distribution of a variable
Instead of designing algorithms for each individual graph problem, we take a different perspective:
Can we design a learning framework for a diverse range of graph problems that learns the algorithm over large graphs achieving the steady-state solutions efficiently and effectively ?
How to represent the meta learner for such algorithm and how to carry out the learning of these algorithms?
Iterative algorithm over graphs
For a graph: G = ( V , E ) \mathcal{G}=(\mathcal{V}, \mathcal{E}) G = ( V , E ) , with node set V \mathcal{V} V and edge set E \mathcal{E} E , many iterative algorithms over graphs can be formulated into:
h v ( t + 1 ) ← T ( { h u ( t ) } u ∈ N ( v ) ) , ∀ t ⩾ 1 , and h v ( 0 ) ← constant , ∀ v ∈ V ( 1 ) \begin{array}{l}
h_{v}^{(t+1)} \leftarrow \mathcal{T}\left(\left\{h_{u}^{(t)}\right\}_{u \in \mathcal{N}(v)}\right), \forall t \geqslant 1, \text { and } \\
h_{v}^{(0)} \leftarrow \text { constant }, \forall v \in \mathcal{V}
\end{array} \space\space\space\space\space\space (1) h v ( t + 1 ) ← T ( { h u ( t ) } u ∈ N ( v ) ) , ∀ t ⩾ 1 , and h v ( 0 ) ← constant , ∀ v ∈ V ( 1 ) until the steady-state conditions are met:
h v ∗ = T ( { h u ∗ } u ∈ N ( v ) ) , ∀ v ∈ V ( 2 ) h_{v}^{*}=\mathcal{T}\left(\left\{h_{u}^{*}\right\}_{u \in \mathcal{N}(v)}\right), \forall v \in \mathcal{V} \space\space\space\space\space\space (2) h v ∗ = T ( { h u ∗ } u ∈ N ( v ) ) , ∀ v ∈ V ( 2 ) n o d e : v , u ∈ V node: v,u\in \mathcal{V} n o d e : v , u ∈ V
the set of neighbor nodes of v : N ( v ) \text{the set of neighbor nodes of } v: \mathcal{N}(v) the set of neighbor nodes of v : N ( v )
some operator : T ( ⋅ ) \text{some operator }: \mathcal{T}(\cdot) some operator : T ( ⋅ )
Intermediate representation of node v : h v \text{Intermediate representation of node }v: h_v Intermediate representation of node v : h v
Intermediate representation of node v at step t : h v ( t ) \text{Intermediate representation of node }v \text{ at step } t: h_v^{(t)} Intermediate representation of node v at step t : h v ( t )
Final (converged/steady) intermediate representation of node v : h v ∗ \text{Final (converged/steady) intermediate representation of node }v : h_v^{*} Final (converged/steady) intermediate representation of node v : h v ∗
Specifically:
1) Graph component detection problem
Goal : find all nodes within the same connected component as source node s ∈ V s\in \mathcal{V} s ∈ V
How : iteratively propagate the label at node s s s to other nodes:
y v ( t + 1 ) = max u ∈ N ( v ) y u ( t ) , y s ( 0 ) = 1 , y v ( 0 ) = 0 , ∀ v ∈ V y_{v}^{(t+1)}=\max _{u \in \mathcal{N}(v)} y_{u}^{(t)}, y_{s}^{(0)}=1, y_{v}^{(0)}=0, \forall v \in \mathcal{V} y v ( t + 1 ) = u ∈ N ( v ) max y u ( t ) , y s ( 0 ) = 1 , y v ( 0 ) = 0 , ∀ v ∈ V y s : y_s: y s : label of node s s s
y s ( t ) : y_s^{(t)}: y s ( t ) : label of node s s s at step t t t
at initial step t = 0 t=0 t = 0 , the label at node s s s are set to 1 (infected), 0 for all other nodes.
Steady state : nodes in the same connected component as s s s are infected, (labelled as 1)
y v ∗ = max u ∈ N ( v ) y u ∗ y_v^{*}= \max_{u\in \mathcal{N}(v)}y_u^{*} y v ∗ = u ∈ N ( v ) max y u ∗ Goal : estimate the importance of each node in a graph
How : update score of each node iteratively
r v ( t + 1 ) ← ( 1 − λ ) ∣ V ∣ + λ ∣ N ( v ) ∣ ∑ u ∈ N ( v ) r u ( t ) , ∀ v ∈ V r_{v}^{(t+1)} \leftarrow \frac{(1-\lambda)}{|\mathcal{V}|}+\frac{\lambda}{|\mathcal{N}(v)|} \sum_{u \in \mathcal{N}(v)} r_{u}^{(t)}, \forall v \in \mathcal{V} r v ( t + 1 ) ← ∣ V ∣ ( 1 − λ ) + ∣ N ( v ) ∣ λ u ∈ N ( v ) ∑ r u ( t ) , ∀ v ∈ V r v ( t ) r_v^{(t)} r v ( t ) : score of node v v v at step t t t
Initialization: r v ( 0 ) = 0 , ∀ v ∈ V r_v^{(0)} = 0, \forall v \in \mathcal{V} r v ( 0 ) = 0 , ∀ v ∈ V
Steady state : r v ∗ = ( 1 − λ ) ∣ V ∣ + λ ∣ N ( v ) ∣ ∑ u ∈ N ( v ) r u ∗ r_{v}^{*} = \frac{(1-\lambda)}{|\mathcal{V}|}+\frac{\lambda}{|\mathcal{N}(v)|} \sum_{u \in \mathcal{N}(v)} r_{u}^{*} r v ∗ = ∣ V ∣ ( 1 − λ ) + ∣ N ( v ) ∣ λ ∑ u ∈ N ( v ) r u ∗
3) Mean field inference in graphical model
Goal : approximate the marginal distributions of a set of variables x v x_v x v in a graph model defined on G \mathcal{G} G
In graphical model, we know:
p ( { x v } v ∈ V ) ∝ ∏ v ∈ V ϕ ( x v ) ∏ ( u , v ) ∈ E ϕ ( x u , x v ) p\left(\left\{x_{v}\right\}_{v \in \mathcal{V}}\right) \propto \prod_{v \in \mathcal{V}} \phi\left(x_{v}\right) \prod_{(u, v) \in \mathcal{E}} \phi\left(x_{u}, x_{v}\right) p ( { x v } v ∈ V ) ∝ v ∈ V ∏ ϕ ( x v ) ( u , v ) ∈ E ∏ ϕ ( x u , x v ) p ( { x v } v ∈ V ) p\left(\left\{x_{v}\right\}_{v \in \mathcal{V}}\right) p ( { x v } v ∈ V ) : true marginal distribution
ϕ ( x v ) and ϕ ( x u , x v ) \phi(x_v) \text{ and } \phi(x_u,x_v) ϕ ( x v ) and ϕ ( x u , x v ) are node and edge potential respectively
How : The marginal approximation can be obtained in an iterative fashion by the following mean field update:
q ( t + 1 ) ( x v ) ← ϕ ( x v ) ∏ u ∈ N ( v ) exp ( ∫ u q ( t ) ( x u ) log ϕ ( x u , x v ) d u ) \begin{aligned}
q^{(t+1)}\left(x_{v}\right) \leftarrow \phi\left(x_{v}\right) \prod_{u \in \mathcal{N}(v)}
& \exp \left(\int_{u}q^{(t)}\left(x_{u}\right) \log \phi\left(x_{u}, x_{v}\right) \mathrm{d} u\right)
\end{aligned} q ( t + 1 ) ( x v ) ← ϕ ( x v ) u ∈ N ( v ) ∏ exp ( ∫ u q ( t ) ( x u ) log ϕ ( x u , x v ) d u ) q ( t + 1 ) ( x v ) q^{(t+1)}\left(x_{v}\right) q ( t + 1 ) ( x v ) : marginal approximation of a set of variables x v x_v x v at step t + 1 t+1 t + 1
Steady State : q ∗ ( x v ) = ϕ ( x v ) ∏ u ∈ N ( v ) exp ( ∫ u q ∗ ( x u ) log ϕ ( x u , x v ) d u ) \begin{aligned} q^{*}\left(x_{v}\right) = \phi\left(x_{v}\right) \prod_{u \in \mathcal{N}(v)} & \exp \left(\int_{u}q^{*}\left(x_{u}\right) \log \phi\left(x_{u}, x_{v}\right) \mathrm{d} u\right) \end{aligned} q ∗ ( x v ) = ϕ ( x v ) u ∈ N ( v ) ∏ exp ( ∫ u q ∗ ( x u ) log ϕ ( x u , x v ) d u )
4) Compute long range graph convolution features (node classification)
Goal: extract long range features from graph and use that features to figure to capture the relation between graph topology and external labels
How: One possible parametrization of graph convolution features h v h_v h v can be updated from zeros initialization as:
h v ( t + 1 ) ← σ ( W 1 x v + W 2 ∑ u ∈ N ( v ) h u ( t ) ) h_{v}^{(t+1)} \leftarrow \sigma\left(W_{1} x_{v}+W_{2} \sum_{u \in \mathcal{N}(v)} h_{u}^{(t)}\right) h v ( t + 1 ) ← σ W 1 x v + W 2 u ∈ N ( v ) ∑ h u ( t ) h v ( t + 1 ) : graph convolution features for node v at step t + 1 h_{v}^{(t+1)} : \text{graph convolution features for node } v \text{ at step } t+1 h v ( t + 1 ) : graph convolution features for node v at step t + 1
σ : a nonlinear element-wise operation \sigma : \text{a nonlinear element-wise operation} σ : a nonlinear element-wise operation
W 1 , W 2 : parameters of the operator W_1,W_2: \text{parameters of the operator} W 1 , W 2 : parameters of the operator
Steady State: h v ∗ ← σ ( W 1 x v + W 2 ∑ u ∈ N ( v ) h u ∗ ) h_{v}^{*} \leftarrow \sigma\left(W_{1} x_{v}+W_{2} \sum_{u \in \mathcal{N}(v)} h_{u}^{*}\right) h v ∗ ← σ ( W 1 x v + W 2 ∑ u ∈ N ( v ) h u ∗ )
After that: the label for each node will be determined by the steady state feature h v ∗ h_v^* h v ∗ by a labeling function: f ( h v ∗ ) f(h_v^*) f ( h v ∗ )
The Algorithm Learning Problem: framework of algorithm design
Assumption : we have collected the output of an iterative algorithm T \mathcal{T} T over a single large graph.
Training dataset (input of the proposed algorithm) consists of:
input graph G = ( V , E ) \mathcal{G}=(\mathcal{V},\mathcal{E}) G = ( V , E )
output of the iterative algorithm for a subset of nodes, V ( y ) ⊆ V \mathcal{V}^{(y)} \subseteq\mathcal{V} V ( y ) ⊆ V , (Note: labeled nodes ) from the graph
D = { f v ∗ : = f ( h v ∗ ) ∣ h v ∗ = T [ { h u ∗ } u ∈ N ( v ) ] , v ∈ V ( y ) } ( 3 ) \mathcal{D}=\left\{f_v^{*}:=f(h_v^{*})|h_v^{*}=\mathcal{T[\{h_u^{*}\}_{u\in \mathcal{N}(v)}], v\in \mathcal{V}}^{(y)}\right\} \space\space\space\space\space (3) D = { f v ∗ := f ( h v ∗ ) ∣ h v ∗ = T [{ h u ∗ } u ∈ N ( v ) ] , v ∈ V ( y ) } ( 3 )
h v ∗ h_v^{*} h v ∗ : is the quantity in the iterative algorithm satisfying the steady-state conditions
f ( ⋅ ) f(\cdot) f ( ⋅ ) : an additional labeling function taking input the steady-state quantity, produces the final label for each node
f v ∗ : f_v^{*}: f v ∗ : ground truth of node v v v
Given the above D \mathcal{D} D ,
Goal: to learn a parameterized algorithm A Θ \mathcal{A}_{\Theta} A Θ , such that the output of the algorithm A Θ \mathcal{A}_{\Theta} A Θ can mimic the output of the original algorithm T \mathcal{T} T .
Namely:
The output of A Θ \mathcal{A}_{\Theta} A Θ is: A Θ [ G ] = { f v ^ } v ∈ V ( y ) \mathcal{A}_{\Theta}[\mathcal{G}]=\{\hat{f_v}\}_{v\in \mathcal{V}^{(y)}} A Θ [ G ] = { f v ^ } v ∈ V ( y ) , which are close to f v ∗ f_v^{*} f v ∗ according to some loss function.
The algorithm learning problem for A Θ \mathcal{A}_{\Theta} A Θ can be formulated into the following optimization problem:
min Θ ∑ v ∈ V ( y ) ℓ ( f v ∗ , f ^ v ) ( 4 ) s.t. { f ^ v } v ∈ V ( y ) = A Θ [ G ] ( 5 )
\min_{\Theta} \sum_{v \in \mathcal{V}^{(y)}} \ell\left(f_{v}^{*}, \widehat{f}_{v}\right) \space\space\space\space\space\space\space (4)\\
\text { s.t. }\left\{\widehat{f}_{v}\right\}_{v \in \mathcal{V}^{(y)}}=\mathcal{A}_{\Theta}[\mathcal{G}] \space\space\space\space\space\space\space (5)
Θ min v ∈ V ( y ) ∑ ℓ ( f v ∗ , f v ) ( 4 ) s.t. { f v } v ∈ V ( y ) = A Θ [ G ] ( 5 ) ℓ ( f v ∗ , f ^ v ) \ell\left(f_{v}^{*}, \widehat{f}_{v}\right) ℓ ( f v ∗ , f v ) : loss function
Design goal : respect steady-state conditions and learn fast .
Core :
a steady-state operator T Θ \mathcal{T}_{\Theta} T Θ between vector embedding representations of nodes
a link function g g g mapping the embedding to the algorithm output.
Note : namely A Θ : T Θ and g \mathcal{A}_{\Theta}: \mathcal{T}_{\Theta} \text{ and } g A Θ : T Θ and g
Stead-state operator and link function
output : { f ^ v : = g ( h ^ v ) } v ∈ V ( 6 ) s.t. h ^ v = T Θ [ { h ^ u } u ∈ N ( v ) ] ( 7 ) \begin{aligned}
\text { output }: &\left\{\widehat{f}_{v}:=g(\hat{h}_{v})\right\}_{v \in \mathcal{V}} \space\space\space\space\space\space\space\space\space (6)\\
\text { s.t. } & \widehat{h}_{v}=\mathcal{T}_{\Theta}\left[\left\{\widehat{h}_{u}\right\}_{u \in \mathcal{N}(v)}\right] \space\space\space\space\space\space\space\space\space (7)
\end{aligned} output : s.t. { f v := g ( h ^ v ) } v ∈ V ( 6 ) h v = T Θ [ { h u } u ∈ N ( v ) ] ( 7 ) initialization: h ^ v ← constant for all v ∈ V \widehat{h}_v \leftarrow \text{constant for all } v\in \mathcal{V} h v ← constant for all v ∈ V
update using equation (7)
Operator T Θ \mathcal{T}_{\Theta} T Θ : a two-layer NN
General nolinear function class
The operator enforces the steady-state condition of node embeddings based on 1-hop
local neighborhood information.
Due to the variety of graph structures, this function should be able to handle different
number of inputs (i.e., different number of neighbor nodes)
h ^ v = T Θ [ { h ^ u } u ∈ N ( v ) ] = W 1 σ ( W 2 [ x v , ∑ u ∈ N ( v ) [ h ^ u , x u ] ] ) ( 9 ) \widehat{h}_v = \mathcal{T}_{\Theta}\left[\left\{\widehat{h}_{u}\right\}_{u \in \mathcal{N}(v)}\right]=W_{1} \sigma\left(W_{2}\left[x_{v}, \sum_{u \in \mathcal{N}(v)}\left[\widehat{h}_{u}, x_{u}\right]\right]\right) \space\space\space\space\space\space\space\space (9) h v = T Θ [ { h u } u ∈ N ( v ) ] = W 1 σ W 2 x v , u ∈ N ( v ) ∑ [ h u , x u ] ( 9 ) σ ( ⋅ ) \sigma(\cdot) σ ( ⋅ ) : element-wise activation function: Sigmoid, ReLU
W 1 , W 2 : weight matrices of NN . W_1, W_2 : \text{weight matrices of NN} . W 1 , W 2 : weight matrices of NN . W 1 : first layer, W 2 : 2nd layer W_1: \text{first layer, } W_2: \text{2nd layer} W 1 : first layer, W 2 : 2nd layer
x v x_v x v : the optional feature representation of nodes
Link function (prediction function) g g g : a two-layer NN
General nolinear function class
predict: the corresponding algorithm outputs (like label of node)
g ( h ^ v ) = σ ( V 2 ⊤ ReLU ( V 1 ⊤ h ^ v ) ) ( 10 ) g\left(\widehat{h}_{v}\right)=\sigma\left(V_{2}^{\top} \operatorname{ReLU}\left(V_{1}^{\top} \widehat{h}_{v}\right)\right) \space\space\space\space\space\space (10) g ( h v ) = σ ( V 2 ⊤ ReLU ( V 1 ⊤ h v ) ) ( 10 ) h ^ v \widehat{h}_{v} h v : node embeddings
V 1 , V 2 : parameters of g . V_1, V_2 : \text{parameters of } g. V 1 , V 2 : parameters of g . V 1 : first layer, V 2 : 2nd layer V_1: \text{first layer, } V_2: \text{2nd layer} V 1 : first layer, V 2 : 2nd layer
σ : task-specific activation function \sigma:\text{task-specific activation function} σ : task-specific activation function
linear regression: identity function σ ( x ) = x \sigma(x)=x σ ( x ) = x
multi-class classification: σ ( ⋅ ) \sigma(\cdot) σ ( ⋅ ) is softmax (output a probabilistic simplex)
The overall optimization problem
min { W i , V i } i = 1 2 L ( { W i , V i } i = 1 2 ) : = 1 ∣ V y ∣ ∑ v ∈ V ( y ) ℓ ( f v ∗ , g ( h ^ v ) ) s.t. h ^ v = T Θ [ { h ^ u } u ∈ N ( v ) ] , ∀ v ∈ V ( 11 ) \begin{array}{c}
\min _{\left\{W_{i}, V_{i}\right\}_{i=1}^{2}} \mathcal{L}\left(\left\{W_{i}, V_{i}\right\}_{i=1}^{2}\right):=\frac{1}{\left|\mathcal{V}^{y}\right|} \sum_{v \in \mathcal{V}^{(y)}} \ell(f_{v}^{*}, g(\hat{h}_{v})) \\
\text { s.t. } \widehat{h}_{v}=\mathcal{T}_{\Theta}\left[\left\{\widehat{h}_{u}\right\}_{u \in \mathcal{N}(v)}\right], \forall v \in \mathcal{V}
\end{array} \space\space\space\space\space\space (11) min { W i , V i } i = 1 2 L ( { W i , V i } i = 1 2 ) := ∣ V y ∣ 1 ∑ v ∈ V ( y ) ℓ ( f v ∗ , g ( h ^ v )) s.t. h v = T Θ [ { h u } u ∈ N ( v ) ] , ∀ v ∈ V ( 11 ) W 1 , W 2 : parameters of T Θ W_1,W_2: \text{parameters of } \mathcal{T}_{\Theta} W 1 , W 2 : parameters of T Θ
V 1 , V 2 : parameters of g V_1,V_2: \text{parameters of } g V 1 , V 2 : parameters of g
my understanding: semi-surpervised learning
How to solve (11): Stochastic Steady-state Embedding(SSE)
An alternating algorithm: alternate between:
using most current model to find the embeddings and make prediction
using the gradient of the loss with respect to { W 1 , W 2 , V 1 , V 2 } \{W_1, W_2, V_1, V_2\} { W 1 , W 2 , V 1 , V 2 } for update these parameters
Intuition:
RL (policy iteration): improve the policy minimizing the cost proportional to f ∗ f^{*} f ∗ by updating the parameters T Θ and g \mathcal{T}_{\Theta} \text{ and } g T Θ and g
steady-state h v ^ \hat{h_v} h v ^ for each node: "value function "
embedding operator T Θ \mathcal{T}_{\Theta} T Θ and classifier function g g g : "policy "
"Value" estimation: estimate steady-state
h v ^ \hat{h_v} h v ^ limitation : it is prohibitive to solve the steady-state equation exactly in large-scale graph with millions of vertices since it requires visiting all the nodes in the graph .
Solution : stochastic fixed point iteration, the extra randomness on the constraints for sampling the constraints to tackle the groups of equations approximately.
In k -th k\text{-th} k -th step, first sample a set of nodes V ~ = { v 1 , v 2 , … , v N } ∈ V \tilde{\mathcal{V}}=\{v_1,v_2,\dots,v_N\}\in \mathcal{V} V ~ = { v 1 , v 2 , … , v N } ∈ V from the entire node set rather of the labeled set. Update the new embedding by moving average:
h ^ v i ( k ) ← ( 1 − α ) h ^ v i ( k − 1 ) + α T Θ [ { h ^ u ( k − 1 ) } u ∈ N ( v i ) ] , ∀ v i ∈ V ~ ( 12 ) \widehat{h}_{v_{i}}^{(k)} \leftarrow(1-\alpha) \hat{h}_{v_{i}}^{(k-1)}+\alpha \mathcal{T}_{\Theta}\left[\{\widehat{h}_{u}^{(k-1)}\}_{u \in \mathcal{N}\left(v_{i}\right)}\right], \forall v_{i} \in \tilde{\mathcal{V}} \space\space\space\space\space\space\space\space (12) h v i ( k ) ← ( 1 − α ) h ^ v i ( k − 1 ) + α T Θ [ { h u ( k − 1 ) } u ∈ N ( v i ) ] , ∀ v i ∈ V ~ ( 12 ) α : 0 ≤ α ≤ 1 \alpha: 0 \leq \alpha\leq1 α : 0 ≤ α ≤ 1
"Policy" improvement: update parameters of
T Θ and g \mathcal{T}_{\Theta} \text{ and } g T Θ and g At the k -th k\text{-th} k -th step, once we have { h ^ v ( k ) } v ∈ V \{\widehat{h}_{v}^{(k)}\}_{v\in \mathcal{V}} { h v ( k ) } v ∈ V satisfying the steady-state equation, we use vanilla stochastic gradient descent to update parameters { W 1 , W 2 , V 1 , V 2 } \{W_1, W_2, V_1, V_2\} { W 1 , W 2 , V 1 , V 2 } :
∂ L ∂ V i = E ^ [ ∂ ℓ ( f v ∗ , g ( h ^ v k ) ) ∂ g ( h ^ v k ) ∂ g ( h ^ v k ) ∂ V i ] ∂ L ∂ W i = E ^ [ ∂ ℓ ( f v ∗ , g ( h ^ v k ) ) ∂ h ^ v k . ∂ T Θ ∂ W i ] \begin{aligned}
\frac{\partial \mathcal{L}}{\partial \color{red}V_{i}} &=\widehat{\mathbb{E}}\left[\frac{\partial \ell\left(f_{v}^{*}, g\left(\hat{h}_{v}^{k}\right)\right)}{\partial g\left(\hat{h}_{v}^{k}\right)} \frac{\partial g\left(\hat{h}_{v}^{k}\right)}{\partial \color{red}V_{i}}\right] \\
\frac{\partial \mathcal{L}}{\partial \color{red}W_{i}} &=\widehat{\mathbb{E}}\left[\frac{\partial \ell\left(f_{v}^{*}, g\left(\hat{h}_{v}^{k}\right)\right)}{\partial \widehat{h}_{v}^{k}} \frac{.\partial \mathcal{T}_{\Theta}}{\partial \color{red}W_{i}}\right]
\end{aligned} ∂ V i ∂ L ∂ W i ∂ L = E ∂ g ( h ^ v k ) ∂ ℓ ( f v ∗ , g ( h ^ v k ) ) ∂ V i ∂ g ( h ^ v k ) = E ∂ h v k ∂ ℓ ( f v ∗ , g ( h ^ v k ) ) ∂ W i . ∂ T Θ E ^ [ ⋅ ] \widehat{\mathbb{E}}[\cdot] E [ ⋅ ] : the expectation is w.r.t. uniform distribution over labeled nodes V ( y ) \mathcal{V}^{(y)} V ( y ) .
n h n_h n h : the # of inner loops in "value" estimation
n f n_f n f : the # of inner loops in "policy" improvement
Complexity
Memory space
O ( ∣ V ∣ ) : O(|\mathcal{V}|): O ( ∣ V ∣ ) : The dominating part is the persistent node embedding matrix{ h ^ v } v ∈ V \{\widehat{h}_v\}_{v\in \mathcal{V}} { h v } v ∈ V
O ( T ∣ V ∣ ) : O(T|\mathcal{V}|): O ( T ∣ V ∣ ) : T -hops T\text{-hops} T -hops for GNN family
Time: the computational cost in each iteration is just proportional to the number of edges in each mini-batch.
"policy" improvement: O ( M ∣ E ∣ ∣ V ∣ ) O(M\frac{|\mathcal{E}|}{|\mathcal{V}|}) O ( M ∣ V ∣ ∣ E ∣ )
"value" estimation: O ( N ∣ E ∣ ∣ V ∣ ) O(N\frac{|\mathcal{E}|}{|\mathcal{V}|}) O ( N ∣ V ∣ ∣ E ∣ )
Reference