Learning combinatorial optimization algorithms over graphs

NIPS 2017 9-22-2020

Motivation

The current methods for combinatorial optimization problem cannot learn from experience.

Can we learn from experience?

Given a graph optimization problem $G$ and a distribution $D$ of problem instances, can we learn a better greedy heuristics that generalize to unseen instances from $D$ ?

Three common greedy algorithms on Graphs

Given weighted graph: $G(V,E,w)$ , $w$ : edge weight function, $w(u,v)$ is the weight of edge $(u,v)\in E$

Minimum Vertex Cover (MVC):

Given a graph $G$ , find a subset of nodes $S\subseteq V$ , s.t. every edge is covered.

$(u,v)\in E \Leftrightarrow u\in S \text{ or } v\in S$
$|S|$ is minimized

Maximum Cut (MAXCUT):

Given a graph $G$ , find a subset of nodes $S\subseteq V$ , s.t. the weight of cut-set $\sum_{(u,v)\in C}w(u,v)$ is maximized.

cut-set: $C$ , the set of edges with one end in $S$ and the other end in $V \backslash S$

Traveling Salesman Problem (TSP)

Overview

basic idea

original graph with initial state
input the graph to an embedding model (structure2vec) for T steps
each node has a corresponding score based on structure2vec
pick up the node with the highest score (after one iteration)
go back to step 2 and repeat until termination

Relation to Q-learning (taking MVC for example)

Reinforcement Learning

MVC (Minimum Vertex Cover)

Reward $R(t)$

score we earned at current step

$r^t=-1$

State $S$

current screen

current selected nodes

Action $i$

move your board left/right

select a node

Action value function

$\hat{Q}(S,i)$

predicted future total rewards

structure2vec embedding

Policy $\pi(s)$

How to choose the action

$i^* = \operatorname{argmax}_i(\hat{Q}(S,i))$

$v^* = \operatorname{argmax}_v(\hat{Q}(S,v))$

How to represent the node embedding:

How to measure the Q-value for each node:

Reference

PreviousExperiments NextMeta-Learning with Shared Amortized Variational Inference

Last updated 5 years ago

Was this helpful?