Learning combinatorial optimization algorithms over graphs

NIPS 2017 9-22-2020

Motivation

The current methods for combinatorial optimization problem cannot learn from experience.

Can we learn from experience?

Given a graph optimization problem GG and a distribution DD of problem instances, can we learn a better greedy heuristics that generalize to unseen instances from DD ?

Three common greedy algorithms on Graphs

Given weighted graph: G(V,E,w)G(V,E,w) , ww : edge weight function, w(u,v)w(u,v) is the weight of edge (u,v)E(u,v)\in E

Minimum Vertex Cover (MVC):

Given a graph GG, find a subset of nodes SVS\subseteq V, s.t. every edge is covered.

  • (u,v)EuS or vS(u,v)\in E \Leftrightarrow u\in S \text{ or } v\in S

  • S|S| is minimized

Maximum Cut (MAXCUT):

Given a graph GG, find a subset of nodes SVS\subseteq V, s.t. the weight of cut-set (u,v)Cw(u,v)\sum_{(u,v)\in C}w(u,v) is maximized.

cut-set: CC, the set of edges with one end in SS and the other end in V\SV \backslash S

Traveling Salesman Problem (TSP)

Overview

basic idea

  1. original graph with initial state

  2. input the graph to an embedding model (structure2vec) for T steps

  3. each node has a corresponding score based on structure2vec

  4. pick up the node with the highest score (after one iteration)

  5. go back to step 2 and repeat until termination

Relation to Q-learning (taking MVC for example)

Reinforcement Learning

MVC (Minimum Vertex Cover)

score we earned at current step

current screen

current selected nodes

move your board left/right

select a node

Action value function

predicted future total rewards

structure2vec embedding

How to choose the action

How to represent the node embedding:

How to measure the Q-value for each node:

Reference

Last updated