The current methods for combinatorial optimization problem cannot learn from experience.
Can we learn from experience?
Given a graph optimization problem G and a distribution D of problem instances, can we learn a better greedy heuristics that generalize to unseen instances from D ?
Three common greedy algorithms on Graphs
Given weighted graph: G(V,E,w) , w : edge weight function, w(u,v) is the weight of edge (u,v)∈E
Minimum Vertex Cover (MVC):
Given a graph G, find a subset of nodes S⊆V, s.t. every edge is covered.
(u,v)∈E⇔u∈S or v∈S
Maximum Cut (MAXCUT):
Given a graph G, find a subset of nodes S⊆V, s.t. the weight of cut-set ∑(u,v)∈Cw(u,v) is maximized.
cut-set: C, the set of edges with one end in S and the other end in V\S
Traveling Salesman Problem (TSP)
basic idea
original graph with initial state
input the graph to an embedding model (structure2vec) for T steps
each node has a corresponding score based on structure2vec
pick up the node with the highest score (after one iteration)
go back to step 2 and repeat until termination
Relation to Q-learning (taking MVC for example)
MVC (Minimum Vertex Cover)
score we earned at current step
move your board left/right
Action value function
Q^(S,i)
predicted future total rewards
How to choose the action
i∗=argmaxi(Q^(S,i))
v∗=argmaxv(Q^(S,v))
How to represent the node embedding:
How to measure the Q-value for each node: