Track 2: Reinforcement Learning
The problem is the same as in Track 1, but for multiple instances. We consider an environment (simulator) that can generate multiple instances following the same distribution and expects as output (partial) solutions containing the order at which the nodes should be visited. The simulator returns general instance features and the time-dependent cost for traversing the last edge in a given solution. The goal is to minimize the cost of the total path over multiple samples of selected test instances.
Last updated