Track 1: Online Supervised Learning (surrogates)

Problem: time-dependent orienteering problem with stochastic weights and time windows (TD-OPSWTW) [1]. Given one instance, previously tried routes, and the reward for those routes, the goal is to learn a model that can predict the reward for a new route. Then an optimizer finds the route that gives the best reward according to that model, and that route is then evaluated, giving a new data point. Then the model is updated, and this iterative procedure continues for a fixed number of steps.

[1] C Verbeeck, Pieter Vansteenwegen, and E-H Aghezzaf. Solving the stochastic time-dependent orienteering problem with time windows. European Journal of Operational Research, 255(3):699–718, 2016.

PreviousTracks NextTrack 2: Reinforcement Learning

Last updated 4 years ago

Was this helpful?