Title
|
|
|
|
Pessimistic reward models for off-policy learning in recommendation
|
|
Author
|
|
|
|
|
|
Abstract
|
|
|
|
Methods for bandit learning from user interactions often require a model of the reward a certain context-action pair will yield – for example, the probability of a click on a recommendation. This common machine learning task is highly non-trivial, as the data-generating process for contexts and actions is often skewed by the recommender system itself. Indeed, when the deployed recommendation policy at data collection time does not pick its actions uniformly-at-random, this leads to a selection bias that can impede effective reward modelling. This in turn makes off-policy learning – the typical setup in industry – particularly challenging. In this work, we propose and validate a general pessimistic reward modelling approach for off-policy learning in recommendation. Bayesian uncertainty estimates allow us to express scepticism about our own reward model, which can in turn be used to generate a conservative decision rule. We show how it alleviates a well-known decision making phenomenon known as the Optimiser’s Curse, and draw parallels with existing work on pessimistic policy learning. Leveraging the available closed-form expressions for both the posterior mean and variance when a ridge regressor models the reward, we show how to apply pessimism effectively and efficiently to an off-policy recommendation use-case. Empirical observations in a wide range of environments show that being conservative in decision-making leads to a significant and robust increase in recommendation performance. The merits of our approach are most outspoken in realistic settings with limited logging randomisation, limited training samples, and larger action spaces. |
|
|
Language
|
|
|
|
English
|
|
Source (book)
|
|
|
|
RecSys '21 : Fifteenth ACM Conference on Recommender Systems, September 2021
|
|
Publication
|
|
|
|
Association for Computing Machinery
,
2021
|
|
ISBN
|
|
|
|
978-1-4503-8458-2
|
|
DOI
|
|
|
|
10.1145/3460231.3474247
|
|
Volume/pages
|
|
|
|
p. 63-74
|
|
ISI
|
|
|
|
000744461300007
|
|
Full text (Publisher's DOI)
|
|
|
|
|
|
Full text (publisher's version - intranet only)
|
|
|
|
|
|