Publication
Title
Pessimistic reward models for off-policy learning in recommendation
Author
Abstract
Methods for bandit learning from user interactions often require a model of the reward a certain context-action pair will yield – for example, the probability of a click on a recommendation. This common machine learning task is highly non-trivial, as the data-generating process for contexts and actions is often skewed by the recommender system itself. Indeed, when the deployed recommendation policy at data collection time does not pick its actions uniformly-at-random, this leads to a selection bias that can impede effective reward modelling. This in turn makes off-policy learning – the typical setup in industry – particularly challenging. In this work, we propose and validate a general pessimistic reward modelling approach for off-policy learning in recommendation. Bayesian uncertainty estimates allow us to express scepticism about our own reward model, which can in turn be used to generate a conservative decision rule. We show how it alleviates a well-known decision making phenomenon known as the Optimiser’s Curse, and draw parallels with existing work on pessimistic policy learning. Leveraging the available closed-form expressions for both the posterior mean and variance when a ridge regressor models the reward, we show how to apply pessimism effectively and efficiently to an off-policy recommendation use-case. Empirical observations in a wide range of environments show that being conservative in decision-making leads to a significant and robust increase in recommendation performance. The merits of our approach are most outspoken in realistic settings with limited logging randomisation, limited training samples, and larger action spaces.
Language
English
Source (book)
RecSys '21 : Fifteenth ACM Conference on Recommender Systems, September 2021
Publication
Association for Computing Machinery , 2021
ISBN
978-1-4503-8458-2
DOI
10.1145/3460231.3474247
Volume/pages
p. 63-74
ISI
000744461300007
Full text (Publisher's DOI)
Full text (publisher's version - intranet only)
UAntwerpen
Faculty/Department
Research group
Project info
City of Things
Publication type
Subject
Affiliation
Publications with a UAntwerp address
External links
Web of Science
Record
Identifier
Creation 22.09.2021
Last edited 03.10.2024
To cite this reference