2. Spotify is a global audio
subscription service
By the
numbers
232M
108M
79
50M+ 450k+
3. What’s at stake on the Homepage?
The Homepage is the first thing you see when you open the app. It
is many things: a discovery tool, a personal music assistant, a
marketplace for artists and their fans.
Spotify’s mission is to unlock the potential of human creativity —
by giving a million creative artists the opportunity to live off their art
and billions of fans the opportunity to enjoy and be inspired by it.
Personalization is powerful in this challenging content space with
vast volume and variety.
4. 01 More on Spotify Homepage
02 Overview of the Ranking algorithm and the bandit policy
03 Sanity checks used in practice for policy debiasing and model behavior
Talk outline
5. Homepage
organization
The Homepage is made up of cards:
podcast shows or episodes, albums,
playlists, radio stations, artist pages,
etc.
Cards are organized into shelves.
Shelf A
Shelf B
6. Each user is eligible for hundreds of
candidate shelves, which can be
editorially or programmatically
curated. Shelves pull from a pool of
millions of cards.
All shelf candidates and their
respective cards are ranked in
real-time when you load Home.
Made for X
Your Favorite Albums
Similar to Y
Recommended for Today
Iconic 80s Soundtracks
Discovered in Greenwich Village
Programmatic Curation
Editorial Curation
Embedding
Network
Ranking
Recommendation
Funnel
8. Log user feedback:
interactions such as clicks,
likes, streams
Learn to rank Homepage based on logged feedback data.
Homepage ranking as end-to-end ML problem
Ranking algorithm serves
recommendations
Train ranking
algorithm
using logged
feedback
9. Consequences of Feedback Loops
Without randomization in the feedback loop, you risk:
● Homogenized user behavior (Chaney et al. 2018)
● Diminishing diversity over time (Nguyen et al. 2014)
● Poor representation of the long tail (Mehrotra et al. 2018)
Continuous exploration and content pool expansion
are helpful (Jiang et al, 2019)
10. Log user feedback:
interactions such as clicks,
likes, streams
Ranking algorithm serves
recommendations
Train ranking
algorithm
using logged
feedback
Introduce exploration
11. Exploration policy
introduces
randomness
Log user feedback:
interactions such as clicks,
likes, streams
Ranking algorithm serves
recommendations
Train ranking
algorithm
using logged
feedback
+ policy
propensities
Introduce exploration
12. Random data collection
Randomize the Homepage
for a small fraction of
requests
Ways to introduce exploration
Bandit Policy
Explore/exploit as
Homepage is assembled
(McInerney et al., 2018)
Bandit approaches are becoming popular:
● Artwork personalization at Netflix (Amat et al. 2018)
● News article recommendation in Yahoo (Chu et al. 2012)
● Personalization at Amazon Music (ICML 2019)
● REVEAL ’19 workshop here
Fully randomized
experiment
Randomize the Homepage
for a small fraction of users
13. Explore/Exploit
on the Homepage
An example of an epsilon-greedy policy for
ranking the Spotify Homepage.
0.7 0.20.8
Card Candidates
Predicted stream rate
14. Explore/Exploit
on the Homepage
An example of an epsilon-greedy policy for
ranking the Spotify Homepage.
0.7 0.20.8
Card Candidates
0.8
𝜋 = (1- 𝝐) + 𝝐/ 3
15. Explore/Exploit
on the Homepage
An example of an epsilon-greedy policy for
ranking the Spotify Homepage.
0.7 0.20.8
Card Candidates
0.8 0.2
𝜋 = 𝝐/ 2
16. Explore/Exploit
on the Homepage
An example of an epsilon-greedy policy for
ranking the Spotify Homepage.
0.7 0.20.8
Card Candidates
0.8 0.2 0.7
𝜋 = 1
17. Training the reward model*
Counterfactual inference for model parameters
* Explore, Exploit, Explain: Personalizing Explainable Recommendations with Bandits. J McInerney, B Lacker, S Hansen,
K Higley, H.Bouchard, A Gruson & R Mehrotra. RecSys 2018.
18. Research Directions & Practical Challenges
Many research directions we work on:
● Designing better reward models (REVEAL, talk by Mounia Lalmas)
● Optimizing for the marketplace (Marketplaces tutorial, Rishabh and Ben)
● Careful feature engineering to mitigate feedback loop side effects and better
rank new content
● Creating a more representative Homepage (Henriette Cramer in Responsible
Recommendation Panel)
But we need to have integration tests (kind of) so that we are confident that we’ve
got the basics right.
20. Need a way to validate that policy debiasing yields roughly unbiased training data.
Sanity Checks
for policy debiasing
Method:
● Remove position bias by using training data from top
position..
● Train a linear model with a single feature (shelf_name) to
predict a metric that’s observable online (CTR).
● Compare prediction from debiased model to observed
outcome during exploration in that position.
21. Need a way to validate that policy debiasing yields roughly unbiased training data.
Sanity Checks
for policy debiasing
With
importance
sampling
Without
importance
sampling
22. Product strategy
Sanity Checks
for problem specific model behavior
Aggregate ranking metrics (e.g. NDCG) have low resolution and offer little visibility into
model behavior. But stakeholders have expectations about what the model should do in
specific situations. We build trust in the model internally and externally by creating metrics
around these expectations and using them as sanity checks.
Artists
Curators
Users
23. Music has repetitive consumption patterns.
Users have habitual behavior on Home. If a
user has a clear preference for a specific shelf,
models should rank that shelf high on the
page, regardless of what it is.
A user has a “favorite” shelf if a significant
amount of their consumption can be attributed
to that shelf.
Measure the average row where that shelf is
placed for those users.
Favorite Shelf Position Sanity Check
modelA modelB
shelfX
shelfY
shelfZ
24. Daily & Hourly Patterns Sanity Check
“Why don’t I see “Peaceful Piano” on top of my
homepage every night?”
● Zoom into repetitive consumption patterns and
habitual behavior.
● Measure if the row position is higher at the right
time when applicable.
streamrate
25. 01 Motivation for exploration when collecting training data
02 Methods for collection policies and an epsilon greedy example
03 Three examples of simple sanity checks we use in production while
navigating the complex ecosystem of the homepage personalization
Conclusions
26. Thank you!
References:
[1] Lihong Li, Wei Chu, John Langford, Robert E. Schapire, A Contextual-Bandit Approach to Personalized News Article Recommendation
arXiv preprint arXiv:1003.0146
[2] Rishabh Mehrotra, James McInerney, Hugues Bouchard, Mounia Lalmas, and Fernando Diaz. 2018. Towards a Fair Marketplace:
Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems. CIKM '18. ACM, New
York, NY, USA, 2243-2251
[3] Allison J. B. Chaney, Brandon Stewart, and Barbara Engelhardt. 2017. How algorithmic confounding in recommendation systems
increases homogeneity and decreases utility. arXiv preprint arXiv:1710.11214
[4] J. McInerney, B. Lacker, S. Hansen, K. Higley, H. Bouchard, A. Gruson, R. Mehrotra. Explore, Exploit, Explain: Personalizing Explainable
Recommendations with Bandits. In ACM Conference on Recommender Systems (RecSys), October 2018
[5] Ray Jiang, Silvia Chiappa, Tor Lattimore, Andras Agyorgy, and Pushmeet Kohli. 2019. Degenerate Feedback Loops in Recommender
Systems. arXiv:arXiv:1902.10730
[6] Thorsten Joachims, Adith Swaminathan, Tobias Schnabel Unbiased learning from biased user feedback arXiv:arXiv:1608.04468
[7] Fernando Amat, Ashok Chandrashekar, Tony Jebara, and Justin Basilico. 2018. Artwork personalization at netflix. In Proceedings of the
12th ACM Conference on Recommender Systems (RecSys '18).
https://www.spotifyjobs.com