Nicolas Kruchten @ Datacratic

RTB Optimizer: Behind the scenes witha Predictive API
Nicolas KruchtenPAPIs.io –November 18, 2014
REAL TIME MACHINE LEARNING
DECISIONS AS A SERVICE

About Datacratic
•Software company specializing in high performance systems andmachine learning
•30 employees, founded in 2009, based in Montréal, Québec, Canada with an office in New York
•3 Predictive APIs in market today
•Building a Machine Learning Database to help others build Predictive APIs and Apps

Real-Time Bidding for online advertising
Real-Time
Exchange
Bidder
Bidder
Bidder
Bidder
Web
Browser
GET ad
bid requests

Real-Time
Exchange
Bidder
Bidder
Bidder
Bidder
Web
Browser
ad
bids
auction

Real-Time
Exchange
Bidder
Bidder
Bidder
Bidder
Web
Browser
This happens millions of times per second
Bidders must respond within 100 milliseconds
ad
bids
auction

Real-Time
Exchange
Bidder
Bidder
Bidder
Bidder
Web
Browser
RTB Optimizer enables bidders to achieve campaign goals
ad
bids
auction

Campaign goals
•Advertising campaignsare typically outcome-oriented
–Clicks
–Video views
–Conversions: app installs, purchases, sign-ups
•e.g. Ad network has sold someone 1,000 outcomes for $1,000
•e.g. Advertiser has $1,000 to get as many outcomes as possible
•Essentially maximize profit or minimize cost-per-outcome

Datacratic’s RTB Optimizer
•Client bidder relays bid-requests to API, API tells it how to bid
•Handles 100,000 queries per second, for 100s of campaign
•API says which campaign should bid and how much
•API also needs outcomes in real-time and campaign goals

RTB Optimizer
Bids API
Outcomes API

A Predictive API that learns
•Datacratichas no proprietary data set
•API can learn from scratch from the bid-request stream what works for each campaign:
–Contextual features: website, time of day, banner size and placement
–User features: geo-location, browser, language, # of impressions shown
–Customer-provided data: about the user, about the website
•Provides insightsinto what features are driving performance
•Can re-use learningsfrom previous campaigns

Second price auctions
•First Price Auctions
–You bid $1, I bid $2: I win, and I pay $2
•RTB uses Second Price Auctions
–You bid $1, I bid $2: I win, and I pay $1
•Optimal bid = E[ value ]
–Say it’s worth $2 to me
–I will never bid more than $2
–If I bid $1.50 and you bid $1.75: I’ve lost an opportunity for $0.25 surplus!
–I should always bid $2

Don’t buy lottery tickets!
E[ value ] = payout * P( getting the payout )

What’s it to you?
•If client gets paid $10,000 for 1,000 then payout = $10E[ value | bid-request ] = $10 * P( conversion | bid-request )
•What was an economics problem is now a prediction problem
•We need to calibrate to predict true probabilities

RTB Optimizer
Bids API
E[ value ]
Outcomes API
P( outcome )

Collecting the data
•To compute P( X | Y ) we need examples of Y’s with an X label
•RTB Optimizer uses mix of strategies to meet campaign goals
•Probe strategy bids randomly to collect data
•Optimized strategy bids with E[ value]
•Automatic training/retraining when API see enough examples

RTB Optimizer
Probe
Bids API
E[ value ]
Training
Outcomes API
P( outcome )

Bias control
•Never stop the probe strategy
•Always need control group for evaluation, retraining
•Risk of filter bubbles: future models trained on previous output
•Bid requests are randomly routed to probe, less often over time
•Models automatically back-tested before deployment

How to learn in real-time
•Classify using bagged generalized linear models
•Generate non-linear features with statistics tables
•Periodically retrain classifier
•Continuously update stats tables

Statistics Table by example
Table
Bucket
Impressions
Outcomes
Outcomes/Impressions
95%Confidence
Lower Bound on
Outcomes/Impressions
Browser
Chrome
5M
3k
0.060%
0.058%
Firefox
3M
1k
0.033%
0.031%
Website
abc.com
4M
2k
0.050%
0.048%
xyz.com
1k
10
1.000%
0.481%

RTB Optimizer
Probe
Bids API
E[ value ]
Training
Outcomes API
GLZ Classifier
Stats Tables
Real-Time
Batch

Implementation details (are everything)
•100k requests per second, 10 millisecond latency, running 24/7,1 trillion predictions to date
•Distributed system, written in C++ 11
•AWS: data in S3, training runs on Amazon EC2 spot market
•http://opensource.datacratic.com/
–RTBkit
–JML
–StarCluster

Does it work?
Classification success? ROC or calibration curves…

Does it work?
Classification success? ROC and calibration curves…
Optimization success? 80% reductions in cost-per-outcome…

Does it work?
Classification success? ROC or calibration curves…
Optimization success? 80% reductions in cost-per-outcome…
Customer success! 25% monthly growth

Thanks!
nicolas@datacratic.com
REAL TIME MACHINE LEARNING
DECISIONS AS A SERVICE

Nicolas Kruchten @ Datacratic

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Nicolas Kruchten @ Datacratic

Semelhante a Nicolas Kruchten @ Datacratic (20)

Mais de PAPIs.io

Mais de PAPIs.io (20)

Último

Último (20)

Nicolas Kruchten @ Datacratic