The newly redesigned TripAdvisor.com emphasizes traveler photos throughout the site, but not all of these photos make the best first impression. Deep learning networks provide an excellent opportunity for us to improve our users’ experience by highlighting the most attractive and useful photos for varying presentation contexts. This talk will discuss our approach for gathering training data, developing a model, and scaling it up to 150+ million photos and 7+ million places of interest. Technologies discussed: Keras, TensorFlow, PySpark, Python multiprocessing, siamese networks, and to a lesser degree, S3, Hadoop/Hive/HDFS, and Kubernetes.
Greg Amis, Principal Software Engineer at TripAdvisor
Greg Amis is a Principal Software Engineer on the Machine Learning team at TripAdvisor, where we tend to focus on very pragmatic projects-- ML that will quickly and directly improve our business. He’s been at TripAdvisor for over 3.5 years, working on machine vision, text processing (e.g., catching inappropriate content), and metadata processing (e.g., catching fraudulent reviews). Prior to TripAdvisor, he worked on government contracts, doing everything from adaptive radar jamming to forecasting Navy personnel needs. Greg has a PhD from Boston University in Cognitive and Neural Systems, studying a type of neural network called Adaptive Resonance Theory and its application to semi-supervised learning and remote sensing.
View the presentation video here: http://videos.re-work.co/videos/929-improving-tripadvisor-photo-selection-with-deep-learning
View additional Deep Learning presentations here: http://videos.re-work.co/discover
Join the upcoming Deep Learning Summit in Boston here: https://www.re-work.co/events/deep-learning-summit-boston-2019
2. About Me: Applied ML, new to MV
2
Machine vision
● Photo selection
Text processing
● Inappropriate reviews
Metadata processing
● Review fraud
Online learning
● Adaptive radar jamming
Text processing
● Topic time series
Agent-based modeling
● Personnel forecasting
● Brain models
● Brain-inspired architectures
● Semi-supervised learning
● Some classes on biological
and machine vision
Department of
Cognitive &
Neural Systems
3. About TripAdvisor
3
(1) Includes 1.1M hotels, inns, and bed & breakfasts, as well as 800K vacation rental listings
(2) TripAdvisor internal log files, average monthly unique visitors during Q2 2017
+ 40M photos
from professionals
= 150M total
● Largest travel website
● ~400 engineers
● ~40 data scientists and ML engineers
11. 11
Gather Training Data
● Interns/MTurk labeled photos
● Pairwise ranking
○ 200,000 photo pairs
○ “Which one motivates you to click?”
● Label photos containing humans
● Label photos by scene type
○ Pool, beach, room, etc.
○ Food, drink, inside, outside, etc.
● Simple infrastructure
Python
Pandas
HTML
JavaScript
Python
CherryPy
12. ● Start with 50-layer ResNet convolutional neural network
trained on 1,000-class ImageNet data
● Remove upper layers concerned with classification
● Remaining lower layers make an excellent feature
extractor for other machine vision problems
Feature Space
12
(He et al, 2015)
...
ResNet
50 ...
2,048
“bottleneck features”
13. For subjective scoring: Siamese Network
For classification: Multi-layer feedforward
networks with dropout
Model architecture
13
def create_mlp(input_size=2048,output_size=1,
hidden_layer_sizes=(2048, 2048),
dropout_rates=(0.5, 0.5)) -> (Model, Model):
model = Sequential()
model.add(Dense(hidden_layer_sizes[0],
activation='relu', input_shape=(input_size,)))
model.add(Dropout(dropout_rates[0]))
for (h, d) in zip(hidden_layer_sizes[1:], dropout_rates[1:]):
model.add(Dense(h, activation='relu'))
model.add(Dropout(d))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer="adadelta",
loss="binary_crossentropy",
metrics=["binary_crossentropy", "accuracy"])
return model
Dense 2048
σ
Dense 2048
Dropout 0.5
Dense 1
Bottleneck
Features For
Better Image
Dropout 0.5
Dense 2048
Dense 2048
Dropout 0.5
Dense 1
Bottleneck
Features For
Worse Image
Dropout 0.5
+ −
Maximize
Siamese Network
Inspired by
Microsoft’s RankNet
(Burges et al, 2005)
and by Michael Alcorn’s Keras implementation.
14. ● R&D tech stack
○ Two consumer-grade GPUs
○ Keras + Tensorflow
○ Pandas
● Random hyperparameter search
○ Hidden layer width
○ Dropout rate
○ Mini-batch size
○ Epoch count
● Evaluation
○ Cross validation
○ A/B Testing (50% of users see photos selected by machine vision)
Training and Evaluation
14
Kenmore - our “mini-fridge” of GPUs
15. Deployment
1515
Kubernetes Cluster
For Computation
Spark+YARN Cluster
For Storage
t_photo
t_photo
_bottlenecks
t_photo
_vision_models
1. Get URLs as
PySpark DataFrame
2. Split into partitions
3. Feed partitions to process pool:
a. Get image bytes from CDN
(thread pool)
b. Calc stats
c. Calc bottlenecks
(if necessary)
d. Calc model outputs
4. Write back partitions
asynchronously
SELECT …
FROM t_photo
LEFT ANTI JOIN
t_photo_vision_models
...
FROM tmp_new_partition
INSERT INTO
t_photo_bottlenecks
PARTITION(...)
SELECT ...
WHERE bottlenecks NOT NULL
INSERT INTO
t_photo_vision_models
PARTITION(...)
SELECT ...
NVIDIA
GeForce
GTX 1080 Ti
23. | tripadvisor is hiring!
○ Software Engineer - Machine Learning
○ Data Scientist - Attractions and Rentals
○ Data Scientist - Search Engine Marketing
○ Data Analyst - Attractions and Rentals
○ Software Engineer - Full Stack Web
24. ● At TripAdvisor
○ Jeff Palmucci
○ Aaron Gonzales
○ Anyi Wang
○ Tyler O’Brien
● Outside TripAdvisor
○ Google: Keras, Tensorflow
○ NVIDIA: CUDA
○ Microsoft: ResNet-50
○ Original research at Google, Microsoft, Facebook,
University of Toronto, NYU
Acknowledgements
24