SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
Improving TripAdvisor
Photo Selection
With Deep Learning
Greg Amis
May 25, 2018
About Me: Applied ML, new to MV
2
Machine vision
● Photo selection
Text processing
● Inappropriate reviews
Metadata processing
● Review fraud
Online learning
● Adaptive radar jamming
Text processing
● Topic time series
Agent-based modeling
● Personnel forecasting
● Brain models
● Brain-inspired architectures
● Semi-supervised learning
● Some classes on biological
and machine vision
Department of
Cognitive &
Neural Systems
About TripAdvisor
3
(1) Includes 1.1M hotels, inns, and bed & breakfasts, as well as 800K vacation rental listings
(2) TripAdvisor internal log files, average monthly unique visitors during Q2 2017
+ 40M photos
from professionals
= 150M total
● Largest travel website
● ~400 engineers
● ~40 data scientists and ML engineers
TripAdvisor Redesign: Photo-centric
4
5
Sometimes we show great photos...
6
Sometimes… not so much
Amenity-specific shelves
don’t always show the amenity
7
Photo ordering matters
8
#6 hotel in Atlantic City
4-bubble hotel in Cancun, 6.7k reviews
1. Good primary photos
2. Relevant amenity photos
3. Good default sort order
Goal: Show attractive, useful photos
9
Approach
15 interns, a mini-fridge of GPUs, great OSS
10
11
Gather Training Data
● Interns/MTurk labeled photos
● Pairwise ranking
○ 200,000 photo pairs
○ “Which one motivates you to click?”
● Label photos containing humans
● Label photos by scene type
○ Pool, beach, room, etc.
○ Food, drink, inside, outside, etc.
● Simple infrastructure
Python
Pandas
HTML
JavaScript
Python
CherryPy
● Start with 50-layer ResNet convolutional neural network
trained on 1,000-class ImageNet data
● Remove upper layers concerned with classification
● Remaining lower layers make an excellent feature
extractor for other machine vision problems
Feature Space
12
(He et al, 2015)
...
ResNet
50 ...
2,048
“bottleneck features”
For subjective scoring: Siamese Network
For classification: Multi-layer feedforward
networks with dropout
Model architecture
13
def create_mlp(input_size=2048,output_size=1,
hidden_layer_sizes=(2048, 2048),
dropout_rates=(0.5, 0.5)) -> (Model, Model):
model = Sequential()
model.add(Dense(hidden_layer_sizes[0],
activation='relu', input_shape=(input_size,)))
model.add(Dropout(dropout_rates[0]))
for (h, d) in zip(hidden_layer_sizes[1:], dropout_rates[1:]):
model.add(Dense(h, activation='relu'))
model.add(Dropout(d))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer="adadelta",
loss="binary_crossentropy",
metrics=["binary_crossentropy", "accuracy"])
return model
Dense 2048
σ
Dense 2048
Dropout 0.5
Dense 1
Bottleneck
Features For
Better Image
Dropout 0.5
Dense 2048
Dense 2048
Dropout 0.5
Dense 1
Bottleneck
Features For
Worse Image
Dropout 0.5
+ −
Maximize
Siamese Network
Inspired by
Microsoft’s RankNet
(Burges et al, 2005)
and by Michael Alcorn’s Keras implementation.
● R&D tech stack
○ Two consumer-grade GPUs
○ Keras + Tensorflow
○ Pandas
● Random hyperparameter search
○ Hidden layer width
○ Dropout rate
○ Mini-batch size
○ Epoch count
● Evaluation
○ Cross validation
○ A/B Testing (50% of users see photos selected by machine vision)
Training and Evaluation
14
Kenmore - our “mini-fridge” of GPUs
Deployment
1515
Kubernetes Cluster
For Computation
Spark+YARN Cluster
For Storage
t_photo
t_photo
_bottlenecks
t_photo
_vision_models
1. Get URLs as
PySpark DataFrame
2. Split into partitions
3. Feed partitions to process pool:
a. Get image bytes from CDN
(thread pool)
b. Calc stats
c. Calc bottlenecks
(if necessary)
d. Calc model outputs
4. Write back partitions
asynchronously
SELECT …
FROM t_photo
LEFT ANTI JOIN
t_photo_vision_models
...
FROM tmp_new_partition
INSERT INTO
t_photo_bottlenecks
PARTITION(...)
SELECT ...
WHERE bottlenecks NOT NULL
INSERT INTO
t_photo_vision_models
PARTITION(...)
SELECT ...
NVIDIA
GeForce
GTX 1080 Ti
Results
Pretty food, fewer bathrooms
16
Better Restaurant Hero Photos
17
Initial
Hero
Photos
Photos
Selected
Using
Machine
Vision
Restaurant A Restaurant B Restaurant C
Better Hotel Hero Photos
18
Initial
Hero
Photos
Photos
Selected
Using
Machine
Vision
Hotel A Hotel B Hotel C
Hotels With A Pool
19
Original shelf, just showing hero photos
Same hotels, pool photos selected using machine vision
Beachfront Hotels
20
Original shelf, just showing hero photos
Same hotels, beach photos selected using machine vision
Better Sort Order
21
Original sort order
Same hotel,
photos sorted using
machine vision
Better Sort Order
22
Original sort order
Same hotel,
photos sorted using
machine vision
| tripadvisor is hiring!
○ Software Engineer - Machine Learning
○ Data Scientist - Attractions and Rentals
○ Data Scientist - Search Engine Marketing
○ Data Analyst - Attractions and Rentals
○ Software Engineer - Full Stack Web
● At TripAdvisor
○ Jeff Palmucci
○ Aaron Gonzales
○ Anyi Wang
○ Tyler O’Brien
● Outside TripAdvisor
○ Google: Keras, Tensorflow
○ NVIDIA: CUDA
○ Microsoft: ResNet-50
○ Original research at Google, Microsoft, Facebook,
University of Toronto, NYU
Acknowledgements
24
25
Primary Green #00AF87
Primary Black #000A12
Accent Red #EF6945
Accent Blue #1C99CE
Accent Yellow #F8C40F
Secondary Dark Gray #666666
Secondary Gray #B7B7B7
Secondary Light Gray #E5E5E5
Colors
26

Mais conteúdo relacionado

Último

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Último (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

Destaque

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Destaque (20)

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 

Improving TripAdvisor Photo Selection With Deep Learning

  • 1. Improving TripAdvisor Photo Selection With Deep Learning Greg Amis May 25, 2018
  • 2. About Me: Applied ML, new to MV 2 Machine vision ● Photo selection Text processing ● Inappropriate reviews Metadata processing ● Review fraud Online learning ● Adaptive radar jamming Text processing ● Topic time series Agent-based modeling ● Personnel forecasting ● Brain models ● Brain-inspired architectures ● Semi-supervised learning ● Some classes on biological and machine vision Department of Cognitive & Neural Systems
  • 3. About TripAdvisor 3 (1) Includes 1.1M hotels, inns, and bed & breakfasts, as well as 800K vacation rental listings (2) TripAdvisor internal log files, average monthly unique visitors during Q2 2017 + 40M photos from professionals = 150M total ● Largest travel website ● ~400 engineers ● ~40 data scientists and ML engineers
  • 5. 5 Sometimes we show great photos...
  • 8. Photo ordering matters 8 #6 hotel in Atlantic City 4-bubble hotel in Cancun, 6.7k reviews
  • 9. 1. Good primary photos 2. Relevant amenity photos 3. Good default sort order Goal: Show attractive, useful photos 9
  • 10. Approach 15 interns, a mini-fridge of GPUs, great OSS 10
  • 11. 11 Gather Training Data ● Interns/MTurk labeled photos ● Pairwise ranking ○ 200,000 photo pairs ○ “Which one motivates you to click?” ● Label photos containing humans ● Label photos by scene type ○ Pool, beach, room, etc. ○ Food, drink, inside, outside, etc. ● Simple infrastructure Python Pandas HTML JavaScript Python CherryPy
  • 12. ● Start with 50-layer ResNet convolutional neural network trained on 1,000-class ImageNet data ● Remove upper layers concerned with classification ● Remaining lower layers make an excellent feature extractor for other machine vision problems Feature Space 12 (He et al, 2015) ... ResNet 50 ... 2,048 “bottleneck features”
  • 13. For subjective scoring: Siamese Network For classification: Multi-layer feedforward networks with dropout Model architecture 13 def create_mlp(input_size=2048,output_size=1, hidden_layer_sizes=(2048, 2048), dropout_rates=(0.5, 0.5)) -> (Model, Model): model = Sequential() model.add(Dense(hidden_layer_sizes[0], activation='relu', input_shape=(input_size,))) model.add(Dropout(dropout_rates[0])) for (h, d) in zip(hidden_layer_sizes[1:], dropout_rates[1:]): model.add(Dense(h, activation='relu')) model.add(Dropout(d)) model.add(Dense(1, activation='sigmoid')) model.compile(optimizer="adadelta", loss="binary_crossentropy", metrics=["binary_crossentropy", "accuracy"]) return model Dense 2048 σ Dense 2048 Dropout 0.5 Dense 1 Bottleneck Features For Better Image Dropout 0.5 Dense 2048 Dense 2048 Dropout 0.5 Dense 1 Bottleneck Features For Worse Image Dropout 0.5 + − Maximize Siamese Network Inspired by Microsoft’s RankNet (Burges et al, 2005) and by Michael Alcorn’s Keras implementation.
  • 14. ● R&D tech stack ○ Two consumer-grade GPUs ○ Keras + Tensorflow ○ Pandas ● Random hyperparameter search ○ Hidden layer width ○ Dropout rate ○ Mini-batch size ○ Epoch count ● Evaluation ○ Cross validation ○ A/B Testing (50% of users see photos selected by machine vision) Training and Evaluation 14 Kenmore - our “mini-fridge” of GPUs
  • 15. Deployment 1515 Kubernetes Cluster For Computation Spark+YARN Cluster For Storage t_photo t_photo _bottlenecks t_photo _vision_models 1. Get URLs as PySpark DataFrame 2. Split into partitions 3. Feed partitions to process pool: a. Get image bytes from CDN (thread pool) b. Calc stats c. Calc bottlenecks (if necessary) d. Calc model outputs 4. Write back partitions asynchronously SELECT … FROM t_photo LEFT ANTI JOIN t_photo_vision_models ... FROM tmp_new_partition INSERT INTO t_photo_bottlenecks PARTITION(...) SELECT ... WHERE bottlenecks NOT NULL INSERT INTO t_photo_vision_models PARTITION(...) SELECT ... NVIDIA GeForce GTX 1080 Ti
  • 17. Better Restaurant Hero Photos 17 Initial Hero Photos Photos Selected Using Machine Vision Restaurant A Restaurant B Restaurant C
  • 18. Better Hotel Hero Photos 18 Initial Hero Photos Photos Selected Using Machine Vision Hotel A Hotel B Hotel C
  • 19. Hotels With A Pool 19 Original shelf, just showing hero photos Same hotels, pool photos selected using machine vision
  • 20. Beachfront Hotels 20 Original shelf, just showing hero photos Same hotels, beach photos selected using machine vision
  • 21. Better Sort Order 21 Original sort order Same hotel, photos sorted using machine vision
  • 22. Better Sort Order 22 Original sort order Same hotel, photos sorted using machine vision
  • 23. | tripadvisor is hiring! ○ Software Engineer - Machine Learning ○ Data Scientist - Attractions and Rentals ○ Data Scientist - Search Engine Marketing ○ Data Analyst - Attractions and Rentals ○ Software Engineer - Full Stack Web
  • 24. ● At TripAdvisor ○ Jeff Palmucci ○ Aaron Gonzales ○ Anyi Wang ○ Tyler O’Brien ● Outside TripAdvisor ○ Google: Keras, Tensorflow ○ NVIDIA: CUDA ○ Microsoft: ResNet-50 ○ Original research at Google, Microsoft, Facebook, University of Toronto, NYU Acknowledgements 24
  • 25. 25
  • 26. Primary Green #00AF87 Primary Black #000A12 Accent Red #EF6945 Accent Blue #1C99CE Accent Yellow #F8C40F Secondary Dark Gray #666666 Secondary Gray #B7B7B7 Secondary Light Gray #E5E5E5 Colors 26