Applications of Machine Learning at USC

APPLICATIONS OF MACHINE
LEARNING
AlexTellez + Amy Wang + H2OTeam
USC, 4/8/2015

AGENDA
1. Introduction to Big Data / ML
2. What is H2O.ai?
3. Use Cases:
4. Data Science Competition
a) Beat Bill Belichick
b) Fight Crime in Chicago
c) Whiskey Recommendation Engine
d) Bordeaux Wine Vintage

1. INTROTO BIG DATA / ML
BIG DATA IS LIKE TEENAGE SEX:
everyone talks about it,
nobody really knows how to do it,
everyone thinks everyone else is
doing it, so everyone claims
they are doing it…
Dan Ariely, Prof. @ Duke

BIGVS. SMALL DATA
When you try to open
file in excel, excel
CRASHES
SMALL = Data fits in RAM
BIG = Data does NOT fit in RAM
Basically…
Big Data is data too big
to process using conventional
methods
(e.g. excel, access)

V +V +V
Today, we have access to more data than we know what to do with!
1) Wearables (ﬁtbit, iWatch, etc)
2) Click streams from web visitors
3. Sensor readings
4. Social Media Outlets (e.g. twitter, facebook, etc)
Volume - Data volumes are becoming unmanageable
Variety - More data types being captured
Velocity - Data arrives rapidly and must
be processed / stored

THE HOPE OF BIG DATA
1. Data contains information of great business / personal value
Examples:
a) Predicting future stock movements = $$$
b) Netﬂix movie recommendations = Better experience = $$$
2. IF you can extract those insights from the data, you can make better
decisions
Enter, Machine Learning (ML)…
So how the hell do you do it?

MACHINE LEARNING
The Wikipedia Definition:
…a scientific discipline that explores the construction and study
of algorithms that can learn from data. Such algorithms operate
by building a model…. ZZZzzzzzZZZzzzzzz
My Definition:
The development, analysis, and application of algorithms that enable
machines to: make predictions and / or better understand data
2 Types of Learning:
SUPERVISED + UNSUPERVISED

SUPERVISED LEARNING
What is it?
Examples of supervised learning tasks:
1. ClassificationTasks - Benign / Malignant tumor
2. RegressionTasks - Predicting future stock market prices
3. Image Recognition - Highlighting faces in pictures
Methods that infer a function from labeled training data. Key task:
Predicting ________ . (Insert your task here)

UNSUPERVISED LEARNING
What is it?
Examples of unsupervised learning tasks:
1. Clustering - Discovering customer segments
2.Topic Extraction - What topics are people tweeting about?
3. Information Retrieval - IBM Watson: Question + Answer
Methods to understand the general structure of input data where
no predictions is needed.
4.Anomaly Detection - Detecting irregular heart-beats
NO CURATION NEEDED!

2.WHAT IS H2O?
What is H2O? (water, duh!)
It is ALSO an open-source, parallel processing engine for machine
learning.
What makes H2O different?
Cutting-edge algorithms + parallel architecture + ease-of-use
=
Happy Data Scientists / Analysts

TEAM @ H2O.AI
16,000 commits
H2O World Conference 2014

COMMUNITY REACH
120 meetups in 2014
11,000 installations
2,000 corporations
First Friday Hack-A-Thons

TRY IT!
Don’t take my word for it…www.h2o.ai
Simple Instructions
1. CD to Download Location
2. unzip h2o ﬁle
3. java -jar h2o.jar
4. Point browser to: localhost:54321
GUI
R

3. USE CASES (LOTS OF EM)
BEAT BILL BELICHICK

TB + BB
Bill Belichick Tom Brady
+ =
15 years together
3 Super Bowls

PASS OR RUN?
On any given offensive play…
Coach Bill can either call a PASS or a RUN
What determines this?
Game situation
Opposing team
Time remaining, etc, etc
Yards to go (until 1st down)
Basically, LOTS of stuff.
Personnel

BUT WHAT IF??
Question:
Can we try to predict whether the next play will be PASS or RUN
using historical data?
Approach:
Download every offensive play from Belichick-Brady era since 2000
Use various Machine Learning approaches to model PASS / RUN
Disclaimer: I’m not a Seahawks fan!
Extract known features to build model inputs

DATA COLLECTION
Data:
13 years of data (2002 -2013 season)
194 games total
14,547 total offensive plays (excludes punts, kickoffs, returns)
Response Variable: PASS / RUN
Model Inputs:
Quarter, Minutes, Seconds, OpposingTeam, Down, Distance,
Line of Scrimmage, NE-Score, OpposingTeam Score, Season,
Formation, Game Status (is NE losing / winning / tied)

FIGHTING CRIME IN CHICAGO
Spark + H2O

OPEN CRIME DATA
Crime Dataset: Crimes from 2001 - Present Day
~ 4.6 million crimes

THE WINDY CITY
Harvest Chicago Weather data since 2001

SOCIOECONOMIC FACTORS
Crimes segmented into Community Area IDs
Percent of households below poverty, unemployed, etc.

SPARK + H2O
Weather CrimesCensusWeatherWeather
Data munging
Spark SQL join
Deep
Learning
Evaluate models
GOAL:
For a given crime,
predict if an
arrest is
more / less
likely to be made!

JOIN DATASETS
crime
data
weather
data
census
data
Using Spark, we join 3 datasets together
to make one mega dataset!

DATAVISUALIZATION
arrest rate season of
crime
temperature
during crime
community
crime is
committed in

SPLIT DATA INTOTEST/TRAIN SETS
training set arrest rate test set arrest rate
train model on this segment, 80% of data
validate the model on this segment (remaining 20%)
~40% of crimes lead to arrest

DEEP LEARNING
Problem:
For a given crime, is an arrest more / less likely?
Deep Learning:
A multi-layer feed-forward
neural network that starts
w/ an input layer
(crime + weather data)
followed by
multiple layers of
non-linear transformations

HOW’D WE DO?
nice!
~ 10 mins

SINGLE-MALT SCOTCH
Single-Malt Scotch
A whiskey made at one particular distillery from a mash that only uses
malted grain (barley)
Solid Standards:
Must be aged at least 3 years in oak casks
Many famous distilleries produced in northern regions of Scotland

OF COURSE,THERE’S A
DATASET FORTHAT!
THE Single Malt Dataset
85 distilleries from Northern Scotland
12 descriptor features:
E.g. Sweetness, Smoky,Tobacco, Honey, Spicy, Malty, etc
Each descriptor rated 0 (weak) to 4 (strong)
Problem:
Can we build a whiskey recommendation engine based on whiskeys I
have tried (and liked!) already?

DIMENSIONALITY
REDUCTION + K-MEANS
First, let’s reduce the 12 features to a lower dimensional space using a
linear transformation (Principal Components Analysis)
7 principal components explain ~ 85% of the variance in dataset
Then let’s use a clustering algorithm to determine unique whiskeys
using the new PCA’d dataset
11 clusters are appropriate
Pipe out the cluster assignments and start buying whiskey!

MODEL RESULTS
I ENJOY:
OTHER WHISKEYS THAT CLUSTER WITH THESE:

OTHER POPULAR BRANDS
APPARENTLY, LOTS OF PEOPLE LIKE:
OTHER WHISKYES THAT CLUSTER WITH THESE:

AUTOENCODER + H2O
Input Output
Hidden
Features
Information Flow
x1
x2
x3
x4
x1
x2
x3
x4
Dogs, Dogs and Dogs

ANOMALY DETECTION OFVINTAGE
YEAR BORDEAUX WINE

BORDEAUX WINE
Largest wine-growing region in France
+ 700 Million bottles of wine produced / year !
Some years better than others: Great ($$$) vs.Typical ($)
Last Great years: 2010, 2009, 2005, 2000

GREATVS.TYPICALVINTAGE?
Question:
Can we study weather patterns in Bordeaux
leading up to harvest to identify ‘anomalous’ weather years >>
correlates to Great ($$$) vs.Typical ($)Vintage?
The Bordeaux Dataset (1952 - 2014 Yearly)
Amount of Winter Rain (Oct > Apr of harvest year)
Average Summer Temp (Apr > Sept of harvest year)
Rain during Harvest (Aug > Sept)
Years since last Great Vintage

AUTOENCODER + ANOMALY
DETECTION
ML Workflow:
1)Train autoencoder to learn ‘typical’ vintage weather pattern
2) Append ‘great’ vintage year weather data to original dataset
3) IF great vintage year weather data does NOT match learned
weather pattern, autoencoder will produce high reconstruction
error (MSE)
‘en primeur of en primeur’ - Can we use weather patterns to identify
anomalous years >> indicates great vintage quality?
Goal:

RESULTS (MSE > 0.10)
Mean
Square
Error
1961V 2009V
2005V
2000V
1990V
1989V
1982V
2010V

2014 BORDEAUX??
Mean
Square
Error
2014
?2013

4. DATA SCIENCE
COMPETITION
Apply / Learn More @: apps.h2o.ai
Checkout ourYouTube Channel for last year’s talks @ H2O World

Applications of Machine Learning at USC

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Applications of Machine Learning at USC

Similar to Applications of Machine Learning at USC (20)

More from Sri Ambati

More from Sri Ambati (20)

Recently uploaded

Recently uploaded (20)

Applications of Machine Learning at USC