SlideShare a Scribd company logo
1 of 32
Download to read offline
MACHINE LEARNING 
PIPELINES 
Evan R. Sparks 
Graduate Student, AMPLab 
With: Shivaram Venkataraman, Tomer Kaftan, Gylfi Gudmundsson, 
Michael Franklin, Benjamin Recht, and others!
WHAT IS MACHINE 
LEARNING?
Model 
“Machine learning is a scientific discipline that deals 
with the construction and study of algorithms that can 
learn from data. Such algorithms operate by building 
a model based on inputs and using that to make 
predictions or decisions, rather than following only 
explicitly programmed instructions.” 
–Wikipedia 
Data
ML PROBLEMS 
• Real data often not ∈ Rd 
• Real data not well-behaved 
according to my algorithm. 
• Features need to be 
engineered. 
• Transformations need to be 
applied. 
• Hyperparameters need to be 
tuned. 
SVM Input: 
Real Data:
SYSTEMS PROBLEMS 
• Datasets are huge. 
• Distributed computing is 
hard. 
• Mapping common ML 
techniques to distributed 
setting may be untenable.
WHAT IS MLBASE? 
• Distributed Machine 
Learning - Made Easy! 
• Spark-based platform to 
simplify the development 
and usage of large scale 
machine learning.
Data Train 
Classifier Model 
A STANDARD MACHINE LEARNING PIPELINE 
Right?
Test 
Data 
A STANDARD MACHINE LEARNING PIPELINE 
That’s more like it! 
Data 
Train 
Linear 
Classifier 
Feature Model 
Extraction 
Predictions
Data Image 
Parser Normalizer Convolver 
A REAL PIPELINE FOR 
IMAGE CLASSIFICATION 
Inspired by Coates & Ng, 2012 
Linear 
Solver 
Feature Extractor 
Symmetric 
Rectifier 
Patch 
Extractor 
Patch 
Whitener 
Patch 
Selector 
Label 
Extractor 
Test 
Feature 
Data 
Model Extractor 
Label 
Extractor 
Test 
Error 
Error 
Computer 
Pooler
A SIMPLE EXAMPLE 
• Load up some images. 
• Featurize. 
• Apply a transformation. 
• Fit a linear model. 
• Evaluate on test data. Replicates Fast Food Features Pipeline - Le et. al., 2012
PIPELINES API 
• A pipeline is made of nodes 
which have an expected 
input and output type. 
• Nodes fit together in a 
sensible way. 
• Pipelines are just nodes. 
• Nodes should be things that 
we know how to scale.
WHAT’S IN THE TOOLBOX? 
Nodes 
Images - Patches, Gabor Filters, HoG, Contrast 
Normalization 
Text - n-grams, lemmatization, TF-IDF, POS, NER 
General Purpose - ZCA Whitening, FFT, Scaling, 
Random Signs, Linear Rectifier, Windowing, Pooling, 
Sampling, QR Decomopsition 
Statistics - Borda Voting, Linear Mapping, Matrix 
Multiply 
ML - Linear Solvers, TSQR, Cholesky Solver, MLlib 
Speech and more - coming soon! 
Pipelines 
Example pipelines across domains CIFAR, MNIST, 
ImageNet, ACL Argument Extraction, TIMIT. 
Stay Tuned! 
Hyper Parameter Tuning Libraries 
GraphX MLlib ml-matrix Featurizers Stats 
Spark 
Utils 
Pipelines 
MLI
Data Image 
Parser Normalizer Convolver 
A REAL PIPELINE FOR 
IMAGE CLASSIFICATION 
Inspired by Coates & Ng, 2012 
Linear 
Solver 
Feature Extractor 
Symmetric 
Rectifier 
Patch 
Extractor 
Patch 
Whitener 
Patch 
Selector 
Label 
Extractor 
Test 
Feature 
Data 
Model Extractor 
Label 
Extractor 
Test 
Error 
Error 
Computer 
Pooler 
YOU’RE GOING TO BUILD THIS!!
BEAR WITH 
ME 
Photo: Andy Rouse, (c) Smithsonian Institute
COMPUTER VISION CRASH 
COURSE
SVM Model
FEATURE EXTRACTION 
Data Image 
Parser Normalizer Convolver 
Linear 
Solver 
Feature Extractor 
Symmetric 
Rectifier 
Patch 
Extractor 
Patch 
Whitener 
Patch 
Selector 
Label 
Extractor 
Model 
Pooler
FEATURE EXTRACTION 
Data Image 
Parser Normalizer Convolver 
Linear 
Solver 
Feature Extractor 
Symmetric 
Rectifier 
Patch 
Extractor 
Patch 
Whitener 
Patch 
Selector 
Label 
Extractor 
Model 
Pooler
NORMALIZATION 
• Moves pixels from [0, 255] to 
[-1.0,1.0]. 
• Why? Math! 
• -1*-1 = 1, 1*1 =1 
• If I overlay two pixels on each 
other and they’re similar values, 
their product will be close to 1 
- otherwise, it will be close to 0 
or -1. 
• Necessary for whitening. 
0 
255 
-1 
+1
FEATURE EXTRACTION 
Data Image 
Parser Normalizer Convolver 
Linear 
Solver 
Feature Extractor 
Symmetric 
Rectifier 
Patch 
Extractor 
Patch 
Whitener 
Patch 
Selector 
Label 
Extractor 
Model 
Pooler
PATCH EXTRACTION 
• Image patches become our 
“visual vocabulary” 
• Intuition from text classification. 
• If I’m trying to classify a 
document as “sports” - I’d look 
for words like “football”, 
“batter”, etc. 
• For images - classifying pictures as 
“face” - I’m looking for things that 
look like eyes, ears, noses, etc. 
Visual Vocabulary
FEATURE EXTRACTION 
Data Image 
Parser Normalizer Convolver 
Linear 
Solver 
Feature Extractor 
Symmetric 
Rectifier 
Patch 
Extractor 
Patch 
Whitener 
Patch 
Selector 
Label 
Extractor 
Model 
Pooler
CONVOLUTION 
• A convolution filter applies a weighted 
average to sliding patches of data. 
• Can be used for lots of things - finding 
edges, blurring, etc. 
• Normalized Input: 
• Image, Ear Filter 
• Output: 
• New image - close to 1 for areas 
that look like the ear filter. 
• Apply many of these simultaneously.
FEATURE EXTRACTION 
Data Image 
Parser Normalizer Convolver 
Linear 
Solver 
Feature Extractor 
Symmetric 
Rectifier 
Patch 
Extractor 
Patch 
Whitener 
Patch 
Selector 
Label 
Extractor 
Model 
Pooler
LINEAR RECTIFICATION 
• For each feature, x, given 
some a (=0.25): 
• xnew=max(x-a, 0) 
• What does it do? 
• Removes a bunch of 
noise.
FEATURE EXTRACTION 
Data Image 
Parser Normalizer Convolver 
Linear 
Solver 
Feature Extractor 
Symmetric 
Rectifier 
Patch 
Extractor 
Patch 
Whitener 
Patch 
Selector 
Label 
Extractor 
Model 
Pooler
POOLING 
• convolve(image, k filters) => k 
filtered images. 
• Lots of info - super granular. 
• Pooling lets us break the (filtered) 
images into regions and sum. 
• Think of the “sum” a how much 
an image quadrant is activated. 
• Image summarized into 4*k 
numbers. 
0.5 8 
0 2
LINEAR CLASSIFICATION
Data: A Labels: b Model: x 
Hypothesis: 
Ax = b + error 
Find the x, which minimizes the error = |Ax - b| 
WHY LINEAR CLASSIFIERS? 
They’re simple. They’re fast. They’re well studied. They scale. 
With the right features, they do a good job!
BACK TO OUR PROBLEM 
• What is A in our problem? 
• #images x #features (4f) 
• What about x? 
• #features x #classes 
• For f < 10000, pretty easy to 
solve! 
• Bigger - we have to get 
creative. 
100k 
1k 
10m x 100k = 
10m 
1k
TODAY’S EXERCISE 
• Build 3 image classification pipelines - simple, 
intermediate, advanced. 
• Qualitatively (with your eyes) and quantitatively 
(with statistics) compare their effectiveness.
ML PIPELINES 
• Reusable, general purpose components. 
• Built with distributed data in mind from day 1. 
• Used together: give a complex system comprised 
of well-understood parts. 
GO BEARS

More Related Content

What's hot

Data Science vs Machine Learning – What’s The Difference? | Data Science Cour...
Data Science vs Machine Learning – What’s The Difference? | Data Science Cour...Data Science vs Machine Learning – What’s The Difference? | Data Science Cour...
Data Science vs Machine Learning – What’s The Difference? | Data Science Cour...Edureka!
 
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...Neo4j
 
Design cycles of pattern recognition
Design cycles of pattern recognitionDesign cycles of pattern recognition
Design cycles of pattern recognitionAl Mamun
 
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete DeckAI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete DeckSlideTeam
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge GraphsJeff Z. Pan
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual IntroductionLukas Masuch
 
A Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxSaiPragnaKancheti
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Suraj Aavula
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language ModelsLeon Dohmen
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and ApplicationsEmanuele Ghelfi
 
3D Perception for Autonomous Driving - Datasets and Algorithms -
3D Perception for Autonomous Driving - Datasets and Algorithms -3D Perception for Autonomous Driving - Datasets and Algorithms -
3D Perception for Autonomous Driving - Datasets and Algorithms -Kazuyuki Miyazawa
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
 
Scikit Learn intro
Scikit Learn introScikit Learn intro
Scikit Learn intro9xdot
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationDataWorks Summit
 
How to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxHow to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxKnoldus Inc.
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 

What's hot (20)

Deep learning
Deep learningDeep learning
Deep learning
 
Data Science vs Machine Learning – What’s The Difference? | Data Science Cour...
Data Science vs Machine Learning – What’s The Difference? | Data Science Cour...Data Science vs Machine Learning – What’s The Difference? | Data Science Cour...
Data Science vs Machine Learning – What’s The Difference? | Data Science Cour...
 
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
 
Design cycles of pattern recognition
Design cycles of pattern recognitionDesign cycles of pattern recognition
Design cycles of pattern recognition
 
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete DeckAI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
A Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptx
 
1.Introduction to deep learning
1.Introduction to deep learning1.Introduction to deep learning
1.Introduction to deep learning
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
 
Machine learning
Machine learning Machine learning
Machine learning
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 
3D Perception for Autonomous Driving - Datasets and Algorithms -
3D Perception for Autonomous Driving - Datasets and Algorithms -3D Perception for Autonomous Driving - Datasets and Algorithms -
3D Perception for Autonomous Driving - Datasets and Algorithms -
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
Scikit Learn intro
Scikit Learn introScikit Learn intro
Scikit Learn intro
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
 
How to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxHow to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptx
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 

Viewers also liked

A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...Jose Quesada (hiring)
 
COCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate AscentCOCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate Ascentjeykottalam
 
Kafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersKafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersJean-Paul Azar
 
Building a unified data pipeline in Apache Spark
Building a unified data pipeline in Apache SparkBuilding a unified data pipeline in Apache Spark
Building a unified data pipeline in Apache SparkDataWorks Summit
 
Building A Production-Level Machine Learning Pipeline
Building A Production-Level Machine Learning PipelineBuilding A Production-Level Machine Learning Pipeline
Building A Production-Level Machine Learning PipelineRobert Dempsey
 
Python as part of a production machine learning stack by Michael Manapat PyDa...
Python as part of a production machine learning stack by Michael Manapat PyDa...Python as part of a production machine learning stack by Michael Manapat PyDa...
Python as part of a production machine learning stack by Michael Manapat PyDa...PyData
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in productionTuri, Inc.
 
PostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CapturePostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CaptureJeff Klukas
 
Machine learning in production with scikit-learn
Machine learning in production with scikit-learnMachine learning in production with scikit-learn
Machine learning in production with scikit-learnJeff Klukas
 
Square's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanSquare's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanHakka Labs
 
Production machine learning_infrastructure
Production machine learning_infrastructureProduction machine learning_infrastructure
Production machine learning_infrastructurejoshwills
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningStepan Pushkarev
 
Managing and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonManaging and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonSimon Frid
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operationsStepan Pushkarev
 
Using PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataUsing PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataRobert Dempsey
 
Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsTuri, Inc.
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In ProductionSamir Bessalah
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architectureStepan Pushkarev
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017Carol Smith
 

Viewers also liked (19)

A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
COCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate AscentCOCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate Ascent
 
Kafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersKafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced Producers
 
Building a unified data pipeline in Apache Spark
Building a unified data pipeline in Apache SparkBuilding a unified data pipeline in Apache Spark
Building a unified data pipeline in Apache Spark
 
Building A Production-Level Machine Learning Pipeline
Building A Production-Level Machine Learning PipelineBuilding A Production-Level Machine Learning Pipeline
Building A Production-Level Machine Learning Pipeline
 
Python as part of a production machine learning stack by Michael Manapat PyDa...
Python as part of a production machine learning stack by Michael Manapat PyDa...Python as part of a production machine learning stack by Michael Manapat PyDa...
Python as part of a production machine learning stack by Michael Manapat PyDa...
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 
PostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CapturePostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data Capture
 
Machine learning in production with scikit-learn
Machine learning in production with scikit-learnMachine learning in production with scikit-learn
Machine learning in production with scikit-learn
 
Square's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanSquare's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong Yan
 
Production machine learning_infrastructure
Production machine learning_infrastructureProduction machine learning_infrastructure
Production machine learning_infrastructure
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learning
 
Managing and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonManaging and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in Python
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operations
 
Using PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataUsing PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of Data
 
Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning Models
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In Production
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architecture
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
 

Similar to Machine Learning Pipelines

Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxIvo Andreev
 
Sparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSpark Summit
 
Sparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsSparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsMaya Hristakeva
 
Fast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareFast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareTigerGraph
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingDatabricks
 
Trinity of AI: data, algorithms and cloud
Trinity of AI: data, algorithms and cloudTrinity of AI: data, algorithms and cloud
Trinity of AI: data, algorithms and cloudAnima Anandkumar
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsMark Peng
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
 
30thSep2014
30thSep201430thSep2014
30thSep2014Mia liu
 
Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowDatabricks
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用CHENHuiMei
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
machine learning workflow with data input.pptx
machine learning workflow with data input.pptxmachine learning workflow with data input.pptx
machine learning workflow with data input.pptxjasontseng19
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017Manish Pandey
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptxDr.Shweta
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedOmid Vahdaty
 
Automated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsAutomated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsLionel Briand
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceSridhara R
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureIvo Andreev
 

Similar to Machine Learning Pipelines (20)

Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackbox
 
Sparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya Hristakeva
 
Sparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsSparking Science up with Research Recommendations
Sparking Science up with Research Recommendations
 
Fast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareFast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA Hardware
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
 
Trinity of AI: data, algorithms and cloud
Trinity of AI: data, algorithms and cloudTrinity of AI: data, algorithms and cloud
Trinity of AI: data, algorithms and cloud
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
30thSep2014
30thSep201430thSep2014
30thSep2014
 
Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflow
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
machine learning workflow with data input.pptx
machine learning workflow with data input.pptxmachine learning workflow with data input.pptx
machine learning workflow with data input.pptx
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptx
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
 
Automated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsAutomated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance Systems
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with Azure
 
專題報告
專題報告專題報告
專題報告
 

More from jeykottalam

AMP Camp 5 Intro
AMP Camp 5 IntroAMP Camp 5 Intro
AMP Camp 5 Introjeykottalam
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQLjeykottalam
 
Concurrency Control for Parallel Machine Learning
Concurrency Control for Parallel Machine LearningConcurrency Control for Parallel Machine Learning
Concurrency Control for Parallel Machine Learningjeykottalam
 
MLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning LibraryMLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning Libraryjeykottalam
 
SparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at ScaleSparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at Scalejeykottalam
 
SampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS StackSampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS Stackjeykottalam
 
The BDAS Open Source Community
The BDAS Open Source CommunityThe BDAS Open Source Community
The BDAS Open Source Communityjeykottalam
 

More from jeykottalam (7)

AMP Camp 5 Intro
AMP Camp 5 IntroAMP Camp 5 Intro
AMP Camp 5 Intro
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQL
 
Concurrency Control for Parallel Machine Learning
Concurrency Control for Parallel Machine LearningConcurrency Control for Parallel Machine Learning
Concurrency Control for Parallel Machine Learning
 
MLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning LibraryMLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning Library
 
SparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at ScaleSparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at Scale
 
SampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS StackSampleClean: Bringing Data Cleaning into the BDAS Stack
SampleClean: Bringing Data Cleaning into the BDAS Stack
 
The BDAS Open Source Community
The BDAS Open Source CommunityThe BDAS Open Source Community
The BDAS Open Source Community
 

Recently uploaded

Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROmotivationalword821
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 

Recently uploaded (20)

Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTRO
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 

Machine Learning Pipelines

  • 1. MACHINE LEARNING PIPELINES Evan R. Sparks Graduate Student, AMPLab With: Shivaram Venkataraman, Tomer Kaftan, Gylfi Gudmundsson, Michael Franklin, Benjamin Recht, and others!
  • 2. WHAT IS MACHINE LEARNING?
  • 3. Model “Machine learning is a scientific discipline that deals with the construction and study of algorithms that can learn from data. Such algorithms operate by building a model based on inputs and using that to make predictions or decisions, rather than following only explicitly programmed instructions.” –Wikipedia Data
  • 4. ML PROBLEMS • Real data often not ∈ Rd • Real data not well-behaved according to my algorithm. • Features need to be engineered. • Transformations need to be applied. • Hyperparameters need to be tuned. SVM Input: Real Data:
  • 5. SYSTEMS PROBLEMS • Datasets are huge. • Distributed computing is hard. • Mapping common ML techniques to distributed setting may be untenable.
  • 6. WHAT IS MLBASE? • Distributed Machine Learning - Made Easy! • Spark-based platform to simplify the development and usage of large scale machine learning.
  • 7. Data Train Classifier Model A STANDARD MACHINE LEARNING PIPELINE Right?
  • 8. Test Data A STANDARD MACHINE LEARNING PIPELINE That’s more like it! Data Train Linear Classifier Feature Model Extraction Predictions
  • 9. Data Image Parser Normalizer Convolver A REAL PIPELINE FOR IMAGE CLASSIFICATION Inspired by Coates & Ng, 2012 Linear Solver Feature Extractor Symmetric Rectifier Patch Extractor Patch Whitener Patch Selector Label Extractor Test Feature Data Model Extractor Label Extractor Test Error Error Computer Pooler
  • 10. A SIMPLE EXAMPLE • Load up some images. • Featurize. • Apply a transformation. • Fit a linear model. • Evaluate on test data. Replicates Fast Food Features Pipeline - Le et. al., 2012
  • 11. PIPELINES API • A pipeline is made of nodes which have an expected input and output type. • Nodes fit together in a sensible way. • Pipelines are just nodes. • Nodes should be things that we know how to scale.
  • 12. WHAT’S IN THE TOOLBOX? Nodes Images - Patches, Gabor Filters, HoG, Contrast Normalization Text - n-grams, lemmatization, TF-IDF, POS, NER General Purpose - ZCA Whitening, FFT, Scaling, Random Signs, Linear Rectifier, Windowing, Pooling, Sampling, QR Decomopsition Statistics - Borda Voting, Linear Mapping, Matrix Multiply ML - Linear Solvers, TSQR, Cholesky Solver, MLlib Speech and more - coming soon! Pipelines Example pipelines across domains CIFAR, MNIST, ImageNet, ACL Argument Extraction, TIMIT. Stay Tuned! Hyper Parameter Tuning Libraries GraphX MLlib ml-matrix Featurizers Stats Spark Utils Pipelines MLI
  • 13. Data Image Parser Normalizer Convolver A REAL PIPELINE FOR IMAGE CLASSIFICATION Inspired by Coates & Ng, 2012 Linear Solver Feature Extractor Symmetric Rectifier Patch Extractor Patch Whitener Patch Selector Label Extractor Test Feature Data Model Extractor Label Extractor Test Error Error Computer Pooler YOU’RE GOING TO BUILD THIS!!
  • 14. BEAR WITH ME Photo: Andy Rouse, (c) Smithsonian Institute
  • 17. FEATURE EXTRACTION Data Image Parser Normalizer Convolver Linear Solver Feature Extractor Symmetric Rectifier Patch Extractor Patch Whitener Patch Selector Label Extractor Model Pooler
  • 18. FEATURE EXTRACTION Data Image Parser Normalizer Convolver Linear Solver Feature Extractor Symmetric Rectifier Patch Extractor Patch Whitener Patch Selector Label Extractor Model Pooler
  • 19. NORMALIZATION • Moves pixels from [0, 255] to [-1.0,1.0]. • Why? Math! • -1*-1 = 1, 1*1 =1 • If I overlay two pixels on each other and they’re similar values, their product will be close to 1 - otherwise, it will be close to 0 or -1. • Necessary for whitening. 0 255 -1 +1
  • 20. FEATURE EXTRACTION Data Image Parser Normalizer Convolver Linear Solver Feature Extractor Symmetric Rectifier Patch Extractor Patch Whitener Patch Selector Label Extractor Model Pooler
  • 21. PATCH EXTRACTION • Image patches become our “visual vocabulary” • Intuition from text classification. • If I’m trying to classify a document as “sports” - I’d look for words like “football”, “batter”, etc. • For images - classifying pictures as “face” - I’m looking for things that look like eyes, ears, noses, etc. Visual Vocabulary
  • 22. FEATURE EXTRACTION Data Image Parser Normalizer Convolver Linear Solver Feature Extractor Symmetric Rectifier Patch Extractor Patch Whitener Patch Selector Label Extractor Model Pooler
  • 23. CONVOLUTION • A convolution filter applies a weighted average to sliding patches of data. • Can be used for lots of things - finding edges, blurring, etc. • Normalized Input: • Image, Ear Filter • Output: • New image - close to 1 for areas that look like the ear filter. • Apply many of these simultaneously.
  • 24. FEATURE EXTRACTION Data Image Parser Normalizer Convolver Linear Solver Feature Extractor Symmetric Rectifier Patch Extractor Patch Whitener Patch Selector Label Extractor Model Pooler
  • 25. LINEAR RECTIFICATION • For each feature, x, given some a (=0.25): • xnew=max(x-a, 0) • What does it do? • Removes a bunch of noise.
  • 26. FEATURE EXTRACTION Data Image Parser Normalizer Convolver Linear Solver Feature Extractor Symmetric Rectifier Patch Extractor Patch Whitener Patch Selector Label Extractor Model Pooler
  • 27. POOLING • convolve(image, k filters) => k filtered images. • Lots of info - super granular. • Pooling lets us break the (filtered) images into regions and sum. • Think of the “sum” a how much an image quadrant is activated. • Image summarized into 4*k numbers. 0.5 8 0 2
  • 29. Data: A Labels: b Model: x Hypothesis: Ax = b + error Find the x, which minimizes the error = |Ax - b| WHY LINEAR CLASSIFIERS? They’re simple. They’re fast. They’re well studied. They scale. With the right features, they do a good job!
  • 30. BACK TO OUR PROBLEM • What is A in our problem? • #images x #features (4f) • What about x? • #features x #classes • For f < 10000, pretty easy to solve! • Bigger - we have to get creative. 100k 1k 10m x 100k = 10m 1k
  • 31. TODAY’S EXERCISE • Build 3 image classification pipelines - simple, intermediate, advanced. • Qualitatively (with your eyes) and quantitatively (with statistics) compare their effectiveness.
  • 32. ML PIPELINES • Reusable, general purpose components. • Built with distributed data in mind from day 1. • Used together: give a complex system comprised of well-understood parts. GO BEARS