SlideShare a Scribd company logo
1 of 43
Download to read offline
© 2013 Datameer, Inc. All rights reserved.
Top 3 Things to Consider with
Machine Learning on Big Data
Karen Hsu
Elliott Cordo

© 2013 Datameer, Inc. All rights reserved.
About our Speakers
Karen Hsu
•

Karen is Senior Director, Product Marketing at
Datameer. With over 15 years of experience in
enterprise software, Karen Hsu has co-authored
4 patents and worked in a variety of
engineering, marketing and sales roles.

•

Most recently she came from Informatica where
she worked with the start-ups Informatica
purchased to bring data quality, master data
management, B2B and data security solutions
to market. 

• Karen has a Bachelors of Science degree in

Management Science and Engineering from
Stanford University.  

© 2013 Datameer, Inc. All rights reserved.
About our Speakers
Elliott Cordo
• Elliott is a data warehouse and information

management expert. He brings more than a
decade of experience in implementing data
solutions with hands-on experience in every
component of the data warehouse software
development lifecycle.

• At Caserta Concepts, Elliott oversees largescale major technology projects, including
those involving business intelligence, data
analytics, Big Data and data warehousing.

© 2013 Datameer, Inc. All rights reserved.
Drivers &
Challenges

Use Cases

© 2013 Datameer, Inc. All rights reserved.

Key Criteria

Best
Practices

Next Steps
Drivers & Challenges
Big Data Drives Results
Amazon vs Barnes & Noble

Big Data Analytics Drives Results

$300

$225

$150

$75

$0
12

/31

/09 1/10 0/10 0/10 1/10 1/11 0/11 0/11 1/11 1/12 0/12 0/12 1/12 1/13
/3
/3
/3
/3
/3
/3
/3
/3
/3
/3
/3
/3
/2
03
06
09
12
03
06
09
12
03
06
09
12
03

NetFlix vs Blockbuster
$300

$225

$150

$75

$0
12

/31

/09 1/10 0/10 0/10 1/10 1/11 0/11 0/11 1/11 1/12 0/12 0/12 1/12 1/13
/3
/3
/3
/3
/3
/3
/3
/3
/3
/3
/3
/3
/2
03
06
09
12
03
06
09
12
03
06
09
12
03

© 2013 Datameer, Inc. All rights reserved.
Alternatives Are Lacking

Data
Mining

•
•
•
•

Traditional
BI

Hard to use
Requires PHD experts
Must write code
Expensive

• Fixed DW models
• Must write code for
analytics
• Very high IT labor
costs
• Not agile

© 2013 Datameer, Inc. All rights reserved.

Visualization

• Easy for small teams
• Can’t manage large data
volume
• Lack support of advanced
analytics
Costs of Building Can be $1M+

Solution

$1M+ in Capital

Bay Area
140,000.00
117,000.00
119,000.00
125,000.00
116,000.00

New York
$126,000.00
$105,000.00
$107,000.00
$119,000.00
$104,000.00

137,000.00

$133,000.00

138,000.00
136,000.00
120,000.00

$133,000.00
$133,000.00
$114,000.00

1,148,000.00

$1M+ in Salaries

Job Title
IT Project Manager
System Administrator
Network Administrator
Database
Administrator
IT Security Manager
Business Intelligence
Analyst
Data Scientist
Java Developer
QA Engineer

$1,074,000.00

Cost / 100TB

Teradata EDW

1,650,000.00

Oracle Exadata

1,400,000.00

IBM Netezza

1,000,000.00

© 2013 Datameer, Inc. All rights reserved.
Use Cases
Use Cases
Use Case

What is Revealed

Profiling and
segmentation

Customer, product, market characteristics and segments

Acquisition and
retention

What leads a person to become a customer or stop being a
customer

Product development
and operations
optimization

What led to product or network failure

Campaign
management

Patterns of successful campaigns

Cross-sell / up-sell

Recommendations on services, products, or advisors for a
given user/customer profile

© 2013 Datameer, Inc. All rights reserved.
Customer Examples
Industry

Use Case

Financial Services

• Show correlation between services purchased and
investments/trades made
• Identify customer segments
• Recommendations for research articles to drive trading

eCommerce

• Show types of events person will like
• Decision tree based on likelihood to click through
• Recommendations for a large “cold start” population

Gaming

• Clustering for user profiles
• Correlation between attributes of a game and behavior
• Churn analysis

Healthcare

• Recommend tests or other offerings
• Identify factors/trends that lead to disease
© 2013 Datameer, Inc. All rights reserved.
Polling Question I
Key Criteria
Ease of Use

© 2013 Datameer, Inc. All rights reserved.

Quality
Clustering
Clustering Overview
•
•
•

K-means is a popular and versatile general purpose clustering
algorithm.
Commonly used to group people and objects together to form
segments
Often leveraged to enhance recommendation and search systems

K-Means

How it works
1. Treats items as coordinates
2. Places a number of random
“centroids” and assigns the
nearest items
3. Moves the centroids around
based on average location
4. Process repeats until the
assignments stop changing

© 2013 Datameer, Inc. All rights reserved.
*Diagram from Collective Intelligence by Toby Segaran
Ease of Use
First, the set up...

In Datameer, you select the columns... And
get the results

And then run the results...

And the quality of results increases with larger
data sets…
And write additional code to scale...
© 2013 Datameer, Inc. All rights reserved.
Ease of Use
First, you have to set up...
pca <- princomp(iris[1:4]);
colors <- kmeans(iris[1:4], 3)$cluster;
plot(pca$scores[,1], pca$scores[,2],
col=colors, pch=5);
And then run the results...

And then write more code to scale...
© 2013 Datameer, Inc. All rights reserved.

In Datameer, you select the columns... And
get the results
Ease of Use
First, select the data...

In Datameer, you select the columns... And
get the results
Second, you need to create the cluster...

And then see the results

© 2013 Datameer, Inc. All rights reserved.
Ease of Use
1. First a dataset’s attirbutes must be converted to numeric representations
User

Location

Company

Favorite Algo

Elliott

New Jersey

Caserta

K-Means

Karen

California

Datameer

K-Means

User

Location

Company

Favorite Algo

1001

1

101

1001

1002

2

102

1001

2. This numeric dataset is then converted to a sequence file, then sparse
vector leveraging Seqdirectory and seq2sparse 
3. Mahout is called, number of clusters, distance calculation is specified

bin/mahout kmeans  -i /user/kmeans/vectors  -c /user/
kmeans/input  -o /user/kmeans/output  -k 200  -dm
CosineSimilarity  -x 20 -ow

4. The sparse vector output is then converted back to a delimted format,
5. Textual attributes willl be appended back to the record, numeric values
preserved for ad-hoc distance comparison of members within a cluster

© 2013 Datameer, Inc. All rights reserved.
*Diagram from Collective Intelligence by Toby Segaran

In Datameer, you select the columns... And
get the results
Quality Comparison

© 2013 Datameer, Inc. All rights reserved.
Column Dependencies
Column Dependencies Overview
A

B

C

D

a

x

a

x

b

y

b

x

b

y

b

y

a

x

a

z

c

z

c

y

a

y

a

y

Column
Dependency ~
0.99

Column
Dependency ~
0.01

Value
•See how data is related after joining multiple sets of
data
•See column dependencies on multiple types of data

© 2013 Datameer, Inc. All rights reserved.
Quality Comparison
ColumnDependency(A,B) = 0.5

ColumnDependency(A,B) = 0.5

0

Column B

0

Column B

0
-2

-5

-5

-1

Column B

1

5

5

2

ColumnDependency(A,B) = 0

-3

-2

-1

0

1

2

3

-2

-1

0

1

2

3

-2

-1

0

1

2

Column A

Column A

ColumnDependency(A,B) = 1

ColumnDependency(A,B) = 0.5

ColumnDependency(A,B) = 1

m
k
j
i
h
g
f
e

Column B (STRING)

a

b

c

a

d

b

0
-6000 -4000 -2000

Column B

2000

Column B (STRING)

l

c

4000

n

6000

o

Column A

-3

-2

-1

0

1

2

3

Column A

© 2013 Datameer, Inc. All rights reserved.

0

0.5

1

1.5

Column A
(NUMBER)

2

2.5

3

1

2

3

4

5

6

7

8

9

10

Column A
(NUMBER)

12

14
Decision Tree
Decision Tree Overview
Goal: Create a model that predicts the value of a target
based on several inputs.

© 2013 Datameer, Inc. All rights reserved.
Ease of Use
First, you need to code...
packages.install(rpart);
library(rpart);
treeInput <- read.csv("/PathToData/
iris.csv");
fit <- rpart(class ~ sepalLength
+sepalWidth+petalLength+petalWidth,
data=treeInput);
par(mfrow=c(1,2), xpd=NA);
plot(fit);
text(fit, use.n=TRUE);
And then run the results...

And then write more code to scale...

© 2013 Datameer, Inc. All rights reserved.

In Datameer, you select the columns... And
get the results
Ease of Use

First, select the data...

In Datameer, you select the columns... And
get the results
Second, you configure the settings...

And then see the results

© 2013 Datameer, Inc. All rights reserved.
Quality Comparison
Iris

Wine

Breast	
  
Cancer	
  
Wisconsin

R

92.66%

86.47%

92.86%

Weka

95.33%

89.33%

93.5%

Datameer

93.33%

91.18%

93.04%

© 2013 Datameer, Inc. All rights reserved.
Recommendations
Recommendations Overview
Increased revenue
Your customers expect them
What makes a good
recommendation?
Combination of algorithms and
Hadoop make effective
recommendations platform
achievable

© 2013 Datameer, Inc. All rights reserved.
Ease of Use
First, the set up...
# run factorization of ratings matrix
$MAHOUT parallelALS --input ${WORK_DIR}/dataset/trainingSet/ --output $
{WORK_DIR}/als/out 
    --tempDir ${WORK_DIR}/als/tmp --numFeatures 20 --numIterations 10 --lambda
0.065 --numThreadsPerSolver 2
# compute recommendations
$MAHOUT recommendfactorized --input ${WORK_DIR}/als/out/userRatings/ -output ${WORK_DIR}/recommendations/ 
    --userFeatures ${WORK_DIR}/als/out/U/ --itemFeatures ${WORK_DIR}/als/out/
M/ 
    --numRecommendations 6 --maxRating 5 --numThreads 2

In Datameer, you select the columns... And
get the results

And then run the results...
1
[845:5.0,550:5.0,546:5.0,25:5.0,531:5.0,529:5.0,52
7:5.0,31:5.0,515:5.0,514:5.0]
2
[546:5.0,288:5.0,11:5.0,25:5.0,531:5.0,527:5.0,515
:5.0,508:5.0,496:5.0,483:5.0]
3
[137:5.0,284:5.0,508:4.832,24:4.82,285:4.8,845:4.7
5,124:4.7,319:4.703,29:4.67,591:4.6]
4
[748:5.0,1296:5.0,546:5.0,568:5.0,538:5.0,508:5.0,
483:5.0,475:5.0,471:5.0,876:5.0]
5
[732:5.0,550:5.0,9:5.0,546:5.0,11:5.0,527:5.0,523:
5.0,514:5.0,511:5.0,508:5.0]
6
[739:5.0,9:5.0,546:5.0,11:5.0,25:5.0,531:5.0,528:5
.0,527:5.0,526:5.0,521:5.0]
© 2013 Datameer, Inc. All rights reserved.
Quality Comparison
Shawshank

Godfather

Pulp
Fiction

Fight
Club

Dianna

4.76

4.98

1.95

2.44

Jon

1.99

2.51

2.87

4.83

Karen

3.28

4.72

1.89

2.95

Elliott

2.92

3.64

2.97

4.83

© 2013 Datameer, Inc. All rights reserved.

Same Results
Best Practices
Big Data Analytics Process

Integrate

Define

Ad
Hoc

Prepare and
Analyze
Deploy

Visualize

© 2013 Datameer, Inc. All rights reserved.

Production
Clustering
• Leverage Hierarchies
• If possible, use numbering schemes
• Scale the surrogate key of attributes
• Try different cluster sizes
• Avoid numeric similarities when building your data

© 2013 Datameer, Inc. All rights reserved.
Recommendations
K-Means:
Similar

Item-Based

• Leverage a combination of
algorithms

• Clustering is your friend!
• Treat cold start situations differently
• Think about ranking
• Don’t let recommendations go wild
© 2013 Datameer, Inc. All rights reserved.

Item Similarity
Best
Recommendations
Process Best Practices

Map

© 2013 Datameer, Inc. All rights reserved.

Chain

Iterate
Demonstration
Polling Question II
Return on Investment
Funnel
Optimization

Behavioral
Analytics

Fraud
Prevention

EDW
Optimization

Customer
Segmentation

Increase Customer
conversion by 3x

Increase Revenue
by 2x

Identify $2B in
potential fraud

98% OpEx savings
$1M+ CapEx
savings

Lower Customer
Acquisition Costs by
30%

© 2013 Datameer, Inc. All rights reserved.
Call to Action
Workshop
Contact

• Elliott Cordo elliott@casertaconcepts.com
• Karen Hsu khsu@datameer.com

© 2013 Datameer, Inc. All rights reserved.

More Related Content

What's hot

Big Data Solutions Executive Overview
Big Data Solutions Executive OverviewBig Data Solutions Executive Overview
Big Data Solutions Executive OverviewRCG Global Services
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker
 
Cloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learningCloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learningCloudera, Inc.
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkCaserta
 
Creating an Enterprise AI Strategy
Creating an Enterprise AI StrategyCreating an Enterprise AI Strategy
Creating an Enterprise AI StrategyAtScale
 
Becoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural ChangeBecoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural ChangeCloudera, Inc.
 
Big Data Analytics in Government
Big Data Analytics in GovernmentBig Data Analytics in Government
Big Data Analytics in GovernmentDeepak Ramanathan
 
The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThomas Kelly, PMP
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingm_hepburn
 
Data Science in Enterprise
Data Science in EnterpriseData Science in Enterprise
Data Science in EnterpriseJosh Yeh
 
Value proposition for big data isv partners 0714
Value proposition for big data isv partners 0714Value proposition for big data isv partners 0714
Value proposition for big data isv partners 0714Niu Bai
 
Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017
Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017
Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017Cloudera, Inc.
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoopDr. Wilfred Lin (Ph.D.)
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionCloudera, Inc.
 
Using hadoop for enterprise data management
Using hadoop for enterprise data managementUsing hadoop for enterprise data management
Using hadoop for enterprise data managementEstuate, Inc.
 
IBM Industry Models and Data Lake
IBM Industry Models and Data Lake IBM Industry Models and Data Lake
IBM Industry Models and Data Lake Pat O'Sullivan
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data LakeCaserta
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 
Big data analysis concepts and references by Cloud Security Alliance
Big data analysis concepts and references by Cloud Security AllianceBig data analysis concepts and references by Cloud Security Alliance
Big data analysis concepts and references by Cloud Security AllianceInformation Security Awareness Group
 

What's hot (20)

Big Data Solutions Executive Overview
Big Data Solutions Executive OverviewBig Data Solutions Executive Overview
Big Data Solutions Executive Overview
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
Cloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learningCloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learning
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 
Creating an Enterprise AI Strategy
Creating an Enterprise AI StrategyCreating an Enterprise AI Strategy
Creating an Enterprise AI Strategy
 
Becoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural ChangeBecoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural Change
 
Big Data Analytics in Government
Big Data Analytics in GovernmentBig Data Analytics in Government
Big Data Analytics in Government
 
The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
 
Data Science in Enterprise
Data Science in EnterpriseData Science in Enterprise
Data Science in Enterprise
 
Value proposition for big data isv partners 0714
Value proposition for big data isv partners 0714Value proposition for big data isv partners 0714
Value proposition for big data isv partners 0714
 
Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017
Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017
Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber Solution
 
Using hadoop for enterprise data management
Using hadoop for enterprise data managementUsing hadoop for enterprise data management
Using hadoop for enterprise data management
 
IBM Industry Models and Data Lake
IBM Industry Models and Data Lake IBM Industry Models and Data Lake
IBM Industry Models and Data Lake
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data Lake
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
Big data analysis concepts and references by Cloud Security Alliance
Big data analysis concepts and references by Cloud Security AllianceBig data analysis concepts and references by Cloud Security Alliance
Big data analysis concepts and references by Cloud Security Alliance
 

Viewers also liked

BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterLinaro
 
Top 3 Challenges to Profitable Mortgage Lending
Top 3 Challenges to Profitable Mortgage LendingTop 3 Challenges to Profitable Mortgage Lending
Top 3 Challenges to Profitable Mortgage LendingEquifax
 
Growth Hacking Marketing Plan for New Design Products - Born.com
Growth Hacking Marketing Plan for New Design Products - Born.comGrowth Hacking Marketing Plan for New Design Products - Born.com
Growth Hacking Marketing Plan for New Design Products - Born.comGrowth Hakka
 
Graph analytic and machine learning
Graph analytic and machine learningGraph analytic and machine learning
Graph analytic and machine learningStanley Wang
 
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...npinto
 
Your Rock Solid Digital Approach to Attract More Industry Attention
Your Rock Solid Digital Approach to Attract More Industry AttentionYour Rock Solid Digital Approach to Attract More Industry Attention
Your Rock Solid Digital Approach to Attract More Industry AttentionAtlas Integrated
 
Predictive Analytics and Machine Learning 101
Predictive Analytics and Machine Learning 101Predictive Analytics and Machine Learning 101
Predictive Analytics and Machine Learning 101Poya Manouchehri
 
Intro au Big Data & Machine Learning
Intro au Big Data & Machine LearningIntro au Big Data & Machine Learning
Intro au Big Data & Machine LearningEric Daoud
 
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",..."From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...Dataconomy Media
 
Hortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with HadoopHortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with HadoopMats Johansson
 
Using Artificial Intelligence to power Service Virtualization
Using Artificial Intelligence to power Service VirtualizationUsing Artificial Intelligence to power Service Virtualization
Using Artificial Intelligence to power Service VirtualizationCA Technologies
 
Cursos de Big Data y Machine Learning
Cursos de Big Data y Machine LearningCursos de Big Data y Machine Learning
Cursos de Big Data y Machine LearningStratebi
 
Arrow AI: Automated Customer Care
Arrow AI: Automated Customer CareArrow AI: Automated Customer Care
Arrow AI: Automated Customer CareUtkarsh Shukla
 
Artifical Intelligence in Customer Service
Artifical Intelligence in Customer ServiceArtifical Intelligence in Customer Service
Artifical Intelligence in Customer ServiceSam Hirsch
 
Machine Learning for Actuaries
Machine Learning for ActuariesMachine Learning for Actuaries
Machine Learning for ActuariesArthur Charpentier
 
parlamind - NOAH16 London
parlamind - NOAH16 Londonparlamind - NOAH16 London
parlamind - NOAH16 LondonNOAH Advisors
 
Integrated Marketing Campaign for AVC hair products
Integrated Marketing Campaign for AVC hair productsIntegrated Marketing Campaign for AVC hair products
Integrated Marketing Campaign for AVC hair productsSaurabh Mhase
 
Trucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - HortonworksTrucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - HortonworksKelly Kohlleffel
 
How Twitter Timeline works
How Twitter Timeline worksHow Twitter Timeline works
How Twitter Timeline worksAnn Smarty
 

Viewers also liked (20)

Learning Analytics
Learning AnalyticsLearning Analytics
Learning Analytics
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
 
Top 3 Challenges to Profitable Mortgage Lending
Top 3 Challenges to Profitable Mortgage LendingTop 3 Challenges to Profitable Mortgage Lending
Top 3 Challenges to Profitable Mortgage Lending
 
Growth Hacking Marketing Plan for New Design Products - Born.com
Growth Hacking Marketing Plan for New Design Products - Born.comGrowth Hacking Marketing Plan for New Design Products - Born.com
Growth Hacking Marketing Plan for New Design Products - Born.com
 
Graph analytic and machine learning
Graph analytic and machine learningGraph analytic and machine learning
Graph analytic and machine learning
 
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
 
Your Rock Solid Digital Approach to Attract More Industry Attention
Your Rock Solid Digital Approach to Attract More Industry AttentionYour Rock Solid Digital Approach to Attract More Industry Attention
Your Rock Solid Digital Approach to Attract More Industry Attention
 
Predictive Analytics and Machine Learning 101
Predictive Analytics and Machine Learning 101Predictive Analytics and Machine Learning 101
Predictive Analytics and Machine Learning 101
 
Intro au Big Data & Machine Learning
Intro au Big Data & Machine LearningIntro au Big Data & Machine Learning
Intro au Big Data & Machine Learning
 
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",..."From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
 
Hortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with HadoopHortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with Hadoop
 
Using Artificial Intelligence to power Service Virtualization
Using Artificial Intelligence to power Service VirtualizationUsing Artificial Intelligence to power Service Virtualization
Using Artificial Intelligence to power Service Virtualization
 
Cursos de Big Data y Machine Learning
Cursos de Big Data y Machine LearningCursos de Big Data y Machine Learning
Cursos de Big Data y Machine Learning
 
Arrow AI: Automated Customer Care
Arrow AI: Automated Customer CareArrow AI: Automated Customer Care
Arrow AI: Automated Customer Care
 
Artifical Intelligence in Customer Service
Artifical Intelligence in Customer ServiceArtifical Intelligence in Customer Service
Artifical Intelligence in Customer Service
 
Machine Learning for Actuaries
Machine Learning for ActuariesMachine Learning for Actuaries
Machine Learning for Actuaries
 
parlamind - NOAH16 London
parlamind - NOAH16 Londonparlamind - NOAH16 London
parlamind - NOAH16 London
 
Integrated Marketing Campaign for AVC hair products
Integrated Marketing Campaign for AVC hair productsIntegrated Marketing Campaign for AVC hair products
Integrated Marketing Campaign for AVC hair products
 
Trucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - HortonworksTrucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - Hortonworks
 
How Twitter Timeline works
How Twitter Timeline worksHow Twitter Timeline works
How Twitter Timeline works
 

Similar to Top 3 Considerations for Machine Learning on Big Data

Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon Web Services
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services
 
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스Amazon Web Services Korea
 
AWS Neptune - A Fast and reliable Graph Database Built for the Cloud
AWS Neptune - A Fast and reliable Graph Database Built for the CloudAWS Neptune - A Fast and reliable Graph Database Built for the Cloud
AWS Neptune - A Fast and reliable Graph Database Built for the CloudAmazon Web Services
 
NEW LAUNCH! Infinitely Scalable Machine Learning Algorithms with Amazon AI - ...
NEW LAUNCH! Infinitely Scalable Machine Learning Algorithms with Amazon AI - ...NEW LAUNCH! Infinitely Scalable Machine Learning Algorithms with Amazon AI - ...
NEW LAUNCH! Infinitely Scalable Machine Learning Algorithms with Amazon AI - ...Amazon Web Services
 
Datastage Online Training in Hyderabad
Datastage Online Training in HyderabadDatastage Online Training in Hyderabad
Datastage Online Training in HyderabadUgs8008
 
3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive AnalyticsNandita Nityanandam
 
Scaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With LuminaireScaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With LuminaireDatabricks
 
Dataiku productive application to production - pap is may 2015
Dataiku    productive application to production - pap is may 2015 Dataiku    productive application to production - pap is may 2015
Dataiku productive application to production - pap is may 2015 Dataiku
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudBridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudInside Analysis
 
Machine Learning 101 - AWS Machine Learning Web Day
Machine Learning 101 - AWS Machine Learning Web DayMachine Learning 101 - AWS Machine Learning Web Day
Machine Learning 101 - AWS Machine Learning Web DayAWS Germany
 
New Features 9.2 – Payroll for North America and T&L
New Features 9.2 – Payroll for North America and T&LNew Features 9.2 – Payroll for North America and T&L
New Features 9.2 – Payroll for North America and T&LEmtec Inc.
 
New features 9.2 - Payroll for North America and T&L
New features 9.2 - Payroll for North America and T&LNew features 9.2 - Payroll for North America and T&L
New features 9.2 - Payroll for North America and T&LEmtec Inc.
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopDatameer
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
 
Migrating Oracle to Aurora PostgreSQL Utilizing AWS Database Migration Servic...
Migrating Oracle to Aurora PostgreSQL Utilizing AWS Database Migration Servic...Migrating Oracle to Aurora PostgreSQL Utilizing AWS Database Migration Servic...
Migrating Oracle to Aurora PostgreSQL Utilizing AWS Database Migration Servic...Amazon Web Services
 
Database@Home : The Future is Data Driven
Database@Home : The Future is Data DrivenDatabase@Home : The Future is Data Driven
Database@Home : The Future is Data DrivenTammy Bednar
 

Similar to Top 3 Considerations for Machine Learning on Big Data (20)

Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
 
AWS Neptune - A Fast and reliable Graph Database Built for the Cloud
AWS Neptune - A Fast and reliable Graph Database Built for the CloudAWS Neptune - A Fast and reliable Graph Database Built for the Cloud
AWS Neptune - A Fast and reliable Graph Database Built for the Cloud
 
NEW LAUNCH! Infinitely Scalable Machine Learning Algorithms with Amazon AI - ...
NEW LAUNCH! Infinitely Scalable Machine Learning Algorithms with Amazon AI - ...NEW LAUNCH! Infinitely Scalable Machine Learning Algorithms with Amazon AI - ...
NEW LAUNCH! Infinitely Scalable Machine Learning Algorithms with Amazon AI - ...
 
Datastage Online Training in Hyderabad
Datastage Online Training in HyderabadDatastage Online Training in Hyderabad
Datastage Online Training in Hyderabad
 
3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics
 
Scaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With LuminaireScaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With Luminaire
 
Dataiku productive application to production - pap is may 2015
Dataiku    productive application to production - pap is may 2015 Dataiku    productive application to production - pap is may 2015
Dataiku productive application to production - pap is may 2015
 
CSCCIX2005
CSCCIX2005CSCCIX2005
CSCCIX2005
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudBridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the Cloud
 
Machine Learning 101 - AWS Machine Learning Web Day
Machine Learning 101 - AWS Machine Learning Web DayMachine Learning 101 - AWS Machine Learning Web Day
Machine Learning 101 - AWS Machine Learning Web Day
 
New Features 9.2 – Payroll for North America and T&L
New Features 9.2 – Payroll for North America and T&LNew Features 9.2 – Payroll for North America and T&L
New Features 9.2 – Payroll for North America and T&L
 
New features 9.2 - Payroll for North America and T&L
New features 9.2 - Payroll for North America and T&LNew features 9.2 - Payroll for North America and T&L
New features 9.2 - Payroll for North America and T&L
 
Introduction to Sagemaker
Introduction to SagemakerIntroduction to Sagemaker
Introduction to Sagemaker
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
 
AI at Scale in Enterprises
AI at Scale in Enterprises AI at Scale in Enterprises
AI at Scale in Enterprises
 
Migrating Oracle to Aurora PostgreSQL Utilizing AWS Database Migration Servic...
Migrating Oracle to Aurora PostgreSQL Utilizing AWS Database Migration Servic...Migrating Oracle to Aurora PostgreSQL Utilizing AWS Database Migration Servic...
Migrating Oracle to Aurora PostgreSQL Utilizing AWS Database Migration Servic...
 
Database@Home : The Future is Data Driven
Database@Home : The Future is Data DrivenDatabase@Home : The Future is Data Driven
Database@Home : The Future is Data Driven
 

More from Datameer

Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2Datameer
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data AnalyticsDatameer
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersDatameer
 
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...Datameer
 
Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data Datameer
 
How to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics WebinarHow to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics WebinarDatameer
 
Webinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-EndWebinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-EndDatameer
 
Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User Datameer
 
Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?Datameer
 
Why Use Hadoop?
Why Use Hadoop?Why Use Hadoop?
Why Use Hadoop?Datameer
 
Online Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics WebinarOnline Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics WebinarDatameer
 
BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? Datameer
 
Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Datameer
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsDatameer
 
Lean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use CaseLean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use CaseDatameer
 
The Economics of SQL on Hadoop
The Economics of SQL on HadoopThe Economics of SQL on Hadoop
The Economics of SQL on HadoopDatameer
 
How to do Data Science Without the Scientist
How to do Data Science Without the ScientistHow to do Data Science Without the Scientist
How to do Data Science Without the ScientistDatameer
 
How to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataHow to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataDatameer
 

More from Datameer (18)

Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data Analytics
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business Managers
 
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
 
Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data
 
How to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics WebinarHow to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics Webinar
 
Webinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-EndWebinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-End
 
Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User
 
Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?
 
Why Use Hadoop?
Why Use Hadoop?Why Use Hadoop?
Why Use Hadoop?
 
Online Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics WebinarOnline Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics Webinar
 
BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics?
 
Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data Analytics
 
Lean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use CaseLean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use Case
 
The Economics of SQL on Hadoop
The Economics of SQL on HadoopThe Economics of SQL on Hadoop
The Economics of SQL on Hadoop
 
How to do Data Science Without the Scientist
How to do Data Science Without the ScientistHow to do Data Science Without the Scientist
How to do Data Science Without the Scientist
 
How to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataHow to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited Data
 

Recently uploaded

Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 

Recently uploaded (20)

Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 

Top 3 Considerations for Machine Learning on Big Data

  • 1. © 2013 Datameer, Inc. All rights reserved.
  • 2. Top 3 Things to Consider with Machine Learning on Big Data Karen Hsu Elliott Cordo © 2013 Datameer, Inc. All rights reserved.
  • 3. About our Speakers Karen Hsu • Karen is Senior Director, Product Marketing at Datameer. With over 15 years of experience in enterprise software, Karen Hsu has co-authored 4 patents and worked in a variety of engineering, marketing and sales roles. • Most recently she came from Informatica where she worked with the start-ups Informatica purchased to bring data quality, master data management, B2B and data security solutions to market.  • Karen has a Bachelors of Science degree in Management Science and Engineering from Stanford University.   © 2013 Datameer, Inc. All rights reserved.
  • 4. About our Speakers Elliott Cordo • Elliott is a data warehouse and information management expert. He brings more than a decade of experience in implementing data solutions with hands-on experience in every component of the data warehouse software development lifecycle. • At Caserta Concepts, Elliott oversees largescale major technology projects, including those involving business intelligence, data analytics, Big Data and data warehousing. © 2013 Datameer, Inc. All rights reserved.
  • 5. Drivers & Challenges Use Cases © 2013 Datameer, Inc. All rights reserved. Key Criteria Best Practices Next Steps
  • 7. Big Data Drives Results Amazon vs Barnes & Noble Big Data Analytics Drives Results $300 $225 $150 $75 $0 12 /31 /09 1/10 0/10 0/10 1/10 1/11 0/11 0/11 1/11 1/12 0/12 0/12 1/12 1/13 /3 /3 /3 /3 /3 /3 /3 /3 /3 /3 /3 /3 /2 03 06 09 12 03 06 09 12 03 06 09 12 03 NetFlix vs Blockbuster $300 $225 $150 $75 $0 12 /31 /09 1/10 0/10 0/10 1/10 1/11 0/11 0/11 1/11 1/12 0/12 0/12 1/12 1/13 /3 /3 /3 /3 /3 /3 /3 /3 /3 /3 /3 /3 /2 03 06 09 12 03 06 09 12 03 06 09 12 03 © 2013 Datameer, Inc. All rights reserved.
  • 8. Alternatives Are Lacking Data Mining • • • • Traditional BI Hard to use Requires PHD experts Must write code Expensive • Fixed DW models • Must write code for analytics • Very high IT labor costs • Not agile © 2013 Datameer, Inc. All rights reserved. Visualization • Easy for small teams • Can’t manage large data volume • Lack support of advanced analytics
  • 9. Costs of Building Can be $1M+ Solution $1M+ in Capital Bay Area 140,000.00 117,000.00 119,000.00 125,000.00 116,000.00 New York $126,000.00 $105,000.00 $107,000.00 $119,000.00 $104,000.00 137,000.00 $133,000.00 138,000.00 136,000.00 120,000.00 $133,000.00 $133,000.00 $114,000.00 1,148,000.00 $1M+ in Salaries Job Title IT Project Manager System Administrator Network Administrator Database Administrator IT Security Manager Business Intelligence Analyst Data Scientist Java Developer QA Engineer $1,074,000.00 Cost / 100TB Teradata EDW 1,650,000.00 Oracle Exadata 1,400,000.00 IBM Netezza 1,000,000.00 © 2013 Datameer, Inc. All rights reserved.
  • 11. Use Cases Use Case What is Revealed Profiling and segmentation Customer, product, market characteristics and segments Acquisition and retention What leads a person to become a customer or stop being a customer Product development and operations optimization What led to product or network failure Campaign management Patterns of successful campaigns Cross-sell / up-sell Recommendations on services, products, or advisors for a given user/customer profile © 2013 Datameer, Inc. All rights reserved.
  • 12. Customer Examples Industry Use Case Financial Services • Show correlation between services purchased and investments/trades made • Identify customer segments • Recommendations for research articles to drive trading eCommerce • Show types of events person will like • Decision tree based on likelihood to click through • Recommendations for a large “cold start” population Gaming • Clustering for user profiles • Correlation between attributes of a game and behavior • Churn analysis Healthcare • Recommend tests or other offerings • Identify factors/trends that lead to disease © 2013 Datameer, Inc. All rights reserved.
  • 15. Ease of Use © 2013 Datameer, Inc. All rights reserved. Quality
  • 17. Clustering Overview • • • K-means is a popular and versatile general purpose clustering algorithm. Commonly used to group people and objects together to form segments Often leveraged to enhance recommendation and search systems K-Means How it works 1. Treats items as coordinates 2. Places a number of random “centroids” and assigns the nearest items 3. Moves the centroids around based on average location 4. Process repeats until the assignments stop changing © 2013 Datameer, Inc. All rights reserved. *Diagram from Collective Intelligence by Toby Segaran
  • 18. Ease of Use First, the set up... In Datameer, you select the columns... And get the results And then run the results... And the quality of results increases with larger data sets… And write additional code to scale... © 2013 Datameer, Inc. All rights reserved.
  • 19. Ease of Use First, you have to set up... pca <- princomp(iris[1:4]); colors <- kmeans(iris[1:4], 3)$cluster; plot(pca$scores[,1], pca$scores[,2], col=colors, pch=5); And then run the results... And then write more code to scale... © 2013 Datameer, Inc. All rights reserved. In Datameer, you select the columns... And get the results
  • 20. Ease of Use First, select the data... In Datameer, you select the columns... And get the results Second, you need to create the cluster... And then see the results © 2013 Datameer, Inc. All rights reserved.
  • 21. Ease of Use 1. First a dataset’s attirbutes must be converted to numeric representations User Location Company Favorite Algo Elliott New Jersey Caserta K-Means Karen California Datameer K-Means User Location Company Favorite Algo 1001 1 101 1001 1002 2 102 1001 2. This numeric dataset is then converted to a sequence file, then sparse vector leveraging Seqdirectory and seq2sparse  3. Mahout is called, number of clusters, distance calculation is specified bin/mahout kmeans -i /user/kmeans/vectors -c /user/ kmeans/input -o /user/kmeans/output -k 200 -dm CosineSimilarity -x 20 -ow 4. The sparse vector output is then converted back to a delimted format, 5. Textual attributes willl be appended back to the record, numeric values preserved for ad-hoc distance comparison of members within a cluster © 2013 Datameer, Inc. All rights reserved. *Diagram from Collective Intelligence by Toby Segaran In Datameer, you select the columns... And get the results
  • 22. Quality Comparison © 2013 Datameer, Inc. All rights reserved.
  • 24. Column Dependencies Overview A B C D a x a x b y b x b y b y a x a z c z c y a y a y Column Dependency ~ 0.99 Column Dependency ~ 0.01 Value •See how data is related after joining multiple sets of data •See column dependencies on multiple types of data © 2013 Datameer, Inc. All rights reserved.
  • 25. Quality Comparison ColumnDependency(A,B) = 0.5 ColumnDependency(A,B) = 0.5 0 Column B 0 Column B 0 -2 -5 -5 -1 Column B 1 5 5 2 ColumnDependency(A,B) = 0 -3 -2 -1 0 1 2 3 -2 -1 0 1 2 3 -2 -1 0 1 2 Column A Column A ColumnDependency(A,B) = 1 ColumnDependency(A,B) = 0.5 ColumnDependency(A,B) = 1 m k j i h g f e Column B (STRING) a b c a d b 0 -6000 -4000 -2000 Column B 2000 Column B (STRING) l c 4000 n 6000 o Column A -3 -2 -1 0 1 2 3 Column A © 2013 Datameer, Inc. All rights reserved. 0 0.5 1 1.5 Column A (NUMBER) 2 2.5 3 1 2 3 4 5 6 7 8 9 10 Column A (NUMBER) 12 14
  • 27. Decision Tree Overview Goal: Create a model that predicts the value of a target based on several inputs. © 2013 Datameer, Inc. All rights reserved.
  • 28. Ease of Use First, you need to code... packages.install(rpart); library(rpart); treeInput <- read.csv("/PathToData/ iris.csv"); fit <- rpart(class ~ sepalLength +sepalWidth+petalLength+petalWidth, data=treeInput); par(mfrow=c(1,2), xpd=NA); plot(fit); text(fit, use.n=TRUE); And then run the results... And then write more code to scale... © 2013 Datameer, Inc. All rights reserved. In Datameer, you select the columns... And get the results
  • 29. Ease of Use First, select the data... In Datameer, you select the columns... And get the results Second, you configure the settings... And then see the results © 2013 Datameer, Inc. All rights reserved.
  • 30. Quality Comparison Iris Wine Breast   Cancer   Wisconsin R 92.66% 86.47% 92.86% Weka 95.33% 89.33% 93.5% Datameer 93.33% 91.18% 93.04% © 2013 Datameer, Inc. All rights reserved.
  • 32. Recommendations Overview Increased revenue Your customers expect them What makes a good recommendation? Combination of algorithms and Hadoop make effective recommendations platform achievable © 2013 Datameer, Inc. All rights reserved.
  • 33. Ease of Use First, the set up... # run factorization of ratings matrix $MAHOUT parallelALS --input ${WORK_DIR}/dataset/trainingSet/ --output $ {WORK_DIR}/als/out     --tempDir ${WORK_DIR}/als/tmp --numFeatures 20 --numIterations 10 --lambda 0.065 --numThreadsPerSolver 2 # compute recommendations $MAHOUT recommendfactorized --input ${WORK_DIR}/als/out/userRatings/ -output ${WORK_DIR}/recommendations/     --userFeatures ${WORK_DIR}/als/out/U/ --itemFeatures ${WORK_DIR}/als/out/ M/     --numRecommendations 6 --maxRating 5 --numThreads 2 In Datameer, you select the columns... And get the results And then run the results... 1 [845:5.0,550:5.0,546:5.0,25:5.0,531:5.0,529:5.0,52 7:5.0,31:5.0,515:5.0,514:5.0] 2 [546:5.0,288:5.0,11:5.0,25:5.0,531:5.0,527:5.0,515 :5.0,508:5.0,496:5.0,483:5.0] 3 [137:5.0,284:5.0,508:4.832,24:4.82,285:4.8,845:4.7 5,124:4.7,319:4.703,29:4.67,591:4.6] 4 [748:5.0,1296:5.0,546:5.0,568:5.0,538:5.0,508:5.0, 483:5.0,475:5.0,471:5.0,876:5.0] 5 [732:5.0,550:5.0,9:5.0,546:5.0,11:5.0,527:5.0,523: 5.0,514:5.0,511:5.0,508:5.0] 6 [739:5.0,9:5.0,546:5.0,11:5.0,25:5.0,531:5.0,528:5 .0,527:5.0,526:5.0,521:5.0] © 2013 Datameer, Inc. All rights reserved.
  • 36. Big Data Analytics Process Integrate Define Ad Hoc Prepare and Analyze Deploy Visualize © 2013 Datameer, Inc. All rights reserved. Production
  • 37. Clustering • Leverage Hierarchies • If possible, use numbering schemes • Scale the surrogate key of attributes • Try different cluster sizes • Avoid numeric similarities when building your data © 2013 Datameer, Inc. All rights reserved.
  • 38. Recommendations K-Means: Similar Item-Based • Leverage a combination of algorithms • Clustering is your friend! • Treat cold start situations differently • Think about ranking • Don’t let recommendations go wild © 2013 Datameer, Inc. All rights reserved. Item Similarity Best Recommendations
  • 39. Process Best Practices Map © 2013 Datameer, Inc. All rights reserved. Chain Iterate
  • 42. Return on Investment Funnel Optimization Behavioral Analytics Fraud Prevention EDW Optimization Customer Segmentation Increase Customer conversion by 3x Increase Revenue by 2x Identify $2B in potential fraud 98% OpEx savings $1M+ CapEx savings Lower Customer Acquisition Costs by 30% © 2013 Datameer, Inc. All rights reserved.
  • 43. Call to Action Workshop Contact • Elliott Cordo elliott@casertaconcepts.com • Karen Hsu khsu@datameer.com © 2013 Datameer, Inc. All rights reserved.