SlideShare a Scribd company logo
1 of 8
Apache Spark MLlib
● What is Apache Spark ?
● What is MLlib ?
● Functionality
● Dependencies
● Books
● Eco-system
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Spark – What is it ?
● Alternative to Map Reduce for certain applications
● A low latency cluster computing system
● For very large data sets
● May be 100 times faster than Map Reduce
● Used with Hadoop / HDFS
● Uses in memory cluster computing
● Memory access faster than disk access
● Has API's written in Scala / Java / Python
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Spark MLlib – What is it ?
● Spark Machine Learning Library
● Provided with Spark Install
● Code in Scala / Java / Python
● Contain libraries
– Spark.mllib
– Spark.ml ( V1.2 )
● Provides common functionality
– classification, regression, clustering
– collaborative filtering, dimensionality reduction
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Spark MLlib – Functionality
● Basic Stats
● Classification and regression
● Collaborative Filtering
● Clustering
● Dimensionality reduction
● Feature extraction and transformation
● Optimization
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Spark MLlib – Dependencies
● NumPy for Python
● Breeze ( linear algebra )
● Netlib-java
● Jblas
● Gfortran runtime library
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Available Books
● See our Hadoop book from Apress / Springer
– “Big Data Made Easy”
● Look out for our Apache Spark based book
– from Packt in 2015
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Spark Eco system
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Contact Us
● Feel free to contact us at
– www.semtech-solutions.co.nz
– info@semtech-solutions.co.nz
● We offer IT project consultancy
● We are happy to hear about your problems
● You can just pay for those hours that you need
● To solve your problems

More Related Content

Viewers also liked

Presentación final
Presentación finalPresentación final
Presentación finaldocentecis
 
8 kl vostochno-evropeyskaya_ravnina
8 kl vostochno-evropeyskaya_ravnina8 kl vostochno-evropeyskaya_ravnina
8 kl vostochno-evropeyskaya_ravninaones123
 
Day 4 Reflection at #SXSW 2013 -- #SXSWOgilvy
Day 4 Reflection at #SXSW 2013 -- #SXSWOgilvyDay 4 Reflection at #SXSW 2013 -- #SXSWOgilvy
Day 4 Reflection at #SXSW 2013 -- #SXSWOgilvyOgilvy Consulting
 
PEDIDO DE PROVIDÊNCIA 814
PEDIDO DE PROVIDÊNCIA 814PEDIDO DE PROVIDÊNCIA 814
PEDIDO DE PROVIDÊNCIA 814vereadoreduardo
 
8ink 기획서V1 0 김수현,유지은
8ink 기획서V1 0 김수현,유지은8ink 기획서V1 0 김수현,유지은
8ink 기획서V1 0 김수현,유지은jin_yoo
 
Profile Optimisation
Profile OptimisationProfile Optimisation
Profile OptimisationLinkedIn
 
効果的なXPの導入を目的とした プラクティス間の相互作用の分析
効果的なXPの導入を目的とした プラクティス間の相互作用の分析効果的なXPの導入を目的とした プラクティス間の相互作用の分析
効果的なXPの導入を目的とした プラクティス間の相互作用の分析Makoto SAKAI
 
8 Truths About Exercising presented by Terry Febrey
8 Truths About Exercising presented by Terry Febrey8 Truths About Exercising presented by Terry Febrey
8 Truths About Exercising presented by Terry FebreyTerry Febrey
 
The sps code of conduct 2011
The sps code of conduct 2011The sps code of conduct 2011
The sps code of conduct 2011bambangsaja
 
Excel dad6 8
Excel dad6 8Excel dad6 8
Excel dad6 8daalt209
 
Smokeless Tobacco and Oral Cancer
Smokeless Tobacco and Oral CancerSmokeless Tobacco and Oral Cancer
Smokeless Tobacco and Oral CancerSteven Kizior
 

Viewers also liked (13)

Entonar
EntonarEntonar
Entonar
 
Presentación final
Presentación finalPresentación final
Presentación final
 
8 kl vostochno-evropeyskaya_ravnina
8 kl vostochno-evropeyskaya_ravnina8 kl vostochno-evropeyskaya_ravnina
8 kl vostochno-evropeyskaya_ravnina
 
Day 4 Reflection at #SXSW 2013 -- #SXSWOgilvy
Day 4 Reflection at #SXSW 2013 -- #SXSWOgilvyDay 4 Reflection at #SXSW 2013 -- #SXSWOgilvy
Day 4 Reflection at #SXSW 2013 -- #SXSWOgilvy
 
PEDIDO DE PROVIDÊNCIA 814
PEDIDO DE PROVIDÊNCIA 814PEDIDO DE PROVIDÊNCIA 814
PEDIDO DE PROVIDÊNCIA 814
 
8ink 기획서V1 0 김수현,유지은
8ink 기획서V1 0 김수현,유지은8ink 기획서V1 0 김수현,유지은
8ink 기획서V1 0 김수현,유지은
 
Profile Optimisation
Profile OptimisationProfile Optimisation
Profile Optimisation
 
効果的なXPの導入を目的とした プラクティス間の相互作用の分析
効果的なXPの導入を目的とした プラクティス間の相互作用の分析効果的なXPの導入を目的とした プラクティス間の相互作用の分析
効果的なXPの導入を目的とした プラクティス間の相互作用の分析
 
8 Truths About Exercising presented by Terry Febrey
8 Truths About Exercising presented by Terry Febrey8 Truths About Exercising presented by Terry Febrey
8 Truths About Exercising presented by Terry Febrey
 
94 1006-1-pb
94 1006-1-pb94 1006-1-pb
94 1006-1-pb
 
The sps code of conduct 2011
The sps code of conduct 2011The sps code of conduct 2011
The sps code of conduct 2011
 
Excel dad6 8
Excel dad6 8Excel dad6 8
Excel dad6 8
 
Smokeless Tobacco and Oral Cancer
Smokeless Tobacco and Oral CancerSmokeless Tobacco and Oral Cancer
Smokeless Tobacco and Oral Cancer
 

More from Mike Frampton (20)

Apache Airavata
Apache AiravataApache Airavata
Apache Airavata
 
Apache MADlib AI/ML
Apache MADlib AI/MLApache MADlib AI/ML
Apache MADlib AI/ML
 
Apache MXNet AI
Apache MXNet AIApache MXNet AI
Apache MXNet AI
 
Apache Gobblin
Apache GobblinApache Gobblin
Apache Gobblin
 
Apache Singa AI
Apache Singa AIApache Singa AI
Apache Singa AI
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
OrientDB
OrientDBOrientDB
OrientDB
 
Prometheus
PrometheusPrometheus
Prometheus
 
Apache Tephra
Apache TephraApache Tephra
Apache Tephra
 
Apache Kudu
Apache KuduApache Kudu
Apache Kudu
 
Apache Bahir
Apache BahirApache Bahir
Apache Bahir
 
Apache Arrow
Apache ArrowApache Arrow
Apache Arrow
 
JanusGraph DB
JanusGraph DBJanusGraph DB
JanusGraph DB
 
Apache Ignite
Apache IgniteApache Ignite
Apache Ignite
 
Apache Samza
Apache SamzaApache Samza
Apache Samza
 
Apache Flink
Apache FlinkApache Flink
Apache Flink
 
Apache Edgent
Apache EdgentApache Edgent
Apache Edgent
 
Apache CouchDB
Apache CouchDBApache CouchDB
Apache CouchDB
 
An introduction to Apache Mesos
An introduction to Apache MesosAn introduction to Apache Mesos
An introduction to Apache Mesos
 
An introduction to Pentaho
An introduction to PentahoAn introduction to Pentaho
An introduction to Pentaho
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

An introduction to Apache Spark MLlib

  • 1. Apache Spark MLlib ● What is Apache Spark ? ● What is MLlib ? ● Functionality ● Dependencies ● Books ● Eco-system www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 2. Spark – What is it ? ● Alternative to Map Reduce for certain applications ● A low latency cluster computing system ● For very large data sets ● May be 100 times faster than Map Reduce ● Used with Hadoop / HDFS ● Uses in memory cluster computing ● Memory access faster than disk access ● Has API's written in Scala / Java / Python www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 3. Spark MLlib – What is it ? ● Spark Machine Learning Library ● Provided with Spark Install ● Code in Scala / Java / Python ● Contain libraries – Spark.mllib – Spark.ml ( V1.2 ) ● Provides common functionality – classification, regression, clustering – collaborative filtering, dimensionality reduction www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 4. Spark MLlib – Functionality ● Basic Stats ● Classification and regression ● Collaborative Filtering ● Clustering ● Dimensionality reduction ● Feature extraction and transformation ● Optimization www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 5. Spark MLlib – Dependencies ● NumPy for Python ● Breeze ( linear algebra ) ● Netlib-java ● Jblas ● Gfortran runtime library www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 6. Available Books ● See our Hadoop book from Apress / Springer – “Big Data Made Easy” ● Look out for our Apache Spark based book – from Packt in 2015 www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 7. Spark Eco system www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 8. Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems