1. Apache Spark MLlib
● What is Apache Spark ?
● What is MLlib ?
● Functionality
● Dependencies
● Books
● Eco-system
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
2. Spark – What is it ?
● Alternative to Map Reduce for certain applications
● A low latency cluster computing system
● For very large data sets
● May be 100 times faster than Map Reduce
● Used with Hadoop / HDFS
● Uses in memory cluster computing
● Memory access faster than disk access
● Has API's written in Scala / Java / Python
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
3. Spark MLlib – What is it ?
● Spark Machine Learning Library
● Provided with Spark Install
● Code in Scala / Java / Python
● Contain libraries
– Spark.mllib
– Spark.ml ( V1.2 )
● Provides common functionality
– classification, regression, clustering
– collaborative filtering, dimensionality reduction
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
6. Available Books
● See our Hadoop book from Apress / Springer
– “Big Data Made Easy”
● Look out for our Apache Spark based book
– from Packt in 2015
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
8. Contact Us
● Feel free to contact us at
– www.semtech-solutions.co.nz
– info@semtech-solutions.co.nz
● We offer IT project consultancy
● We are happy to hear about your problems
● You can just pay for those hours that you need
● To solve your problems